In my last blog I reviewed some recent patents in the AML/CTF space. They describe what I consider some very rudimentary analytics workflows–fairly simple scoring and weighting using various a-priori measures. Why are such simple approaches patentable? To give you sense of why I would ask this question, there was a great trumpeting of news around the closing of a $6b money laundering operation at Liberty Reserve. But money laundering (including terrorism funding) is estimated at $500 billion to $1 trillion per year. That’s alot of badness that needs to be stopped. Hopefully smarter is better.
There are predictive analytical solutions to various parts of the AML problem and there is a movement away from rules-only systems (rules are here to stay however since policies must still be applied to predictive results). However, the use of predictive analytics is slowed because AML analytics boils down to an unsupervised learning problem. Real-world test cases are hard to find (or create!) and the data is exceptional noisy and incomplete. The short message is that its a really hard problem to solve and sometimes simpler approaches just work easier than others. However, in this note, I’ll describe the issues a bit more and talk about where more advanced analytics have come into play. Oh and do not forget, on the other side of the law, criminals are actively and cleverly trying to hide their activity and they know how banks operate.
The use of algorithms for AML analytics is advancing. Since AML analytics can occur at two different levels, the network and the individual level, its pretty clear that graph theory and other techniques that operate on the data in various ways are applicable. AML Analytics is not simply about a prediction that a particular transaction, legal entity or gorup of LE’s are conducting money laundering operations. Its best to view AML analytics as a collection of techniques from probabilistic matching to graph theory to predictive analytics combining together to identify suspicious transactions or LEs.
If the state AML analytics is relatively maturing, what is the current state? Rather simple actually. Previous systems, including home grown systems, focused on the case management and reporting aspects (that’s reporting as in reporting on the data to help an analyst analyze some flows as well as regulatory reporting). AML Analytics was also typically based on sampling!
Today, bigdata can help to avoid sampling issues. But current investments are focused around the data management aspects because poor data management capabilities have greatly exacerbated the cost of implementing AML solutions. FS institutions desperately need to reduce these costs and comply with what will be an ever-changing area of regulation. “First things first” seems to be the general thrust around AML investments.
Since AML analysis will be based on Legal Entities (people and companies) as well as products, its pretty clear that the unique identification of LEs and the hierarchy/taxonomies/classifications of financial instruments is an important data management capability. Results from AML Analytics can be greatly reduced if the core data is noisy. When you combine the noisy data problem with today’s reality of highly siloed data systems inside of Banks and FS institutions, the scope of trying to implement AML Analytics is quite daunting. Of course, start simple and grow it.
I mentioned above that there are not alot of identifiable cases for training algorithms. While it is possible to flag some transactions and confirm them, companies must report Suspicious Activity Reports (SAR) to the government. Unfortunately, the government does not provide a list of “identified” data back. So it is difficult to formulate a solution using supervised learning approaches. That’s why it is also important to attack the problem from multiple analytical approaches–no one method dominates and you need multiple angles of attack to help tune your false positive rates and manage your workload.
When we look at the underlying data, its important to look at not only the data but also the business rules currently (or proposed) in use. The business rules will help identify how the data is to be used per the policies set by the Compliance Officer. The rules also help orient you on the objectives of the AML program at a specific institution. Since not all institutions transact all types of financial products, the “objectives” of an AML system can be very different. Since the objectives are different, the set of analytics used are also different. For example, smaller companies may wish to use highly iterative what-if scenario analysis to refine the policies/false positive rates by adjusting parameters and thresholds (which feels very univariate). Larger banks need more sophisticated analysis based on more advanced techniques (very multi-variate).
We’ve mentioned rules (a-priori knowledge, etc.) and predictive/data mining models (of all kinds since you can test deviations from peer groups using data mining methods, and predicted versus actual patterns etc.) and graph theory (link analysis). We’ve also mentioned master data management for LEs (don’t forget identity theft) and products as well taxonomies, classifications and ontologies. But we also cannot forget time series analysis for analyzing sequential events. That’s a good bag of data mining tricks to draw from. The list is much longer. I am often reminded of a really great statistics paper called Bump Hunting in High Dimensional Data by Jerome Friedman and Nick Fisher because that’s conceptually what we are really doing. Naturally, criminals wish to hide their bumps and make their transactions look like normal data.
On the data site, we have mentioned a variety of data types. The list below is a good first cut but you also need to recognize that synthesizing data, such as from aggregations (both time based aggregations and LE based aggregations such as transaction->account->person LE->group LE), are also important for the types of analytics mentioned above:
- LE data (Know Your Customer – KYC)
- General Ledger
- Detailed Transaction data
- Product Data
- External sources: watch lists, passport lists, identity lists
- Supplemental: Reference data, classifications, hierarchies, etc.
Clearly, since there are regulatory requirements around SAR (suspicious activity), CTF (currency transactions) and KYC, it is important that the data quality enhancements first focus on those areas.