Welcome to our comprehensive research.


This part is rather technical, without giving away algorithmic secrets to potential money launderers.


If you look for condensed research, check out our White Paper.

The following research is based on my doctoral thesis:


A Short Introduction to Benford’s Law


Benford's Law is named after the American physicist Frank Benford (1883 – 1948), who was curious about the wear and tear of large books of logarithms that were widely used for scientific calculations before computers. Benford eventually looked at different data sets – everything from areas of rivers to death rates. In all cases, he discovered that low numbers had a higher frequency of occurring than high numbers.


If you Google Benford's Law, you're most likely to discover the first-digit test. However, in our research, we are using a more advanced test called the first-two-digit test. Why? The ACFE (Association of Certified Fraud Examiners) is the world’s leading association of fraud examiners. In a recent paper called "Using Benford's Law to Detect Fraud," they recommended the first two-digit test for efficient audit samples for testing. Our research confirms their findings: the more advanced test allows us to drill down a (more granular) level in money laundering.


The first-two-digit test formula:




Let's illustrate Benford's formula based on a fictive dataset. An AML (anti-money laundering) specialist discovered anomalies for the following company:


Figure 1: A fictive TBML dataset.

TBML (trade-based money laundering) revolves around invoice fraud and associated manipulation of supporting documents in its primary form. When a buyer and seller work together, the price of goods (or services) can be whatever the parties want it to be—manipulated invoices are hard to detect. For an AML specialist, the situation is more difficult as he deals with documents and not goods (or services). However, as we will see, with Benford's Law, he only needs the invoice amounts – nothing more.

In the case above (figure 1), how often do we expect to see invoices such as $51,523? Based on Benford's Law, the calculation is as follows:


Essentially, the log transformation gives us the expected probability for each digit.


Benford’s first-two-digit test only reviews invoices of $10 or higher (lower invoices are ignored). Additionally, it only examines the first two digits. For example, the $51,523 invoice reviews the first two digits – or 51 to detect abnormal duplications and possible biases in all invoices.


Visually we can see that the digits 51 were expected 0.8%, but observed 3.6%, which results in a delta (deviation from the expected distribution) of 2.8%.


Figure 2: Benford’s distribution in Tableau.

In data science, the visualization in figure 2 is one of the essential visualization types. It's called a histogram. A histogram has a categorical data type on the x-axis (invoices starting with the digits 10 – 99) and a measure on the y-axis (the observed distribution of the digits 10 – 99). Why is it so important? One reason is that we can easily visualize anomalies in our data.

"Benford generally conforms to accounting and finance data."

Prof. Dr. Mark Nigrini (Forensic Analytics, 2020)

The visualization also leverages our pre-attentive attributes, which are the visual cues humans automatically process with sensory memory. For example, we can quickly and quickly interpret suspicious money flow based on Benford's Law of attributes without extraordinary effort. In other words, in figure 2, we want to start investigating invoices starting with the digits 51, such as $51,523.


Summary: BenfordAnalytics is an additional tool to detect TBML for financial service companies. In other words, it should not be your only tool (e.g., it doesn't screen customers). However, BenfordAnalytics is very powerful in identifying TBML schemes.


The following dataset (figure X) with the fictive company name Starlight Software, LLC has not been flagged as suspicious. The observed distribution of each digit (e.g., 47) is pretty close to Benford's expected distribution. No bar is highlighted in red. Thus, based on a 95% confidence level, we do not suspect any money laundering activity.


Figure 4: Based on Benford's Law, a non-suspicious dataset.

Let's turn our attention now to a dataset that has been flagged as suspicious by Benford's Law. The invoice amounts are most likely randomized as the distribution is reasonably normal. In other words, most observed distributions are within +1/-1 standard deviations (dotted orange lines).


Figure 5: A highly suspicious dataset that has been flagged based on Benford's Law.

As we will see in the research results, randomized or close to randomized invoices are the easiest to spot with BenfordAnalytics. Based on my research, we can detect randomized invoices with 100% accuracy.


The following money launderer strategy will be much harder to detect.


Benford's Law has flagged the following dataset. However, this one is harder to detect than with randomized invoices (figure X). Why? The observed distribution follows Benford's expected distribution most of the time, while there are some anomalies with invoices starting with 51 and 55 (bars in dark red).


Figure 6: A suspicious dataset that has been flagged based on Benford's Law.

What does harder to catch mean? In general, we want to detect money laundering activities as fast as possible. However, if a money launderer creates invoices that follow Benford's expected distribution most of the time, the algorithm requires more samples until it can flag the dataset with confidence.


A visualization of the money flows can be helpful in your investigation. In the case of our flagged company, Padlock Solutions, Inc., BenfordAnalytics shows all money flows by Padlock Solutions, Inc. plus enormous money flows (by total amount) color-coded in red.


Figure 7: A suspicious dataset that has been flagged based on Benford's Law.

Conclusion: BenfordAnalytics empowers you to detect trade-based money laundering directly in Tableau. While it’s probably not your only software to use for detecting TBML, BenfordAnalytics can detect money laundering attempts with up to 100% accuracy.

The 3 Worst Mistakes in using Benford’s Law for AML

During my doctoral studies, I've uncovered some grave mistakes in using Benford's Law. Based on my research, the worst mistakes are:


(1) Some companies use percentage bands to detect anomalous behavior. While this technique might make intuitive sense, it's based on arbitrary numbers (e.g., +/- 10%). The result? We label too many transactions as potential money laundering (i.e., false positives) or miss actual money laundering (i.e.,

false negatives). Both are terrible. Based on my research, a better alternative is to use a

research-based threshold such as MAD (Mean Absolute Deviation) developed by

Prof. Mark Nigrini.


(2) Ignoring the sample size is a common mistake, even among statisticians. A quick analogy illustrates this crucial point: we can't ask ten people in Florida to vote for Trump and then generalize the findings to

the US population. In other words, we need a certain number of transactions before we can use the algorithm. Based on my research, this mistake creates many false alarms (i.e., false positives), which can be tremendously frustrating for investigators. The better alternative? Using confidence levels such as 90%, 95%, and 99% based on sample size.


(3) The last mistake is so deficient, it makes Benford's Law useless. For example, let's say we have 100 suspicious persons of interest for potential trade-based money laundering. If we calculate Benford's law

conformity based on the aggregated 100 persons, we might miss the money launderer. There's an extremely high chance that the calculation is wrong. However, if we analyze each person of interest separately (i.e., not aggregated), we have an infinitely higher chance of detecting the money launderer.


Summary: some mistakes in using Benford's Law make the approach useless. However, with BenfordAnalytics, you are confident that you will avoid those mistakes.

The Mean Average Deviation (MAD)

Hardly any dataset conforms precisely to Benford's expected distribution. Thus, we need a statistical measure that indicates how much deviation from Benford's expected distribution is acceptable. In statistics, a standard measure of conformity of a categorical data type is the chi-squared test. However, one problem with the chi-squared test in Benford's Law is that the test is based on an arbitrary p-value. An alternative proposed by Prof. Mark Nigrini is the Mean Absolute Deviation (MAD). MAD simply takes the sum of deviations from the expected distribution, expressed as an absolute value, and divides in the by the number of observations (e.g., 89 in the two-digit test).


Compared to other statistical tests, MAD has a distinctive advantage: it's based on empirical observation. Again, based on Prof. Mark Nigrini's research, he observed natural science data and defined a threshold. In other words, he observed how much natural data deviates from Benford's Law in order to create a threshold to when we can classify a dataset as non-conform. In the case of the first-digit test, he found that a MAD level of above 0.015 indicates nonconformity. This is crucially important. With those MAD levels, we can clearly define whether a dataset is conform or not.


Benford's Law major weaknes

Historically, Benford's Law used to have a significant weakness: a high false-positive rate. In AML, a false positive is wrongly classifying a dataset as nonconform (i.e., money laundering). However, based on my research, we can reduce the false-positive rate substantially by introducing confidence levels based on the sample size. This part was my focus in my thesis. In simple terms, the more samples we get the more confident we are in our prediction. 

For example, at a 99% confidence level, we need more samples to be confident about our prediction. Similarly, at a 90% confidence level, we need fewer samples. In general, a sweet spot in AML is a confidence level of 95%.

In order to get the desired confidence level based on the sample size, we used a Monte Carlo simulation in R.


Based on my research, we can reduce the high false-positive rate dramatically by using confidence levels base on the sample size:


Mathematically, we can see something extremely interesting in the sample size determination: the Central Limit Theorem (CLT) still holds, even for a strongly right-skewed distribution such as Benford's Law.


The Research Results

This page is based on my doctoral thesis, “Improving the Accuracy of the Benford Algorithm via Monte Carlo Simulation for Sample Size Determination.

In its basic form, Benford's Law is not an algorithm. However, if we add a threshold (the level at which we classify a dataset as suspicious) and sample size (combined with the corresponding confidence interval), we get an algorithm. Why do we care? An algorithm allows us objectively test the accuracy against large datasets. Additionally, we can automate an algorithm that can analyze large datasets in near real-time.

We expected money launders to create invoices that were independent and mutually exclusive. Mathematically, two invents A and B (e.g., two different invoices) are said to be independent when the occurrence of one event (say event A) does not affect the probability of occurrence of the other event (event B). Mathematically, two events A and B, are independent when P(A and B) = P(A) x P(B). In other words, let's say a money launderer creates an invoice of $125,000, and then $875,500 is independent. Independent events are valuable property of events since it simplifies calculating probability events.

Benford’s first-digits-test:


In probability theory, the Law of total probability is the sum of all marginal probability (e.g., X% for the digit 51) which must sum up to 1 (or 100%). In other words, if we sum up the probabilities for each digit [10, 11 … 98,99], then the sum must equal 1.

The name was coined in the 1940s by the mathematicians' John von Neumann and Stanislaw Ulam. Using the Monte Carlo method, we simulated the probabilities to calculate the sample size and its corresponding confidence intervals (90%, 95%, and 99%). Essentially, we simulate the behavior of compliant, not money laundering-based invoices to calculate the sample size required. The following R code snippet shows 10,000 Monte Carlo simulations for invoices starting with the digits 10 – 99 and corresponding probabilities.

replicate(10000, sample(10:99, n, replace = TRUE, prob = c( 0.041, 0.038, 0.035, 0.032, 0.030, 0.028, 0.026, 0.025, 0.023, 0.022, 0.021, 0.020, 0.019, 0.018, 0.018, 0.017, 0.016, 0.016, 0.015, 0.015, 0.014, 0.014, 0.013, 0.013, 0.013, 0.012, 0.012, 0.012, 0.011, 0.011, 0.011, 0.010, 0.010, 0.010, 0.010, 0.010, 0.009, 0.009, 0.009, 0.009, 0.009, 0.008, 0.008, 0.008, 0.008, 0.008, 0.008, 0.008, 0.007, 0.007, 0.007, 0.007, 0.007, 0.007, 0.007, 0.007, 0.007, 0.006, 0.006, 0.006, 0.006, 0.006, 0.006, 0.006, 0.006, 0.006, 0.006, 0.006, 0.006, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.005, 0.004, 0.004, 0.004)))

Confidence level, usually written as (1- a), gives us the interval estimate of a population parameter (not money laundering-based invoices). When a = 0.05 (a.k.a. significance), 95% is the confidence level, and 0.95 is the probability that the interval estimate will have the population parameter.

In other words, determining the required sample size based on confidence levels solves a significant problem with Benford's Law: a generally high false-positive rate.


Figure 10: The breakthrough approach to reduce the false-positive rate.

The TBML dataset:

The point of the fake-data simulation is not to provide insight into the data or the real-world problem being studied but rather to evaluate the properties of the statistical methods being used. In other words, in science, we like to simulate different datasets and see how the algorithm behaves.

Our dataset comprises of:

- 95 companies, each with 10,000 invoices.

- 5 companies, each with ten invoices.

- 5 companies using fabricated invoices (via randomized invoice amounts).


Based on a 95% confidence level, the BenfordAnalytics algorithm correctly classified all true positives (5) and all true negatives (90):


Figure 11: Confusion matrix for BenfordAnalytics study.

Assumptions in the research study: (1) among those five companies with only ten invoices each, none of them committed money laundering. The sample size is too small for most statistical analyses. (2) We don't know how money launderers create fake invoices. Even if we did have a sample, the following money launderer would most likely use a different strategy. Thus, we assume that money launderers randomize invoice amounts. This approach is not perfect as we don’t know how money launderers create invoices. However, the advantage is that we can compare this research results objectively against other research papers and algorithms.

Accuracy = (TP + TN) / (TP + TN + FP + FN) = (5 + 90) / (5 + 90 + 0 + 0) = 100%

Sensitivity = TP / (TP + FN) = 5 / (5 + 0) = 100%

In general, with money laundering, we care most about sensitivity. In other words, sensitivity is the ratio of money launderers we caught.

Of course, past performance is no guarantee for future performance. However, the study shows that we can catch money launderers who randomize invoice amounts with 100% accuracy.

In other words, if the invoice amounts don't follow Benford's expected distribution, we suspect fabricated invoices. Fabricated invoices can have many different distributions, while in our study, we defined fabricated invoices as randomized.

If your company wants to test your algorithm against the dataset used here, please let me know. I’m happy to share the trade-based money laundering dataset.

Statistical Significance of the Research

Last but not least, we wanted to know if our results were statistically significant. In other words, we wanted to know whether using confidence levels based on a sample size improves our results compared to the traditional approach of using Benford's Law without confidence levels. 

Assumption: datasets with too few samples were compliant.
P-value: set at 0.05.

Using a Monte Carlo simulation (number of simulations: 10,000) to test the hypothesis, we got a p-value of 0.03. In other words, if our null hypothesis were true, we would have observed the difference in accuracy less than 5%. 

Our research conclusion: the results are statistically significant at a 0.03 p-value.


Research conclusion: 

With BenfordAnalytics, banks and government institutions gain a powerful tool in combating trade-based money laundering.