Researchers and practitioners use the Benford algorithm, or Bedford's law, extensively to detect fraud. The law states that the sequence of numbers is likely to be distributed in a specific, non-uniform way. Until now, the practical implementation of Benford’s law is visible in different investigations related to random pairs of growth rates and initial values, tax payment-based frauds, and accounting data frauds. The law also allows significant support in assessing the operational effectiveness of employees’ data and dealing with the survey data problems highlighted in large datasets.

The theorem is based on the conceptualization and operationalization of the digits appearing in the number. According to the law, leading digits appear in a number, which can be used to understand the distribution's uniformity and to detect the fraudulent use of certain digits in the business, finance, and accounting records. Additionally, applying the law appears to be mainly significant for the large dataset comprised of several individual datasets. In this context, there is a possibility that individual datasets in a large dataset may not comply with the criteria of Benford’s law. Still, such compliance is visible in the integrated dataset exhibiting new behaviors. Hence, applying the law is extremely critical for real-world problems, mainly in data and number management and uses.

Based on current research, there are two popular algorithms: the first-digit and the first two-digit algorithms. In this research paper, we focus on the first two-digit algorithm. In datasets that obey Benford's law for the first two digits, the number 10 appears as a significant leading digit about 4.14% of the time, while the digit 99 appears as the significant leading digit 0.44% of the time.

Hence, the current research has attempted to test the statements of Benford’s law by exhibiting that the law of the leading digit applies to the leading two digits, too, rather than leading a single digit. Hence, in pursuance of the same statement of the law, the current research can focus on the finite observations of the selected database of the invoices for algorithmic assessment based on accurate sample size. On the other side, Benford’s law in the second statement does highlight the importance of a good visual fit for the observations. The use of Monte Carlo Simulation in the current research has allowed the researchers to simultaneously evaluate the goodness of fitness tests.

Figure X visualizes the expected Benford distribution (blue line) for the digits 10 - 99. We can see that the distribution is highly skewed to the right — the bars in grey display the observed first two digits in percentages. For example, we observed the number 49 almost 2.5% of the time. Based on the number of standard deviations, 49 represents the largest deviation from the expected Benford distribution and is therefore marked in red.

Figure 1: Expected distribution of the digits 10-99 based on Bedford’s Law (blue line) vs. observed distribution (gray bars). The red bar indicates the largest deviation from the expected distribution based on standard deviations. Source: Franco Arda (2020).

One popular method to classify if a dataset obeys the law is calculating the average deviation from each digit, mathematically known as mean average deviation. If the mean average deviation is above a certain threshold, the dataset is classified as non-conforming to Benford's Law.

Some researchers work on fraud detection using Bedford's law (Tota, Aliaj, and Lamcja, 2016), and aggregate vendors (or companies) to calculate conformity to Bedford's distribution. The reason for aggregating is that, to date, we do not know the necessary sample size. In other words, if we have one hundred vendors, we aggregate all the invoices and test for conformity. The aggregation is also supported by the Central Limit Theorem, which justifies that one can get a normal distribution of a large dataset by combining the individual datasets with nonnormal or skewed distributions.

Contrary to common intuition that all digits should occur randomly with equal change in real data, empirical examinations consistently show that not all digits are created equal, but rather that low digits occur much more frequently than high digits in almost all data types, such as those relating to geology, chemistry, astronomy, physics, and engineering, as well as in accounting, financial, econometrics, and demographic data sets.

The finding is named after Frank Benford, a physicist who published a seminal article on the topic. Benford started his article by noting that the first few pages of a book of common logarithms show more wear than the last few pages.

Of interest here is the fact that the first few pages of logarithm books give us the logs of numbers with low first digits (e.g., 1,2, and 3). He concluded that the worn first pages were there because most of the numbers in the world had low first digits. The first digit is the leftmost digit in a number; for example, the first digit of 153,451 is 1.

Zero is inadmissible as a first digit, so there are nine possible first digits (1,2 …9). The signs of negative numbers are ignored, so the first two digits of -32.12 are 32.

In his research, Benford tried to collect data from as many sources as possible to include various types of data sets. His data varied from random numbers having no relationship to each other, such as the numbers from the front pages of newspapers and all the numbers in an issue of Reader’s Digest to formal mathematical tabulations, such as mathematical tables and scientific constants.

Other data sets included the drainage areas of rivers, population numbers, American league statistics, and street numbers from an American Men of Science issue. He analyzed the entire data set at hand or, in the case of large data sets, he worked to the point that he was assured that he had a fair average. All his work and calculations were done by hand, which was probably quite time-consuming.

His research showed that, on average, 30.6% of the numbers had a first digit 1, and 18.5% had a first digit 2. This means that 49.1% of his records had a first digit, either a 1 or a 2. In contrast, only 4.7% of his record had a first digit 9.

Benford then saw a pattern in his results. The proportion for the first digit 1 was almost equal to the common logarithm of 2 (or 2/1), and the proportion for the first digit 2 was almost equal to the 9, with the proportion for the first digit 9 approximately the common logarithm of 10/9.

He then calculated the expected frequencies of the digits in lists of numbers, and these frequencies have now become known as Bedford’s law. Bedford’s law is usually associated with the expected first digit (1-9) proportions, but we use a slightly more advanced version, the first two-digit (10-99) version, proposed by Nigrini (Nigrini, 2020).

Mathematically, a dataset satisfies Bedford’s Law for the first two digits if the probability that the first two digits, D1 D2, equal d1 d2 is approximate:

Most research on Bedford’s distribution suggests that real data will never perfectly reflect Benford’s expected proportion. Without an error term, it is too imprecise to say that the data set “does not conform to Bedford’s distribution.”

To run hypotheses scientifically, we need an “error measure” to evaluate an invoice dataset on whether it is conforming. How much does it have to differ from expected values to “non-conform”?

In statistics, a measure called chi-square compares two categorical distributions. This is not an error, as Benford’s proportions are binned (i.e., 10, 11 …99), and the data is categorical and not numerical. But given how it is calculated (Goodman, 2016), it is misleadingly sensitive to the size of the data sets being tested.

Again, Nigrini (Nigrini, 2020) came up with an ingenious research approach: first, he proposed a measure that is not sensitive to the number of records, and second, he meticulously researched natural and unnatural data to determine conformity.

The mean absolute deviation (MAD) is such a test that ignores the number of records because the denominator is the number of records, but the number of bins (i.e., 10-99), which always stays constant in the MAD formula:

A considerable amount of research went into analyzing natural datasets for the MAD levels, mainly by Nigrini (Nigrini, 1997). For example, earth science data gave near-perfect conformity to Benford's distribution and a MAD level of 0.0001. The results based on Nigrini's research (Nigrini, 2020):

In summary, figure 4 shows the individual levels for conformity based on Mean Average Deviation. For this research thesis, we will ignore the different granular levels and only focus on nonconformity at 0.0022.