Benford's Law: A Practical Application for Fraud Analysis

Model in MS Excel: 

  

DOWNLOAD:

Benford's Law Model 

  

STEPS TO IMPORT YOUR DATA:

  

1. Import your data into a new sheet within the template.

  

2. Determine the field you wish to test, e.g. check amount.  Highlight that column, copy, and paste into tab "Data", column "A".  Ensure the first value of the data begins in cell A2.  Verify that only your data is within the column, i.e. ensure the example data has been deleted prior to pasting your new data.

  

3. Determine the range of the data.  A2 through (?).  The example data has a range of A2:A56072.

  

4. Update the range in tab "F1", cell B9.  The formula should reflect the proper range of your data.  Once that has been completed, the mode will calculate properly.

  

  

5. In tab "F1", cell B20 select the level of significance for the independence testing.  The default is 0.050.  For example, 0.050 indicates the probability that a chi-square test statistic falls between zero and the (upper tail) critical value is 0.95 or 95%.

  

  

6. Ensure your data meets the minimum frequency requirements of the chi-square test statistic.  Verify tab "F1", cell H23 reads "All frequencies meet the requirement."  If not, then chi-square will not apply due to insufficient size of the population for that specific frequency.  You can either accept this limitation or increase the scope of your data.  For example, include two months of data instead of merely one month. 

  

  

7. Ensure your data statistically conforms to a Benford distribution.  In tab "F1", cell C16 should read "The actual data conforms to a Benford distribution."  If not, choose another field to analyze.  If the data does conform to Benford, then you can say that statistically any deviations from the Benford distribution are due to human interference of what would otherwise be naturally occurring numbers. 

  

  

8. You can add up to two additional data fields that may be useful later for filtering criteria into tab "Evaluation", columns A and B.  In this example, check number and vendor number were added.  Verify that only your data is within the column, i.e. ensure the example data has been deleted prior to pasting your new data.

  

  

9. Risk is assigned based on the "Risk" tab.  The values with the highest risk will be those with a score of 4 in the "Evaluation" tab, column I.  This means that for those items, all four tests produced a red flag based on the top ten risk factors.  You can filter for those items or others by choosing from the drop down list in cell I1. 

  

Risk

  

Highest Risk

  

  

(or) depending on the version of MS Excel

  

Results 6

  

  

INTERPRETING THE RESULTS:

  

The graph from the F1 test of the example data indicates three digits where the actual exceeds the Benford distribution. 

  

  

This is supported where f0 > fe for the digits where k = 1, k = 2, and k = 7.  See tab "F1", cells I31, I32, and I37.  There is an increased risk when the observed frequency (fo) is greater than the expected frequency (fe) and the Z-score for the relative observed frequency (ZRf0) is positive. 

  

  

The same concept holds true when interpreting the other tests:

  

   F1: first digit (see above)

   D2: second digit

   D3: third digit

   D4: fourth digit

  

Risk is assigned for each of the tests based on the top ten highest positive differences between the observed and expected frequencies.  They are then rank ordered from the highest risk through the tenth highest risk.  Red cells indicate areas of risk whereas the green cells are within an acceptable range. 

  

  

The above example indicates that for test F1, the highest risk is when k = 1.  This means that for check amounts beginning with 1, there is the greatest risk for human manipulation based on this data.  For test D3, the third highest risk is when k = 5.  Thus, for check amounts with the third digit is 5, there is the third highest risk for human manipulation based on this data. 

  

Here, the overall greatest risk is for check amounts with the first four digits beginning with 1000.  This is because the risk ranking of 1, i.e. highest risk, is indicated for check amounts beginning with 1 (F1 test), where the second digit is 0 (D2 test), where the third digit is 0 (D3 test), and where the fourth digit is 0 (D4 test).  Thus, check amounts for this data have the greatest overall risk when the first four digits are 1000.  

  

Risk by test is summarized in the below table.

  

  

The example data indicates that the F1 test produced 32,165 red flags where k equaled a digit within a red cell.  In this example, only first digits where k = 1, k = 2, and k = 7 will produce a red flag. 

  

Here, only second digits where k = 0, k = 1, k = 5, and k = 9 will produce a red flag.  The D2 test produced a total of 24,095 red flags. 

  

In total, 100,321 red flags were produced by this data.

  

Risk by number of red flags is summarized in the below table.

  

  

The example data indicates that there were 2,958 checks that produced 4 red flags.  Since there are 4 tests, this means that all of those checks produced red flags from all 4 tests.  These represent the highest risk checks. 

  

To see which checks produced the highest risk rating, go to the "Evaluation" tab.  Column A was updated with vendor number and column B was updated with check number for the corresponding check amounts indicated in column C.  Cell I1 has a dropdown arrow.  Click the dropdown arrow and select 4.

  

  

Now the data is filtered to only show the checks which had red flags for all 4 tests.  Further analysis can be performed on these high risk checks now that they have been identified.  

  

You can also see which checks produced red flags only for a risk ranking of 1, i.e. the highest risk ranking.  Turn off the previous filter by selecting the (All) item from the dropdown list in cell I1. 

  

  

Next, go to the dropdown list in cell O1 and select 4.  This filter will now show you the checks that produced red flags for only the highest risk ranking of 1 for all 4 tests. 

  

Results 8

  

For the example data, cell Q2 indicates that 344 checks fit that criteria. 

  • Note: "RR1 4" indicates that for Risk Rating of 1 for all 4 tests (F1, D2, D3, and D4), for this data there were 344 checks.  "RR1 3" indicates that for Risk Rating of 1, i.e. the highest risk rating, for 3 of the 4 tests, for this data there were 1,411 checks. 
  • Note: "RR1 4" results show in cell Q2.  In order for the cell to be visable, ensure all filters, especially from cell O1, are turned off, i.e. set to "All".  

  

  

  

  

 Limitations                                                                                                   Sources

Copyright © 2007-2009       Alexis C Bell     -     All rights reserved