Thanks for posting this topic, thirtyseven. Much easier to discuss, explain and get feedback here in the forum.
---
Ok, Benford’s Law is what it is, but it’s not all it’s cracked up to be. As far as my research shows, Benford’s Law applies in two types of situations:
A) When there is no real, measurable limit to the data being analyzed (eg: address numbers - someone could start them at any given number and end them at any given number)
B) When there is a measurable limit to the data, and it falls into a range that favors 1 as the leading digit. (eg: rolling a 20 sided die, more than half the results should have a leading 1)
This is relatively simple to explain(I hope). Look at two things in particular, and it becomes obvious. The odds of the leading digit being a 1 vs. the odds of it being a 9. If your data is numbers from 1 to 9, the likelihood of the leading digit being 1 is 11.11%, as are the odds of it being 9, or any other number.
Now, look at the odds when the data can range from 1 through 19. Half of those numbers start with 1. The likelihood of the leading digit being 1 is 57.89%, whereas the odds of it being 9 (or any other number) is 5.26%.
If your data ranges from 1 to 29, the odds of the leading digit being 1 or 2 are 37.93% each, and every other number 3 through 9 are at 3.45%.
When you get to a range of 1 to 89, every number has odds of 12.36%, except 9, which has 1.12% odds.
When the range is 1 to 99, the odds are once again even. Everything is back to 11.11%.
Now here’s the important part: through all of those possible input data ranges:
The odds of the leading digit being 1 fluctuated between a low of 11.11% and a high of 57.89%.
The odds of the leading digit being 9 fluctuated between a low of 1.12% and a high of 11.11%.
So this is why Benford’s Law works in some cases, but not in others.
The odds of different data ranges average out to a curve close to the one described by Benford’s Law.