Posting interesting threads in computer programming, technologies, science, etc.
Wednesday, April 25, 2012
The Benford's Law
(Sources: Wikipedia) Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than 5% of the time. This distribution of first digits is the same as the widths of gridlines on the logarithmic scale. Benford's law also gives the expected distribution for digits beyond the first, which approach a uniform distribution as the digit place goes to the right.
This result has been found to apply to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature). It tends to be most accurate when values are distributed across multiple orders of magnitude.The graph to the right shows Benford's law for base 10. There is a generalization of the law to numbers expressed in other bases (for example, base 16), and also a generalization to second digits and later digits.It is named after physicist Frank Benford, who stated it in 1938, although it had been previously stated by Simon Newcomb in 1881.
Notice that if the numbers increased by a constant rate, for example by 0.3. Then we see that the numbers will not follow the Benford's law.
<Ex>
1, 1.3, 1.6, 1.9, 2.2, 2.5, 2.8, 3.1, 3.3, 3.5, 3.8, 4.1, 4.3, 4.7, 5, 5.3, 5.6, 5.9, 6.2, 6.5, 6.8, 7.1, 7.4, 7.7, 8, 8.3, 8.6, 8.9, 9, 9.3, 9.6, 9.9
You see that the number occurrences of the first digit does not follow the Benford's law. However, when you multiply each subsequent number by an exponential number it will behave properly.
<Ex>
1.0, 1.050, 1.103, 1.158, 1.216, 1.276, 1.340, 1.407, 1.477, 1.551, 1.629, 1.710, 1.796, 1.886, 1.980, 2.079, 2.183, 2.292, 2.407, 2.527, 2.653, 2.786, 2.925, 3.072, 3.225, 3.386, 3.556, 3.733, 3.920, 4.116, 4.322, 4.538, 4.765, 5.003, 5.253, 5.516, 5.792, 6.081, 6.385, 6.705, 7.040, 7.392, 7.762, 8.150, 8.557, 8.985, 9.434, 9.906, 10.401, 10.921
We see this time the occurrence of '1' as the first digit is much more frequent than any other digit. After some calculation, the percent of '1' as first digit turns out to be around 30.6%, which is almost identical as the Benford's Law 30%.
Benford's Law is such a profound discovery and it can be used in a wide variety of fields: Finance, population growth, bacteria growth, anything that related to exponential growth can be predicted using Benford's Law.
Finally, here's the simple program for Benford's Law:
/*Programmer: David Lin
This program examines the famous Benford's Law that stated
numbers growing exponentially tends to behave a certain way.
The program can be used to detect fraud in financial,population-related,
housing market, and etc.
*/
import java.util.*;
import java.io.*;
public class BenfordLaw {
public static void main(String[] args) throws FileNotFoundException{
intro();
Scanner console = new Scanner(System.in);
Scanner input = searchFile(console);
int[]numberCount = digitCount(input);
printStat(numberCount);
}
//printing the result and see the percentage of occurrences for each digit
public static void printStat(int[] numberCount){
System.out.println();
int total = sum(numberCount)-numberCount[0];
System.out.println("Digit Count Percent");
for(int i=1;i<numberCount.length;i++){
double pct = numberCount[i]*100.0/total;
System.out.printf("%5d %5d %7.3f\n",numberCount[i],i,pct);
}
}
//summing the occurrences for each digit
public static int sum(int[]numberCount){
int sum = 0;
for(int n:numberCount){
sum+=n;
}
return sum;
}
//Counting the occurrences of the first digit from a file of numbers
public static int[] digitCount(Scanner input){
int[]countTimes = new int[10];//# of times digit from 0-9 occur
while(input.hasNextInt()){
int num = input.nextInt();
countTimes[singleDigit(num)]++;
}
return countTimes;
}
//Performing the algorithm to give the first digit of the number intersted
public static int singleDigit(int num){
while(num>=10){
num = num/10;
}
return num;
}
//User friendly introduction to the program
public static void intro(){
System.out.println("Welcome to David's program !");
System.out.println("This program runs the famous Benford's Law");
}
//Search whether the file can be found
public static Scanner searchFile(Scanner console) throws FileNotFoundException{
System.out.print("Please enter the file name to be examined(Included extension): ");
String fileName = console.nextLine();
File f = new File(fileName);
while(!f.canRead()){
System.out.println("File not found. Please try again.");
System.out.print("Please enter the file name to be examined(Included extension): ");
fileName = console.nextLine();
}
return new Scanner(f);
}
}
Subscribe to:
Post Comments (Atom)
If the file has comments though that needed to be "skipped" over in order to read the data, how would you code that into this program?
ReplyDelete