APG-L Archives

Archiver > APG > 2009-07 > 1247064146

From: Ray Beere Johnson II <>
Subject: Re: [APG] Social Security numbers
Date: Wed, 8 Jul 2009 07:42:26 -0700 (PDT)

First, I hope you also noted my statement that, while this was of theoretical interest, there was much room to wonder if there was any practical application. Anyway, the SSDI is _not_ needed: you could obtain the same results by stealing SSNs and data on a large group of individuals and analysing that.
It is the same old story. Computers are a tool. The same tool that makes our lives easier makes crime easier. No doubt when the first crowbar was invented, all the villagers were screaming about how now anyone could get into their cottages. :-)
But, even if the sky remains in place, it _is_ important to understand what is really happening. See my remarks below.

--- On Wed, 7/8/09, Richard A. Pence <> wrote:

> The patterns really only work after 1988, when the SSA began to
> automatically issue Social Security numbers at birth.
> Prior to that there was (1) no firm relationship between date of birth
> and date a card was issued nor (2) place of birth and place card issued
> (the study even wrongly states that the Death Master File gives the
> state of birth; it does not).
> It wasn't until the 1970s (as I recall) that you needed
> Social Security numbers for children and then you didn't
> really need them unless their financial status was such that
> you had to file tax information.

The patterns work _better_ after 1988, but there _is_ some correlation even in earlier years. I agree that the SSDI does not give the state of birth - but here is how the correlation works. A statistically significant number of individuals in the SSDI had SSNs issued in the same state in which they were born. The algorithm caught that correlation - the fact the reporter misunderstood "issuing state" as "state of birth" is irrelevant to this (although I agree the reporter and the paper's fact checking staff - if they still have fact checkers - were pretty sloppy).
I don't have the technical article I read in front of me right now, but I believe the success rate for predictions of numbers issued after 1988 was about 25% - and that for numbers issued earlier was much lower, single digits, but still enough to notice. What is the real lesson here? If you have powerful computers, and people who know how to use them, you can pull useful information from nearly _any_ large pool of data. Lawmakers can't stop this: our society runs on data.
Once you understand how this works it is clear, A: this probably isn't a 'real-world' threat, B: even if it is a method eventually adopted in the real world, there is no way to stop it short of going back to horses and buggies. All that is a sideshow, in my opinion. What is really interesting about this is the potential it reveals for "pattern matching" software to extract useful information from data pools. Think of what might come out if we could persuade the geeks at Carnegie-Mellon to enter a bunch of census data, vital records, tax lists, and so on and see what patterns emerged. :-D
For that matter, I'd love to see one statistic I haven't noticed as highlighted anywhere. What exactly is the percentage of individuals in the SSDI whose SSNs were issued in the same state as the state where they were born - and how did this vary by decade? Now _that_ would be a useful purpose for the study.
Ray Beere Johnson II

This thread: