How to Look for E-Mail Addresses in the Search Data
The search data in the form that was released by AOL does not contain e-mail addresses as they would be entered by a user. Instead, many characters are replaced by spaces. Fortunately,
in many cases related to e-mail this appears to only have removed the @ (at) sign. However, automatically detecting e-mail addresses without an @ sign in them is not trivial. One possibility
is looking for domains of well-known freemail providers. For example, there are several possible e-mail addresses that end in "gmail.com" in the search queries.
Looking for Obvious E-Mail Adresses
For starters, I had my faithful slave, the computer, print out all entries that contained "mail.com", "aol.com" or "yahoo.com", but not as the first word of a query.
There are more than 100 Google Mail addresses that can be found this way. However, recall that all these queries came from AOL users, so these e-mail addresses probably don't belong to
the users who searched for them.
Other Screen Names
Besides AOL and Yahoo, there are other online services that appear in the search data. For example, a number of queries appear to contain names of myspace profiles. But how to recognize a screenname?
Putting it All Together
One user ID is connected to a large number of searches for a Google Mail address with minor spelling variation, but apparently belonging to the same person. This can be told from the
search queries because the user part of the e-mail addresses in question (the part before the @ sign) appears to be the real name of a person. This user ID was also recorded searching
for vanity number plates in California and for a newspaper of Vallejo, California and she is on myspace. It is not unlikely that this person could be identified by asking the
persons whose e-mail addresses she entered into AOL search. However, nothing new could be learned from that because her record does not contain much more than what I
outlined here.
There is a rather large number of queries that mention myspace. Many of them appear to be web addresses erroneously pasted or entered into the search box. Unsurprisingly, the user with
the most web addresses containing "myspace.com" is again the user with the most total queries. The user with the second
most queries matching this pattern is more interesting. One of the URLs she searched for is a link that contains 40 unique myspace user IDs. If these 40 myspace users are
friends (the myspace kind) of the person behind this user, her identity could be found by looking at those users' friends lists and finding the persons that appear in all of them. However,
the rest of her recorded queries aren't particularily informative and you'd probably find out a lot more about her by looking at her myspace profile, which is publically available on
the internet anyway.
A different user ID that matches these criteria seems to be a hobbyist guitar player because almost all of her searches are related to songs, lyrics and guitar chords. She also searches
for similar myspace links containg three different myspace user ids. All of these three myspace profiles belong to junior level high school students in Nebraska. One of these three profiles
is private, so you need to be a friend of that person to look at her friends. The other two profiles are public and there are lists of 106 and respectively 152 friends. It is highly likely
that the identity of this user ID can be found by finding common friends of these two myspace users, especially with the added information of some degree of guitar obsession. Also, it is
likely that this person is also a junior level student at the same high school. Just like the previously described case, there's nothing interesting to be found in this user ID's queries, though.
Looking for Password Reset Links
Sometimes, a user of an online community will forget her password. She'll go to a web page where she can request a new one and perhaps receive a link by e-mail. She'll click that link and be
sent to a web page where she can reset her password. However, with users pasting URLs into the search box, some of those links may have ended up in the search logs.
There is exactly one such link for a myspace account in the log file. It contains the user id of the person who requested the password reset. The bad news is, it's again the user ID with
the most total queries, so the other searches associated with this user ID are likely not from the same person. One could, however, trivially
find out where that person used AOL from March to May 2006. Knowing her myspace also helps with locating the shared AOL account this person used. Read more about that here.