30 July 2008

On Proving your Identity

I purchased a new Dell Latitude D630 laptop today. I'm excited, but I was also quite cautious, since this is more than I'd ever spent on a laptop before. Nonetheless, my old laptop is effectively dead, and I need one that can last me through the next five years of grad school.

About an hour after I placed the order, I received a call from some 800 number. I had sort of a goofy conversation with them. Here's a summary of my conversation with him, and a later conversation I had with my bank:
Them: Hi this is Mr. Foo calling from Bar, Inc. We contract with your bank or credit union to provide fraud monitoring and prevention. We noticed some questionable transactions, and wanted to verify them with you.

Me: (unimpressed by the phrase "bank or credit union") Umm... Thanks.

Them: Could you please tell me your name and the last four of your social security number?

Me: I'd love to, but first, can you prove to me that you really are affiliated with my bank?

Them: Unfortunately, we cannot do that until you verify with us first. Besides, it's just the last four digits of your social security number---what harm is there in that?

Me: (trying not to laugh) Well, I use those last four digits to prove my identity to my bank-or-credit-union, my cell phone company, my health insurance company, and even this morning to prove my identity to my University. Those last four digits have become quite an important secret, and I don't give them out readily (any more).

Furthermore, because of the structure of social security numbers, the first five digits aren't very random. The first three digits, for instance, can be looked up as a function of the year and state of your birth, so I must assume they are public knowledge. If I tell you my last four, then you know everything except the middle two; you would be able to narrow my social down to 1 in 100.

No disrespect intended, Sir, but even if I assume that you are who you say you are, and that you have a legitimate relationship with my bank-or-credit-union, I have no assurance that your company has any decent standards for data protection, or that everyone who works at your company has good intentions.

Instead, I'll have to call my bank-or-credit-union directly. At least then, I'll know who I am speaking with.
I wasn't really trying to give this guy a hard time, but we have to protect ourselves. Given the time-coincidence between the purchase and the call, I was pretty sure that Mr. Foo was indeed who he claimed to be. Nonetheless, I opted to call my bank-or-credit-union to be sure, and had the whole situation cleared up.

My main point here is to call attention to the failings of our everyday methods of proving our identities. In nearly every context, an applicant claims and identity, and then provides a piece of knowledge that is known only to the applicant and the service provider. This could be your username/password on a website, your name/last-4 to your phone company, or your name/credit card/expiration date/billing address to a vendor. Variations of this scheme go on forever, and none of them is secure.

And the failing of all of these methods is: to prove your identity, you give away your identity. If the service provider on the other end (or even one bad employee at that service provider) decides to, he may impersonate you. If someone set up a website y0urbank.com that looked exactly like yourbank.com,
you may accidentally give them your identity. It's even possible that your service provider acts in good faith, but all of your identifying information is stolen from their databases by some malicious agent.

There are alternatives; I'll present two cryptographic solutions below. Neither of these is my invention. In this modern day and age---and considering that credit card companies are liable for fraud---I don't know why they aren't in common use.

While I'm on the topic, I should also announce that I am creating an open-source RF keyfob that performs the first authenication scheme below. That's why I developed SHA-1 for the PIC16 late last month). The device is still in the works, and has been delayed by the death of my laptop, but will hit the scene before I start grad school.

The first solution is relatively simple. Presume for a second that both the applicant (me) and the service provider (my bank-or-credit-union) share some secret S_me. You can think of this as a password, or more generally as a really long number. Instead of giving S_me away (and as a result, giving away my identity), I can prove that I know S_me. In order to authenticate:
  1. I tell my bank-or-credit-union my purported identity "Nick Johnson."
  2. My bank creates a nonce N_bank or in other words a long random number that is different each time.
  3. I compare N_bank to all of the nonces I have seen in the past. If I have seen it before, I reject it and start from the beginning.
  4. I create a second nonce N_me.
  5. I can now calculate a magic number by concatenating the digits of N_bank, S_me, and N_me. Please note that only the bank-or-credit-union and I could have produced this magic number, since only we know S_me.
  6. I compute the cryptographic hash H of the magic number.
  7. I send N_me and H to my bank-or-credit-union.
  8. My bank-or-credit-union compares N_me to all of the nonces I have sent in the past. If my bank-or-credit-union has ever seen this before, he rejects the whole transaction, and tells me to start from the beginning.
  9. My bank-or-credit-union, knowing N_bank, S_me, and N_me can now also compute H.
  10. My bank-or-credit-union compares Hs, and if they match, can be confident that I know the secret.
This way, even if someone overhears the entire conversation, they neither learn any information about my secret (because of properties of the hash function), nor do they have a dialog that they can replay (because the nonces must change each time). By performing this protocol again in reverse, my bank-or-credit-union can also prove it's identity to me.

The problem with the first scheme is the shared secret. The two parties must agree upon some secret, and communicate one to the other, hoping that no one overhears. Also, although best practice is to store passwords as hash values, this scheme requires both parties to store the passwords as cleartext, making data-theft a possible issue.

There is another algorithm to perform this sort of exchange---the classic zero-knowledge proof. It is more complicated conceptually, as well as in computation-time and -space, but doesn't require the service provider to store any secrets, and is very sexy. I would love to employ it in my RF keyfob project, but I don't think I can make it fit into a small, embedded microcontroller. Let me explain it using an analogy.
There's one slice of Pizza left, and we both want it. Neither of us trust the other to divide it evenly. So, instead, we agree that one of us will cut the slice in two, and the other has first pick from the halves. The pareto-optimal strategy for each player yields an equal half-slice; the first doesn't know what the second will choose, and so he must be prepared for either choice.

So, what if instead of knowing a secret, suppose instead that I knew the solution to a complicated problem---so complicated that it would take years to solve it. This knowledge can be used represent my identity, and I can prove that I know it without giving away the solution:
  1. Both my bank-or-credit-union and I agree upon a complicated problem to which only I know the solution.
  2. We repeat these steps until the bank is satisfied of my identity:
    1. I create a derived problem which is equally complicated to solve, but which I can solve easily since I know the solution to the original. It is important that knowing the solution to the derived problem gives no hint to the solution of the original.
    2. I tell my bank-or-credit-union about this derived problem. I have committed to this problem, and in doing so have claimed that I can demonstrate both that it is equivalent to the original problem and that I know an answer to it.
    3. I ask my bank-or-credit-union: Which would you like to see---the equivalence, or the solution? (Which half would you like to eat?)
    4. My bank-or-credit-union selects one, and I provide the requested information.
    5. My bank-or-credit-union verifies my response. If it's wrong, it rejects me. If it's right, it only knows that I wasn't wrong. In the worst case I predicted its choice in [4], and my bank-or-credit-union is now twice as confident that I know the solution to the original problem.
This aforementioned protocol, of course, requires a satisfactory problem to perform. The classic example is the problem of finding the Hamiltonian Cycle on an undirected graph. Since this problem is NP-Complete, we can make the problem arbitrarily hard by using a large enough graph. However, it is very easy to compute the Hamiltonian Cycle while you build such a graph (or perhaps, more accurately, it is easy to add a Hamiltonian Cycle to a totally random graph).

With the example above, I would generate a large graph and cycle when I open an new account with the bank, and I would tell the bank the graph, but I would keep the cycle as my secret. The secret doesn't need to travel, and so we can keep closer tabs on it.

Additionally, to perform the above protocol, I need some method by which I can create a derived problem. With the Hamiltonian Cycle problem, this is also easy: I choose a new random name for each vertex in the graph, thus producing a new problem, and I perform the same substitution on the cycle, thus producing the solution to the new problem. I have created a new problem, equally as large (and presumably, equally as difficult to solve), which I can prove equivalent to the original (by disclosing the random substitution), or which I can solve on demand (by applying the random substitution to the solution to the original). Even if I disclose the derived problem and its solution, my bank-or-credit-union gains no knowledge of the solution to the original. To do so, it would need to solve the graph isomorphism problem to find the relationship between the two problems. Graph isomorphism is in NP.

The major problem with both of these authentication protocols is that they require a lot of computation---more than any human would be able to do in his head. As a result, the human needs a piece of trusted hardware---analogous to a key---which can serve as his proxy in this exchange. Such hardware exists, often in the form of smart cards. Each implementation is a black box; no one is allowed to audit them to see if they really work the way they claim. Again, sit tight and you can play with my RF keyfob project soon.

However, like a key, anything you have can be stolen. To use this device effectively, it should be combined with something that can't be stolen, for instance something you know. This project is not the ultimate solution, just a step in the right direction.

28 July 2008

This is New York:: NYPD Untouchable

I witnessed something scary at Critical Mass last Friday. I was riding in the crowd for a few hours---a totally peaceful night, no police harassment. It seemed to me like the quite before a storm.

But when we made it to Times Square, things changed. Two young police officers jumped into the road among the cyclists, and knocked one to the ground. I was immediately behind him; I didn't see him do anything illegal or even interact with the police officers.

They simply assaulted him. He tried to flee from the unlawful arrest (which is legal, since unlawful arrest is assault), and was then arrested.

I admit that, for all I know, this guy was a wanted criminal. I think this is doubtful; even if he were, I find it hard to believe that an officer could have identified him among the hundreds of cyclists and thousands of tourists in Times Square.


Either way, be very scared. Riding a bike is a crime.

UPDATE: Here's a video of it on YouTube:



Also seen on Gothamist

UPDATE: Streetsblog reports,
Mark Taylor, an attorney with the firm representing the cyclist, says he is hopeful the charges will be dropped in light of the video evidence. Asked whether the NYPD plans to go ahead with the charges, a department spokesman said the matter is being investigated. Since the video surfaced, the officer has been put on desk duty.
I hope justice might be served for once.

24 July 2008

Recovering deleted files from ext-3 filesystems

Exercise for the reader: where can you insert a space in this command to delete all of your files: "rm *.toc" ?

At work today, I managed to delete a document I had been writing for an hour and a half, just as I completed it and was about to add it to source control. What a royal pain in the butt.

It's no secret that most operating system do not actually erase data from a drive when they delete a file; rather, they update meta-data to mark the file as deleted. For this reason, there are "undelete" utilities for various operating system. And, although I found plenty for the ext-2 filesystem, there are none for the ext-3 filesystem. In fact, the journaling-aspect of ext-3 actually zeroes-out some imporant structures on disk, making traditional undelete utilities virtually impossible.

Still, I didn't give up. There is a way, which I'll document here for google to find.

I very quickly rebooted my computer into single-user mode. Single-user mode was helpful, in my opinion, because there were fewer processes running---any of which could have written to a temporary file that, by chance, could have occupied the same disk sectors as my file.

Next, I unmounted my /home partition.

Then, I went looking for my document on the partition. In my case, /home is /dev/sda6. The beauty of unix is that it treats everything---even partitions on your harddrive---as files, and it has a rich set of commands to operate on files. Here, I will literally grep my harddrive for my file.

To do this, you also need to know a crib for your file. In my case, I knew my file contained the text "An Algebra of Dispatch Rulesets."

Then, I executed this command:

strings /dev/sda6|fgrep --before=1000 --after=1000 "Algebra of Dispatch Rulesets" >foo

The strings command reads every byte from the harddrive, and only outputs runs of at least 4 printable characters. This fast pre-processing greatly reduces the work that grep has to do.

I used fgrep instead of grep because searching for a fixed pattern is much faster than searching for a regex. The --before= and --after= flags tell fgrep to print out 1000 lines of context before and after the match.

After this command completed, I was lucky enought to find the final draft of my document---among a lot of crap---in the file foo. I removed the crap, and recovered my file.

I was lucky in a couple of ways, and you may not be so lucky. First of all, my document was written in LaTeX---a plaintext format---which made it easy for me to identify the beginning and end of my document.

Also, my disk is only about 10% full, which means the operating system has no need to fragment files. As a result, my file appeared as a single, contiguous run in foo. On a more crowded disk, the OS will break the file up into blocks and stores them wherever; in that case, one would have to search for each of the blocks and re-assemble them into the original.

Anyway, I got my file back, and I hope you do too.

Oh yeah, recovering the file took about 2 hours. I probably should have just re-written it.

UPDATE: A friend tells me that this utility can undelete removed file on an ext3 filesystem.