24 July 2008

Recovering deleted files from ext-3 filesystems

Exercise for the reader: where can you insert a space in this command to delete all of your files: "rm *.toc" ?

At work today, I managed to delete a document I had been writing for an hour and a half, just as I completed it and was about to add it to source control. What a royal pain in the butt.

It's no secret that most operating system do not actually erase data from a drive when they delete a file; rather, they update meta-data to mark the file as deleted. For this reason, there are "undelete" utilities for various operating system. And, although I found plenty for the ext-2 filesystem, there are none for the ext-3 filesystem. In fact, the journaling-aspect of ext-3 actually zeroes-out some imporant structures on disk, making traditional undelete utilities virtually impossible.

Still, I didn't give up. There is a way, which I'll document here for google to find.

I very quickly rebooted my computer into single-user mode. Single-user mode was helpful, in my opinion, because there were fewer processes running---any of which could have written to a temporary file that, by chance, could have occupied the same disk sectors as my file.

Next, I unmounted my /home partition.

Then, I went looking for my document on the partition. In my case, /home is /dev/sda6. The beauty of unix is that it treats everything---even partitions on your harddrive---as files, and it has a rich set of commands to operate on files. Here, I will literally grep my harddrive for my file.

To do this, you also need to know a crib for your file. In my case, I knew my file contained the text "An Algebra of Dispatch Rulesets."

Then, I executed this command:

strings /dev/sda6|fgrep --before=1000 --after=1000 "Algebra of Dispatch Rulesets" >foo

The strings command reads every byte from the harddrive, and only outputs runs of at least 4 printable characters. This fast pre-processing greatly reduces the work that grep has to do.

I used fgrep instead of grep because searching for a fixed pattern is much faster than searching for a regex. The --before= and --after= flags tell fgrep to print out 1000 lines of context before and after the match.

After this command completed, I was lucky enought to find the final draft of my document---among a lot of crap---in the file foo. I removed the crap, and recovered my file.

I was lucky in a couple of ways, and you may not be so lucky. First of all, my document was written in LaTeX---a plaintext format---which made it easy for me to identify the beginning and end of my document.

Also, my disk is only about 10% full, which means the operating system has no need to fragment files. As a result, my file appeared as a single, contiguous run in foo. On a more crowded disk, the OS will break the file up into blocks and stores them wherever; in that case, one would have to search for each of the blocks and re-assemble them into the original.

Anyway, I got my file back, and I hope you do too.

Oh yeah, recovering the file took about 2 hours. I probably should have just re-written it.

UPDATE: A friend tells me that this utility can undelete removed file on an ext3 filesystem.


Dennis Ferron said...

Great post!

I once had to do a similar thing when my logical volume got corrupted and I needed to search the ext3 file system for some code I had been working on. (You can't run fsck on a broken logical volume, because it's on a layer of abstraction below the filesystem!) Anyhow I got back a lot of binary junk with the code, so I wrote a quick Ruby script to clean it up. I'd like to send you a link to it, but I seem to have lost the script.

Dennis Ferron said...

BTW, as a Linux newbie at the time I didn't know I could use fgrep on a disk as a block file! I actually modified the source of a disk cloning program to scan the cloned sectors for my string. Quite a bit less work your way is.