Two Birds, One Web Page

Bird One

Everyone knows what soft links are, because they show up in the desktop interfaces we use. A soft link is an alias or a shortcut, depending on whether you're working on Macintosh or Windows. If you use Linux, you're used to creating soft links in the command line on Linux through the way software is often installed. After a directory gets untarred you find the author provides a shell script to start the application, so you create a soft link in /usr/local/bin. Eventually, this becomes second nature and you don't ever think about the switch in the command:

ln -s /usr/local/software/bin/ /usr/local/bin/software

The -s switch tells the ln command to make the link a soft link. We know what a soft link is. It's a pointer to another place; it's a post office box in Beverly Hills which forwards all mail to your real address in Inglewood. We're familiar with the concept.

But what is a hard link? What would happen if I absentmindedly omitted -s and the ln command created the default hard link? Among those who don't know, you'll sometimes hear myths. I heard: "It's like having two names represent the same underlying file. If you change file attributes on the first link, it changes on the second, and if you delete one link, you delete the other too." I didn't trust myself enough for that, given that I've become so used to removing soft links with impunity. A friend whose UNIX skills I trust once tried to correct the misconception, but he began by giving me a crash course in filesystem fundamentals. By his fifth use of the word "inode" I was lost. I rationalized hard links away as an obscure legacy which people suffered with before our new and improved version arrived -- hooray for the soft link. In my day-to-day computer use, hard links had about as much practical value as remembering peek and poke values for the Apple IIe. It's fun trivia and a bit of history but nothing more.

Bird Two

I recently received my laptop back from the IBM repair service (the ol' coffee in the keyboard trick), installed Ubuntu Linux (it was Breezy before the holidays), and in less than a couple of hours, I was back at home in Gnome 2.12 with data restored and customizations in place. This gave me the opportunity to consider automating my backups. Yes, I'm a manual backup guy. Every Monday morning (and always with my coffee by the way) I run a custom script to tar.gz what I need, then I copy that file to my removable drive. I remembered someone described their simple automated backup using rsync and cron and I decided to give it some thought. My last backup had been three days before the spill and wasn't completely current, so I was just lucky I hadn't done anything important in those last three days.

The Web Page

Google promptly offered me Mike Rubel's web page which describes in excellent detail how to implement rsync backups on Linux: If you're using Linux and aren't confident about your backups, read Mike's page and consider implementing his suggestions. To my delight I also discovered Mike gives an accessible explanation of hard links right there in the article! I quote it below for your UNIX edification, but I strongly encourage you to read his whole article; there are more gems to be read there than this one example.

We usually think of a file's name as being the file itself, but really the name is a hard link. A given file can have more than one hard link to itself--for example, a directory has at least two hard links: the directory name and . (for when you're inside it). It also has one hard link from each of its sub-directories (the .. file inside each one). If you have the stat utility installed on your machine, you can find out how many hard links a file has (along with a bunch of other information) with the command:

stat filename

Hard links aren't just for directories--you can create more than one link to a regular file too. For example, if you have the file a, you can make a link called b:

ln a b

Now, a and b are two names for the same file, as you can verify by seeing that they reside at the same inode (the inode number will be different on your machine):

ls -i a
   232177 a
ls -i b
   232177 b

So ln a b is roughly equivalent to cp a b, but there are several important differences:

  1. The contents of the file are only stored once, so you don't use twice the space.

  2. If you change a, you're changing b, and vice-versa.

  3. If you change the permissions or ownership of a, you're changing those of b as well, and vice-versa.

  4. If you overwrite a by copying a third file on top of it, you will also overwrite b, unless you tell cp to unlink before overwriting. You do this by running cp with the --remove-destination flag. Notice that rsync always unlinks before overwriting!!. Note, added 2002.Apr.10: the previous statement applies to changes in the file contents only, not permissions or ownership.

But this raises an interesting question. What happens if you rm one of the links? The answer is that rm is a bit of a misnomer; it doesn't really remove a file, it just removes that one link to it. A file's contents aren't truly removed until the number of links to it reaches zero.

Posted on January 10, 2006 in linux . | 215 Trackbacks, 0 Comments


Post a comment

Remember Me?

(you may use HTML tags for style)