Sneakernet still reigns supreme

Summary What's the quickest way to move your data from point-to-point? Hint: the answer might not be the Internet.

Disaster strikes

A series of automated text messages bombards your BlackBerry, indicating a crisis at the data center -- just as you've left work for the rest of Friday afternoon. As your phone chirps noisily with each incoming text, the sinking feeling in the pit of your stomach coalesces into a knot of despair. With a mounting sense of horror, your eyes settle on two weekend-destroying words: data corruption.

More details trickle in from your team members, and the pieces begin to fit together. You have a large amount -- let's say 2 terabytes -- of mission-critical information which has suddenly been corrupted at your office in Los Angeles. Although you've been doing regular backups, these are kept offsite in New York City at your company's headquarters. A staff member there has verified these copies are secure, clean, and uncorrupted. All the backup archives are accessible through your company's high-speed VPN and are available immediately.

But the data needs to be restored by Monday morning, or heads will roll (including yours, in all probability). Assume that the only thing you need to do to get back to normal is wipe out the corrupted data and replace it with the fresh copy. What's the fastest way to do this?

Some not-so-good options

This seems like a no-brainer, doesn't it? Just ssh over to the remote machine and start a scp session. Sit back and wait for the bits to finish streaming over the wire, and you're golden.

Alas, if only things were that easy. Reality is going to throw a monkey wrench into your weekend plans, for it seems impossible to finish this task by Monday morning. Consider, for example, how long restoring 2 TB of data will take with a relatively juiced-up cable connection at 16 Mb/s: 12.14 days, running at top speed. You'll finish the Wednesday after next, which is not nearly quick enough to make the deadline.

If you had a direct fiber connection at, say, 50 Mb/s, you could speed things up by factor of about 3; now it takes about 3.88 days. You won't finish until just after Tuesday lunch, still too late. What to do?

Sneakernet to the rescue

There's a faster, much lower-tech way: the sneakernet -- the transfer of data via physical storage. Despite tremendous Moore's Law-pace advancements in some areas of bandwidth innovation, the sneakernet wins hands down when it comes to delivering large blocks of data over channels where latency isn't important.

In this case, just ship the backup drive overnight to Los Angeles from the New York office. Even allowing for the time it'll take to restore from the backup, short-range same-bus speeds will easily trump transit over the public Internet. Doing the back-of-the-envelope math here yields some surprising results:

  • Suppose you order the drive shipped overnight at 6:00 PM local time and it arrives at noon the next day. This is an 18-hour window for the transit time.
  • A typical drive-to-drive copy speed for modern SATA drives is on the order of 40 MB/s when the drives are on the same bus. It will take an additional (2 TB) / (40 MB/s) = 14.56 hours to copy the drive.

We thus copied 2 TB of data in approximately 32.5 hours, for a net rate of about 160 Mb/s. If we had specialized cloning hardware, we could get a significant speed boost on the copying phase, perhaps to around a net speed of 240 Mb/s.

Further, because there's no practical cap on the number of drives you could send, absurd levels of bandwidth can be reached this way. Suppose you overnighted a box full of 100 such drives. This would only be roughly five times as expensive, but you'd wind up with a hundred-fold bandwidth improvement, to a net speed of 16 Gb/s. That's almost twice as fast as the current Internet2 speed record of 8.8 Gb/s.

The price of the sneakernet

Of course, nothing comes free. By using the sneakernet, you have implicitly accepted a number of tradeoffs:

  • Up-front costs. Shipping an insured hard drive overnight cross-country sets you back about $75 at FedEx as of this writing. That's roughly the cost of a month or two of unlimited-bandwidth residential Internet access. In the emergency scenario described above, the sneakernet's probably your best shot, so you might have no choice. And it's probably worth this relative pittance to save your company's business operations. Finally, sneakernet is usually vastly cheaper in terms of dollars per bandwidth.
  • Latency. The time it takes for data to arrive after being requested is much smaller with a digital channel than with the sneakernet. Even if the bandwidth is low, the lower bound on latency is the speed of light, yielding responses of milliseconds instead of days. It is thus more appropriate when you have large blocks of data and small pieces of them aren't useful without the rest.
  • Much bigger carbon footprint. Pushing bits and bytes around is cheap; an electron weighs around one million trillion trillionth of a kilogram (~10-32 kg). But your packages cost far more energy to move and ship around than electrons do, and they burn dozens of orders of magnitude more fossil fuels. This isn't terribly green.

On the other hand, note that as data density improves without any corresponding change in bandwidth, the sneakernet becomes more attractive. For example, there's no appreciable difference in shipping a 2 TB hard drive than a 250 GB one if they have the same form factor.

How does the sneakernet stack up?

As we've seen, physical transport of data can be the most effective means when you have either short physical distances or very large amounts of data, albeit at a tradeoff. This table summarizes some of the effective bandwidths you'll get [9], with sneakernet modes of transport italicized.

transport channeleffective bandwidth (Kb/s)
plain old telephone service (POTS)29
Integrated Services Digital Network (ISDN) chunk64
FCC "broadband" definition (2002)1200
FCC "broadband" definition (2008)1768
human stereo hearing (approximate)2~1,536
CWA mean broadband speed, US (2007)31,946
IEEE 802.11 (physical)2,048
canine stereo hearing (approximate)4~4,608
ADSL downstream (typical)6,144
CWA mean broadband speed, Canada (2007)37,782
IEEE 802.11b (physical)11,264
human stereo vision (approximate)6~19,700
IEEE 802.11a (physical)40,960
IEEE 802.11g (physical)55,296
CWA mean broadband speed, Japan (2007)362,464
IEEE 802.11n (user throughput)75,776
single 250 GB hard drive via FedEx, NYC to LAX5~150,000
freight pallet (500 × 250 GB disks), NYC to LAX5~1,200,000
walk a 8 GB USB stick across a room5~6,000,000
Internet IPv6 speed record (2008)79,227,469
oil tanker (10% full of 250 GB disks), NYC to LAX5~200,000,000,000

I've also compiled these into a chart that you can view here.


Footnotes and credits

1 FCC approves new method for tracking broadband's reach. cNet News, March 19, 2008. Accessed March 28, 2009.

2 Based on CD-quality 44.1 kHz sampling rate, 16-bit sampling resolution, 2 channels = ~1.5 Mb/s.

3 CWA survey: average broadband speed in US is 1.9Mbps. Ars Technica, May 29, 2007. Accessed March 28, 2009.

4 Based on [2], and assuming that dogs can hear a frequency range about three times as wide as a human.

5 Assumes the following transport times:

  • Crossing a room: 10 seconds
  • FedEx overnight: 18 hours
  • Shipping a freight pallet: 10 days
  • Shipping via oil tanker: 6 weeks

6 Bandwidth of the human eye. medGadget, July 28, 2006. Accessed March 28, 2009.

7 Internet2 Land Speed Record. Internet2. Accessed March 28, 2009.

8 I use "effective bandwidth" instead of "throughput" since it's hard to say what the bandwidth of some of these physical channels are.

Sneaker illustration by Ivan C. Reyes of Overbyte.

Trackbacks (none)

TrackBack URL: http://distilledb.com/mt/mt-tb.cgi/36

leave new comment