<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Distilled Brilliance</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/" />
    <link rel="self" type="application/atom+xml" href="http://distilledb.com/blog/atom.xml" />
    <id>tag:distilledb.com,2008-10-25:/blog/2</id>
    <updated>2009-09-15T12:15:00Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 4.24-en</generator>


<entry>
    <title>Reaching the limits</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/09/15/reaching-the-limits.page" />
    <id>tag:distilledb.com,2009:/blog//2.80</id>

    <published>Tue, 15 Sep 2009 08:15:00 -0500</published>
    <updated>Tue, 15 Sep 2009 08:15:00 -0500</updated>

    <summary>Recently Distilled Brilliance has undertaken a number of new projects, and we&apos;ve found that our existing infrastructure isn&apos;t keeping up with the legal, accounting, and general technical complexities associated with bigger projects. So we&apos;re upgrading a number of internal systems....</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Announcements" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="announcements" label="announcements" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>Recently Distilled Brilliance has undertaken a number of new projects, and we've found that our existing infrastructure isn't keeping up with the legal, accounting, and general technical complexities associated with bigger projects. So we're upgrading a number of internal systems.</p>

<p>While we're poking around in there, though, we'd also like to switch our blogging platform to something more robust and internally useful. In particular, we've been looking at <a href="http://www.blogofile.com/">Blogofile</a>, a static blog publishing platform similar to Movable Type. I've been corresponding closely with <a href="http://github.com/enigmacurry">Ryan McGuire</a>, the lead developer, and we're collaborating on a raft of new features that we think will improve things greatly.</p>

<p>Stay tuned in the coming months for a possible site revamp and a switch to Blogofile for good!</p>]]>
        
    </content>
</entry>



<entry>
    <title>Getting a list of recently installed packages in Ubuntu</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/06/30/getting-a-list-of-recently-installed-packages-in-ubuntu.page" />
    <id>tag:distilledb.com,2009:/blog//2.71</id>

    <published>Tue, 30 Jun 2009 08:00:00 -0500</published>
    <updated>Tue, 30 Jun 2009 08:00:00 -0500</updated>

    <summary>Want to know what&apos;s recently been installed on your system? It&apos;s as simple as a few commands.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Snippets" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="instructional" label="instructional" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="ubuntu" label="Ubuntu" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>It's sometimes handy to see a list of recently installed packages, so that you can remember what you were fiddling with.</p>



<pre class="script-section">$ cat /var/log/dpkg.log* | grep &quot;\ install\ &quot; | sort</pre>



<p>This does a few things:</p>


<ul>
<li>First, we concatenate all of the package logs from <code>dpkg</code>, the package manager for Debian. The package logs will have names like <code>/var/log/dpkg.log.1</code>, <code>/var/log/dpkg.log.2</code>, and so on. (The oldest logs will be zipped up, using Ubuntu's built-in <a href="https://help.ubuntu.com/community/LinuxLogFiles#Log%20Rotation">log rotation</a>, so they will look like <code>/var/log/dpkg.log.3.tar.gz</code>.)</li>
</ul>




<ul>
<li>This is then piped to <code>grep</code> so that we find only installation events. Specifically, we look only for lines that contain a space, followed by the word "install", followed by another space.</li>
</ul>




<ul>
<li>Finally, we sort the list of items we got. Because each <code>dpkg</code> record line begins with a timestamp, items that were installed earlier will be sorted first in this list.</li>
</ul>



<p>The result is a chronological list of package installation events, with the most recently installed package first. (Note that if a package was installed and subsequently removed, it will still be on the list.)</p>

<p>If you want to get some subset with the most recently installed one first, you can use a combination of <span class="script-section">sort -r</span> and <span class="script-section">head -n</span>. For example:</p>



<pre class="script-section"># Retrieve the last 5 most recently installed packages, with
# the most recently installed one first.
$ cat /var/log/dpkg.log* | grep &quot;\ install\ &quot; | sort -r | head -n 5
2009-06-20 23:47:49 install easytag &lt;none&gt; 2.1.4-1.1
2009-06-20 23:47:49 install libid3-3.8.3c2a &lt;none&gt; 3.8.3-7.2
2009-06-19 18:08:02 install wv &lt;none&gt; 1.2.4-2ubuntu2
2009-06-19 18:08:01 install tracker-utils &lt;none&gt; 0.6.93-0ubuntu3
2009-06-19 18:08:01 install tracker-search-tool &lt;none&gt; 0.6.93-0ubuntu3
</pre>



<p><strong>Update:</strong> Commenter <code>maximd</code> noted that you can also bring <code>zcat</code> into the equation to get an even more comprehensive list that will introspect the log files. This will hit the older logs that have been zipped up, so you can go farther back if needed.</p>]]>
        
    </content>
</entry>



<entry>
    <title>Sliding windows now in Mono</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/06/26/sliding-windows-now-in-mono.page" />
    <id>tag:distilledb.com,2009:/blog//2.69</id>

    <published>Fri, 26 Jun 2009 09:30:00 -0500</published>
    <updated>Fri, 26 Jun 2009 09:30:00 -0500</updated>

    <summary>Code from an earlier article was solicited for inclusion into Mono and is now a part of the Mono.Rocks extension methods library.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Announcements" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="c" label="C#" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="mono" label="Mono" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>An earlier article from this blog described how to use <a href="http://distilledb.com/blog/archives/date/2009/03/12/enumerable-extension-method-for-contiguous-subsequences.page">sliding windows to obtain contiguous subsequences</a>, and provided a sample implementation in the form of an extension method. The excellent <a href="http://www.jprl.com/Blog/">Jon Pryor</a> of Novell wrote in to let me know he was planning to include it. With a few modifications and some unit tests, the finished product is now part of the "Mono.Rocks subproject", which provides extension methods to augment base classes.</p>

<p>As committed, the code looks like this:</p>




<pre><code class="code c-sharp:nocontrols">
public static IEnumerable&lt;IEnumerable&lt;TSource&gt;&gt;
  ContiguousSubsequences&lt;TSource&gt;(this IEnumerable&lt;TSource&gt; self, int windowSize)
{
  Check.Self (self);
  if (windowSize &lt; 1)
    throw new ArgumentOutOfRangeException("windowSize", "must be &gt;= 1");

  return CreateContiguousSubsequencesIterator (self, windowSize);
}

private static IEnumerable&lt;IEnumerable&lt;T&gt;&gt;
  CreateContiguousSubsequencesIterator&lt;T&gt;(this IEnumerable&lt;T&gt; input, int windowSize)
{
  int index = 0;
  var window = new List&lt;T&gt;(windowSize);
  window.AddRange (new T[windowSize]);
  foreach (var item in input) {
    bool initializing = index &lt; windowSize;

    if (!initializing) {
      window = window.Skip (1).ToList ();
      window.Add (default (T));
    }

    int itemIndex = initializing ? index : windowSize - 1;
    window [itemIndex] = item;

    index++;
    bool initialized = index &gt;= windowSize;
    if (initialized)
      yield return new List&lt;T&gt;(window);
  }
}
</code></pre>
]]>
        
    </content>
</entry>



<entry>
    <title>WordPress talk slides released</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/06/26/wordpress-talk-slides-released.page" />
    <id>tag:distilledb.com,2009:/blog//2.68</id>

    <published>Fri, 26 Jun 2009 06:00:00 -0500</published>
    <updated>Fri, 26 Jun 2009 06:00:00 -0500</updated>

    <summary>Last Thursday, I was graciously invited to give a talk at the University of Virginia&apos;s WordPress MUG on securing WordPress installations. For attendee reference, I&apos;ve posted the slides from the talk on SlideShare here....</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Announcements" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="announcements" label="announcements" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="presentations" label="presentations" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="wordpress" label="WordPress" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>Last Thursday, I was graciously invited to give a talk at the University of Virginia's WordPress <span class="caps">MUG </span>on securing WordPress installations. For attendee reference, I've posted the slides from the talk on SlideShare <a href="http://www.slideshare.net/dzbslideshare/wordpress-a-gentle-introduction">here</a>.</p>]]>
        
    </content>
</entry>



<entry>
    <title>TweetFastTweetFurious released</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/06/22/tweetfasttweetfurious-released.page" />
    <id>tag:distilledb.com,2009:/blog//2.67</id>

    <published>Mon, 22 Jun 2009 08:00:00 -0500</published>
    <updated>Mon, 22 Jun 2009 08:00:00 -0500</updated>

    <summary>This handy plugin makes issuing Tweets of Fury challenges to your fellow Twitterers a snap.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Announcements" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="announcements" label="announcements" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="greasemonkey" label="Greasemonkey" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="twitter" label="Twitter" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<div class="sidebar-graphic"><img src="http://assets.distilledb.com/assets/entry-assets/by-id/67/tweets-of-fury.jpg" alt="" /></div>

<p>Is someone you know on Twitter cruisin' for a bruisin'? Fortunately, the developers over at <a href="http://viget.com">Viget Labs</a> have too much free time on their hands. They recently released <a href="http://tweetsoffury.com">Tweets of Fury</a>, a cute application which allows you issue challenges over Twitter that are resolved by a best-of-three rock-paper-scissors duel on the ToF side.</p>

<p>Prior to the battle, each side writes an embarrassing message which their opponent will be forced to tweet if they prove victorious. The winner gets their message tweeted by their opponent, thus ensuring public humiliation and a fortnight of roundly deserved scorn from the community.</p>

<p><a href="http://userscripts.org/scripts/show/52163">TweetFastTweetFurious</a> is a Firefox Greasemonkey plugin that makes it much quicker to issue these challenges directly to your opponents. A special action link will be revealed when you mouse over any person's status from your main feed, which you can then click to issue a Tweets of Fury challenge.</p>

<div class="image-holder"> <img src="http://assets.distilledb.com/assets/entry-assets/by-id/67/tweet-fast-tweet-furious.png" alt="" class="figure" /> <div class="caption">Mouse over any status to issue a Tweets Of Fury challenge to impudent scoundrels.</div></div>

<p>Assuming you've already logged into Tweets of Fury and have given the application permission to access Twitter on your behalf, you'll be spared the incredibly taxing step of remembering who you just challenged and having to type in their name. Instead, once you click the challenge link, you'll be taken to the New Battle screen, which will be prepopulated for you with the challenged Twitterer's name.</p>

<div class="image-holder"> <img src="http://assets.distilledb.com/assets/entry-assets/by-id/67/tweets-of-fury-populated.png" alt="" class="figure" /></div>

<p>So what are you waiting for? Get out there and go all Aaron Burr on your foes. Download <span class="caps">TFTF </span><a href="http://userscripts.org/scripts/source/52163.user.js">here</a>.</p>

<h3>Credits</h3>

<p>Thanks to <a href="http://www.viget.com/about/team/preagan">Patrick Reagan</a> (<a href="http://twitter.com/reagent">@reagent</a>) and <a href="http://www.viget.com/about/team/kvigneault">Kevin Vigneault</a> (<a href="http://twitter.com/kvigneau">@kvigneau</a>) for their permission to embed the Tweets of Fury favicon in the plugin.</p>

<p>Thanks also to <a href="http://www.robares.com/">Rob Ares</a>, with whom I bounced this ill-advised idea around over drinks at Ruby Hack Night.</p>]]>
        
    </content>
</entry>



<entry>
    <title>Python gotcha: Bizarre integer equality</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/06/18/python-gotcha-integer-equality.page" />
    <id>tag:distilledb.com,2009:/blog//2.61</id>

    <published>Thu, 18 Jun 2009 20:00:00 -0500</published>
    <updated>Thu, 18 Jun 2009 20:00:00 -0500</updated>

    <summary>Are two different references to the same integer value the same object? The answer: sometimes.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Articles" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="gotchas" label="gotchas" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="python" label="Python" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>In Python, everything is an object. These semantics are predictable for the most part -- until they aren't. Here's a short but confusing snippet of Python 3 code, running from Ubuntu 9.04. Can you surmise why this inconsistency happens?</p>



<pre class="code-block">
&gt;&gt;&gt; a = 500
&gt;&gt;&gt; b = 500
&gt;&gt;&gt; a is b
False

&gt;&gt;&gt; c = 200
&gt;&gt;&gt; d = 200
&gt;&gt;&gt; c is d
True</pre>



<p>In Python, <code>is</code> tests for identity, not equality. <code>x is y</code> if and only if <code>x</code> and <code>y</code> reference the same thing. Although <code>a</code> and <code>b</code> have the same value, they are distinct objects, and so comparing the two yields <code>False</code>, as one might expect.</p>

<p>But then we're confronted with the second case. It's precisely identical to the first, just with a different assigned value. Yet it produces the opposite result. How can this be?</p>

<p>The key to this puzzle lies in a peculiar implementation detail of <a href="http://en.wikipedia.org/wiki/CPython">CPython</a>, the de facto Python implementation. As we said earlier, in Python, everything is an object, even literals. Logically, that means that two different instances should be distinct from each other, as in the first case above.</p>

<p>But in CPython, when you create an integer literal in the range <code>[-5, ..., 256]</code>, it's actually cached for performance reasons. Further references to the same literal are identical references to the existing literal, not new references. Thus <code>c</code> and <code>d</code> refer to the same cached instance, and the result is <code>True</code>.</p>

<p>Because of another implementation detail, two literals with the same value that are in the same compilation unit will reference the same object. Comparing literals directly results in <code>True</code> in both cases, as we see here:</p>



<pre class="code-block">
&gt;&gt;&gt; 200 is 200
True

&gt;&gt;&gt; 500 is 500
True</pre>



<p>More importantly, however, this illustrates the danger when <code>is</code> is mistakenly used to compare <em>value equality</em> instead of <em>reference equality</em>. Had you used <code>==</code> instead, the results are precisely what you'd expect:</p>



<pre class="code-block">
&gt;&gt;&gt; a = 500
&gt;&gt;&gt; b = 500
&gt;&gt;&gt; a == b
True

&gt;&gt;&gt; c = 200
&gt;&gt;&gt; d = 200
&gt;&gt;&gt; c == d
True</pre>]]>
        
    </content>
</entry>



<entry>
    <title>Getting IP address from GNU/Linux command line</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/06/05/getting-ip-address-from-command-line.page" />
    <id>tag:distilledb.com,2009:/blog//2.57</id>

    <published>Fri, 05 Jun 2009 11:00:00 -0500</published>
    <updated>Fri, 05 Jun 2009 11:00:00 -0500</updated>

    <summary>How to get your IP address from the command line in one shot with curl and sed.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Snippets" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="instructional" label="instructional" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="linux" label="Linux" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>Need to get your IP address from the command line with maximum geekiness? Two simple tools together can get the job done. This is handy for use as part of another script or just for satisfying curiosity.</p>



<pre class="script-section">$ curl -s http://checkip.dyndns.org | sed 's/[^0-9.]//g'
209.145.68.249</pre>



<p>This is a simple one-two punch. First, we use the <a href="http://curl.haxx.se/docs/"><code>curl</code></a> tool to get back a <span class="caps">HTML </span>document that contains our <span class="caps">IP, </span>using the checkip service from <a href="http://dyndns.org">DynDNS</a>. Next, we need to filter the output to return just the IP address.</p>

<p>Then we pipe the output to <a href="http://www.gnu.org/software/sed/manual/sed.html"><code>sed</code></a>, with the instructions <code>s/[^0-9.]//g</code>. This tells <code>sed</code> to do the following:</p>


<ul>
<li>invoke the (s)ubstitute command;</li>
<li>find the first character ([...]) which is not ([^...]) a digit (0-9) or a period (.);</li>
<li>replace each such character with an empty string, effectively removing it;</li>
<li>repeat this command (g)lobally for the entire stream</li>
</ul>



<p>The result is our IP address.</p>]]>
        
    </content>
</entry>



<entry>
    <title>Giving effective technical presentations</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/05/27/giving-effective-technical-presentations.page" />
    <id>tag:distilledb.com,2009:/blog//2.56</id>

    <published>Wed, 27 May 2009 06:15:00 -0500</published>
    <updated>Wed, 27 May 2009 06:15:00 -0500</updated>

    <summary>Giving great talks means delivering with confidence and in a style that exercises your audience&apos;s minds, not just their eyes and ears.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Articles" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="presentations" label="presentations" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>In a digital age where screencasts and podcasts can reach millions at once, it's ironic that live public speaking can continue to be a challenge. The simple act of standing up in front of a small audience and making a presentation is still a nerve-wracking experience for many.</p>

<p>It's a widely cited (if perhaps a bit exaggerated) statistic that fear of public speaking ranks higher than fear of death. As Jerry Seinfeld noted, "this means that, to the average person, if you have to go to a funeral, you're better off in the casket than doing the eulogy."<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/05/27/giving-effective-technical-presentations.page#fn1">1</a></sup> This mindset poses a problem, because knowledge dissemination and the free exchange of ideas are some of the most important services a technical community provides.</p>

<p>So if we recognize that these are valuable pursuits, and if we want to give a great talk, what can we do to maximize our chances of making that happen? While there's no magic formula for guaranteed success, effective talks almost always have two essential characteristics: a high-bandwidth presentation style and speaker confidence.</p>

<h3>Presentation style</h3>

<p><em>How</em> you present is more important than <em>what</em> you present. No matter how intriguing or useful the topic, your listeners will get little value out of a talk that's poorly delivered. Effective presenters will therefore try to maximize the useful information they communicate to their audience.</p>

<p>The bandwidth of someone's words are very low; we could get the same information by reading a transcript of their speech. Clearly people find conferences more valuable, or they wouldn't pay hundreds of dollars for them. Why don't we get the same value out of a transcript that we do by reading a talk? It's because the nature of the delivery and the presenter's style can greatly enhance how much we learn from a talk; these are things that are difficult to capture textually.</p>

<p>Effective presenters invite their audience to consider possibilities with them and to imagine scenarios; they exercise minds, not just eyes and ears. The epiphany gained from seeing a good speaker bring together a complex concept, the excitement from seeing a new technology, the curiosity evoked by an intriguing demonstration -- all these are emotional connections that draw your listeners closer. </p>

<p>Asking your audience rhetorical questions is a particularly good tactic here. You'll engage them directly, and you invite them to consider your point in the context of their own experiences. For example, in a talk to explain the benefits of unit testing, you might ask, "Can you think of a time where you wrote some code that worked initially but broke mysteriously later on?" This serves as both a good segue into a point about regression testing and invites the audience to link your talk with their personal experience, making for a more compelling reminder of your points.</p>

<p>Conversely, bombarding your audience with too much tangential data or straying too far from the path of the topic isn't helpful. For this reason it's important, whenever you break out of presentation mode, to minimize any downtime. Giving technical demos is a big danger zone here; make sure everything you need is set up and ready to go when you switch over to demo mode. Now is not the time to forget which function key combination breaks you back to the desktop.</p>

<h3>Confidence</h3>

<p>Confidence is the other crucial piece for effective presentations, and you get it from two sources: intrinsic and extrinsic.</p>

<p><em>Intrinsic confidence</em>, which comes from self-confidence in your expertise and knowledge, is something you can only acquire through experience with your field over time. It's the source of the internal reassurance that what you're saying is a sound opinion. By contrast, <em>extrinsic confidence</em> is measured in how you project yourself to others and the world around you.</p>

<p>The goal is to keep both of these synchronized: you want to appear as confident as you should be, no more and no less. If your extrinsic confidence is much greater than your intrinsic confidence, you are projecting a confidence that is not backed up by actual knowledge or experience. This is disingenuous and of no use to your audience. A few probing questions from observant audience members will make it obvious that you're weak on the facts, so make sure you know your material inside and out. <a href="http://www.tenhundfeld.org/blog/post/2009/05/Four-Rs-of-Presenting.aspx">Thoroughly rehearsing</a> your speech beforehand will help identify soft points and prepare you for the real deal.</p>

<p>Conversely, if your extrinsic confidence is much less than your intrinsic confidence, your points will come across as weak and ineffectual. How much would you trust a speaker that looked unsure of everything he was saying? Body language is a well-studied psychological discipline, and it has some important precepts to inform your speaking with regards to extrinsic confidence. You should have open, welcoming language that encourages others to listen to what you're saying.</p>

<p>Doing so invites the audience to consider your points as you make them and projects self-assurance in your opinions. Closed body language, like leaving your hands in your pockets, crossing your arms, or looking at the floor, signals an unwillingness to engage or connect with the audience. Filming yourself or having a friend watch you give the talk is a good way to make sure this doesn't happen.</p>

<h3>Giving your next talk</h3>

<p>There's plenty of advice to be had about what makes a good technical talk, but I think this captures the essential pieces of what I'd consider an effective presentation. If you've never given one before, hopefully this has encouraged you to put your thinking cap on and get to the podium with something you're passionate about. The technical community can always benefit from hearing more ideas -- perhaps it's time to share yours!</p>

<h3>Footnotes and credits</h3>

<p>Thanks very much to Justin Etheredge for <a href="http://www.codethinked.com/post/2009/05/18/A-Technical-Presenters-Journey-Part-1-Know-Your-Audience.aspx">the excellent idea to write this</a> as part of his technical-talks series.</p>

<p class="footnote" id="fn1"><sup>1</sup> Seinfeld, <a href="http://www.seinfeldscripts.com/ThePilot.html">episode 63 / The Pilot</a></p>]]>
        
    </content>
</entry>



<entry>
    <title>Periscope updated with bit.ly support</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/05/10/periscope-updated-with-bitly-support.page" />
    <id>tag:distilledb.com,2009:/blog//2.53</id>

    <published>Sun, 10 May 2009 07:45:00 -0500</published>
    <updated>Sun, 10 May 2009 07:45:00 -0500</updated>

    <summary>As a result of a live-coding session at this year&apos;s beCamp, Periscope has been updated to v.0.2.2 and now provides bit.ly support.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Announcements" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="greasemonkey" label="Greasemonkey" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="periscope" label="Periscope" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>This weekend I attended <a href="http://barcamp.org/beCamp2009">beCamp2009</a>, where I gave several talks. One of them was on the excellent <a href="https://addons.mozilla.org/firefox/addon/748">Greasemonkey plugin for Firefox</a> and its flexibility and power as a scripting environment. Part of the demo was to add <a href="http://bit.ly">bit.ly</a> support to <a href="http://userscripts.org/scripts/show/41581">Periscope</a> in a live-coding session.</p>

<p>I was pleased at how remarkably easy it was, especially since I was a little nervous about touching code I hadn't seen in a while. As for whether this means that Periscope is well-designed or if it's just those energy drinks temporarily increasing my focus -- well, I'll leave that for you to decide.</p>

<p>But for other curious hackers, the special sauce is in the handler map:</p>




<pre><code class="code js:nocontrols">
// For each domain, we provide a domain handler that
// understands how to provide a meaningful title for the
// periscoped link. We optionally provide an id handler that
// knows how to transform the identifier of each periscoped
// link so that it's easier to find what we're looking for.

// [other handlers elided]
handlerMap["is.gd"]          = {domainHandler: handleIsGd, idHandler: null};
handlerMap["bit.ly"]         = {domainHandler: handleGetTitle, idHandler: null};
handlerMap[""]               = {domainHandler: handleUnknown, idHandler: null};
</code></pre>




<p>Each of the <code>handlerMap</code> lines maps a particular domain name identified from one of the document's links. In this case, when you mouseover a particular <code>bit.ly</code> link, you'll now see the additional information provided by the <code>handleGetTitle</code> handler. Hovering over, say, <a href="http://bit.ly/GH4Cn">http://bit.ly/GH4Cn</a> thus lets you know that this goes to Google's search page.</p>

<p>In a future article I'll show how to write your own handlers. Thanks again to everyone at beCamp2009 for attending my talk. If you have any questions, feel free to drop me a line using my <a href="http://distilledb.com/blog/pages/meta/contact/contact.page">contact page</a>.</p>]]>
        
    </content>
</entry>



<entry>
    <title>Bug reports/feature requests on TicTacGo and Periscope</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/05/05/bug-reportsfeature-requests-on-tictacgo-and-periscope.page" />
    <id>tag:distilledb.com,2009:/blog//2.52</id>

    <published>Tue, 05 May 2009 11:30:00 -0500</published>
    <updated>Tue, 05 May 2009 11:30:00 -0500</updated>

    <summary>If you&apos;re a TicTacGo or Periscope user and you like what you see so far, but you&apos;d like some incremental improvement, let me know directly rather than use the Userscripts page. I don&apos;t look there very often (perhaps once a...</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Announcements" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="announcements" label="announcements" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="periscope" label="Periscope" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="tictacgo" label="TicTacGo" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>If you're a <a href="http://userscripts.org/scripts/show/45777">TicTacGo</a> or <a href="http://userscripts.org/scripts/show/41581">Periscope</a> user and you like what you see so far, but you'd like some incremental improvement, let me know directly rather than use the Userscripts page. I don't look there very often (perhaps once a month) since Userscripts doesn't have a very good notification system. If you'd like updates made, the best way to accommodate those requests is to <a href="http://distilledb.com/blog/pages/meta/contact/contact.page">contact me directly</a>.</p>]]>
        
    </content>
</entry>



<entry>
    <title>Ad-hoc type systems via unit tests in dynamic languages</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/04/29/ad-hoc-type-systems-in-dynamic-languages.page" />
    <id>tag:distilledb.com,2009:/blog//2.49</id>

    <published>Wed, 29 Apr 2009 07:45:00 -0500</published>
    <updated>Wed, 29 Apr 2009 07:45:00 -0500</updated>

    <summary>Do dynamic type systems encourage the proliferation of ad-hoc type checks, and possibly reduce software quality?</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Essays" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="programminglanguages" label="programming languages" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="software" label="software" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<h3>Growing Paynes: Twitter's legacy code</h3>

<p><a href="http://www.codethinked.com/page/About-Me.aspx">Justin Etheredge</a> <a href="http://www.codethinked.com/post/2009/04/05/Do-We-Create-Type-Systems-In-Dynamic-Languages.aspx">drew my attention</a> to a recent article in Artima Developer entitled <em>Twitter on Scala</em>, describing some issues that Twitter encountered as it scaled up, and providing justifications for switching the codebase to a different language. While that is interesting in and of itself, what was really thought-provoking is the excerpt from Alex Payne that Justin cites, reproduced in its entirety here:</p>

<blockquote><p><strong>Alex Payne:</strong> I'd definitely want to hammer home what Steve said about typing. As our system has grown, a lot of the logic in our Ruby system sort of replicates a type system, either in our unit tests or as validations on models. <strong>I think it may just be a property of large systems in dynamic languages, that eventually you end up rewriting your own type system, and you sort of do it badly.</strong> You're checking for null values all over the place. There's lots of calls to Ruby's <code>kind_of?</code> method, which asks, "Is this a kind of <code>User</code> object? Because that's what we're expecting. If we don't get that, this is going to explode." It is a shame to have to write all that when there is a solution that has existed in the world of programming languages for decades now. </p>

<p>&ndash; <a href="http://www.artima.com/scalazine/articles/twitter_on_scala.html">Twitter on Scala.</a> Artima Developer. Published April 3, 2009. Accessed April 7, 2009.</p></blockquote>

<p>Is Alex unfairly scapegoating Ruby for Twitter's failings, and using this as a reason to move to Scala? <a href="http://blog.obiefernandez.com/content/2009/04/my-reasoned-response-about-scala-at-twitter.html">Obie Fernandez</a> seems to think so. If the <code>kind_of?</code> checks need to be proliferated because Ruby has framework and language weaknesses, that's one thing. But if it's Twitter legacy code, that's another matter entirely.</p>

<p>In a Twitter exchange between Alex and Obie, Alex admits the problems were with Twitter, not Ruby:</p>

<blockquote><p><strong>[Alex Payne]</strong> » @[Obie Fernandez] Indeed, using kind_of? in that way is "doing it wrong" in Ruby. But as our codebase grew, it became a necessity to combat bugs.</p>

<p><strong>[Obie Fernandez]</strong> » @[Alex Payne] Doesn't make sense. Do you mean non-deterministic bugs? An in-depth explanation of the bugs you're talking about would be enlightening</p>

<p><strong>[Alex Payne]</strong> » @[Obie Fernandez] Yes, I mean non-deterministic bugs in the giant, legacy, spaghetti parts of our system. Unexpected objects flying around.</p></blockquote>

<p>It's not very fair to be armchair architects and second-guess Twitter's design decisions with the benefit of hindsight. Nonetheless, I'd be willing to bet that a more thorough analysis of the code base would have revealed the source of the "unexpected objects flying around". That is likely to be the real cause of their issues, not the proliferation of ad-hoc type checks.</p>

<p>But are these perhaps part and parcel of the same underlying problem?</p>

<h3>Catching problems</h3>

<p>We've stumbled onto an interesting question: Does the use of dynamic type systems defer some of the responsibility that would otherwise be handled by static type checking to the developer in the form of additional required tests? There's no doubt that this is the case. Alex's experience with the Twitter codebase is compelling (albeit anecdotal) evidence for this point.</p>

<p>To see why, consider that the underlying goal of a static type system is to convert as many programming mistakes as possible into <em>type mismatch errors</em> (TME). A <span class="caps">TME </span>occurs whenever an expression needs to be of some type <code>T</code>, but its static type is of type <code>U</code>, and there is no automatic way to convert <code>U</code> into <code>T</code>.</p>

<p>Possible <span class="caps">TME</span>s are easy to identify statically. But most static languages will be conservative and reject programs which wouldn't have any issues at runtime. In other words, merely having a <em>candidate</em> for <span class="caps">TME </span>can be sufficient to cause a failed compile. For example, the C# compiler will reject a snippet such as the following:</p>




<pre><code class="code c-sharp:nocontrols">
int x = 0;
int y = 1;

if (y == 0) {
  x = "hello!";  // Won't compile, even though it's impossible to reach
}                // this line at runtime and no type error can occur.
</code></pre>




<p>In the above example, the attempt to assign <code>&quot;hello!&quot;</code> to <code>x</code> will be a <span class="caps">TME</span>: an assignment to <code>x</code> for a static type is required to be of the same type as <code>x</code> or of a type that can be implicitly converted to <code>x</code>. In this case, <code>x</code> is an <code>int</code>, and the expression <code>&quot;hello!&quot;</code> is a <code>string</code>. C# has no implicit conversion from a string to an integer, so failure occurs.</p>

<p>But on closer inspection, we see that this line could never be reached to begin with: the <code>y == 0</code> condition cannot possibly be true. We're often told to trust the compiler, and let it do its job, because understanding compiler internals and optimizations seems to be trickier than just writing code. But this example seems to go against that; it's pretty clear that this wouldn't really be a problem. Why can't the compiler be smarter than a human in this almost absurdly trivial case?</p>

<h3>A tradeoff: flexibility of expression for type safety</h3>

<p>The answer is that static type systems try to be as strict as possible to make their analysis easier. In exchange for this strictness, static compilers can make more powerful guarantees about the type safety of their programs at runtime.</p>

<p>Additionally, an assignment like <code>int x = &quot;hello!&quot;;</code> might be considered wrong on its face, since there's a mismatch between what was intended and what was stated. By highlighting such errors at compile time, where they're more easily caught, we save a lot of developer time later down the road. The example we used here is trivial to the point of being a straw man, but in more complex expressions it may not be automatically obvious to a human whether the result is what was intended.</p>

<h3>Lost type safety might need to be compensated for elsewhere</h3>

<p>Many dynamic type systems give us more powerful constructs to express software than static typing does. It's often quicker to just say what you mean in Python, Groovy, or Ruby and have unit tests sort out the details later.</p>

<p>But when you get around to writing those tests, what do you actually need to test? Certainly, the tests you write for a statically typed language will be similar in some respects to dynamically typed ones. You'll still need to test the logical portions of your software -- for example, whether <code>ComputeSalesTax()</code> computes sales tax correctly, or whether a particular <code>Comparable</code> mixin sorts instances the way you expect. So we wouldn't expect to see much difference between unit tests of static and dynamic languages on this front.</p>

<p>But there's some additional work you need to do with the unit tests of dynamic languages. In general, without checking things yourself, you can't be sure at runtime that a particular object will respond to a particular method signature. More generally, you can't know in advance that it conforms to a particular interface.</p>

<p>That means that, if you really do require the full power of static typing in a dynamic language, you will unquestionably need to do some more legwork in testing to be sure that you didn't miss anything. Each and every time you invoke a method, for example, you will need to scrupulously examine its arguments for compatibility with the method signature. In effect, you will have ultimately implemented static typing for the subset of your software being tested, and in a far more verbose (and likely error-prone) way<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/04/29/ad-hoc-type-systems-in-dynamic-languages.page#fn1">1</a></sup>.</p>

<p>This should not be a surprising result: if you want to implement a feature of language <code>L</code> in a different language <code>M</code> using only the provided facilities of <code>M</code>, be prepared for an uphill struggle. This is why, for instance, no one has been successful in getting <a href="http://en.wikipedia.org/wiki/Software_transactional_memory" title="software transactional memory"><span class="caps">STM</span></a> into widespread, mainstream languages: they're not well suited for it.</p>

<p>The real point here is that if you need static typing, you should use a statically typed language. Similarly, if you absolutely need native concurrency primitives, you don't implement them in C#; you use Erlang. Static typing is no different from other language features in this respect.</p>

<h3>Best of both worlds: opt-in type systems</h3>

<p>As developers, that's frustrating to hear, of course. Why can't we have our cake and eat it too? What if we want the syntactical conveniences of Ruby with the static typing of Java?<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/04/29/ad-hoc-type-systems-in-dynamic-languages.page#fn2">2</a></sup></p>

<p>Fortunately, recent additions to mainstream programming languages are beginning to make this sort of thing a reality. For instance, C# 4 will soon have the <a href="http://www.nikhilk.net/Entry.aspx?id=210">much-vaunted <code>dynamic</code> pseudokeyword</a>. This essentially functions as a type modifier; expressions of type <code>dynamic</code> have their evaluation deferred to runtime. Effectively, C# now has what I'll call <em>opt-in duck typing</em>.</p>

<p>Even dynamic languages have toyed with the notion of batting for the other team. Guido van Rossum has entertained the idea of static typing in Python <a href="http://www.artima.com/weblogs/viewpost.jsp?thread=85551">at least once before</a>. And ActionScript 3 has <a href="http://www.artima.com/lejava/articles/actionscript.html">optional static typing</a>, an obvious complement to C#'s approach.</p>

<p>You may be able to see how this feature would be valuable in some circumstances -- perhaps you can even identify situations where it'd be useful in your current project. But more importantly, were you to use these opt-in type systems, your unit tests would change accordingly. A dynamic language using opt-in static typing would see some  manual type checking disappear for the relevant unit tests, and likewise a static language using opt-in dynamic typing might need a few more checks to make sure messages were going around correctly.</p>

<h3>Dynamic unit tests represent your elective use of typing</h3>

<p><strong>Dynamic unit tests, then, are like a barometer for measuring how much you care about typing in a particular segment of your software.</strong> If you're relatively indifferent to type so long as objects conform to some interface (which is generally the case), then your unit tests will reflect this. This is often the case, which explains why dynamic languages can feel so productive and baggage-free.</p>

<p>But if you'll have difficulty unless the types are exactly right, or if things don't line up just the way you expect, your unit tests will be much richer in assertions and checks. They're essentially functioning as a substitute for as-yet-nonexistent language features like opt-in alternative typing systems.</p>

<p>That, I would surmise, is why Alex and the Twitter crew found themselves a new home with Scala. It wasn't that Ruby was a poor language or that it was ill-suited for development of large-scale software. It's that Scala offered the static typing and rigor that was needed for their particular application. It allowed them to transform those pervasive ad-hoc type checks and rely on the language to provide a more stable type system, one where their assertions could manifest as type declarations instead of ad-hoc typing in unit tests.</p>

<p>At the end of the day, a better product resulted, and that's the goal. Our choice of which language features to use is only tangentially related to the quality of software we produce. But if we can make it easier on everybody with elective language features, and avoid the proliferation of checks that arise when these aren't available, so much the better.</p>

<p><hr/></p>

<h3>Footnotes</h3>

<p class="footnote" id="fn1"><sup>1</sup> This of course says nothing about the quality of software produced with dynamic or static languages. That has, in my view, less to do with languages and more to do with design decisions and disciplined testing.</p>

<p class="footnote" id="fn2"><sup>2</sup> This is really a manifestation of a sort of <a href="http://en.wikipedia.org/wiki/Pareto_principle">80/20 rule</a>: we're getting most of our day-to-day language productivity directly from a relatively small subset of language features. The frustration arises because we want to do that last 20%, but it'd require clumsily accreting stuff from the remaining 80%.</p>

<p>http://www.codethinked.com/post/2009/04/05/Do-We-Create-Type-Systems-In-Dynamic-Languages.aspx</p>]]>
        
    </content>
</entry>



<entry>
    <title>Participating in C&apos;ville Wordplay</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/04/23/participating-in-cville-wordplay.page" />
    <id>tag:distilledb.com,2009:/blog//2.51</id>

    <published>Thu, 23 Apr 2009 07:45:00 -0500</published>
    <updated>Thu, 23 Apr 2009 07:45:00 -0500</updated>

    <summary>I&apos;m participating in Charlottesville Wordplay, a charity game show to benefit Literacy Volunteers of Charlottesville-Albemarle.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Announcements" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="announcements" label="announcements" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>Tomorrow night I'll be part of a three-person team in <strong>Wordplay</strong>, Charlottesville's ad hoc annual gameshow. Come watch me and my team, the Blue Screens of Death (sponsored by the <a href="http://neonguild.org">Neon Guild</a>), take on all comers. All proceeds benefit the <a href="http://literacyforall.org/">Literary Volunteers of Charlottesville-Albemarle</a>, a great organization that's devoted to improving adult literacy in and around our area.</p>

<p>Spectator tickets are $39 ahead and $45 at the door. You'll get an incredible dinner catered by the Omni Hotel and Boar's Head Inn, and more importantly you'll get the schadenfreude satisfaction of watching me possibly get owned (however unlikely that may be). Hope to see you there, although I concede that's a bit of a long shot for those of you in NoVA and elsewhere. Even if you can't make it, you can show your support for the cause with a donation. See the <a href="http://cvillewordplay.org">Charlottesville Wordplay site</a> for more info.</p>]]>
        
    </content>
</entry>



<entry>
    <title>TicTacGo v.0.1 released</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/04/03/tictacgo-v01-released.page" />
    <id>tag:distilledb.com,2009:/blog//2.48</id>

    <published>Fri, 03 Apr 2009 11:15:00 -0500</published>
    <updated>Fri, 03 Apr 2009 11:15:00 -0500</updated>

    <summary>TicTacGo helps you search Twitter effortlessly by converting hashtags you encounter in tweets into clickable search links.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Announcements" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="greasemonkey" label="Greasemonkey" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="tictacgo" label="TicTacGo" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="twitter" label="Twitter" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p>I've released a handy Firefox script for Greasemonkey entitled <a href="http://userscripts.org/scripts/show/45777"><em>TicTacGo</em></a>. It converts hashtags in tweets into searchable links that you can click. Hashtags are Twitter's ubiquitous user-generated semantics system, wherein people mark interesting parts of their Tweets by prefixing them with the hash sign, <code>#</code>. Each link generated by TicTacGo takes you to the Twitter search page for that hashtag. For example, the <code>#awesome</code> hashtag would be converted into a link that takes you to <a href="http://search.twitter.com/search?q=%23awesome">Twitter search results for <code>#awesome</code></a>.</p>

<div class="image-holder"><a href="http://distilledb.com/blog/assets/entry-assets/by-id/48/twitter-screenshot.png" class="lightbox"><img src="http://distilledb.com/blog/assets/entry-assets/by-id/48/twitter-screenshot.png" alt="" class="figure" /></a></div>

<p>You can read more info at the <a href="http://userscripts.org/scripts/show/45777">script page</a>, or read about <a href="http://distilledb.com/blog/pages/meta/projects/projects.page">other projects I'm working on</a>. Feedback is welcome!</p>

<p>Thanks to <a href="http://ashish.tonse.com/">Ashish Tonse</a> for reminding me about how annoying it was that Twitter doesn't have this. Frustration is the mother of invention just as much as necessity, it seems!</p>]]>
        
    </content>
</entry>



<entry>
    <title>Restoring accidentally deleted files with Git</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/03/29/restoring-accidentally-deleted-files-with-git.page" />
    <id>tag:distilledb.com,2009:/blog//2.37</id>

    <published>Sun, 29 Mar 2009 13:00:00 -0500</published>
    <updated>Sun, 29 Mar 2009 13:00:00 -0500</updated>

    <summary>Ever accidentally lose some work because you didn&apos;t commit to your version control system&apos;s repository? That won&apos;t happen anymore if you&apos;re using Git.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Articles" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="git" label="Git" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="instructional" label="instructional" scheme="http://www.sixapart.com/ns/types#tag" />
    <category term="versioncontrol" label="version control" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<p><a href="http://git-scm.com/">Git</a>, the distributed version-control system, has won me over. That's not to say that <code>svn</code> is a bad choice, only that I now prefer <code>git</code> when starting anew. It's also got a highly effective, if somewhat bland, <span class="caps">GUI </span>interface provided by <code>git-gui</code>.</p>

<p>I'm happy to see more people in my local community using Git, or at least <a href="http://www.codethinked.com/post/2009/03/27/Always-Assume-That-Youre-The-Problem.aspx">thinking about it</a>. A recent mishap and its happy ending put another solid notch in <code>git</code>'s column for me, since I'm not sure that the result would have been the same if I were using another <span class="caps">VCS.</span></p>

<h3>Whoops</h3>

<p>I just accidentally deleted over 40 files in a project I'm working on. And some of those files had local modifications to them, representing about 45 minutes of work or so. It therefore wasn't safe to just restore from the last commit. What to do?</p>

<p>Had I been using <span class="caps">VSS, CVS,</span> Subversion or most other version control systems, this would be an unfortunate and frustrating loss of time. In most version-control environments, whatever isn't committed is gone forever. Lose your local copies and you've lost everything.</p>

<h3>Git to the rescue</h3>

<p>Git, however, has the concept of an <em>index</em>, often called the <em>staging area</em>. In the staging area, you assemble the changes you're making into a meaningful unit that is then committed to your desired repository.</p>

<div class="image-holder"><img src="http://distilledb.com/blog/assets/entry-assets/by-id/37/git-adding-to-index.png" alt="" class="figure" /><div class="caption">Typically, changes are first added to the index, then committed to a repository. Each commit generates a unique identifier.</div></div>

<p>Whenever you perform <span class="script-section">git add</span>, you add changes that have not yet been staged to the staging area. Note that you're adding <em>individual changes</em>, not <em>files</em>. Although Git is of course aware of files, it is effectively storing changes and what they apply to, not the contents of files themselves. By default <code>git</code> assumes you would like to add all the changes of each file you mention, but you can use <span class="script-section">git add -p</span> to enter an interactive mode where you select desired changes.</p>

<p>Periodically adding changes to the staging area is thus a good idea; it signals that you would like to eventually commit these changes and that they are important.</p>

<h3>Restoring the deleted files</h3>

<p>Because our changes are living in the staging area, restoring the deleted files is almost trivial. To see the difference between the staging area and what's in the working tree, you can invoke <a href="http://www.kernel.org/pub/software/scm/git/docs/git-ls-files.html"><code>git ls-files</code></a>. Specifically, we're interested in deleted files:</p>



<pre class="script-section">$ git ls-files --deleted</pre>



<p>Assuming that looks like what you want, go ahead and checkout each file back from the index.</p>



<pre class="script-section">$ git ls-files -d | xargs git checkout --</pre>



<p>Huzzah; your work is saved! Not bad for two commands.</p>

<p><hr/></p>

<h3>Some additional notes</h3>

<p>Normally, if you use <code>git checkout</code> without specifying a file path, you'll update your index and working tree to reflect whatever current or new branch you'd like to look at. </p>

<p>However, in this case, we pass a list of paths to <code>git checkout</code> originating from <code>git ls-files</code>. When you use <code>git checkout</code> and you specify a local path <code>git</code> checks any changes out from the index instead. This restores files to their state the last time you added them.</p>

<p>You can read more information about how <code>git checkout</code> works at the <a href="http://www.kernel.org/pub/software/scm/git/docs/git-checkout.html">online manual</a>.</p>]]>
        
    </content>
</entry>



<entry>
    <title>Sneakernet still reigns supreme</title>
    <link rel="alternate" type="text/html" href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page" />
    <id>tag:distilledb.com,2009:/blog//2.46</id>

    <published>Sat, 28 Mar 2009 12:00:00 -0500</published>
    <updated>Sat, 28 Mar 2009 12:00:00 -0500</updated>

    <summary>What&apos;s the quickest way to move your data from point-to-point? Hint: the answer might not be the Internet.</summary>
    <author>
        <name>John</name>
        
    </author>
    
        <category term="Articles" scheme="http://www.sixapart.com/ns/types#category" />
    
    <category term="technology" label="technology" scheme="http://www.sixapart.com/ns/types#tag" />
    
    <content type="html" xml:lang="en" xml:base="http://distilledb.com/blog/">
        <![CDATA[<h3>Disaster strikes</h3>

<div class="sidebar-graphic"><img src="http://distilledb.com/blog/assets/entry-assets/by-id/46/headache.jpg" alt="" /></div>

<p>A series of automated text messages bombards your BlackBerry, indicating a crisis at the data center -- <em>just</em> as you've left work for the rest of Friday afternoon. As your phone chirps noisily with each incoming text, the sinking feeling in the pit of your stomach coalesces into a knot of despair. With a mounting sense of horror, your eyes settle on two weekend-destroying words: <strong>data corruption</strong>.</p>

<p>More details trickle in from your team members, and the pieces begin to fit together. You have a large amount -- let's say 2 terabytes -- of mission-critical information which has suddenly been corrupted at your office in Los Angeles. Although you've been doing regular backups, these are kept offsite in New York City at your company's headquarters. A staff member there has verified these copies are secure, clean, and uncorrupted. All the backup archives are accessible through your company's high-speed <span class="caps">VPN </span>and are available immediately.</p>

<p>But the data needs to be restored by Monday morning, or heads will roll (including yours, in all probability). Assume that the only thing you need to do to get back to normal is wipe out the corrupted data and replace it with the fresh copy. <strong>What's the fastest way to do this?</strong></p>

<h3>Some not-so-good options</h3>

<div class="sidebar-graphic"><img src="http://distilledb.com/blog/assets/entry-assets/by-id/46/bandwidth-shirt.jpg" alt="" /></div>

<p>This seems like a no-brainer, doesn't it? Just <code>ssh</code> over to the remote machine and start a <a href="http://en.wikipedia.org/wiki/Secure_copy"><code>scp</code></a> session. Sit back and wait for the bits to finish streaming over the wire, and you're golden.</p>

<p>Alas, if only things were that easy. Reality is going to throw a monkey wrench into your weekend plans, for it seems impossible to finish this task by Monday morning. Consider, for example, how long restoring 2 TB of data will take with a relatively juiced-up cable connection at 16 Mb/s: <strong>12.14 days</strong>, running at top speed. You'll finish the Wednesday after next, which is not nearly quick enough to make the deadline.</p>

<p>If you had a direct fiber connection at, say, 50 Mb/s, you could speed things up by factor of about 3; now it takes about <strong>3.88 days</strong>. You won't finish until just after Tuesday lunch, still too late. What to do?</p>

<h3>Sneakernet to the rescue</h3>

<div class="sidebar-graphic"><img src="http://distilledb.com/blog/assets/entry-assets/by-id/46/sneakernet-image.jpg" alt="" />:</div>

<p>There's a faster, much lower-tech way: the <em>sneakernet</em> -- the transfer of data via physical storage. Despite tremendous Moore's Law-pace advancements in some areas of bandwidth innovation, the sneakernet wins hands down when it comes to delivering large blocks of data over channels where latency isn't important.</p>

<p>In this case, <strong>just ship the backup drive overnight to Los Angeles from the New York office.</strong> Even allowing for the time it'll take to restore from the backup, short-range same-bus speeds will easily trump transit over the public Internet. Doing the back-of-the-envelope math here yields some surprising results:</p>


<ul>
<li>Suppose you order the drive shipped overnight at 6:00 PM local time and it arrives at noon the next day. This is an <strong>18-hour</strong> window for the transit time.</li>
</ul>




<ul>
<li>A typical drive-to-drive copy speed for modern <span class="caps">SATA </span>drives is on the order of 40 MB/s when the drives are on the same bus. It will take an additional (2 TB) / (40 MB/s) = <strong>14.56</strong> hours to copy the drive.</li>
</ul>



<p>We thus copied 2 TB of data in approximately 32.5 hours, for a net rate of about <strong>160 Mb/s</strong>. If we had specialized cloning hardware, we could get a significant speed boost on the copying phase, perhaps to around a net speed of <strong>240 Mb/s</strong>.</p>

<p>Further, because there's no practical cap on the number of drives you could send, absurd levels of bandwidth can be reached this way. Suppose you overnighted a box full of 100 such drives. This would only be roughly five times as expensive, but you'd wind up with a hundred-fold bandwidth improvement, to a net speed of 16 Gb/s. That's almost twice as fast as the <a href="http://www.internet2.edu/lsr/">current Internet2 speed record</a> of 8.8 Gb/s.</p>

<h3>The price of the sneakernet</h3>

<p>Of course, nothing comes free. By using the sneakernet, you have implicitly accepted a number of tradeoffs:</p>


<ul>
<li><em>Up-front costs.</em> Shipping an insured hard drive overnight cross-country sets you back about $75 at FedEx as of this writing. That's roughly the cost of a month or two of unlimited-bandwidth residential Internet access. In the emergency scenario described above, the sneakernet's probably your best shot, so you might have no choice. And it's probably worth this relative pittance to save your company's business operations. Finally, sneakernet is usually vastly cheaper in terms of dollars per bandwidth.</li>
</ul>




<ul>
<li><em>Latency.</em> The time it takes for data to arrive after being requested is much smaller with a digital channel than with the sneakernet. Even if the bandwidth is low, the lower bound on latency is the speed of light, yielding responses of milliseconds instead of days. It is thus more appropriate when you have large blocks of data and small pieces of them aren't useful without the rest.</li>
</ul>




<ul>
<li><em>Much bigger carbon footprint.</em> Pushing bits and bytes around is cheap; an electron weighs around one million trillion trillionth of a kilogram (~10<sup>-32</sup> kg). But your packages cost far more energy to move and ship around than electrons do, and they burn dozens of orders of magnitude more fossil fuels. This isn't terribly green.</li>
</ul>



<p>On the other hand, note that as data density improves without any corresponding change in bandwidth, the sneakernet becomes more attractive. For example, there's no appreciable difference in shipping a 2 TB hard drive than a 250 GB one if they have the same form factor.</p>

<h3>How does the sneakernet stack up?</h3>

<p>As we've seen, physical transport of data can be the most effective means when you have either short physical distances or very large amounts of data, albeit at a tradeoff. This table summarizes some of the effective bandwidths you'll get [9], with sneakernet modes of transport italicized.</p>

<table><tr><th>transport channel</th><th>effective bandwidth (Kb/s)</th></tr><tr><td>plain old telephone service (POTS)</td><td align="right">29</td></tr><tr><td>Integrated Services Digital Network (ISDN) chunk</td><td align="right">64</td></tr><tr><td><span class="caps">FCC </span>"broadband" definition (2002)<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn1">1</a></sup></td><td align="right">200</td></tr><tr><td><span class="caps">FCC </span>"broadband" definition (2008)<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn1">1</a></sup></td><td align="right">768</td></tr><tr><td>human stereo hearing (approximate)<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn2">2</a></sup></td><td align="right">~1,536</td></tr><tr><td><span class="caps">CWA </span>mean broadband speed, US (2007)<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn3">3</a></sup></td><td align="right">1,946</td></tr><tr><td><span class="caps">IEEE</span> 802.11 (physical)</td><td align="right">2,048</td></tr><tr><td>canine stereo hearing (approximate)<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn4">4</a></sup></td><td align="right">~4,608</td></tr><tr><td><span class="caps">ADSL </span>downstream (typical)</td><td align="right">6,144</td></tr><tr><td><span class="caps">CWA </span>mean broadband speed, Canada (2007)<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn3">3</a></sup></td><td align="right">7,782</td></tr><tr><td><span class="caps">IEEE</span> 802.11b (physical)</td><td align="right">11,264</td></tr><tr><td>human stereo vision (approximate)<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn6">6</a></sup></td><td align="right">~19,700</td></tr><tr><td><span class="caps">IEEE</span> 802.11a (physical)</td><td align="right">40,960</td></tr><tr><td><span class="caps">IEEE</span> 802.11g (physical)</td><td align="right">55,296</td></tr><tr><td><span class="caps">CWA </span>mean broadband speed, Japan (2007)<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn3">3</a></sup></td><td align="right">62,464</td></tr><tr><td><span class="caps">IEEE</span> 802.11n (user throughput)</td><td align="right">75,776</td></tr><tr><td>single 250 GB hard drive via FedEx, <span class="caps">NYC </span>to <span class="caps">LAX</span><sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn5">5</a></sup></td><td align="right"><em>~150,000</em></td></tr><tr><td>freight pallet (500 &#215; 250 GB disks), <span class="caps">NYC </span>to <span class="caps">LAX</span><sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn5">5</a></sup></td><td align="right"><em>~1,200,000</em></td></tr><tr><td>walk a 8 GB <span class="caps">USB </span>stick across a room<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn5">5</a></sup></td><td align="right"><em>~6,000,000</em></td></tr><tr><td>Internet IPv6 speed record (2008)<sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn7">7</a></sup></td><td align="right">9,227,469</td></tr><tr><td>oil tanker (10% full of 250 GB disks), <span class="caps">NYC </span>to <span class="caps">LAX</span><sup class="footnote"><a href="http://distilledb.com/blog/archives/date/2009/03/28/sneakernet-still-reigns-supreme.page#fn5">5</a></sup></td><td align="right"><em>~200,000,000,000</em></td></tr></table>

<p>I've also compiled these into a chart that you can view <a href="http://distilledb.com/blog/assets/entry-assets/by-id/46/sneakernet-vs-internet.png">here</a>.</p>

<div class="image-holder"><a href="http://distilledb.com/blog/assets/entry-assets/by-id/46/sneakernet-vs-internet.png" class="lightbox"><img src="http://distilledb.com/blog/assets/entry-assets/by-id/46/sneakernet-vs-internet-thumbnail.jpg" alt="" class="figure" /></a></div>

<p><hr/></p>

<h3>Footnotes and credits</h3>

<p class="footnote" id="fn1"><sup>1</sup> <a href="http://news.cnet.com/8301-10784_3-9898118-7.html"><span class="caps">FCC </span>approves new method for tracking broadband's reach.</a> cNet News, March 19, 2008. Accessed March 28, 2009.</p>

<p class="footnote" id="fn2"><sup>2</sup> Based on CD-quality 44.1 kHz sampling rate, 16-bit sampling resolution, 2 channels = ~1.5 Mb/s.</p>

<p class="footnote" id="fn3"><sup>3</sup> <a href="http://arstechnica.com/tech-policy/news/2007/05/survey-average-broadband-speed-in-us-is-1-9mbps.ars"><span class="caps">CWA </span>survey: average broadband speed in US is 1.9Mbps.</a> Ars Technica, May 29, 2007. Accessed March 28, 2009.</p>

<p class="footnote" id="fn4"><sup>4</sup> Based on [2], and assuming that dogs can hear a frequency range about three times as wide as a human.</p>

<p class="footnote" id="fn5"><sup>5</sup> Assumes the following transport times:</p>


<ul>
<li>Crossing a room: 10 seconds</li>
<li>FedEx overnight: 18 hours</li>
<li>Shipping a freight pallet: 10 days</li>
<li>Shipping via oil tanker: 6 weeks</li>
</ul>



<p class="footnote" id="fn6"><sup>6</sup> <a href="http://medgadget.com/archives/2006/07/the_bandwidth_o.html">Bandwidth of the human eye.</a> medGadget, July 28, 2006. Accessed March 28, 2009.</p>

<p class="footnote" id="fn7"><sup>7</sup> <a href="http://www.internet2.edu/lsr/">Internet2 Land Speed Record.</a> Internet2. Accessed March 28, 2009.</p>

<p class="footnote" id="fn8"><sup>8</sup> I use "effective bandwidth" instead of "throughput" since it's hard to say what the bandwidth of some of these physical channels are.</p>

<p><a href="http://www.overbyte.org/illustrations/sneakernet-illustration.html">Sneaker illustration</a> by <a href="http://icr.23d.com/">Ivan C. Reyes</a> of <a href="http://www.overbyte.org">Overbyte</a>.</p>]]>
        
    </content>
</entry>


</feed>


