March 24, 2009


Some­times we tell peo­ple that things live for­ever on the inter­net and that any­one can find them (so don’t post that pic­ture of your­self drink­ing alco­hol, young man), but I want to high­light how some impor­tant things from just a cou­ple of months ago are becom­ing impos­si­ble to find. If we’re not care­ful, the haystack is going to dis­ap­pear, never mind the needle.

For exam­ple, take the dis­cus­sion that hap­pened on Twit­ter dur­ing ALA’s Mid­win­ter Meet­ing just under two months ago. The Meet­ing had a hash­tag for track­ing con­tent (#alamw09), and almost every­one used it most of the time. There was a lot going on in that tag, so much so that I thought it was a tip­ping point for the Asso­ci­a­tion in terms of com­mu­ni­ca­tion tools. I even debriefed what hap­pened on Twit­ter for ALA staff after­wards so that they’d be able to see the patterns.

But try to find that dis­cus­sion now, and it’s almost impos­si­ble. Most peo­ple (includ­ing me) rely on Twitter’s search engine (which was for­merly called “Sum­mize” and run by a dif­fer­ent com­pany until Twit­ter bought it). If you search Twit­ter now for the #alamw09 hash­tag, you get exactly one page of results (yes­ter­day there were two), and only a cou­ple of those tweets were actu­ally posted dur­ing the event itself. If you look up #alamw09 at, you’ll get more results from the Meet­ing itself, but there’s still only one page, and you had to have man­u­ally fol­lowed the Twit­ter account for them to have tracked your tweets, so even if you could see older results than what shows, it would be an incom­plete archive at best. Search Tech­no­rati for #alamw09 and you get eight blog posts. Iron­i­cally, you can get most of the pub­lic tweets from Mid­win­ter by search­ing Friend­Feedlooking for anything from #ala2008 on Twitter, although there again FriendFeed saves the day, but for how long?

So for all of our aggregation attempts of that Twitter content, they may only work in the moment for the moment. It turns out they're mis­cel­la­neous *and* search­able in only one place (for now), a pretty bad com­bi­na­tion in hind­sight. Thank heav­ens I favor­ited in Twit­ter so many of the alamw09 tweets, although that’s still not ideal. I have to man­u­ally page through them to find the ones I want, and I already have 35 pages of favorites.

After Mid­win­ter, I tried to start mov­ing my #alamw09 favorites into Ever­note so that I’d be able to search and group them, but I haven’t had time to com­plete that process, and I just can’t seem to train myself to add new tweets there as I favorite them. The ratio of effort between click­ing on a star and fill­ing out a few words of meta­data is just too much in the mid­dle of my day, so this looms as a project in my future if I really want to save this stuff. Even then, there’s no guar­an­tee Ever­note will stick around, but at least I can export from it.

So if you were using a hash­tag to aggre­gate con­tent, think­ing it would be eas­ier to find it all again in the future, think again. You’re going to have to do some­thing more proac­tive and man­ual than rely­ing on Twitter’s search engine or Google. You’ll have to decide what level of ephemer­a­li­ness you’re com­fort­able with for that con­ver­sa­tion, because you may not be able to get back to it if you let some­one else man­age access to the archive. In this con­text, it’s a shame so much of the con­ver­sa­tion has moved away from blog com­ments (where indi­vid­u­als can openly archive it) to Twit­ter and Friend­Feed. And if you’re a gov­ern­ment or archive orga­ni­za­tion look­ing to pre­serve this kind of dig­i­tal con­tent, the stakes are get­ting raised on you.

Am I miss­ing any other options for find­ing past hash­tag con­ver­sa­tions? Please tell me yes in the comments.

Adden­dum: Poten­tial ideas for archiv­ing — you could sub­scribe to the RSS feed of a hash­tag in an RSS reader and export them, right? Or sub­scribe to the RSS feed via email? Other ideas?

  1. Ice­rocket has a Twit­ter search func­tion, and this responds to say that it has over 1,000 results using #alamw09, but when you try and page through them, you only have 1 page of 14 hits. might be worth a look, since that does pro­duce a few more, but seems to hang (at least for me) when try­ing to recover more. If you do a Google search for #alamw09 that seems to find 340+ results. This is clearly insane, since Twit­ter stores everyone’s indi­vid­ual tweets — I checked mine back to the mid­dle of last year and they all seem to be there; at least as far as I can tell.

    How­ever, look at it whichever way you like, in many respects Twit­ter search is shot. It’s not a prob­lem with the search engines that I’ve looked at, since they all seem pretty con­sis­tent (as well as the ones that I men­tioned above I played with another 15 or so), so it’s got to be access to the data­base I sus­pect. Of course, if you really want to go totally nuts — do a search using Fire­fox and then another same search using IE, and you may well find that you get dif­fer­ent results and — wait for it — you can search back fur­ther using Fire­fox than you can with IE! How mad is that?

    Comment by Phil Bradley — March 24, 2009 @ 7:34 am

  2. I set up a lifestream on my blog using Sweet­cron ( which is cur­rently in pri­vate beta. I have Sweet­cron sub­scribed to the feed for a search for cksthree (my old twit­ter name) and another for cksam­ple, my cur­rent one. As a result all my tweets and all the tweets of peo­ple reply­ing to me get archived in my lifestream’s data­base (where I can back it up, search it, and have it for­ever). It’s a very flex­i­ble sys­tem though, so you can sub­scribe to archive any feed, and archiv­ing a search for a #hash­tag of some sort would seem to par­tially address your prob­lem (at least going for­ward). Here’s my lifestream, btw: I have my Back­Type tracked com­ments going in there too, so even­tu­ally this com­ment will show up there.



    Comment by C.K. Sample III — March 24, 2009 @ 8:00 am

  3. Jenny,

    I like the idea of sub­scrib­ing to the feed by email (I use, and then use email fil­ter to put tweets directly into a folder so as not to over­whelm the inbox.

    Comment by Peter Bromberg — March 24, 2009 @ 8:01 am

  4. I won­der if the Twit­ter API could be used to col­lect and archive hash­tags. This wouldn’t help ret­ro­spec­tively, but it could be done for future con­fer­ence tags at least.

    Comment by DerikB — March 24, 2009 @ 8:39 am

  5. Jenny–

    A reflec­tor could do the job of aggre­gat­ing and stor­ing con­tent with­out work on any user’s part. Instead of using hash­tags, every­one fol­lows a cer­tain user, and direct-messages that user to tweet to the group. The user (which is scripted) then retweets that direct mes­sage. The upshot is that that user now has a copy of every tweet DMed to it, which it can do what­ever it likes with (stores in a SQL DB, pushes to a flat text file, what­ever.) Wolf Rentzsch did this at C4[2] and it worked really, really well in terms of real-time inter­ac­tion dur­ing the con­fer­ence. (

    Another option is for the hashtag’s “owner” (or any­one, really, who wants to keep a record of a tag) to pull tweets in real-time and store them. Coudal does this on their Layer Ten­nis Group page, which is a self-updating ver­sion of a hash­tags page for a par­tic­u­lar hash­tag in order to pro­vide com­men­tary on their Fri­day live matches ( Sim­ple to do using the statuses/public_timeline method in the Twit­ter API and pars­ing through the XML.

    Comment by CJC — March 24, 2009 @ 8:48 am

  6. It is hard to com­pre­hend, with cheap disk­space, why Twit­ter tosses tweets– as there ought to value in being able to mine pat­terns, trends over time. I strug­gle to find an excuse, and I imag­ine for a librar­ian, not archiv­ing is close to a sin ;-)

    As cited above, Sweet­Cron has poten­tial because you cache your stream on your own server, though it’s search is across all the bits you pull into the stream. I’d think an RSS reader or web tool that caches posts might do it by sub­scrib­ing to a twit­ter search feed (its been so long since I used Blog­Lines, don;t they cache past posts??)

    Comment by Alan Levine — March 24, 2009 @ 8:57 am

  7. I have been won­der­ing about if and how tweets should be archived. A ran­dom hunt in the way­back machine found that it some­times archives twit­ter pages:*/ … though even this exam­ple only cov­ers 2007 and likely only cap­tures a sub­set of the tweets.

    Of course this doesn’t help with archiv­ing hash-tagged tweets together as a group — unless we can sub­mit a spe­cific URL for all those posts. Unfor­tu­nately plans to archive a page like are thwarted by Twitter’s robots.txt file ( ) that spec­i­fies “Disallow: /*?”.

    Comment by Jeanne — March 24, 2009 @ 9:53 am

  8. I’ve been think­ing about this lately too. I haven’t found a good way to archive a hash­tagged con­ver­sa­tion either. I do peri­od­i­cally archive my own twit­ter posts via this site though:

    Comment by Chad — March 24, 2009 @ 10:15 am

  9. Did any­body ask the Twit­ter peo­ple about this? Shouldn’t the ques­tions start there?

    Comment by Wilfred Drew — March 24, 2009 @ 10:25 am

  10. Just spot­ted this framework/tool that men­tions that it has ‘Sup­port for Twit­ter as a source’:
    Looks wor­thy of fur­ther exam­i­na­tion. Has any­one tried this yet?

    Comment by Jeanne — March 24, 2009 @ 11:08 am

  12. Thank you for draw­ing atten­tion to the fragility of Twit­ter as an archive. There are many col­lec­tions of tweets which have some his­tor­i­cal value, and it’s impor­tant that they be pre­served and remain accessible.

    Twit­ter Search seems now to have a lim­i­ta­tion to 7 days’ traf­fic in many cir­cum­stances. For me today (25 March 2009) search­ing for #alamw09 on Inter­net Explorer returns 6 tweets going back only 7 days to this one dated 17 March.

    How­ever, on Fire­fox the same Twit­ter search yields 14 results going back to this tweet of 8 Feb­ru­ary. I don’t know whether IE and Fire­fox are intrin­si­cally treated dif­fer­ently by Twit­ter Search, or if my browsers have dif­fer­ent cook­ies affect­ing the result.

    You can force IE to show older tweets with the until: option on search. For exam­ple, try–02-13.

    Comment by Jim Richardson — March 24, 2009 @ 5:09 pm

  13. I would rec­om­mend search­ing with twemes. This will pull all tweets from alamw09 and flickr pho­tos tagged with alamw09. Sev­eral pages of results and they all seem to still work. I really love twemes because it mashes dif­fer­ent web­site tags.

    Comment by Brett Kochendorfer — March 25, 2009 @ 7:30 am

  14. As noted disk space is cheap — but the cost is not zero and there­fore some­one has to pay. Since Twit­ter is free that means the own­ers pay.

    Twit­ter itself is not intended to be any sort of “per­ma­nent” com­mu­ni­ca­tion. E-mail isn’t, either. E-mail clients retain mail because they were pro­grammed to do so, retain­ing mes­sages isn’t part of the pro­to­col def­i­n­i­tions (POP3, IMAP, SMTP, MIME, etc.). Some folks even have longer-term archiv­ing tools which use APIs to read e-mail and archive it later.

    What I am say­ing is that some­one could/should write a Twit­ter client that lets you fol­low a Twit­ter feed or feeds, fil­ter­ing on var­i­ous strings (pri­mar­ily hash tags), and adding them to some sort of archive file. When­ever you want to make sure tweets are retained, sub­scribe this client to it, con­fig­ure how and where it archives the tweets, and you’re done.

    Comment by Tim — March 25, 2009 @ 3:15 pm

  15. Thanks Brett for the pointer to twemes. How­ever, it too seems to be incom­plete: at 26 March 2009, shows noth­ing more recent than 31 Jan­u­ary, while shows num­bers of tweets since then.

    Comment by Jim Richardson — March 25, 2009 @ 4:29 pm

  19. I used Twit­ter to live­blog ILI08 — a process which I doc­u­mented at and

    I used the twit­ter search api to then dis­play the twit­ter feed in a vari­ety of ways — and over time this feed is los­ing tweets as you describe.

    I did sub­scribe to the twit­ter feed at the time via Google reader — and so have an archive of the tweets there as well, so I can get them out again. You could obvi­ously use other RSS read­ers, or use some­thing like Out­twit which would allow you to archive them in your email. Another approach could be to use a Word­Press plu­gin (can’t remem­ber the name) which can take an RSS feed and auto­mat­i­cally post the con­tents on your blog.

    Comment by Owen Stephens — March 30, 2009 @ 3:03 pm

  20. Why not sim­ply use a on-demand web archive ser­vice like to save your twit­ters? Then save the links in Ever­note and/or Zotero like application?

    Twit­ter + Back­upURL at

    Comment by Ryan Williams — March 31, 2009 @ 6:46 am

  21. Has any­one come up with any tried solu­tions to this prob­lem! This is a tremen­dous dis­cus­sion, but it seems that it’s mostly con­jec­ture, or did I just hap­pen to miss some­one that nailed it. Great blog, btw!

    Comment by BRubinstein — April 3, 2009 @ 8:02 am

  22. It seems to me that this dis­cus­sion an exam­ple of two belief-systems:
    1) The Inter­net is an Archive: every­thing has his­tor­i­cal value
    2) The Inter­net is ephemeral: every­thing changes, noth­ing is for ever.

    In some of the “archivists” views expressed here, there is also a bit of “Some­one else’s job” too: Every­thing should be archived, every­thing should be dis­cov­er­able, every­thing should be con­nected… and some­one else should be doing for me.

    My own view, which formed when the Inter­net was young, is that noth­ing does last for ever: infor­ma­tion (like knowl­edge) is ever evolv­ing. I don’t think you can keep every­thing: do you keep the things that are plain /wrong/ (pae­dophilia, tor­ture, how to blow up The White House, etc); what hap­pens when things become “polit­i­cally incor­rect” (an exam­ple being Robinson’s Jam here in the UK — they had to stop putting “gol­li­wogs” on their jars); who decides when some­thing is impor­tant to keep, and what can be dropped?

    It seems to me that the rise of hastags is an exam­ple of peo­ple assum­ing one thing, and then dis­cov­er­ing that the world doesn’t work that way.…
    Just like you do when you go to infant school (mummy isn’t there when you want her); and to col­lege (“actu­ally, the are quite fas­ci­nat­ing”); and when you start a degree (“Every­thing they taught you in school is wrong”)

    Yes, it would be lovely if The Inter­net mys­ti­cally kept every­thing we were inter­ested in… but that is an expo­nen­tial growth in mostly garbage.…

    Comment by Code Gorilla — April 3, 2009 @ 9:51 am

  23. Hi Jenny,

    It is strange how tweets seem to appear and dis­ap­pear as the tubes get plugged. Here’s a pos­si­ble long-term solu­tion to make Twit­ter as lit­tle less disposable.

    Go to and use this syn­tax:*/status/ #hashtag

    An ear­lier com­menter was on the right track using the “site:” Google oper­a­tor, but you need to add the sta­tus & wild­card to get just tweets.

    Of course with alerts you have to set them up BEFORE the con­fer­ence, event, etc. or at least before Google indexes the tweets (As short as two hours, look for this time to get even shorter as Twit­ter search is touted as a com­peti­tor for Google.) In my test of this I set my type of alert to com­pre­hen­sive, and time span to “daily”, and my results equaled, deliv­ered right to my email. Glad to for­ward it if that’s con­fus­ing at all…

    Hope that helps :)

    Comment by @dereknp81 — April 10, 2009 @ 5:54 pm

  24. remem­ber to do a search of key­word

    Google keeps an archive of pages for a while, so even if it dis­ap­peared from its orig­i­nal place, you can still find it many times doing a search directly on Google.

    Comment by Jackie — April 12, 2009 @ 4:04 pm

  26. Seri­ously — how much data do you intend to ‘squir­rel’ away. Live in the moment. Twit­ter is a ‘social’ medium, not an archival device.

    Comment by Flip — April 13, 2009 @ 6:33 pm

  27. As a newish Twit­ter user, thanks for the heads up. I’m still not sure what to make of it all, and the thought of it being here today, gone tomor­row, even on the Internet…I never would have thought of it. Time to load up the favorites bar.

    Comment by Josh Hanagarne — May 4, 2009 @ 7:14 pm

  28. twit­ter, tweets, twemes,… I tend to agree with Flip, Twit­ter is used by so many users as a social micro-blog, that the vol­ume of infor­ma­tion pro­duced over­whelms the use­ful­ness of it as a com­mu­ni­ca­tion tool. Until I read your blog, it never occurred to me that any­thing from twit­ter would be even worth preserving.

    Comment by Chris — May 20, 2009 @ 11:25 am

  29. It’d be neat to have some kind of event tracker that pulls together tweets and images (like tweme?), but also shows a list of cur­rent events and their loca­tions. Maybe peo­ple could rate events or pro­vide feed­back through that too, as a way of cap­tur­ing expe­ri­ences in mul­ti­ple ways (and with a way to export it, of course).

    Comment by Heather D. — June 4, 2009 @ 10:24 am

