March 24, 2009


Filed under: precat — Tags: , , , , , , — tsladmin @ 6:19 am

Sometimes we tell people that things live forever on the internet and that anyone can find them (so don’t post that picture of yourself drinking alcohol, young man), but I want to highlight how some important things from just a couple of months ago are becoming impossible to find. If we’re not careful, the haystack is going to disappear, never mind the needle.
For example, take the discussion that happened on Twitter during ALA’s Midwinter Meeting just under two months ago. The Meeting had a hashtag for tracking content (#alamw09), and almost everyone used it most of the time. There was a lot going on in that tag, so much so that I thought it was a tipping point for the Association in terms of communication tools. I even debriefed what happened on Twitter for ALA staff afterwards so that they’d be able to see the patterns.
But try to find that discussion now, and it’s almost impossible. Most people (including me) rely on Twitter’s search engine (which was formerly called “Summize” and run by a different company until Twitter bought it). If you search Twitter now for the #alamw09 hashtag, you get exactly one page of results (yesterday there were two), and only a couple of those tweets were actually posted during the event itself. If you look up #alamw09 at, you’ll get more results from the Meeting itself, but there’s still only one page, and you had to have manually followed the Twitter account for them to have tracked your tweets, so even if you could see older results than what shows, it would be an incomplete archive at best. Search Technorati for #alamw09 and you get eight blog posts. Ironically, you can get most of the public tweets from Midwinter by searching FriendFeedlooking for anything from #ala2008 on Twitter, although there again FriendFeed saves the day, but for how long?
So for all of our aggregation attempts of that Twitter content, they may only work in the moment for the moment. It turns out they’re miscellaneous *and* searchable in only one place (for now), a pretty bad combination in hindsight. Thank heavens I favorited in Twitter so many of the alamw09 tweets, although that’s still not ideal. I have to manually page through them to find the ones I want, and I already have 35 pages of favorites.
After Midwinter, I tried to start moving my #alamw09 favorites into Evernote so that I’d be able to search and group them, but I haven’t had time to complete that process, and I just can’t seem to train myself to add new tweets there as I favorite them. The ratio of effort between clicking on a star and filling out a few words of metadata is just too much in the middle of my day, so this looms as a project in my future if I really want to save this stuff. Even then, there’s no guarantee Evernote will stick around, but at least I can export from it.
So if you were using a hashtag to aggregate content, thinking it would be easier to find it all again in the future, think again. You’re going to have to do something more proactive and manual than relying on Twitter’s search engine or Google. You’ll have to decide what level of ephemeraliness you’re comfortable with for that conversation, because you may not be able to get back to it if you let someone else manage access to the archive. In this context, it’s a shame so much of the conversation has moved away from blog comments (where individuals can openly archive it) to Twitter and FriendFeed. And if you’re a government or archive organization looking to preserve this kind of digital content, the stakes are getting raised on you.
Am I missing any other options for finding past hashtag conversations? Please tell me yes in the comments.
Addendum: Potential ideas for archiving – you could subscribe to the RSS feed of a hashtag in an RSS reader and export them, right? Or subscribe to the RSS feed via email? Other ideas?


  1. Icerocket has a Twitter search function, and this responds to say that it has over 1,000 results using #alamw09, but when you try and page through them, you only have 1 page of 14 hits. might be worth a look, since that does produce a few more, but seems to hang (at least for me) when trying to recover more. If you do a Google search for #alamw09 that seems to find 340+ results. This is clearly insane, since Twitter stores everyone’s individual tweets – I checked mine back to the middle of last year and they all seem to be there; at least as far as I can tell.
    However, look at it whichever way you like, in many respects Twitter search is shot. It’s not a problem with the search engines that I’ve looked at, since they all seem pretty consistent (as well as the ones that I mentioned above I played with another 15 or so), so it’s got to be access to the database I suspect. Of course, if you really want to go totally nuts – do a search using Firefox and then another same search using IE, and you may well find that you get different results and – wait for it – you can search back further using Firefox than you can with IE! How mad is that?

    Comment by Phil Bradley — March 24, 2009 @ 7:34 am

  2. I set up a lifestream on my blog using Sweetcron ( which is currently in private beta. I have Sweetcron subscribed to the feed for a search for cksthree (my old twitter name) and another for cksample, my current one. As a result all my tweets and all the tweets of people replying to me get archived in my lifestream’s database (where I can back it up, search it, and have it forever). It’s a very flexible system though, so you can subscribe to archive any feed, and archiving a search for a #hashtag of some sort would seem to partially address your problem (at least going forward). Here’s my lifestream, btw: I have my BackType tracked comments going in there too, so eventually this comment will show up there.

    Comment by C.K. Sample III — March 24, 2009 @ 8:00 am

  3. Jenny,
    I like the idea of subscribing to the feed by email (I use, and then use email filter to put tweets directly into a folder so as not to overwhelm the inbox.

    Comment by Peter Bromberg — March 24, 2009 @ 8:01 am

  4. I wonder if the Twitter API could be used to collect and archive hashtags. This wouldn’t help retrospectively, but it could be done for future conference tags at least.

    Comment by DerikB — March 24, 2009 @ 8:39 am

  5. Jenny-
    A reflector could do the job of aggregating and storing content without work on any user’s part. Instead of using hashtags, everyone follows a certain user, and direct-messages that user to tweet to the group. The user (which is scripted) then retweets that direct message. The upshot is that that user now has a copy of every tweet DMed to it, which it can do whatever it likes with (stores in a SQL DB, pushes to a flat text file, whatever.) Wolf Rentzsch did this at C4[2] and it worked really, really well in terms of real-time interaction during the conference. (
    Another option is for the hashtag’s “owner” (or anyone, really, who wants to keep a record of a tag) to pull tweets in real-time and store them. Coudal does this on their Layer Tennis Group page, which is a self-updating version of a hashtags page for a particular hashtag in order to provide commentary on their Friday live matches ( Simple to do using the statuses/public_timeline method in the Twitter API and parsing through the XML.

    Comment by CJC — March 24, 2009 @ 8:48 am

  6. It is hard to comprehend, with cheap diskspace, why Twitter tosses tweets– as there ought to value in being able to mine patterns, trends over time. I struggle to find an excuse, and I imagine for a librarian, not archiving is close to a sin 😉
    As cited above, SweetCron has potential because you cache your stream on your own server, though it’s search is across all the bits you pull into the stream. I’d think an RSS reader or web tool that caches posts might do it by subscribing to a twitter search feed (its been so long since I used BlogLines, don;t they cache past posts??)

    Comment by Alan Levine — March 24, 2009 @ 8:57 am

  7. I have been wondering about if and how tweets should be archived. A random hunt in the wayback machine found that it sometimes archives twitter pages:*/ … though even this example only covers 2007 and likely only captures a subset of the tweets.
    Of course this doesn’t help with archiving hash-tagged tweets together as a group – unless we can submit a specific URL for all those posts. Unfortunately plans to archive a page like are thwarted by Twitter’s robots.txt file ( ) that specifies “Disallow: /*?”.

    Comment by Jeanne — March 24, 2009 @ 9:53 am

  8. I’ve been thinking about this lately too. I haven’t found a good way to archive a hashtagged conversation either. I do periodically archive my own twitter posts via this site though:

    Comment by Chad — March 24, 2009 @ 10:15 am

  9. Did anybody ask the Twitter people about this? Shouldn’t the questions start there?

    Comment by Wilfred Drew — March 24, 2009 @ 10:25 am

  10. Just spotted this framework/tool that mentions that it has ‘Support for Twitter as a source’:
    Looks worthy of further examination. Has anyone tried this yet?

    Comment by Jeanne — March 24, 2009 @ 11:08 am

  11. […] ask for feedback and to get virtual questions that I answered during the presentation. Although there are problems with Twitter Search it does enable you to filter your results to those Tweets containing links – a real-time search […]

    Pingback by Intute Blog » Blog Archive » The Intute Twitter 500 — March 24, 2009 @ 12:09 pm

  12. Thank you for drawing attention to the fragility of Twitter as an archive. There are many collections of tweets which have some historical value, and it’s important that they be preserved and remain accessible.
    Twitter Search seems now to have a limitation to 7 days’ traffic in many circumstances. For me today (25 March 2009) searching for #alamw09 on Internet Explorer returns 6 tweets going back only 7 days to this one dated 17 March.
    However, on Firefox the same Twitter search yields 14 results going back to this tweet of 8 February. I don’t know whether IE and Firefox are intrinsically treated differently by Twitter Search, or if my browsers have different cookies affecting the result.
    You can force IE to show older tweets with the until: option on search. For example, try

    Comment by Jim Richardson — March 24, 2009 @ 5:09 pm

  13. I would recommend searching with twemes. This will pull all tweets from alamw09 and flickr photos tagged with alamw09. Several pages of results and they all seem to still work. I really love twemes because it mashes different website tags.

    Comment by Brett Kochendorfer — March 25, 2009 @ 7:30 am

  14. As noted disk space is cheap – but the cost is not zero and therefore someone has to pay. Since Twitter is free that means the owners pay.
    Twitter itself is not intended to be any sort of “permanent” communication. E-mail isn’t, either. E-mail clients retain mail because they were programmed to do so, retaining messages isn’t part of the protocol definitions (POP3, IMAP, SMTP, MIME, etc.). Some folks even have longer-term archiving tools which use APIs to read e-mail and archive it later.
    What I am saying is that someone could/should write a Twitter client that lets you follow a Twitter feed or feeds, filtering on various strings (primarily hash tags), and adding them to some sort of archive file. Whenever you want to make sure tweets are retained, subscribe this client to it, configure how and where it archives the tweets, and you’re done.

    Comment by Tim — March 25, 2009 @ 3:15 pm

  15. Thanks Brett for the pointer to twemes. However, it too seems to be incomplete: at 26 March 2009, shows nothing more recent than 31 January, while shows numbers of tweets since then.

    Comment by Jim Richardson — March 25, 2009 @ 4:29 pm

  16. […] Levine and others point out, these little batches of information about a specific event or topic are ephemeral. Well, I guess you could save these “threads” if you start using an entirely different […]

    Pingback by Twitter… - Community College — March 26, 2009 @ 8:56 am

  17. […] be sure to save off the stream of comments because, as Jenny Levine noted, the stream of the moment is […]

    Pingback by Twitter for usability testing or doc testing? Sure, here’s how | just write click — March 26, 2009 @ 8:22 pm

  18. […] but interesting note, Jenny Levine posted a few days ago on Twitter’s apparent “ephemeraliness” in terms of archiving conference […]

    Pingback by two-way touché. « info-mational — March 27, 2009 @ 1:18 pm

  19. I used Twitter to liveblog ILI08 – a process which I documented at and
    I used the twitter search api to then display the twitter feed in a variety of ways – and over time this feed is losing tweets as you describe.
    I did subscribe to the twitter feed at the time via Google reader – and so have an archive of the tweets there as well, so I can get them out again. You could obviously use other RSS readers, or use something like Outtwit which would allow you to archive them in your email. Another approach could be to use a WordPress plugin (can’t remember the name) which can take an RSS feed and automatically post the contents on your blog.

    Comment by Owen Stephens — March 30, 2009 @ 3:03 pm

  20. Why not simply use a on-demand web archive service like to save your twitters? Then save the links in Evernote and/or Zotero like application?
    Twitter + BackupURL at

    Comment by Ryan Williams — March 31, 2009 @ 6:46 am

  21. Has anyone come up with any tried solutions to this problem! This is a tremendous discussion, but it seems that it’s mostly conjecture, or did I just happen to miss someone that nailed it. Great blog, btw!

    Comment by BRubinstein — April 3, 2009 @ 8:02 am

  22. It seems to me that this discussion an example of two belief-systems:
    1) The Internet is an Archive: everything has historical value
    2) The Internet is ephemeral: everything changes, nothing is for ever.
    In some of the “archivists” views expressed here, there is also a bit of “Someone else’s job” too: Everything should be archived, everything should be discoverable, everything should be connected… and someone else should be doing for me.
    My own view, which formed when the Internet was young, is that nothing does last for ever: information (like knowledge) is ever evolving. I don’t think you can keep everything: do you keep the things that are plain /wrong/ (paedophilia, torture, how to blow up The White House, etc); what happens when things become “politically incorrect” (an example being Robinson’s Jam here in the UK – they had to stop putting “golliwogs” on their jars); who decides when something is important to keep, and what can be dropped?
    It seems to me that the rise of hastags is an example of people assuming one thing, and then discovering that the world doesn’t work that way….
    Just like you do when you go to infant school (mummy isn’t there when you want her); and to college (“actually, the are quite fascinating”); and when you start a degree (“Everything they taught you in school is wrong”)
    Yes, it would be lovely if The Internet mystically kept everything we were interested in… but that is an exponential growth in mostly garbage….

    Comment by Code Gorilla — April 3, 2009 @ 9:51 am

  23. Hi Jenny,
    It is strange how tweets seem to appear and disappear as the tubes get plugged. Here’s a possible long-term solution to make Twitter as little less disposable.
    Go to and use this syntax:*/status/ #hashtag
    An earlier commenter was on the right track using the “site:” Google operator, but you need to add the status & wildcard to get just tweets.
    Of course with alerts you have to set them up BEFORE the conference, event, etc. or at least before Google indexes the tweets (As short as two hours, look for this time to get even shorter as Twitter search is touted as a competitor for Google.) In my test of this I set my type of alert to comprehensive, and time span to “daily”, and my results equaled, delivered right to my email. Glad to forward it if that’s confusing at all…
    Hope that helps 🙂

    Comment by @dereknp81 — April 10, 2009 @ 5:54 pm

  24. remember to do a search of keyword
    Google keeps an archive of pages for a while, so even if it disappeared from its original place, you can still find it many times doing a search directly on Google.

    Comment by Jackie — April 12, 2009 @ 4:04 pm

  25. […] Librarian, Jenny Levine, found out, hashtags seem to have a limited lifespan in Twitter search.  Read her blog post for how she gets around […]

    Pingback by What Does This Mean to Me, Laura? » Blog Archive » Intermediate Twitter: #hashtags — April 13, 2009 @ 7:04 am

  26. Seriously – how much data do you intend to ‘squirrel’ away. Live in the moment. Twitter is a ‘social’ medium, not an archival device.

    Comment by Flip — April 13, 2009 @ 6:33 pm

  27. As a newish Twitter user, thanks for the heads up. I’m still not sure what to make of it all, and the thought of it being here today, gone tomorrow, even on the Internet…I never would have thought of it. Time to load up the favorites bar.

    Comment by Josh Hanagarne — May 4, 2009 @ 7:14 pm

  28. twitter, tweets, twemes,… I tend to agree with Flip, Twitter is used by so many users as a social micro-blog, that the volume of information produced overwhelms the usefulness of it as a communication tool. Until I read your blog, it never occurred to me that anything from twitter would be even worth preserving.

    Comment by Chris — May 20, 2009 @ 11:25 am

  29. It’d be neat to have some kind of event tracker that pulls together tweets and images (like tweme?), but also shows a list of current events and their locations. Maybe people could rate events or provide feedback through that too, as a way of capturing experiences in multiple ways (and with a way to export it, of course).

    Comment by Heather D. — June 4, 2009 @ 10:24 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress