I’ve become a big Twitter fan recently. Before getting going earlier this year, I was a Twitter-sceptic for a long time – I actually signed up twice before and lasted all of, ooh, a day, before deciding that I couldn’t see the point and deleting my account(s). But this time I persevered for a few days, reached that critical mass of followers and people I wanted to follow, and here I am a few months later, still Twittering away.
Having decided early on that I was going to use my Twitter account primarily for work purposes, there are lots of work-related tweets in my account. Many of them are about digital preservation inititives I’ve come across. Whilst I wouldn’t go so far as to say they are gold dust, they’re not quite transient and I’d quite like to have a copy of them – for a while, at least. So I’ve started looking into tools to archive Twitter accounts.
There’s not much on the Twitter website about it, you have to do some rooting around. (Enter Google, stage left). In a couple of minutes, I found:
- Tweettake
- Tweetdumpr
- A Python script to grab a copy of your tweets
- A Twitter tool for making an archive of your account (though this seems to have been taken down)
I’ve used Tweettake to grab a copy of my tweets to date, along with a list of (and info about) people I follow, my followers, and private messages. This is all wrapped up in a nice and convenient *.csv file – fairly straightforward for preservation purposes. So it was a nice experiment and I’m happy that I can get a copy of my Twitter data that’s important to me (so far, anyway; the scalability of these apps is something that I’ll look at as my account grows).
Now, here’s the wider context. I know that many people out there couldn’t give a monkey’s about saving Twitter updates and account information. And for many Twitter acounts, I’d probably agree. But there are certainly instances where Twitter accounts are valuable and interesting and worthy of archiving. Think of all the authors and poets on Twitter, for example – their feeds could provide a fascinating insight into a social and professional aspect of their lives that would otherwise be lost to time. Who should be responsible for archiving them, if anyone? What of the Twitter novels – a new and emerging art form or an exercise in futility as you can’t write anything coherent if it’s forced into 140 character batches? Future generations won’t be able to make that decision if we don’t keep a copy. What of updates from government officials and institutions – should they be kept as records? If not, why not? If people rely on the information in them then they should be kept too, surely? (As an aside, how many archivists and records keepers have applied their retention schedules to tweets?)
If we do decide we want the data, then another issue is how on earth we get it? Tweetdeck required me to give my username and password, so it’s an okay way to get data from an account when we have that information, but what about when we don’t? And if we do figure out a way, is it ethical to go around grabbing copies of other people’s Twitter data for our archives, even if they have posted it online for the world to see?
Finally (at least for now), how do we validate that what we’ve got is what we think we’ve got? Issues of identity and authenticity are real problems in the Twittersphere. There are already numerous examples of fake twitter accounts that amass thousands of followers before they are sprung. There are probably many more still out there. And if we’re going to archive Twitter accounts then we need to be sure that we’re archiving the genuine item. Questions questions questions… . Tweets to @mopennock please!
[...] Original post by coupsdestylo [...]
I use Tweettake, also, but I am unsure at this point what benefit the file saved periodically will have for my personal papers or my research.
With so many Twitter members and so much tweeting going on, does Twitter archive their site feeds, at all?
Hi Maureen
Thanks for pointing out TweetDumpr etc – I’ll check these out.
Re Twitter self-archiving (ahem!) – following on from discussions on Paul Walk’s blog (http://tinyurl.com/amw5k9), I made the following discoveries that might be of interest. They relate to the JISC Innovations Forum at Keele last July.
* Using Twitter Search to look for #jif08 (or jif08) returns nothing.
* Don’t believe that, however, because hunting back manually through Paul’s and Owen’s accounts (hacking the URL: “?page=200″ etc) I do find their tweets from that event.
* Twemes.com, one of the sites that aggregates Twitter # tags, does still return a full, ‘threaded’ view of all the tweets tagged #jif08 back in July
* Twitter RSS feeds don’t contain more than the last few (days of) tweets
Make of that what you will: my inference is that we can’t rely on Twitter, or Twemes, to do Tweservation. If we think certain Twitter accounts/streams of value, we need to a (at least) two-pronged approach:
* Catchup: use scripting, screen-scraping, etc, of Twitter, Twemes to extract historical tweets from the HTML (and API where possible)
* Proactive: set up watchers on the feeds of your accounts to automatically harvest the RSS feeds from now on.
I’m in the process of writing a little post for JISC-PoWR about using aggregator plugins for WordPress to build local blog archives from the RSS feeds of multiple remote blogs: this approach might work for preserving tweets also, or even creating a combined blog/tweet archive of (for instance) all UKOLN staff activities in those arenas!
Useful as the CSV of TweetDumpr might be for some exercises, I think that attacking the RSS is more appealing, as it’s not only a native Twitter format, but also a viable preservation format.
What fun!
What an interesting concept, I wish I had thought of it! But you’re right, considering the fact that everyone and her mother is updating their industry on Twitter it would only make sense to archive tweets.
Thanks for all your comments, also the tweets (of course!).
Richard: I considered RSS feeds but then stopped mid-post to watch CSI; clearly that wasn’t conducive to my train of thought! Thanks for flagging it up, as I hadn’t got so far as using different approaches in different scenarios. I agree that the RSS feed is more appealing for preservation and for proactive archiving. If I get the time, I’ll look into just what is provided in the RSS feeds… I doubt it would capture the wider followers/following network, for example, and with my significant properties hat on I would say that’s a desirable element to capture. All good fun indeed.
I’ll pop over to the JISC-PoWR blog now to check out your post on building local blog archives from RSS feeds… .
(PS – Tweservation? Lol, why not. Bandwagon here we come!)
> Using Twitter Search to look for #jif08 (or jif08) returns nothing.
Richard – a closer look reveals this is intentional (and documented.) Twitter search only looks at the past 4 months or so of data.
The twitter APIs look more promising, though, and give a choice of formats (JSON,XML, etc.) Tweetdumpr and friends are likely built on these.
As for tweservation – since when did Elmer Fudd get involved in this ?
[...] Archiving Twitter « Bits Bytes & Archives (tags: twitter archiveren) Tags: geen [...]
[...] are more issues to address. What of comments or embedded images? Can it handle Twitter tweets as well as blog posts? Does it scale? What of look-and-feel, individual themes, etc? Now we start [...]
I’ve been idly thinking about this as well. General Twitter just falls under the general web archive for me, for which my preferred scope is include everything unless someone indicates they don’t want it archived (eg by making their tweets private).
However what’s more interesting is that often twitter feeds are dependent on other context. We had a forum a couple of weeks ago with a bunch of twitter traffic during it. I grabbed a copy of the RSS feed and have been considered linking it to the recorded video of the event (so that you can see tweets popup at the relevant times a speaker says something). They don’t make any sense out of context.
Oh and about “if we’re going to archive Twitter accounts then we need to be sure that we’re archiving the genuine item”. Why? It’s pretty infeasible to validate every twitter account you might want to archive. How could you do it, call up Ewan McGregor and ask him if he really twitters under name X. Anyone can make an account under any name, it’s the nature of the medium and shouldn’t we not be preserving things as they are, and hence including the satire and spoof twitter feeds? Perhaps we should though be careful to make clear the limitations of the medium, to prevent future historians from making mistakes.
[...] are more issues to address. What of comments or embedded images? Can it handle Twitter tweets as well as blog posts? Does it scale? What of look-and-feel, individual themes, etc? Now we start [...]
Hi Alex,
I think if you’re claiming to have archived a specific person’s twitter feed, then you need to be sure that you’ve got the genuine item. If you’re just grabbing twitter traffic then it’s a different case. Hope that clarifies things!
Maureen.
I’ve been playing around with Tweetake and it looks like you can’t download more than 1.000 tweets.
As a top Twitter business user, I was sent an advance copy of the Tweet Adder System for my review. This is by far the best Networking Tool I have used for Twitter!
[...] about their service. There are also other ways to archive your Twitter stream as discussed in a blog post I found by Maureen Pennock, and from another tool I found called Twitter to PDR Multi-archival-Webapp. These tools look to [...]