Feeds:
Posts
Comments

Okay, so I’ve been a bit hit and miss on the posts lately. Well, more miss than hit, I admit. Actually, far far more miss than hit! But I have a very good reason – as those who know me in real life are aware, I have a family and that family recently grew with the arrival of my third child (hurrah!). So I have been overly occupied with family life and have very little time to come online any more. Rightly so! And it will be some time before I am back online regularly again, probably not until the second half of the year, as I’m making the most of this special time with my family.

That said, the arrival of our new baby prompted some observations about personal digital archives. I know there are several projects that look into this already, but the impact of the digital revolution on personal and family memory really hit home with the arrival of our new baby and the way in which it was acknowledged by our family and friends. We have two archive boxes containing physical artefacts, celebratory messages and cards that were sent by our family and friends to mark the arrival of our other two children. We kept them for sentimental reasons, and so that when the children are older, we can show them how their arrivals were celebrated.  This time, a substantial proportion of the messages have been sent electronically, by email, Twitter and Facebook. Unless I make some effort to preserve them, the birth box for our third child will be sadly lacking when compared to those of his siblings.

So what I’m looking out for now is a way to preserve digital objects regardless of their origin or format. Not a web archiving application, or an email archiving application or a photo or document archiving app – a general, multi-purpose personal digital archiving solution. Ideally it will be an online solution, so that I don’t have to go into the birthbox every few years and check for deterioration (my home, cosy as it is, isn’t exactly a controlled environment in terms of humidity and temperature!) or migrate to new media to avoid obsolescence. And clearly such a solution relies upon a long term service provider, preferably one that can be relied upon to ensure survival of the files even if the service provider themselves cease to operate. Such a solution could serve not just this special occasion, but perhaps  be revisited over the years to serve a lifetime of occasions. I wonder what I’ll find when I start looking into what’s available – perhaps I’ll be pleasantly surprised. It would be nice, anyway!

Cheerio for now,

Mo.

I’m lucky enough to have received an invitation to Google’s new communications & collaboration tool – Google Wave. Having previously avoided much of the hype, I’ve now found myself immersed in all things Wave – and, by default, all things Google. My Wave experience has so far been a fascinating (if a bit premature, because it really is very early days for Wave) glimpse into the future. It has the potential to be a *really* useful tool, significantly changing the way we communicate and collaborate electronically. And, if it takes off in the way Google expect it to – and at the moment I see no reason why it won’t – then it will also have significant implications for digital archives and archivists.

But first, a bit more about Google Wave itself, based on what I’ve read and experienced so far. You may be wondering, what is Google Wave, exactly? The best way I can describe it at the moment is as a communications hub. And it’s a communications hub that is still very much in an alpha release stage, but which has been made available to a small set of people for testing, criticism, and development – thus my earlier comment about it being a bit premature. Because it’s still in the early stages of development, functionality is limited and much of the conversation about Wave focuses around what it may be capable of supporting in the future rather than what it can actually do now. It’s also a ‘critical mass app’ that will only truly become effective for users once they have sufficient peers using it; given that it’s only been opened up to a small number of users at the moment, the critical mass hasn’t yet been reached so it’s difficult to get a full ‘wave experience’ just yet – you have to use your imagination.

What is it for? The core function of Google Wave is to create, share, and collaborate on ‘waves’ of information and content. Users create and share ‘waves’ between each other. A ‘wave’ is, in other words, a stream or thread of information that is collaboratively generated and managed. Waves contain ‘wavelets’, which are threaded conversations originating from an initial wave of their own, and wavelets contain ‘blips’, which are the single message units contributed by users. Google illustrate this through the following diagram on their guide to wave entities:

What purpose can a wave serve? Well, what purpose would you like it to serve? That’s a bit like asking what purpose could a conversation serve. I might start a wave to kickstart a new project I’m working on, for example.  I’ll add users to this wave – probably my project team – and start typing an outline of the project. If the other users are online, they can see what I type as I type it – this is real time communications (though I guess that’s to some extent dependent on your bandwidth).

Screenshot: Starting a new wave

Screenshot: Starting a new wave

A major difference between a wave communication and email is that Wave has no ‘send’ button! There is a ‘draft’ option, though that’s greyed out at the moment. There’s also a ‘done’ button, which you can click on to signify that you have finished writing your blip. Other users can edit your contribution directly, or add a new blip by clicking on the ‘reply’ button.You view waves either by reading them as threaded streams, or by enacting the ‘playback’ option. This results in a ‘movie’ of sorts, which reveals in turn the order in which each wavelet and blip was added.

So, in effect, a wave is like a record of a conversation. You can add tags to your wave, making it easier for you and others to find. Waves can go ‘public’, accessible to all other wave users, or remain private to those users you’ve selected to participate. You can ‘archive’ a wave and it will be removed from your inbox, though it’s still accessible via your navigation pane. You can add extensions or gadgets to your waves, such as a poll or a map, and of course you can also add attachments, embed URLs, other files, and so forth. The Wave API enables other developers to use and build on Google Wave by writing and making available further extensions or gadgets, and several sample gadgets are already available. At the moment, there appears to be no way to ‘save a wave’ offline. There is a ‘download’ option in the ‘file’ menu at the base of the wave, but it is currently grayed out.

What does this mean for archives and digital preservation?

If it takes off, it means we’ll have a new type of record – if not a new recordkeeping and archiving paradigm – to deal with. Google have been clear on a number of occasions already that they want Wave to replace email. In that case, we may well be in trouble! Email is a fairly straightforward technical protocol, yet it continues to regularly cause problems for records managers and archivists, insofar as both management and preservation are concerned. The problems are not just technical, they are also organisational. Compared with email, Wave is far more complicated technically and can potentially pose just as much of an organisational challenge, if not more so.

The underlying technology is fairly well described, which bodes well for preservation. Google has published a white paper on Wave’s underlying framework, which defines waves as ‘hosted XML documents that allow seamless and low latency concurrent modifications’. Wavelets are collections of documents, each of which is itself comprised of an XML document and some annotations. The white paper provides a good overview of the structure of an XML document, though significantly more detail is provided in the Google Wave Conversation Model draft protocol specification. The specification describes the Wave conversation manifest document schema and blip document schema, and together these define the Google Wave conversation model. Clearly defining and publishing the model is obviously useful for preservation purpopses, as it will enable us to understand waves through their underlying conversation model long into the future (so long as the model itself is preserved, of course!)

At the moment, waves are hosted on remote servers via Google – Google is the only Wave service provider. However, open publication of the Google Wave Federation Protocol means that anyone can become a Wave provider and share waves with others.  In practice, this means that organisations will be able to either use a hosted service provided by a third party, or they can run their own Wave servers – giving them more control over the waves and wave records generated by their staff. When wave participants originate from different Wave servers, each Wave server involved will keep a copy of the wave; multiple identical copies should therefore be generated in such instances. Open publication of the protocol is also useful for preservation, particularly insofar as understanding original technical hosting environments is concerned.

Now, as email was around for a few decades before it become commonplace in institutions, perhaps we’ve got some time to get to grips with solutions for managing and preserving waves. That said, I don’t think we’ve got decades – more like a few years  – this time round, we already have the computers at our workstations, we already have the accounts set up to use Wave, and we are (comparatively) a far more technologically advanced and willing society. So we need to watch the development of Wave carefully to make sure we can develop good practice guidelines and strategies for preserving waves sooner, rather than later. That said, it’s rather difficult to develop realistic strategies and guidelines at this early stage of Wave’s development. What we really need are some practical use cases to base them around, otherwise it’s all a bit too ‘pie in the sky’. Once we have some experience and see how people use Wave in practice, then we can start to make suggestions on how to generate archivally sustainable well-constructed waves. One thing however is very clear: the static and ‘simple’ record is increasingly going to be something of the past. If Wave is anything to go by, then the future is dynamic, complex, compound, and distributed. We need to be prepared to think about our digital archives in the same way.

The Swiss Federal Archives recently issued a call for papers for the 8th European Conference on Digital Archives. The conference will be held in Geneva from 28 – 30th April 2010, and the call has inivited papers that address the following topics:

  1. Archival profile: professional competence in the digital age
  2. What to keep: how to mirror the information society
  3. E-Archiving: reorganisation of processes and business models
  4. Online access: solutions and implications

Tucked away in the call for papers is the following information that will be of particular interest to those born in 1982 or later:

‘As part of an effort to stimulate the active involvement of young professionals within the ICA and to encourage an inter-generational dialogue about the state of the profession and its future needs, the conference organizers would especially welcome proposals for contributions from young professionals. To further this objective, a session each day will feature one or more young professionals who wish to speak at the Conference. The Scientific Committee will select three proposals from young professionals aged less than 28 years (born 1982 or later). The authors of the three proposals selected will be offered a grant including registration fees, hotel and transportation costs.’

That’s a great way to encourage  young professionals to put papers in, as funding to attend or present at conferences can often be a barrier when you’re newly qualified, epecially when they are abroad and in this economic age (though hopefully things will have improved by the time the conference comes round!)  I attended a pre-conference along these lines back in September 2009 where the conference organisers wanted input from young professionals in just putting the conference programme together. This was a fantastic meeting and I hope to be meeting up with several other delegates at the conference in Geneva next year.

KEEP in context

I posted a while ago about the KEEP project on emulation for digital preservation. Earlier today, a past colleague of mine from the Netherlands tweeted about a webpage that clarifies the relationship between the KEEP and a number of other projects, including PLANETS and the International Internet Preservation Consortium (which implies they’ll be testing on websites, hurray). This is worth a look if, like me, you’re interested in seeing how all these parts of the jigsaw are actually coming together to form co-ordinated preservation solutions.  For more details, see this KEEP webpage

Every now and again, I come across a chorus of ‘all you need to do is just keep the bits. There’s plenty of other software out there to read file formats these days. Digital obsolescence isn’t the issue that we’ve all been led to believe it is’. Statements like these make me uneasy because I have a particular interest in ensuring that documents/records are preserved in an authentic manner, i.e. that they are not only what they purport to be, but that they are also intact in all of their essential respects. And that’s the really tricky bit, because I don’t believe that different applications are always capable of reading files in an authentic manner. This isn’t so much about obsolescence though (which I’ll return to later) as about interoperability.

Let me explain. It’s usually quite possible to use MS Word 2007 to read a MS Word 95 file. It’s quite possible to use Open Office to read files created by MS Office. It’s quite possible to read RTF files on several different applications, and it’s quite possible to use just about any browser you can get your hands on to render a web page. However, it’s a fact that different applications render files in different ways – even when they are supposed to be rendering a standard like RTF. So, rendered documents often look slightly different – and sometimes more than slightly different, hugely different – when rendered in different applications and re-saved to new formats. This is compounded by people’s lousy creation practices – who reading this has ever been taught how to use most of the applications on their desktop? I suspect very few. The lack of training means that most people have developed their own ways of doing things. And whilst onscreen this may look okay, it can actually result in quite a different structure and set of instructions ‘behind the scenes’, so when the file is transferred to a different environment, it can again look quite different.

Examples of these problems over the past ten years include: •

  • A report created using MS Word 95. Several sections of the report had been copied and pasted over from a number of other documents. It looked fine in Word 95, but when opened in Word 2002, the formatting was all over the place. It was apparent that different fonts had been used, margin settings were different, there were problems with the headers, and images were in different places to the original report.
  • A spreadsheet created using an early spreadsheet application that was opened with Excel 2002. Some of the formulae were no longer presenting the same result and the application was rounding up to a different number of decimal places, so long calculations that were increasingly adversely affected.
  • A slideshow presentation generated using Open Office but rendered using MS Powerpoint: some slides that originally had both images and text on them in Open Office showed only the images when rendered using MS PowerPoint.

Now, I appreciate that some of these may be considered fairly small differences. After all, most of the content was the same, it just looked different in most cases. And the underlying data was still there so it could have been retrieved using the ‘correct application’. But that’s not interoperability, is it. You can either rely on interoperability or you can’t, in which case you have to go back to the original software. Some may say ‘who cares about a header or powerpoint slides?’ Well, in many cases, I do! And I think lots of archivists would agree that there will be circumstances in which we would certainly want to keep that information, and its appearance, intact. These small differences make a big impact on the way future users will experience a document, especially if they are compounded with each and every migration. It’s the ‘slow fire’ of the digital world.

So what does this mean in terms of obsolescence? What does it actually mean for something to be obsolete? No longer in use? Out of date? Not current? No guarantee of support? There’s no agreement on what digital obsolescence actually is, and I would suggest that this does not help our cause! The  Jem Report suggests that obsolete is actually the most misused word in the history of computing. At the moment, I’m inclined to agree. Let’s not lose sight of authenticity in digital preservation either.

Archiving Twitter

I’ve become a big Twitter fan recently. Before getting going earlier this year, I was a Twitter-sceptic for a long time – I actually signed up twice before and lasted all of, ooh, a day, before deciding that I couldn’t see the point and deleting my account(s). But this time I persevered for a few days, reached that critical mass of followers and people I wanted to follow, and here I am a few months later, still Twittering away.

Having decided early on that I was going to use my Twitter account primarily for work purposes, there are lots of work-related tweets in my account. Many of them are about digital preservation inititives I’ve come across. Whilst I wouldn’t go so far as to say they are gold dust, they’re not quite transient and I’d quite like to have a copy of them – for a while, at least. So I’ve started looking into tools to archive Twitter accounts.

There’s not much on the Twitter website about it, you have to do some rooting around. (Enter Google, stage left). In a couple of minutes, I found:

I’ve used Tweettake to grab a copy of my tweets to date, along with a list of (and info about) people I follow, my followers, and private messages. This is all wrapped up in a nice and convenient *.csv file – fairly straightforward for preservation purposes. So it was a nice experiment and I’m happy that I can get a copy of my Twitter data that’s important to me (so far, anyway; the scalability of these apps is something that I’ll look at as my account grows).

Now, here’s the wider context. I know that many people out there couldn’t give a monkey’s about saving Twitter updates and account information. And for many Twitter acounts, I’d probably agree. But there are certainly instances where Twitter accounts are valuable and interesting and worthy of archiving. Think of all the authors and poets on Twitter, for example – their feeds could provide a fascinating insight into a social and professional aspect of their lives that would otherwise be lost to time. Who should be responsible for archiving them, if anyone? What of the Twitter novels – a new and emerging art form or an exercise in futility as you can’t write anything coherent if it’s forced into 140 character batches? Future generations won’t be able to make that decision if we don’t keep a copy. What of updates from government officials and institutions – should they be kept as  records? If not, why not?  If people rely on the information in them then they should be kept too, surely? (As an aside, how many archivists and records keepers have applied their retention schedules to tweets?)

If we do decide we want the data, then another issue is how on earth we get it? Tweetdeck required me to give my username and password, so it’s an okay way to get data from an account when we have that information, but what about when we don’t? And if we do figure out a way, is it ethical to go around grabbing copies of other people’s Twitter data for our archives, even if they have posted it online for the world to see?

Finally (at least for now), how do we validate that what we’ve got is what we think we’ve got? Issues of identity and authenticity are real problems in the Twittersphere. There are already numerous examples of fake twitter accounts that amass thousands of followers before they are sprung. There are probably many more still out there. And if we’re going to archive Twitter accounts then we need to be sure that we’re archiving the genuine item. Questions questions questions… . Tweets to @mopennock please!

Older Posts »