Metadata… » Gaia Resources

I enjoy quoting Jim Croft:

“Yesterday upon the stair,
Metadata wasn’t there.
It wasn’t there again today,
How I wish it would go away!”

As a (one time?) geospatial professional, I love receiving metadata, but I hate making it. It’s been a real thorn in our side in the recent SWAEI SCP project, as we have been consistently receiving data without any accompanying metadata.

I’ve been involved in a few metadata repositories over time, as a developer, “populator”, user, or even as an invited speaker (although that gig won’t be recurring). Some of them have included:

Pilbara Biological Survey Database (http://science.dec.wa.gov.au/projects/pilbaradb/)
Biodiversity Information Projects of the World (http://www.tdwg.org/biodiv-projects)
Interragator+ (http://www2.landgate.wa.gov.au/interragatorplus/)
Metadata Entry and Search Tool (http://mest.ivec.org/geonetwork/srv/en/main.home)

All of these metadata portals work in the same manner, and this process (when heavily stylized) involves three parties;

One party enters the metadata,
One party stores or manages the metadata repository, and
One party uses the metadata (i.e. the customer).

Yes, these parties can be the same person/group/organization in certain situations, but in my experience, I’ve always been one of these three parties, working with two others. There are potential problems with all of these parties…

The Metadata Author

All of the metadata portals I’ve mentioned suffer from a common complaint; they require people to manually enter data. And given how time poor we generally are these days (after I blog, tweet, email, phone and even talk to people…) you can’t expect a community to keep a portal up to date without providing them something in return. So the portals wither away and don’t get updated.

The System Owner

There’s generally two types of owner here: the “100% uptime” owner, and the “that would be a cool feature” tinkerer. Both can be good or bad – the “100% uptime” owner may never change the site, and never update it. And the tinkerer keeps changing the site and people get annoyed with the constant change (and often the site drifts away from what the customer wants, and then the customers drift away too…).

The Customer
The customer (I hate the term “user”, it makes people sound like drug addicts) wants the experience to be: find site, enter search term, get response, move on. They are after a quick, fast, free response. They don’t generally have any interest in entering data to the site (often citing that they are time-poor, or it’s not their role), but are quick to criticize and complain when the site is not updated. They’re the customer – so they’re always right!

In my experience, if you are building a metadata repository there are three things you might want to look at to ensure sustainability and success:

Make data entry automatic from somewhere (RSS feed harvester, crawler bot, Twitter feed collector)
Ensure the site is stable, and designed well, but review it regularly to…
Always provide better, technologically appropriate tools for your end users.

I’d started to think lately that metadata repositories were painful, hard to deal with, there was no good software, and so provided little worth. But in what seems to be something of a trend lately, I’ve had to review my feelings in the light of one or two things that have appeared on my radar.

One is that we are in the process of reviewing our metadata systems in house. We’ve been undertaking an internal review of state, national and global metadata standards, and then looking at implementing our chosen standard in a metadata application. We’re currently looking at a few different applications, but no choice has been made just yet. We’re looking really closely at the things I’ve listed above – and how we integrate metadata into the business processes and projects that we are working on is critical to the success of this in-house project.

Secondly, on the TAXACOM mailing list, people were asking for a site that lists biodiversity data and provides some sort of “review”. One of the projects we’ve worked on, the Biodiversity Information Projects of the World (BIPOTW) site, was seen as a good place to implement a solution. I’d already started talking about crawlers with TDWG (the site owners) in earlier stages of the project, after we spent far too much time entering metadata manually. We couldn’t get that off the ground then but I think it’s time to revisit that idea again.

We could make BIPOTW a model metadata portal with some changes. So here are some thoughts about how I’d like to make some changes to the existing application:

A rating system that ranks the entries, by either (or both?):
- number of views of that record (or clicks on the link?), or
- User-based ratings (like a ‘star’ system in your favorite media player, and like the one we just delivered for the Western Australian Herbarium for their image management)
Some form of harvester/crawler to try to automatically populate the database (yes, a mini-Google). How can we do this?
- Crawler bot? – too much like spam, that disturbs me
- Consume some feeds – maybe even Twitter? I know you can do this with things like WordPress blogs, like Seb Chan’s blog, so maybe there are some tools we can play with.
- Allow bulk uploads somehow? Like pulling in data from one of Tony Rees’ sites – http://www.cmar.csiro.au/datacentre/taxonomy/mainlist.htm
A means for registered users to tag entries with information if they want to add in their own comments (although that is fraught with problems – and a reason why we don’t have comments on this blog),
The ability to extract the information in the database to produce some sort of “review paper” which can be published at regular intervals (which is what started this train of thought, via a request for this type of thing on TAXACOM).

So far, there has been a lot of further support for this revision in the TAXACOM mailing list, so it’s looking quite positive and it might go ahead. Next week I’ll be able to follow it up with TDWG and I’ll make sure to post anything that comes out of it. In the meantime, I’m flat out with the SWAEI workshops that are dominating my week… and really looking forward to Anthony starting on Monday!

Email me directly here.

Posted in Blog

« Curtin Schools Workshops and WWF

Twitter – The Verdict »