The challenge of open data standards is working its way back to the forefront of attention. At FOO, a group of us sat down to discuss openly owned repositories of data, and how we could gather momentum around a summit-style dialogue to dissect the issue and start to work on a balanced solution. There is an unnerving variety of data that consumers should own, or at least have access to - school records, medical records, etc. It's not just about Flickr streams. We need both an open standard data format, and an open standard for how to get data.
Standards for sharing information will drive much debate. Does user access mean that service providers need to export full datasets, which could reach terabytes? For my Flickr photos, do I just get back my uploaded pictures, or do I get the comments from other users ? What about the comments that I made on someone else's photo - do I get them back with or without the context of the photos that I commented on, but don't have the rights to? There's also a need for transparency in what companies are doing with user data. Ideally, the terms of service on these sites will draw the line in the sand for who owns what and how companies can use my information, but it's doubtful that the T&C will have anticipated open data dynamics.
This upcoming summit - targeted for October in the Bay Area - will need creative people who have a substantial stake in repositories. Who should be part of the conversation? Some initial ideas:
- Organizations that can bring use cases. Holders of big data repositories such as Kaiser, Stanford Health, the Department of Homeland Security, MIT Admissions.
- Privacy/attention/rights organizations. Those nonprofit groups that are driving around issues of privacy and ownership of consumer attention or content, and that could bring an advocate's perspective - AttentionTrust, Creative Commons, EFF, EPIC, TRUSTe. (Disclosure note: Omidyar Network is a funder of AttentionTrust, Creative Commons, EFF, and EPIC.)
- User delegates. How can you best assure inclusivity? Perhaps potential participants could submit essays or position papers, which could serve as an appropriate barrier to entry while assuring that participants are serious about the issue.
Other open questions that are anticipated to be addressed:
- If you have to pay to get your data, is it still open?
- Is it possible to come up with a potential solution this year? At what point should a straw man be vetted?
- Is there more demand for an open data solution outside of the US? (There are much stricter regulations in Europe on privacy, data protection, and ownership of personal information.)
The general plan for this summit is to spend a day on the problem statement, trying to mark out as many angles as possible. On the second (and third?) day, try to put together a straw man of a solution. Current trends suggest that some of this straw man will come from work of Julian Cash and Cliff Skolnick. What ideas do you have? Please do comment, if you have suggested companies and participants for this dialogue.
Christine
we're interested. See, for example, http://blogs.talis.com/panlibus/archives/2006/09/open_data_again.php
Posted by: Paul Miller | September 19, 2006 at 01:02 AM
These are all great comments - thanks for plugging in. Danese Cooper (http://danesecooper.blogs.com/about.html) is the one working to herd cats on this issue, and to get a variety of smart folks with bodies of relevant work into the same room.
As noted, this challenge has been lurking around unsolved for quite some time. And unsurprisingly, the ocean hasn't boiled yet. It seems a fine ambition for Danese, Tim O'Reilly, and others to catalyze movement. Nothing works better than a moving train.
Posted by: Christine | September 04, 2006 at 11:16 PM
Christine,
the problems you're addressing with regard to storing and retrieving data, and possibly user authentication, are already on the agenda and some progress has already been made. Just consider the Java Content Repository API (JCR) which is the result of the Java Community Process JSR 170. It has largely been implemented by the Apache Software Foundation as part of the Jackrabbit Project. It provides:
- Object management
- Relationship management through schemas
- Object notification
- Version management
- Configuration management
plus some sort of query language to interact with the repositories. Looks like a perfect solution to me. Thanks.
Posted by: Steve Grägert | August 31, 2006 at 07:26 AM
This looks wonderful! I would love to participate if possible. I'm part of a multi-institution collaboration that is building a public repository for patent data.
We're grappling with all the issues you've outlined. While our dataset is already "open" in the theory (constitutional mandates, etc.) in practice it's a very different story. And an unfortunately common one when when corporate interests collide with government data.
Anyhow, we'll be releasing our first dataset very soon (full-text searchability of all patents from 1836-1936). I think being a part of the conversation you're proposing would be invaluable for our work and, with any luck, we'll have quite a bit to share by then!
Keep me posted on the progress!
Kevin Webb
Posted by: Kevin Webb | August 30, 2006 at 08:38 AM
I am going to write on this in my own post, but Christine, you're talking about throwing together something in a couple of months over a couple of days that has been under discussion for close to 40 years in the data industry.
I have to ask: is this Web 2.0 fooflah? People getting together tossing around a couple of buzzwords, effort of which will fade away? Or is this truly some form of serious effort?
After what you've written here, it's hard for me to take seriously. Especially given the location, timeline, and 'user participation criteria', you've already effectively shut the door in the faces of people who have truly worked these issues for a long time.
Posted by: Shelley | August 28, 2006 at 12:55 PM
Count me in too - this is very closely aligned with why we founded microformats.org
Posted by: Kevin Marks | August 27, 2006 at 11:03 PM
Count me in.
:-)
Posted by: Marc Canter | August 27, 2006 at 06:24 PM