Do we need standards for ethical aggregation? Do we need a Curator's Code for attributing discovery? The latest proposals to codify credit-giving on the web have been controversial. On the one hand, there's appeal to the idea of streamlining and standardizing the way people attribute the discoveries they make online. On the other hand, there's an impracticality to the notion of top-down rules on aggregating -- and, some have argued, a paternalism to it. (Gawker: We Don't Need No Stinking Seal of Approval From the Blog Police.)

In a new paper in, yep, the International Journal of Digital Curation, Paul Groth (of VU University, Amsterdam), Yolanda Gil (the Information Sciences Institute, University of Southern California), James Cheney (University of Edinburgh), and Simon Miles (Kings College London) explore the issue of the "requirements for provenance on the web." Provenance being, essentially, a combination of attribution and origin -- the creation myth of a piece of information.

The researchers look at three scenarios -- including a HuffPostian "news aggregator scenario" -- that both require and thwart some kind of provenance infrastructure. And their conclusions, for those in the pro-attribution camp, are not heartening.

"Even recognizing that provenance is central to the way that humans make use of information," the authors observe, "typically information systems offer little or no support for provenance beyond primitive (and unreliable) ownership, creation and modification timestamps." Furthermore, "we are not routinely exposed to software, databases, or web applications that understand and handle rich forms of provenance for us. However, we do use provenance in all our decision making."

So provenance is everywhere ... and provenance is nowhere. And "why is provenance not pervasive in all information systems and software?" the authors ask. Because

although the treatment of provenance seems straightforward (after all there is nothing conceptually challenging about recording and replaying extra log information), the design of appropriate provenance solutions becomes complex very quickly.

And also because, more interestingly:

Provenance may be in an analogous situation to how user interfaces were treated a few decades ago. That is, user interfaces were viewed as an afterthought to a system's design (after all there is nothing challenging about putting a few buttons and menus here and there) but then in practice they became complex very quickly. Appropriate methodology had to be developed in order to understand the role of a user interface in software systems. Today, user interfaces are at the forefront of a system's design, ensuring the usability and success of the system. Provenance may be in the early stages of a similar cycle. We lack the principles, languages, and methodologies to incorporate provenance in the design and implementation of software systems more pervasively and thoroughly. Provenance solutions may boost the usability and ultimately the success of today's software systems, and perhaps open the door to new application areas where delegation and trust are paramount.

Perhaps -- but that will they require, the authors note, new "principles, languages, and methodologies." Which are three things that have never been known to be especially adaptable. The paper's conclusion is appropriately ambiguous: We need a comprehensive, and comprehensible, attribution system -- "without provenance, information is hard to understand, integrate, and trust" -- and yet we have no idea, pragmatically, how to create one. On the one hand, "the availability of enormous volumes of information and data in the digital information age makes it crucial that we develop provenance frameworks to capture, manage, and use provenance." 

On the other: "The question of how to collect adequate provenance, especially from end-users, is a challenging open problem."

Image: Toban Black/Flickr.