Cyberspace Law and Policy Centre, University of New South Wales
Unlocking IP  |  About this blog  |  Contact us  |  Disclaimer  |  Copyright & licencsing  |  Privacy

Tuesday, October 17, 2006


SWS: Many Other Mechanisms

From the starting point of semantic web search just for Creative Commons (CC) licensed works using RDF/XML metadata (see my previous post), we can expand the idea to include other mechanisms and other kinds of documents. For example, the AEShareNet licensing scheme does not promote RDF, but instead web pages simply link to AEShareNet's copy of the licence, and display the appropriate image. A semantic web search engine could then use this information to establish that such pages are licensed, and in combination with an administrator entering the appropriate rights metadata for these licences, such documents can be included in search results. Using this new architecture of link-based licensing identification, we can also expand the Creative Commons search to include pages that link to their licences but for one reason or another do not include the relevant metadata (this is the primary method that Yahoo advanced search uses). Note that such link-based searching will inevitably include false positives in the search results.

The next mechanism that can be considered is that of HTML 'meta' tags. This is part of the HTML format, and is an alternative (and older) way of putting metadata into web pages. The same information can be carried, and given the nature of 'meta' tags they are unambiguous in a similar way to RDF, so false positives should not be a problem.

Another possibility is that the RDF that describes the rights in a page will not be embedded in that page, but will exist elsewhere. This is not too much of an issue, because the search engine can certainly be made capable of reading it and correctly interpreting it. However, it is worth noting that we should have less confidence in such 'foreign RDF' than we would in locally embedded RDF, because it is more likely than otherwise to be a demonstration or illustrative example, rather than a serious attempt to convey licensing information.

Text-based licences

One mechanism that poses significant challenges is what I call 'text-based licences', as compared with RDF-based (e.g. CC) or link-based (e.g. AEShareNet) licences. What I mean by ‘text-based’ is that the licence text is duplicated and included with each licensed work. This raises two problems: What happens if the licence text undergoes some slight changes? And what happens if the licence text is stored in a different document to the document that is a candidate for a search result? (This is common practice in software, as well as uses of the LGPL in multi-page documents. Wikipedia is a prime example of the latter.)

The first question can be answered fairly simply, although the implementation might not be as easy: the search engine needs a feature that allows it to compare a licence text to a document to see if they are (a) not similar, (b) similar enough that the document can be considered a copy, or (c) similar, but with enough extra content in the document that it can be considered a licensed document in its own right.

The other question is more tricky. One possible solution might be to keep a database of all such copies of known licences (what I term 'impromptu' licences), and then, using the functionality of establishing links to licences, record every web page that links to such an impromptu licence as licensed under the original licence. This idea will be useful for all types of text-based licensed, from free software to open content.

But wait, there’s more

Stay tuned for the third instalment, where I will talk about how to utilise non-web content, and the difficulties with public domain works.

Labels: ,

Comments: Post a Comment

Links to this post:

Create a Link

<< Home

This page is powered by Blogger. Isn't yours?