The House of Commons: July 2007

Tuesday, July 31, 2007

Music Royalty Fees

ABC Radio National, Australia Talks discussion regarding the recent ruling by the Copyright Tribunal increasing some music royalty fees. I blogged about this decision here.

(Hat tip: Matthew Rimmer)

Labels: abi, licensing

(permalink) posted by Abi Paramaguru @ Tuesday, July 31, 2007 0 comments links to this post

Thursday, July 26, 2007

Creative Commons Launching CC Learn

Creative Commons is planning to launch ccLearn. The project is aimed at minimising the barriers to sharing and reusing educational materials. This will be done using licensing, promoting the use of interoperability standards and educating teachers and learners.

(via BoingBoing)

Labels: abi, Creative Commons, licensing

(permalink) posted by Abi Paramaguru @ Thursday, July 26, 2007 0 comments links to this post

I IZ WIF US OR WOT?

House of Commons readers will know of the Virgin Australia/Creative Commons images controversy that I have blogged about previously (see ad campaign here). There were plenty of other pictures that Virgin could have used in its ad campaign, for example, why didn't Virgin just use pictures of cats (thereby avoiding issues of model releases, defamation, privacy etc) and, in the great tradition of LOLCATS, build a campaign around that? There are hundreds of thousands of CC licensed cat photos on Flickr so we decided to help Virgin out with this image based on their ad "multi tasking is like swordfighting, it always ends in tears".

(Pictured: "Matrix Swordfighting Cats", 0205billege, available under Creative Commons Attribution Non-Commercial License 2.0 license.)

Labels: abi, catherine, Fun, lolcats

(permalink) posted by Abi Paramaguru @ Thursday, July 26, 2007 0 comments links to this post

Wednesday, July 25, 2007

More on the Virgin Australia Controversy...

The Australian has done a follow up article regarding the Virgin Australia advertising controversy (blogged about here and here).

I think it is essential to reiterate that there are multiple legal issues at play and it is important not to get them confused. While I spoke about issues regarding attribution requirements under the license in my previous post, it is important not to mix this up with moral rights under the Copyright Act (which applies in Australia and are explicitly referenced in the Australian Creative Commons licences, but not in other jurisdictions like the US and the US Creative Commons licenses). Further, moral rights, and terms under the license apply in relation to the copyright owner/licensor- which in many cases is the photographer rather than the individuals in the photographs.

In some respects their advertising campaign is a very interesting use of Creative Commons licensed materials, providing some nice publicity for the photographers who have chosen to add an open license to their material which permits commercial use. However, it is also important for Virgin to read the terms of the license closely and fulfil their obligations under other areas of law.

Labels: abi, licensing

(permalink) posted by Abi Paramaguru @ Wednesday, July 25, 2007 1 comments links to this post

Monday, July 23, 2007

Are You Attributing or What?

Last week I blogged about Virgin Australia's latest advertising campaign, "Are you with us or what", and Virgin's use of Creative Commons licensed images in this campaign. Today there is an article in The Australian titled "Virgin 'in the wrong' on ads". The article contains comments from people unhappy about how they have been depicted in some of the images used by Virgin.

It is important to note that the person in the photograph is not necessarily the copyright owner. Generally it is the photographer that is the owner of copyright, though some exceptions apply, notably, in the case of commissioned photographs. As a result, rights may not arise for the individuals in the photographs under copyright law (they will have the look elsewhere, such as trade practices law, defamation or privacy). This 'photographers and copyright' information sheet by the Australian Copyright Council provides a useful overview.

Theoretically, the copyright owner in all of these cases has chosen to attach a Creative Commons attribution license (which allows for commercial use).

While there are multiple legal issues in play here, I am interested in whether Virgin has satisfied their requirements under the Creative Commons license.

For example, the first image that they use is a turtle ("websites shouldn't take long to load"). The link available on the bottom left hand corner of the picture goes to the Flickr user's photo page (which happens to contain 1,323 photos). From there, you will have to locate the particular photo in question and click on the photo page (I managed to locate Big Turtle in the Masoala Hall, Zurich using tags). Under 'additional information' you can click on 'some rights reserved' to discover that the photo is licensed under a Creative Commons Attribution 2.0 license. I query if link to the user's page provided by Virgin satisfy the attribution requirements under the license. If you look at the legal code for the license above, Section 4 outlines restrictions to the rights granted under the license:

a. ...You must include a copy of, or the Uniform Resource Identifier for, this License with every copy or phonorecord of the Work You distribute, publicly display, publicly perform, or publicly digitally perform...

b. ... You must keep intact all copyright notices for the Work and give the Original Author credit reasonable to the medium or means You are utilizing by conveying the name (or pseudonym if applicable) of the Original Author if supplied; the title of the Work if supplied; to the extent reasonably practicable, the Uniform Resource Identifier, if any, that Licensor specifies to be associated with the Work, unless such URI does not refer to the copyright notice or licensing information for the Work; and in the case of a Derivative Work, a credit identifying the use of the Work in the Derivative Work...

While 'reasonable to the medium or means You are utilizing' allows for a bit of flexibility, it appears to me that Virgin Australia could have done more by way of attribution. They provide a link to the author's Flickr page, however you still have to sift through this to find the actual photo. Only after you find the photo do you see which license is attached (and other details such as the title of the photograph). I am not sure that providing a link - which has a link - which contains a link to the license constitutes 'including a copy' of the license/URI as required under the 4 (a) license. Similar issues arise with regard to 4(b). I would love to hear what other people think about this issue and whether they believe the terms of the licence have been breached.

The Virgin scenario bears a (slight!) resemblance to a Canadian case involving a photographer and the use of a CC licensed image by an MP (the photographer disagreeing with the political views of the MP). Attribution issues were raised here as well.

It is a shame that this and other issues have presented themselves in reference to a campaign that is in some respects an innovative and interesting use of Creative Commons licensed materials.

(Pictured: "Big Turtle in the Masoala Hall, Zurich", alex.ch, available under Creative Commons Attribution License 2.0 license.)

Big thank you to Catherine for her help with this post!

Labels: abi, licensing

(permalink) posted by Abi Paramaguru @ Monday, July 23, 2007 0 comments links to this post

Friday, July 20, 2007

Probabilistic Ripple Down Rules

Ripple down rules (RDR) is a methodology for creating decision trees, in domains even experts have trouble mapping their knowledge in, by requiring the expert only to justify their correction of the system, in the context in which the particular error occurred. That's probably why the original article on ripple down rules was called Knowledge in context: a strategy for expert system maintenance.

Now, there's two possible approaches to making the RDR system probabilistic (i.e. making it predict the probability that it is wrong for a given input). First, we could try to predict the probabilities based on the structure of the RDR and which rules have fired. Alternatively, we could ask the expert explicitly for some knowledge of probabilities (in the specific context, of course).

Observational analysis

What I'm talking about here is using RDR like normal, but trying to infer probabilities based on the way it reaches its conclusion. The most obvious situation where this will work is when all the rules that conclude positive (interesting) fire and none of the rules that conclude negative (uninteresting) fire. (This does, however, mean creating a more Multiple Classification RDR type of system.) Other possibilities include watching over time to see which rules are more likely to be wrong.

These possibilities may seem week, but they may turn out to provide just enough information. Remember, any indication that some examples are more likely to be useful is good, because it can cut down the pool of potential false negatives from the whole web to something much, much smaller.

An expert opinion

The other possibility is to ask the expert in advance how likely the system is to be wrong. Now, as I discussed, this whole RDR methodology is based around the idea that experts are good at justifying themselves in context, so it doesn't make much sense to ask the expert to look at an RDR system and say in advance how likely a given analysis is to be wrong. On the other hand, it might be possible to ask the expert, when they are creating a new rule: what is the probability that the rule will be wrong (the conclusion is wrong), given that it fires (its condition is met)? And, to get a little bit more rigorous, we would ideally also like to know: what is the probability that the rule's condition will be met, given that the rule's parent fired (the rule's parent's condition was met)?

The obvious problem with this is that the expert might not be able to answer these questions, at least with any useful accuracy. On the other hand, as I said above, any little indication is useful. Also, it's worth pointing out that what we need is not primarily probabilities, but rather a ranking or ordering of the candidates for expert evaluation, so that we know which is the most likely to be useful (rather than exactly how likely it is to be useful).

Also the calculations of probabilities could turn out to be quite complex :)

Here's what I consider a minimal RDR tree for the purposes of calculating probabilities, with some hypothetical (imaginary) given probabilities.

Let me explain. Rule 0 is the default rule (the starting point for all RDR systems). It fires 100% of the time, and in this case it is presumed to be right 99% of the time (simulating the needle-in-a-haystack scenario). Rules 1 and 2 are exceptions to rule 0, and will be considered only when rule 0 fires (which is all the time because it is the default rule). Rule 3 is an exception to rule 2, and will be considered only when rule 2 fires.

The conclusions of rules 0 and 3 are (implicitly) 'hay' (uninteresting), while the conclusions of rules 1 and 2 are (implicitly) 'needle' (interesting). This is because the conclusion of every exception rule needs to be different from the conclusion of the parent rule.

The percentage for 'Fires' represents the expert's opinion of how likely the rule is to fire (have its condition met) given that the rule is reached (its parent is reached and fires). The percentage for 'Correct' represents the expert's opinion of how likely the rule's conclusion is to be correct, given that the rule is reached and fires.

With this setup, you can start to calculate some interesting probabilities, given knowledge of which rules fire for a given example. For example, what is the probability of 'needle' given that rules 1 and 2 both fire, but rule 3 doesn't? (This is assumedly the most positive indication of 'needle' we can get.) What difference would it make if rule 3 did fire? If you can answer either of these questions, leave a comment.

If no rules fire, for example, the probability of 'needle' is 0.89%, which is only very slightly less than the default probability of 'needle' before using the system, which was 1%. Strange, isn't it?

Labels: artificial intelligence, ben, research

(permalink) posted by Ben Bildstein @ Friday, July 20, 2007 0 comments links to this post

Thursday, July 19, 2007

Random acts of senseless piracy steal the fun from Harry's fans?

[This is a guest post, written by David Vaile, Executive director, Cyberspace Law and Policy Centre -- Abi]

House of Commons housemeister Abi reported a dreadful attack on the rights of Harry Potter fans last week as they waited for the latest and perhaps last instalment. Once hastily scanned pages of the much-awaited new Potter book were posted on a web site, taken down (oops, too late), and distributed via file-sharing as reported here, a legion of nasty little devils have been spoiling everyone's fun by up-loading snippets in the form " dies, hahaha!" in random, inappropriate but highly visible places, like in the comments to John Howard's now infamous YouTube page.

The unreleased novel's plot secrets were effectively broadcast by these cheeky killjoys to deliberately defuse the drama and anticipation for readers everywhere. So now, beyond the routinely-claimed commercial effects of digital piracy, should we add a new offence to the litany of intellectual property crimes: intentionally undermining the pleasure of the text?

Strange new network effects have also appeared due to the ubiquity of these posts, which suddenly seemed to be everywhere on the Internet. Legions of expectant Potter readers (you know who you are), on hearing of these outrages realised that there was, for the last week, probably nowhere safe to go on the net: no way to be online without the risk of being inadvertently infected with one of these guerrilla memes, indiscriminate cluster-bombs of premature revelation peppering the online landscape, little plot viruses that leach away motivation for reading the final instalment if you are accidentally exposed to them.

An unintended consequence of this new social form of Internet 'malware' may be that, in order to preserve your blissful ignorance of key plot twists until at last you have the tome in your sweaty paws, fans may be reluctantly obliged to seek temporary safe haven in real work offline, rather than risking accidental exposure tooling about on the net 'for research' as usual.

Don't go reading the paper though -- just when we were sure this pathetic plot-rot was the work of bratty 12-year-olds, along comes Sydney's Sun-Herald to restore our faith in the Fourth Estate and Old Media's race to the bottom: there, on page 10, in upside-down back-to-front writing you can only read in the mirror, were four of the (ex-)secrets.

(OK, admittedly this was after the witching hour of publication -- but the modus operandi was the same as the net culprits', curse them all.)

Labels: guest post, piracy

(permalink) posted by Abi Paramaguru @ Thursday, July 19, 2007 3 comments links to this post

Wednesday, July 18, 2007

The Democratic Deficit in Copyright Law: A Legislative Proposal

The Cyberspace Law and Policy Centre and Linux Australia are hosting a LawTechTalk by Maureen O'Sullivan (lecturer from the National University of Ireland, Galway). The talk considers issues in Free/Libre and/or Open Source Software (FLOSS) licensing, particularly as manifest in the recently finalised GPL v3 and their impact on Spanish-speaking or civil code countries, and her proposal for an international standard law to help free software licenses work the same in all users' countries.

Topic: The Democratic Deficit in Copyright Law: A Legislative Proposal

Date: Wednesday 25 July

Time: 1-2pm

Venue: Room 101, Faculty of Law Building, University of New South Wales

More information is available here.

All are welcome. Hope to see you there!

Labels: abi, free software, seminars

(permalink) posted by Abi Paramaguru @ Wednesday, July 18, 2007 0 comments links to this post

Is Canada Really a 'Piracy Haven'?

I finally got around to watching this video by Michael Geist and Daniel Albahary. If you haven't already, take a look:

Labels: abi, piracy

(permalink) posted by Abi Paramaguru @ Wednesday, July 18, 2007 0 comments links to this post

Tuesday, July 17, 2007

Possible Solutions

In a previous post, I talked about the problems of finding commons in the deep web; now I want to talk about some possible solutions to these problems.

Ripple Down Rules

Ripple down rules is a knowledge acquisition methodology developed at the University of New South Wales. It's really simple - it's about incrementally creating a kind of decision tree based on an expert identifying what's wrong with the current decision tree. It works because the expert only needs to justify their conclusion that the current system is wrong in a particular case, rather than identify a universal correction that needs to be made, and also the system is guaranteed to be consistent with the expert's evaluation of all previously seen data (though overfitting can obviously still be a problem).

The application of ripple down rules to deep web commons is simply this: once you have a general method for flattened web forms, you can use the flattened web form as input to the ripple down rules system and have the system decide if the web form hides commons.

But how do you create rules from a list of text strings without even a known size (for example, there could be any number of options in a select input (dropdown list), and any number of select inputs in a form). The old "IF weather = 'sunny' THEN play = 'tennis'" type of rule doesn't work. One solution is to make the rule conditions more like questions, with rules like "IF select-option contains-word 'license' THEN form = 'commons'" (this is a suitable rule for Advanced Google Code Search). Still, I'm not sure this is the best way to express conditions. To put it another way, I'm still not sure that extracting a list of strings, of indefinite length, is the right way to flatten the form (see this post). Contact me if you know of a better way.

A probabilistic approach?

As I have said, one of the most interesting issues I'm facing is the needle in a haystack problem, where we're searching for (probably) very few web forms that hide commons, in a very very big World Wide Web full of all kinds of web forms.

Of course computers are good at searching through lots of data, but here's the problem: while you're training your system, you need examples of the system being wrong, so you can correct it. But how do you know when it's wrong? Basically, you have to look at examples and see if you (or the expert) agree with the system. Now in this case we probably want to look through all the positives (interesting forms), so we can use any false positives (uninteresting forms) to train the system, but that will quickly train the system to be conservative, which has two drawbacks. Firstly, we'd rather it wasn't conservative because we'd be more likely to find more interesting forms. Secondly, because we'll be seeing less errors in the forms classified as interesting, we have less examples to use to train the system. And to find false negatives (interesting forms incorrectly classified as uninteresting), the expert has to search through all the examples the system doesn't currently think are interesting (and that's about as bad as having no system at all, and just browsing the web).

So the solution seems, to me, to be to change the system, so that it can identify the web form that it is most likely to be wrong about. Then we can get the most bang (corrections) for our buck (our expert's time). But how can anything like ripple down rules do that?

Probabilistic Ripple Down Rules

This is where I think the needle in a haystack problem can actually be an asset. I don't know how to make a system that can tell how close an example is to the boundary between interesting and uninteresting (the boundary doesn't really exist, even). But it will be a lot easier to make a system that predicts how likely an example is to be an interesting web form.

This way, if the most likely of the available examples is interesting, it will be worth looking at (of course), and if it's classified as not interesting, it's the most likely to have been incorrectly classified, and provide a useful training example.

I will talk about how it might be possible to extract probabilities from a ripple down rules system, but this post is long enough already, so I'll leave that for another post.

Labels: artificial intelligence, ben, research

(permalink) posted by Ben Bildstein @ Tuesday, July 17, 2007 3 comments links to this post

Monday, July 16, 2007

Controversy Over Virgin Australia Using Creative Commons Content

Virgin Australia Mobile has started an advertising campaign called "Are you with us or what?" using Creative Commons licensed (commercial use permitted) content available on Flickr. Many images have been used and Virgin has added text to these images. At times the text they have added could be considered derogatory. This has sparked quite a bit of outrage.

Read more in this post on TechnoLlama.

Labels: abi, licensing

(permalink) posted by Abi Paramaguru @ Monday, July 16, 2007 0 comments links to this post

Thursday, July 12, 2007

The Problems

Today I just wanted to take a brief look at some of the problems I'm finding myself tackling with the deep web commons question. There's two main ones, from quite different perspectives, but for both I can briefly describe my current thoughts.

Flattening a web form

The first problem is that of how to represent a web form in such a way that it can be used as an input to an automated system that can evaluate it. Ideally, in machine learning, you have a set of attributes that form a vector, and then you use that as the input to your algorithm. Like in tic-tac-toe, you might represent a cross by -1, a naught by +1, and an empty space by 0, and then the game can be represented by 9 of these 'attributes'.

But for web forms it's not that simple. There are a few parts of the web form that are different from each other. I've identified these potentially useful places, of which there may be one or more, and all of which take the form of text. These are just the ones I needed when considering Advanced Google Code Search:

Form text. The actual text of the web form. E.g. "Advanced Code Search About Google Code Search Find results with the regular..."
Select options. Options in drop-down boxes. E.g. "any language", "Ada", "AppleScript", etc.
Field names. Underlying names of the various fields. E.g. "as_license_restrict", "as_license", "as_package".
Result text. The text of each search result. E.g. (if you search for "commons"): "shibboleth-1.3.2-install/.../WrappedLog.java - 8 identical 26: package..."
Result link name. Hyperlinks in the search results. E.g. "8 identical", "Apache"

Conditions can then be made up of these. For example, when (as in the case of Advanced Google Code Search) you see a select option like "GNU General Public License", it's an indication you've found an interesting web form.

But as far as I can tell, text makes for bad attributes. Numerical is much better. As far as I can tell. But I'll talk about that more when I talk about ripple down rules.

A handful of needles in a field of haystacks

The other problem is more about what we're actually looking for. We're talking about web forms that hide commons content. Well the interesting this about that is that there's bound to be very few, compared to the rest of the web forms on the Internet. Heck, they're not even all for searching. Some are for buying things. Some are polls.

And so, if, as seems likely, most web forms are uninteresting, if we need to enlist an expert to train the system, the expert is going to be spending most of the time looking at uninteresting examples.

This makes it harder, but in an interesting way: if I can find some way to have the system, while it's in training, find the most likely candidate of all the possible candidates, it could solve this problem. And that would be pretty neat.

Labels: ben, deep web, research

(permalink) posted by Ben Bildstein @ Thursday, July 12, 2007 0 comments links to this post

Wednesday, July 11, 2007

Dancing Queen No More? 1400% Increase in License Fees

When you receive several emails about a copyright related development, all before lunchtime, you know something is up (though it could just say something about me).

Yesterday the Copyright Tribunal approved an application to increase licensing fees for nightclubs and dance parties. The application was made by a copyright collecting society called Phonographic Performance Company of Australia (PPCA). PPCA are calling the decision 'a better deal for artists' I am calling it 'license fee increase of around 1400%- that is massive'.

The decision can be found here [RTF]. Brief outline below.

Phonographic Performance Company of Australia Limited (ACN 000 680 704) under section 154(1) of the Copyright Act 1968 (Cth) [2007] ACopyT 1

Framework: The Tribunal discusses the statutory framework for its power to confirm or vary licence schemes. At [10] they note that this involves "a value judgment as to what it considers reasonable in the circumstances. It is not usually possible to calculate mathematically the correct licence fee in any particular case."

Nightclubs and Dance Parties: The decision discusses a study commissioned by the PPCA raising ground breaking points such as nightclubs play music and sometimes nightclubs have dance floors. Nightclub operators presented evidence of their declining patronage and how they usually operate below capacity. Dance parties are usually one off events, the popularity of these events have also declined according to evidence presented to the Tribunal.

Current Tariffs: Current license fees for nightclubs are 7.48 cents per person per night (number is based on licensed capacity of the venue). The amount is payable for each area where music is playing (if applicable, different rooms, levels etc). Dance parties need to pay 19.8 cents per person, based on estimated attendance.

The Respondents: The Respondents include Australian Hotels Association, Clubs Australia, Clubs NSW, Explorer Cruise Lines Ltd and others together with Nightclub Respondents. Issues raised included the definition of nightclub, whether license fees should be calculated based on attendance rather than capacity, non protected music, whether the extent that patrons are willing to pay for recorded music can be established, the way in which the fee is calculated for dance parties and whether not for profit organisations liked Mardi Gras should be treated differently from other organisations.

Willingness to Pay: The PPCA engaged Allen Consulting Group (you might remember their report into the economic effects of copyright term extension) to estimate the value of sound recordings in nightclubs and dance parties. Allen Consulting utilised a 'choice modelling survey' to determine 'willingness to pay'. The nightclub respondents criticised the survey claiming that it was "divorced from economic and competitive reality" and provided unrealistic choice sets (at [147]) . The case continues with various factors relevant to the assessment of economic impact.

Judicial Estimate: With respect to nightclubs the PPCA claimed $2.32 per person. The Tribunal discounted this rate based on non-protected music (-20%), competition from other late night venues providing live or recorded music (-20%) however chose not to discount for actual patronage being below or above capacity. Further:

"The division of the estimate of willingness to pay should be adjusted to reflect the fact that the entrepreneurial risk in relation to the operation of a nightclub is undertaken by the operator and not by the Society or by APRA...A more appropriate division, therefore, would be 50% to the operator and 25% to each of APRA and the Society. " at [215].

The Tribunal arrived at a figure of $1.05 for the use of protected music at nightclubs.

PPCA claimed $15.37 for value of music at dance parties. This figure was reduced by 20% for non-protected music and entrepreneurial risk, leaving this figure at $3.07 per person.

Not for profit: Mardi Gras made an application under s 157(2) of the Copyright Act seeking a determination that the license scheme is unreasonable in their circumstances. The Tribunal found:

"Clearly, much of Mardi Gras’s activities are intended to serve a community purpose. However, that does not mean that the Society, and its members, must also be compelled to support those purposes. It is not for copyright owners, or any other private group in the community, to subsidise public instrumentalities or charities" at [232].

Outcome: The scheme proposed by the PPCA was approved subject to adjustments (to rates and defintions) indicated by the Tribunal. The application by Mardi Gras was refused.

Thoughts?

The case provides interesting insight into the calculation of licensing fees. The Tribunal noting that:

"The exercise that results in that figure is, of course, to a considerable extent, arbitrary and artificial. Nevertheless, it has a rational basis for arriving at what has been described as a judicial estimate of what a reasonable but not too anxious licensor would require to be paid and what a reasonable but not too anxious nightclub operator would be prepared to pay for the right to play recorded music at nightclub venues" at [217].

In the end, it is the consumer who has to pay. Both through increased prices and, if this leads to closures, then reduced choice. It is the extent of this impact that is unclear. If this decision leads to the closure of less mainstream clubs then it is important to ask which artists are actually getting the better deal.

More reports here (The Age) and here (SMH).

Update: More info on the economic analysis in the Tribunal decision available here on Core Economics (hat tip: Peter Black).

Labels: abi, licensing

(permalink) posted by Abi Paramaguru @ Wednesday, July 11, 2007 0 comments links to this post

Tuesday, July 10, 2007

Current Focus - The Deep Web

My previous two posts, last week, talked about my research, but didn't really talk about what I'm researching at the moment.

The deep web

Okay, so here's what I'm looking at. It's called the deep web, and it refers to the web documents that the search engines don't know about.

Sort of.

Actually, when the search engines find these documents, they really become part of the surface web, in a process sometimes called surfacing. Now I'm sure you're wondering: what kinds of documents can't search engines find, if they're the kind of documents anyone can browse to? The simple answer is: documents that no other pages link to. But a more realistic answer is that it's documents hidden in databases, that you have to do searches on the site to find. They'll generally have URLs, and you can link to them, but unless someone does, they're part of the deep web.

Now this is just a definition, and not particularly interesting in itself. But it turns out (though I haven't counted, myself) that there are more accessible web pages in the deep web than in the surface web. And they're not beyond the reach of automated systems - the systems just have to know the right questions to ask and the right place to ask the question. Here's an example, close to Unlocking IP. Go to AEShareNet and do a search, for anything you like. The results you get (when you navigate to them) are documents that you can only find by searching like this, or if someone else has done this, found the URL, and then linked to it on the web.

Extracting (surfacing) deep web commons

So when you consider how many publicly licensed documents may be in the deep web, it becomes an interesting problem from both the law / Unlocking IP perspective and from the computer science, which I'm really happy about. What I'm saying here is that I'm investigating ways of making automated systems to discover deep web commons. And it's not simple.

Lastly, some examples

I wanted to close with two web sites that I think are interesting in the context of deep web commons. First, there's SourceForge, which I'm sure the Planet Linux Australia readers will know (for the rest: it's a repository for open source software). It's interesting, because their advanced search feature really doesn't give many clues about it being a search for open source software.

And then there's the Advanced Google Code Search, which searches for publicly available source code, which generally means free or open source, but sometimes just means available, because Google can't figure out what the licence is. This is also interesting because it's not what you'd normally think of as deep web content. After all Google's just searching for stuff it found on the web, right? Actually, I class this as deep web content because Google is (mostly) looking inside zip files to find the source code, so it's not stuff you can find in regular search.

This search, as compared to SourceForge advanced search, makes it very clear you're searching for things that are likely to be commons content. In fact, I came up with 6 strong pieces of evidence that I can say leads me to believe Google Code Search is commons related.

(As a challenge to my readers, see how many pieces of evidence you can find that the Advanced Google Code Search is a search for commons (just from the search itself), and post a comment).

Labels: ben, deep web, research

(permalink) posted by Ben Bildstein @ Tuesday, July 10, 2007 2 comments links to this post

Samba embraces GPLv3

The Samba team have announced they will start using the new version of the GPL, after initially indicating they might not. As I recall Andrew Tridgell saying, they were using GPL version 2 (the previous version) because they liked the words of the licence, and the principles of free software as embodied in it, and GPLv3 seemed to be going in a slightly different direction.

This announcement acknowledges that GPLv3 "is an improved version of the license to better suit the needs of Free Software in the 21st Century," saying "We feel this is an important change to help promote the interests of Samba and other Free Software."

Unfortunately, the announcement doesn't say much about how Samba made their decision or what swayed them.

Labels: ben, free software, gpl

(permalink) posted by Ben Bildstein @ Tuesday, July 10, 2007 0 comments links to this post

Friday, July 06, 2007

Quantification

Those of you who have been paying (very) close attention would have noticed that there was one thing missing from yesterday's post - the very same topic on which I've been published: quantification of online commons.

This is set to be a continuing theme in my research. Not because it's particularly valuable in the field of computer science, but because in the (very specific) field of online commons research, no one else seems to be doing much. (If you know something I don't about where to look for the research on this, please contact me now!)

I wish I could spend more time on this. What I'd do if I could would be another blog post altogether. Suffice it to say that I envisaged a giant machine (completely under my control), frantically running all over the Internets counting documents and even discovering new types of licences. If you want to hear more, contact me, or leave a comment here and convince me to post on it specifically.

So what do I have to say about this? Actually, so much that the subject has its own page. It's on unlockingip.org, here. It basically surveys what's around on the subject, and a fair bit of that is my research. But I would love to hear about yours or any one else's, published, unpublished, even conjecture.

Just briefly, here's what you can currently find on the unlockingip.org site:

My SCRIT-ed paper
My research on the initial uptake of the Creative Commons version 2.5 (Australia) licence
Change in apparent Creative Commons usage, June 2006 - March 2007
Creative Commons semi-official statistics

What else?

I'm also interested in the methods of quantification. With the current technologies, what is the best way to find out, for any given licence, how many documents (copyrighted works) are available with increased public rights? This is something I need to put to Creative Commons, because their licence statistics page barely addresses this issue.

Labels: ben, quantification, research

(permalink) posted by Ben Bildstein @ Friday, July 06, 2007 1 comments links to this post

Thursday, July 05, 2007

My research

Recently, I've been pretty quiet on this blog, and people have started noticing and suggested I post more. Of course I don't need to point out what my response was...

The reason I haven't been blogging much (apart from laziness, which can never be ruled out) is that The House of Commons has become something of an IP blog. Okay, it sounds obvious, I know. And, as it seems I say at every turn, I have no background in law, and my expertise is in computer science and software engineering. And one of the unfortunate aspects of the blog as a medium is that you don't really know who's reading it. The few technical posts I've done haven't generated much feedback, but then maybe that's my fault for posting so rarely that the tech folks have stopped reading.

So the upshot of this is a renewed effort by me to post more often, even if it means technical stuff that doesn't make sense to all our readers. It's not like I'm short of things to say.

To start with, in the remainder of this post, I want to try to put in to words, as generally as possible, what I consider my research perspective to be...

Basically what I'm most interested in is the discovery of online documents that we might consider to be commons. (But remember, I'm not the law guy, so I'm trying not to concern myself with that definition.) I think it's really interesting, technically, because it's so difficult to say (in any automated, deterministic way, without the help of an expert - a law expert in this case).

And my my computer science supervisor, Associate Professor Achim Hoffman, has taught me that computer science research needs to be as broad in application as possible, so as I investigate these things, I'm sparing a thought for their applicability to areas other than commons and even law.

In upcoming posts, I'll talk about the specifics of my research focus, some of the specific problems that make it interesting, possible solutions, and some possible research papers that might come out of it in the medium term.

Labels: ben, research

(permalink) posted by Ben Bildstein @ Thursday, July 05, 2007 3 comments links to this post

Some Friday Fun - Wikipedia Style

(Pictured: "Wikipedian Protester", Randall Munroe - via his excellent webcomic xkcd, available under a Creative Commons Attribution-NonCommercial 2.5 license)

Labels: abi, Fun

(permalink) posted by Abi Paramaguru @ Thursday, July 05, 2007 2 comments links to this post

Wednesday, July 04, 2007

The Benoit Tragedy and Wikipedia Controversy

Some readers will be aware of the recent death of United States pro wrestler Chris Benoit, his wife Nancy and their son Daniel. Police concluded that the deaths were a murder-suicide: Benoit killed his wife and child and then himself. However, this tragedy has been accompanied by an eerie twist: the fact that the death of Benoit’s wife was added to Benoit’s Wikipedia page 14 ½ hours prior to the bodies being discovered by local authorities. The Wikipedia angle was reported about Thursday-Friday of last week but rather than blogging and speculating about how this actually happened, I wanted to wait to see, well, what actually happened.

An anonymous individual posted on Benoit’s Wikipedia page that he was replaced by another wrestler, Johnny Nitro, for a championship wrestling event as Benoit was unable to attend the event “due to personal issues, stemming from the death of his wife Nancy.” A Wikipedia moderator took the post down an hour later on the basis that the statement needed a reliable source. A second anonymous individual then added to the site that “several pro wrestling websites” attributed Benoit’s failure to attend the event to Nancy’s death. This second post was made by an individual in Australia. The second post was then removed by Wikipedia editors on the basis that “several pro wrestling websites” was not reliable. When it was revealed that Benoit, his wife and son had died, Wikipedia editors put the puzzle together and contacted authorities. (see the Sydney Morning Herald report here).

After revealing that they were responsible for the first post, the anonymous individual said that they had made the changes to Benoit’s Wikipedia page on the basis of a number of rumours floating around the Internet. Further, they stated that

"I posted the comment we are all talking about and I am here to explain that it
was A HUGE COINCIDENCE and nothing more…
I was beyond wrong for posting wrongful information, and I am sorry to everyone for this ... I just posted something that was at that time a piece of wrong unsourced information that is typical on wikipedia, as it is done all the time.” (Jano Gibson, “Benoit Mystery’s Wiki Twist: I Did It”, Sydney Morning Herald, 29 June 2007)

So does saying that “I just posted something that was at that time a piece of wrong unsourced information that is typical on Wikipedia, as it is done all the time” make it all right then? No, for a number of reasons. First, while editing Wikipedia has become all the rage, what is the rush in posting the death of an individual before it’s actually been confirmed? Even if this was based on ‘rumours’ – which in this case ended up being somewhat true – I’m not sure of the harm in waiting for a death to be confirmed by more reliable sources before adding it the Wikipedia page. After the Sinbad incident, chances are that Jimmy Wales wouldn’t mind Wikipedia not being updated for a few hours in order to confirm that an individual in question is actually deceased. Second, if the individual is not a prankster and does in fact care about the information on Wikipedia, then surely they should not base their posts on unsubstantiated rumours and seek to dispel the misconception that Wikipedia is the place you go to post inaccurate information.

It's a shame that in such tragic circumstances this is the story that is filling the headlines.

Labels: catherine, wikis

(permalink) posted by Catherine Bond @ Wednesday, July 04, 2007 0 comments links to this post

Monday, July 02, 2007

GPLv3 has been Released

Read it here.

Labels: abi, gpl

(permalink) posted by Abi Paramaguru @ Monday, July 02, 2007 0 comments links to this post

Tuesday, July 31, 2007

Thursday, July 26, 2007

Wednesday, July 25, 2007

Monday, July 23, 2007

Friday, July 20, 2007

Thursday, July 19, 2007

Wednesday, July 18, 2007

Tuesday, July 17, 2007

Monday, July 16, 2007

Thursday, July 12, 2007

Wednesday, July 11, 2007

Tuesday, July 10, 2007

Friday, July 06, 2007

Thursday, July 05, 2007

Wednesday, July 04, 2007

Monday, July 02, 2007

Contributors

On this page

Supporters

Archives

IP blogosphere