Luis Villa: Copyleft and data: databases as poor subject |
tl;dr: Open licensing works when you strike a healthy balance between obligations and reuse. Data, and how it is used, is different from software in ways that change that balance, making reasonable compromises in software (like attribution) suddenly become insanely difficult barriers.
In my last post, I wrote about how database law is a poor platform to build a global public copyleft license on top of. Of course, whether you can have copyleft in data only matters if copyleft in data is a good idea. When we compare software (where copyleft has worked reasonably well) to databases, we’ll see that databases are different in ways that make even “minor” obligations like attribution much more onerous.
In software copyleft, the most common scenarios to evaluate are merging two large programs, or copying one small file into a much larger program. In this scenario, understanding how licenses work together is fairly straightforward: you have two licenses. If they can work together, great; if they can’t, then you don’t go forward, or, if it matters enough, you change the license on your own work to make it work.
In contrast, data is often combined in three ways that are significantly different than software:
Attribution in large software projects is painful enough that lawyers have written a lot on it, and open-source operating systems vendors have built somewhat elaborate systems to manage it. This isn’t just a problem for copyleft: it is also a problem for the supposedly easy case of attribution-only licenses.
Now, again, instead of dozens of authors, often employed by the same copyright-owner, imagine hundreds or thousands. And imagine that instead of combining these pieces in basically the same way each time you build the software, imagine that every time you have a different query, you have to provide different attribution data (because the relevant slices of data may have different sources or authors). That’s data!
The least-bad “solution” here is to (1) tag every field (not just data source) with licensing information, and (2) have data-reading software create new, accurate attribution information every time a new view into the data is created. (I actually know of at least one company that does this internally!) This is not impossible, but it is a big burden on data software developers, who must now include a lawyer in their product design team. Most of them will just go ahead and violate the licenses instead, pass the burden on to their users to figure out what the heck is going on, or both.
Most software is either under a very standard and well-understood open source license, or is produced by a single entity (or often even a single person!) that retains copyright and can adjust that license based on their needs. So if you find a piece of software that you’d like to use, you can either (1) just read their standard FOSS license, or (2) call them up and ask them to change it. (They might not change it, but at least they can if they want to.) This helps make copyleft problems manageable: if you find a true incompatibility, you can often ask the source of the problem to fix it, or fix it yourself (by changing the license on your software).
Data sources typically can’t solve problems by relicensing, because many of the most important data sources are not authored by a single company or single author. In particular:
Copyleft (and, to a lesser extent, attribution licenses) works when the obligations placed on a user are in balance with the benefits those users receive. If they aren’t in balance, the materials don’t get used. Ultimately, if the data does not get used, our egos feel good (we released this!) but no one benefits, and regardless of the license, no one gets attributed and no new material is released. Unfortunately, even minor requirements like attribution can throw the balance out of whack. So if we genuinely want to benefit the world with our data, we probably need to let it go.
So if data is legally hard to build a license for, and the nature of data makes copyleft (or even attribution!) hard, what to do? I’ll go into that in my next post.
http://lu.is/blog/2016/09/14/copyleft-and-data-databases-as-poor-subject/
Комментировать | « Пред. запись — К дневнику — След. запись » | Страницы: [1] [Новые] |