I want to take a moment to talk about the availability of scientific articles, in this 30th anniversary of the world wide web.
Coming up to the 30th anniversary of the world wide web, I want to say that it’s amazing just how much information has now been made freely available. Fair enough, in the most recent years fake news seems to have made a disconcerting rise, but at the same time the wealth of real reliable knowledge has exploded.
Part of this revolves around the humble PDF and the scientific article. In the past these, I understand, would largely have been hidden away in journals which you would need to hunt down a hard copy of in a library. Luckily I come late enough in life to have been able to avoid this in my studies. When I want something, I can just get a copy from the web.
I have encountered a number of issues which make it difficult to actually access such content, and which I believe probably causes difficulty for many other people as well, and poses a great challenge to the open web as a whole.
Paywalled articles are annoying, and relying on paid journal distributors is a poor move. Better informed individuals than me have written on why such a direction for academia is a poor move, so I won’t keep beating the dead horse. I believe in open access, but for the moment that’s a fully reliable solution.
First, on the point of them being annoying, I can’t count the number of times I’ve encountered an article but have been forbidden from accessing it. This despite the fact that my university has a paid subscription for it. I have to hunt down credentials and enter them in order to view an article, I have to create an account on myriad services to download a PDF.
My university has a web proxy I can use, so I built a browser extension to let me quickly route through that, but I am fairly certain this is not the kind of web that Berners-Lee envisaged when he invented the web decades ago.
Back then you just accessed a server and got your document. In fact, even going as far back as the internet itself, that’s the model. I fail to see why things have to be so difficult nowadays. Even if broadband speeds have gone up, we’ve gone backwards in how easily available things are.
DRM & Formats
An issue that can be associated with the paywall. Digital Restriction Management is when a special measure, typically software, is used to prevent you using the file however you wish.
I despise DRM and thankfully haven’t had to deal with it when reading articles. It is however a concern, more particularly with eBooks. The web should have been about openness and freedom to share common readable formats of information. 30 years after the web started, any browser can still interact with HTML (even if some parts are deprecated, RIP
<marquee>). I really don’t think the same can be said of DRM software.
Flash, though it hangs on by a thread, is in the throes of a very prolonged and painful death. Ignoring all the security holes I believe this is in part thanks to the fact that it is not an open format. How long until Adobe Digital Editions dies off and no-one can read certain formats of eBook from a publisher, should they go bust?
Even though people often tout that once something goes online, that isn’t true forever. The same happens far too often with PDF articles. Theoretically once an article is made available, it should always be available in some form, either by the author or the conference or the journal publishing it.
Authors make use of preprints, as mentioned journals have a monetary incentive to always offer access. The biggest offender I’ve come across for failing to archive are conferences though. The ones I follow (typically Computer Science ones) make their PDFs available online, but there’s no guarantee they’ll stay there once next year’s web site goes up.
The most annoying part of this is when they forget to update their DOI resolvers and references. Everything else relies on this references pointing to the right document, but once that starts giving you 404s, everything else breaks down. From the databases to the Sci-Hub, if the DOI doesn’t point correctly, it becomes extremely difficult to find the right file.
I would hope that, another 30 years from now, a better solution will be devised to archive all of the important scientific documents, studies and proposals that have been made. This will likely never happen completely due to the fact that it conflicts with copyright, but one can hope.