Research papers are not easy to read

Research papers and scientific articles are not easy to read. There are loads of problems, and loads of ways we could improve them. Here’s some that I’ve thought of recently.


When did you last read for fun? I’m reading Little Brother. So much more enjoyable than articles.

<h2>Formatting</h2>

Almost all of the papers I see are in PDF form. These are compatible with nearly all devices, but they have their problems:

  • You can’t change the fonts to make them more readable
  • You can’t easily copy embedded graphics
  • Sometimes even text doesn’t properly copy
  • You can zoom, but not increase the text size, as it does not reflow

When preparing these documents, the strict design guidelines force you to change the contents of your research. The results you present to the wider academic community should not be influenced just because your publisher wants your margins to be a certain width and your footnotes a certain size. It is good to limit how much you publish, to keep you focused on the core important values, but being so anal about a document which isn’t going to be physically printed in a book is a ridiculous waste of time and effort. I don’t think I’ve ever picked up a printed volume of articles in my whole academic career. I, and likely all subsequent generations, will be digital first researchers.

Some publishers are increasingly making documents available in multiple formats, but it is still difficult for me to push a scientific article to my e-reader to read. Not without a mess of scripts to try and get the content into a format that is easily readable on such devices. Yes, even cheap e-readers can open PDFs, but if you want to make the text and figures large enough to read while maintaining easy pagination, you will be out of luck.

I have heard many refuseniks argue that PDFs and the associated layout languages and engines are great because they preserve formatting and make the appearance of a document universal. To them I ask: What is more important? That your images are placed just so, or that people can read and learn from your work?

Some would argue that the common layout of documents makes them easier to read and to find the information you really need. But were that the case, why do we have so many different formats? I have seen single column A4, margins of ridiculous sizes, two column publications, even some triple column sheets. Fonts and sizes of all kinds. If readability was so important, why not let people choose their own layouts?

HyperReferencing

So much of scientific reporting revolves around references – this is almost a cornerstone of the scientific method. You verify your sources, other’s sources and build up a solid foundation for logic. you can use it to build on previous work, adapt to original research, and you can include links to data sets and more.

But paradoxically, referencing is often the most difficult part of a research document to use. Some PDFs simply don’t include links at all, making you scroll around to find different sections. In some PDF files, clicking a reference will take you to the end of the document, losing your current place with no quick way to return. Not all PDFs let you just open links in a new tab in a web browser or dedicated reader, and those that do usually restrict to external links, not footnotes or references.

It is apparently impossible to directly link to other documents directly. Despite the widespread adoption of universal identifiers like ISBN, DOI and others, the use of accessible hyperlinks to resolvers for these identifiers is completely missing across the community of academic writing.

Now, resolvers are ephemeral and change over time, but the identifiers themselves do not change. They are universal. If we used better formats than compiled PDFs, rewriting references to point to resolvers, indeed directly to files, would be easy. Saying that, one can even rewrite PDFs, with a little difficulty, to achieve this correct behaviour.

Research life would be so much easier if we started using hyperlinks the way Tim Berners-Lee really intended when he invented the World Wide Web.

I would like to note that some publishers and academic search engines do include linkable references, but these have problems:

  • They are not universal
  • The hyperlinks exist outside the document
  • Not all such linkable references exist in machine readable open formats

Click here to Access

Publishers. Publishers everywhere. A wide array of different publishers, all with their own bespoke websites pages to let them offer special™ logins (spam), special™ features (oft broken PDF renderers) and special™ search (which does not search) for their offerings.

The last thing I want if for the entire web to look the same. That is a big enough issue already. But for goodness sake, we have standard interfaces for things like email, calendars and more. Competing, but standardised. Why not have the same for academic articles?

There are sites, sites that I shall not link to for they are forbidden, these “hubs for science” which let you type in an identifier and then immediately serve you the document. Imagine if I could go to a legitimate resolver, type in a identifier and get that article back instantly. Or just the abstract, or the appendices, or the bibliography. With a single click.

I could supply an authentication cookie provided by my institution to bypass logins and paywalls. I wouldn’t need to have to hunt for that “download” button. I wouldn’t have to deal with the myriad means of displaying a broken PDF file with overlapping text. I wouldn’t need to click past cookie warnings and sign-ups and recommended articles.

Search OR “Search” AND *Search?

There are few places where I can search ang get everything back in one place, neatly, without expending vast amounts of effort.

  • Google Scholar is often decent enough, but I find it has incredibly finnicky refinement tools, and it’s Google™
  • DBLP works well within my specific field but has a narrow scope and lacks complex querying
  • My institution pays for a custom library engine, I won’t link to it, which is woefully inadequate on all fronts
  • Various publishing websites have their own search engines, which usually work well, but only on their own catalogues

I wish when I am searching I could just type in my query and get back a list of abstracts and/or identifiers and/or any other meta-data about my results. There are databases and search engines that come close to this, but again it’s not standard and its never universal. I could try and track each individual database, but who knows if one database wants you to use brackets around everything, if one database supports “exact match”, some I’ve come across don’t even understand concepts like OR and AND. And forget trying to do any of this in a programmatic way.

One important part of academic searching is to track your progress as you refine your queries and filter your results. There are no tools I know of out there to help with this. Other than a notepad and sub-par tagging in a reference manager.

Want to do a systematic comparison? Better bust out the excel spreadsheet and hope to the gods that your browser is saving your history correctly for when it comes time to go back and repeat the query which inevitably missed a single paper your reviewer wants you to include.

In the 21st century, the age of data, these should not be hard problems to solve.


That’s the end of my rant. It’s been a long day. It continues to be a long PhD.

Header by Sylvia Yang

Comment

Vivaldi