Captcha Paranoia

Internet Challenge Image

In the wake of Data Collection getting some discussion in the media, and greater scrutiny by governmental bodies, I felt that there was perhaps too much focus on Facebook, and that something that Google has been doing has just skirted past everyone’s minds.

Internet Challenge Image

Captcha Paranoia

CAPTCHAs are everywhere. In the early days of the web, developers wanted a way to guarantee that certain users were real people, and not just automated tools.

Enter reCAPTCHA, Google’s solution to this problem. Solve a small problem that is difficult for an AI to do, and prove that you’re human, all by simply including a little bit of code on your website.

But there are many problems with that idea.

Many of these problems could be applied to all forms of CAPTCHA served by a central server, but considering the prevalence of Google’s reCAPTCHA, I think they are particularly poignant when combined with the overreach which I believe Google exercises.

Assumption about User Ability

The first is is that you, as a website owner, are making an implicit declaration that the users of your site be able to solve whatever problem is being asked.

In the early days this meant working on the assumption that your users would be able to read distorted text. Never mind if they speak another language, or have difficulty reading.

In more recent iterations of reCAPTCHA, this task is usually a visual recognition task. Perhaps better as it doesn’t require a language solution, but you are still requiring site visitors to understand what the question is asking.

I come across elderly users of the web who are confounded when faced with challenges such as this, because after all: “if i’m just trying to access mail, or read the news, why am I suddenly being asked to click on random pictures?” – Clicking on random stuff that isn’t what you were originally looking for is not a lesson we ought to be teaching novice internet users.

This also doesn’t help a new internet user from a foreign country, who might never have seen a storefront from an image of a British Street, or an American car, and when prompted to select one, might not be able to answer. Are they a robot because of this?

You might suggest using an audio-based reCAPTCHA? Well considering the recent Yanny vs Laurel discussion, I think that’s clear proof that you can’t rely on people responding with an absolute when hearing distorted or noisy audio.

You could ask that people be given help when solving these tasks, but then you’re not encouraging digital independence and mobility.

User Autonomy in Choice of Service

In this day and age, there is a large onus placed on the end user to agree to all kinds of Terms of Service, and Privacy Agreements. But captchas somewhat undermine your ability to choose whether to accept these.

In the case of Facebook, you could simply “opt out” by not having an account. You could use a resource blocker like uBlock Origin, or even add the IP addresses to your router’s HOSTs list.

You can’t do that with a service offered by Google. In order to complete a reCAPTCHA you have to allow connections to both google.com and gstatic.com. If you want to “opt out” of google, you have to give up access to any part of the web that requires solving a reCAPTCHA challenge.

Considering the data scandals that have happened recently, the ability to simply “opt out” of parts of the web you don’t want any dealings with is an important choice users should be able to make, but by using a centrally hosted internet challenge, you take some of that autonomy away from users.

Data is Valuable

I think it is somewhat arrogant that in reCAPTCHA, you are essentially doing work, and not receiving any compensation in return other than being afforded the luxury of accessing the site you were originally tring to get to.

Consider a service like Duolingo. In the original iteration of this service, users would do work by translating foreign text. Duolingo can offer this text translation service, and to the users that take part, they are given help in learning a new language.

Consider a service like Amazon Mechanical Turk. Participants complete small tasks of work in return for hard cash. They do some work in solving a problem, and they get paid.

With a reCAPTCHA, you are helping to train an AI for Google, or you are transcribing text for Google. Where is the compensation you are due, and would be given by other services in return for work? It feels like a bit of a hold to ransom if your choice is “do work for google or don’t access this site”.

Gatekeeping, by Google™

When a web owner puts a reCAPTCHA challenge on their site, they ask google to be the gatekeeper.

This is indeed be true of any internet challenge service, but if Google is the most prolific, then in effect most websites are implicitly voting for google to be the Gatekeeper of the web.

The other big gatekeeper that springs to mind, one whose business model incorporates a somewhat more transparent gatekeeping model, Cloudflare, even relies on Google to provide a challenge. Everything leads back to a single central service.

Whenever a government makes attempts to regulate access to the web, there is wide ranging discussion. But consider that at google’s discretion, they could easily prevent access to any site using a reCAPTCHA, without any notice to the site owner, and this simply passes everyone by.

Just by increasing the number of iterations of a captcha a user has to solve, Google could put off a large number of users from visiting certain websites. I can recall many instances where I’ve given up trying to access a site after having failed an Internet Challenge too many times.

You might think that seems far fetched but it is a known fact that Google attempts to suffocate potential competitors, and if their competitor used reCAPTCHA, Google could very easily use this to their advantage.

AI is Growing, so Whats Next?

As AI grows ever better, the makers of CAPTCHA services are going to have to look for ever more nuanced ways of testing to see if users are human.

One of the ways in which a company like Google could guarantee a check if you are a robot is by using many identifiers and tying these to how you use the service.

If AI are able to solve the captchas as they currently exist, and the only check is if you interact with Google, this poses a big problem – if you hide from google and it cant see you as real… Do you just get blocked completely?

While Google are not the only offender in many of the points I’ve made above, I just don’t see their dominance and indeed the concept of CAPTCHAS as sustainable in any way.

Join the Conversation

  1. Nice article, dear.
    Thanks for your precise article.

    Accessability (#a11y) is important for me, as a webdesigner, a programmer and a person with some slight disabilities.

    Captchas are a bad attitude and close the WWW for many users.

    Some sites with captchas act as users would be criminals and should be rejected.
    Some sites use math captchas, not nice for people with dyscalculia. I even saw sites with extreme math calculations like higher algebra f.ex. Integral calculation.
    Some sites use crazy images with characters to recognize which results sometimes trying 5 or more times to recognize.
    Some sites using audio captchas which are mostly not understandable.

    I dislike such captchas. I call that excluding technology: Digital Racism.
    Captchas are like a aggressive doorkeeper in front of a bar or disco telling me: “YOU NOT! Only for White!”

    1. The writer is very correct that Captcha and ReCaptcha are used to blackball select individuals. In my case I was locked out of Yahoo and Facebook for my politically independent views and excoriating both right and left for their mendacity and corruption. I never used ad-hominems or said anything that was not a commonly known fact.
      Strange event: A few minutes ago I came to login to my Vivaldi mail and was repeatedly rejected so I submitted a support request, closed Vivaldi, returned in Firefox and logged in easily. Then I closed Firefox, reopened Vivaldi, came here and logged in easily. Why did that block happen? Your Vivaldi has been alerting me about bad sites and I appreciate that very much. Thank you.

  2. Hello, I agree on what you say but, why all browsers or the vast majority, always have Google as a search engine?.. Even Vivaldi uses chrome as a base..

    1. I think the major reason or that is simply that users expect it. If browsers want to have users, they better include google, or many (not all, but enough to make an impact) users would probably turn away.

    2. In my desktop iMac with Sierra10.12.6 Apple embedded Siri to control and report searches even when Siri is disabled so I set the private search engine DuckDuckGo as my home page in all browsers and do not use the Apple search bar. Siri cannot be removed from the OS unless you have top level skills.

  3. Nice article. I still see sites using reCAPTCHA v1 and it shows to the public that the site has not upgraded to reCAPTCHA v2 and that the reCAPTCHA does not work. On https://www.microsoft.com/en-us/wdsi/support/report-unsafe-site the captcha is not always readable (sometimes the characters overlap each others, other times the character is cut off or to faint to read). There is are the check the box to prove you are human and it then might ask you to pick the images with cars or store fronts or street signs etc. With Google being the biggest gatekeeper that could be a big single point of failure if the servers handling that go down or the route to them gets hijacked.

    :robot:

    1. Oh, yes, the Microsoft site image captcha sometimes is hard to recognize and i can not understand all in the audio captcha.
      That does not keep spammers away but people. 🙁

    2. Yuo are right. The visible Captcha is not always readable.

      But worse, hearing to audio of the captcha is a hell.
      I can not solve it, and i really understand most english words.
      Microsofts fails with a11y!

    1. Google reCAPTCHA has somewhere between 94 and 99,3 % global marketshare in CAPTCHA solutions. There are no good alternatives that are as easy to deploy and such a low price (free).

      Coinhive CAPTCHA is technically a good solution. It requires processing power and time instead of human intuition and pattern recognition. However, Coinhive’s name is mud and their domain name is blocked in DNS because website were hijacked and folks snuck in out of control variants of their crypto-currency miner. So web publishers can’t use their CAPTCHA product.
      https://www.ctrl.blog/entry/coinhive-captcha

      Cloudflare supports Privacy Pass. It’s kind of the same solution as Coinhive’s CAPTCHA service were the users’ computer is tasked to do some heavy computation to sign the same token over and over again. It requires processing power which slows down automation attempts considerably. However, it requires a browser extension (Firefox on desktop and Chromium on desktop only) and isn’t available on mobile. I’m not aware of any website other that Cloudflare’s automated website security check (so tens and thousands effectively) that uses this system.
      https://privacypass.github.io/

      1. Good points.

        I think that the coinhive solution isn’t great.

        A determined spammer / bot probably has enough processing power to handle coinhive’s miner.

        And I don’t think hijacking the user’s processor is a great solution either – considering how google’s reCAPTCHAs are omnipresent, I wouldn’t want my CPU to keep having to do lots of work every time I visited a page that had the script. If it required user consent first, it’s less of an issue.

        Also, for mobile devices with poor batteries it’s an absolute non-starter.

        As for privacy pass, I’ve tried that and I think it’s a good idea.

        It does have some flaws though: When I tried it, it required me to first solve a google reCaptcha, so the root of the problem still exists.

        It also requires to to re-solve CAPTCHAs when your tokens expire.

        If anything, privacy pass undermines the whole point for including CAPTCHAs in the first place: A spammer could employ an individual to solve CAPTCHAs repeatedly, and feed the resulting tokens into bots.

  4. I have learned to hate Captcha recently. Stumbling through several iterations before being allowed to proceed. Sometimes I just leave in disgust because it is not worth the effort. The effort to stop the Bad Guys has resulted in the legitimate user being punished for trying to access their accounts.

    This is a good article with some real thought behind it.

  5. I first noticed this post a while back, but have only just had the opportunity to respond. What a fantastic article! I’m glad I’m not alone, – I share a lot of your anger and frustration at Captchas. Unless I am absolutely desperate, the very appearance of one makes me leave the web site in question, as I have almost never been able to solve one. I am colour-blind, so quite often it is hard to tell what is a letter or number, and what is the surrounding “noise” designed to make it not machine-readable. Additionally, my hearing isn’t too great, so I click on the audio version, but can’t understand it at all. I think one of the comments above coins a fantastic new term: “digital racism”. That’s what it is, after all. Discrimination. If you don’t have the ability to solve this from what we provide, you aren’t worth bothering with.

    Since Google launched the new “click on the street signs” reCaptchas, I’ve found it a little easier – but often these are ambiguous. Does that corner of a street-sign taking up 2 pixels in that box make it count as a “box containing a street sign”? Is that sign in a foreign language really as street-sign at all, or just a shop sign or advert?

    Additionally, I don’t want me or my computer doing unpaid work for someone else. Why should I have to dance like a trained monkey, helping improve Google’s AI, just to view a web page? Alternatively, if we go down the processor-hijacking route, why should I have my browser perform some complex and CPU-intensive task so that I can view a page? It’s my power bill and network bandwidth. It’s wear-and-tear on my machine (the hotter a chip runs, the shorter its life). Plus, if I happen to be browsing on a slow/old device, such as a Netbook that’s been given a new lease of life with Linux, this is another form of discrimination, as the coin-mining (or whatever) calculation could take a prohibitively long amount of time to run. Or, as another user pointed out, I could be trying to use a mobile phone with little battery power left.

    I don’t know what the solution is, but Captchas and CPU-hijacking aren’t it. For me personally, I don’t see the problem with just having a username and password with suitable complexity anywhere where you have to post things, login, register etc. But then again, any site I’ve maintained has always had the back-end patched religiously, and I’ve always used pretty long random passwords for my own accounts, relying on a password-safe to remember them. Unfortunately, every time I sign into anything Google or Amazon nowadays, (getting less frequent as I disentangle myself from them with the goal of leaving completely) I get some sort of challenge to verify I’m me. Maybe its because too many users pick easy-to-guess passwords, reuse them excessively, and too many site operators don’t patch vulnerabilities quickly enough, so get hacked…

  6. Personally, I hate capthas with a passion. First of all, I think that if they need some sort of proof that I am who I say I am, and I’m on that site, that should be enough. Secondly, sometimes some capthas can be difficult to get correct. If you don’t get it right, you’re basically stuck where you are. I’m starting to believe that if a site requires a captha, I don’t need to waste my time with them.

    1. I work for a company that had to turn on CAPTCHA because of spam bots. It wasn’t a matter of not trusting you, it was a matter of stopping bots from hijacking the process.

      The newer form of CAPTCHA is better but a lot of sites still use v1 which is a hassle and annoying.

  7. And the last square with the car just keeps getting replaced with another car pic and you have to click that one last square 10 times.

  8. I also think that captchas are annoying, but the problem is that right now we dont have a lot that is less ugly to do against bots and stuff.

    1. While it doesn’t solve all of the problems (sites still have to rely on google), this will go a long way towards solving accessibility problems.

      I wonder if they’ve considered users that perhaps use screen readers or spatial navigation, which may have differing user patterns that are harder to model / outliers to the norm. Will they be notified with a legacy captcha if they fail to be recognised as “human”, or will the site simply silently fail to work, without any notice of why.

  9. Your guess is as good (possibly better) than mine.
    I have come up with a few goodies if you are still interested while looking into this reCaptcha/Google issue:
    – some software – Reverse-engineering the new “captchaless” ReCaptcha system… at https://github.com/vladikoff/InsideReCaptcha. This chap seems to have put a fair bit of work into checking this “heavily obfuscated reCaptcha system”. Just read the ReadMe.md at the above link.
    – and a paper on reCaptcha by: Suphannee Sivakorn, Jason Polakis, and Angelos D. Keromytis from Columbia University, New York NY, USA (https://www.blackhat.com/docs/asia-16/materials/asia-16-Sivakorn-Im-Not-a-Human-Breaking-the-Google-reCAPTCHA-wp.pdf).
    All far beyond the capabilities of my four remaining brain cells. 😉

Comment

Leave a Reply to greybeard Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.