🏠 Home Page


In Which LLMs Made the Web Worse Yet Again

2025-05-09 - [54] 2:20

I don't think I have complained in a post yet about LLMs (Large Language Models). Well, let's fix that!

(This post is an anti-LLM rant post, but more importantly to me, it's an anger at the modern web requiring Javascript everywhere post in disguise)

While working on projects over the last few days, I have noticed something quite annoying while looking up resources... Almost every website I end up going to (outside of the search engine website itself) requires I wait 10-30 seconds while an icon spins around, just to be told to click on a checkbox to prove I'm a human.

I remember when it would take quite a few seconds to start up the Internet via dial-up (I had dial-up until at earliest 2011, probably mid 2012, so considerably longer than I would have liked). After waiting to connect, I was good. Sure, things didn't run fast on a 56kbps connection, but my bottleneck for text information was the initial connection to the Internet itself. Now it's basically every dang mainstream website every few minutes until it tests if I'm a human again...

It makes sense why this is happening. LLMs are all the rage, and that means there is money to sucker investors out of. The thing is though, these LLMs need to be trained on human generated text. How do you get that text? By scraping everything on the Internet, of course! Wait, hasn't that already generally been done? Yeah, but a lot of the folks scraping everything right now want their own version of all of the text data for their own LLM.

This means a bunch of site scrapers are "essentially" DoSing (DoS = Denial of Service) sites. I put essentially in scare quotes because it's not the intention to DoS, it's just what ends up basically happening depending on the setup of the site. Honestly, I don't know how much scraping of my site is happening, but I wouldn't be surprised if over 99.9%+ of the traffic to my site is text scrapers specifically for LLMs now.

Anyways, Drew Devault posted about it and I'm sure plenty of other folks have as well. Drew's post is linked below:

Please stop externalizing your costs directly into my face [HTTPS]

Javascript, Javascript Everywhere

What really frustrates and saddens me is that a lot of the resources I used didn't rely on Javascript until they started using Cloudflare or other proof of work systems. As a result, plenty of these pages are currently unavailble in a live state via web browsers that don't use Javascript. Implementing Javascript in a browser is an absolute beast of a project, which ends up limiting which browsers can even view things more than before. I thought it was hard to surf the web without Javascript enabled before, but DANG IT'S GOTTEN WAY WORSE lately. A year or 2 ago, I would have said "Wow, it sure is hard to surf the web without Javascript", but if modern day me could go back in time, I would tell myself "Don't worry, it'll get so much more worse than you could have imagined, now I'm going back to my own time with the ignorance of how much worse it is yet to get."

This is not the fault of these websites (although some of the sites seem to have gone all in on AI support...). It's the freeloading text scrapers wanting to make some investment money (or in many cases, have already tricked investors into parting with their money [or in the case of Venture Capitalists, other people's money]) with the promise of making an LLM. Something tells me this "AI" bubble is going to inflate for a decent amount of time and get considerably bigger before it pops. Unfortunately, I suspect this will end up pushing web servers to support something like Google's "Web Integrity API" so they only end up accepting "trusted" browsers. Not using a Chrome flavored browser (or maybe Firefox)? Too bad! The thing that really bothers me is that I probably won't end up blaming Google when the trigger happens that causes the browser gatekeeping to happen.

What can we do? Well, I hope real life humans with independent personal websites consider keeping their websites available to folks who don't want to enable Javascript, but that might not be viable with how much these text scrapers are bogging down servers. I don't think it's going to get better anytime soon. In fact, I suspect it could get MUCH worse. Maybe if I notice my site being impacted, I might just drop HTTP support for a while. Who knows?

Yes, to the average person, I wrote all of this because I had to wait a couple of seconds to get some text resources. Oh well... On the bright side, at least my newest batches of sauerkraut are fermenting quite nicely, even while I rant.

Contact/Reply

If you would like to reply to this post, feel free to send me an email.

Email: vi@vigrey.com [Email]
PGP Public Key [515F AD67 F931 0A2B 9B93 CE19 814F ECB1 A398 63CE]