Protecting cgit with Anubis and Hiawatha
As a lot of other people, my little Web server has been flooded by AI scrapers (or other kind of very badly configured crawlers). They come again and again and again to crawl all possible pages they can found, even the most pointless of them. This is particularly harmful with my cgit instance, where they will visit every single commit pages and every possible diff pages…
By chance, all heroes do not wear cape, and one published Anubis[1], a self hostable scraper blocker. They[2] really save my life, as I was clearly and sadly wondering whether I should just close my cgit instance.
I’m still running Hiawatha webserver[3], and thus I had to adapt the configuration of my cgit instance to be protected by Anubis. The complex part being Anubis don’t know how to serve CGI programs, and can only be used as a reverse proxy.
So I needed to configure Hiawatha to serve cgit CGI on a localhost binding (to avoid anyone from the internet to access it and bypass Anubis protection), and make Anubis serves this localhost binding on success.
To begin with, I put the following values into the Anubis configuration file:
BIND=127.0.0.1:8923 TARGET=http://127.0.0.1:8088 # The rest is specific to my usage …
Nothing specific here, I configure Anubis to be accessible locally on port 8923 and to pass successfull requests to localhost on port 8088.
Then I can use the same values in the configuration file for Hiawatha:
Binding {
BindingId = cgit-binding
# Serves cgit only over localhost on port 8088.
# Ensure it is no more directly accessible on the Web.
Port = 8088
Interface = 127.0.0.1
}
# Following your Hiawatha installation, it could be already setup
CGIextension = cgi
UrlToolkit {
ToolkitID = cgit
RequestURI exists Return
# "Hide" the cgit.cgi part of the URL, make them "cleaner"
Match ^/(.*) Rewrite /cgit.cgi/$1
}
VirtualHost {
Hostname = git.umaneti.net
RequiredBinding = cgit-binding
WebsiteRoot = /usr/share/cgit
StartFile = cgit.cgi
ExecuteCGI = yes
TimeForCGI = 15
EnablePathInfo = yes
UseToolkit = cgit
}
VirtualHost {
Hostname = git.umaneti.net
# Still useful for assets
WebsiteRoot = /usr/share/cgit
# Anubis is running on localhost on port 8923
ReverseProxy = .* http://127.0.0.1:8923 15
# The rest is specific to my usage
…
}
Nothing had to be done on cgit configuration files.
The most tricky part, was to understand I still need to use "git.umaneti.net" as `Hostname' value for the local binding of cgit. It probably sounds obvious for lot of people, but it bugs me for hours. Anyway, if there is still Hiawatha /aficionados/, you now know what to do 🙂
--
📅 vendredi 16 mai 2025 à 22:04
📝 Étienne Pflieger with GNU/Emacs 30.1 (Org mode 9.7.18)