2024-08-11 Serving bare git on the web
This page describes the setup of src.alexschroeder.ch. That's where I host public source repositories.
In the old days, I used `cgit` to render my git repositories on the web. It's simple to set up since it's a CGI script. This is ideal for URLs that get very few hits. When nobody is requesting the URL, the CGI script isn't running and no resources are being used. When a URL is requested, however, the CGI script loads, the interpreter loads, the libraries load, the script executes… It's an expensive end-point! And you know how it is. The web is full of leeches and bad bots, crawlers and idiots. Having an expensive end-point means it needs protection.
Then I started thinking about @Sandra@idiomdrottning.org's post on hosting git repos.
This is going to use the Dumb HTTP protocol.
Some libraries only support the newer Smart HTTP protocol. This setup doesn't support the newer protocol because there's no git server running. There's just a web server.
Time to fiddle with the Apache config.
I changed my site to the following in order to just serve `/home/git` from a subdomain:
ServerAdmin alex@alexschroeder.ch ServerName src.alexschroeder.ch Include conf-enabled/blocklist.conf SSLEngine on DocumentRoot /home/git Options Indexes AllowOverride All Require all granted
For this to work, you need a `post-update` hook that calls `git update-server-info`.
Knowing that I'm also going to serve the bare git repositories via the web, the hook also needs to generate an `index.html` file.
Furthermore, given that the repositories have a `description` file, I update `/home/git/.htaccess` accordingly.
I prepared the hook that I want to install in every repository and saved it as /home/git/.post-update.
This is what it looks like, written in Perl.
#!/usr/bin/perl
use Modern::Perl;
use File::Slurper qw(read_text write_text);
use File::Temp qw(tempfile);
use Encode qw(decode_utf8);
use Cwd;
qx(/usr/bin/git update-server-info);
# create index.html
my $branch = qx(git branch --show-current);
chomp $branch;
my $template = read_text("/home/git/.readme.html");
my $dir = getcwd;
my $title = $dir;
$title =~ s/\.git$//;
my $body = decode_utf8(qx(/usr/bin/git show $branch:README.md));
my ($fh, $filename) = tempfile(SUFFIX => '.md');
write_text($filename, $body);
my $pagename = substr($filename, 0, -3);
my $html = decode_utf8(qx(/home/oddmu/oddmu html $pagename));
unlink($filename);
write_text("index.html", sprintf($template, $title, $html, $title, $title));
# update description
if (-r "description") {
my $description = read_text("description");
chomp $description;
my $htaccess = read_text("/home/git/.htaccess");
write_text("/home/git/.htaccess~", $htaccess);
my @lines = grep { !/ $title\.git$/ } split(/\n/, $htaccess);
push(@lines, "AddDescription \"$description\" $title.git");
my (@new, @descriptions);
for my $line (@lines) {
if ($line =~ /^AddDescription .* (\S+\.git)$/) {
push(@descriptions, [length($1), $line]);
} else {
push(@new, $line);
}
}
for my $description (sort { $b->[0] <=> $a->[0] } @descriptions) {
push(@new, $description->[1]);
}
write_text("/home/git/.htaccess", join("\n", @new));
}
I turn Markdown into HTML using oddmu but feel free to use some other command-line tool like `cmark`.
Next, I created a symlink in every git repository's `hooks` directory.
Using the Fish shell, assuming that `/home/git` is where all the repositories are, owned by the user `git`, and that you're using the root account:
for d in *.git; sudo -u git ln -sf /home/git/.post-update $d/hooks/post-update; end
The hook uses /home/git/.readme.html as a template for the `index.html` file.
This is what it looks like:
%s
%s
I'm currently hosting 95 repositories according to `ls -d /home/git/*.git | wc -l`. Some of these don't have a `README.md` file. Should I ever touch them again, I'll have to investigate.
The `/home/git` directory has an `.htaccess` file that starts out containing the following:
HeaderName .top.html IndexOptions SuppressIcon SuppressSize FancyIndexing HTMLTable IgnoreCase IndexOrderDefault Descending Date IndexIgnore *~ .* Makefile IndexHeadInsert "" IndexOptions Charset=UTF-8
The rest of the file is all the AddDescription directives added by the `post-update` hook.
The /home/git/.top.html file contains a fragment to add to the top of the index:
Source code repositories
Hello!
I'm Alex Schroeder. These are my source code repositories. You can find out more about me on my blog. There, you'll also find a page listing ways to contact me.
As for the git repositories, you should be able to clone them as they are. For example:
git clone https://src.alexschroeder.ch/oddmu.gitFor more about this setup, see How to host git repos by @Sandra and my post, 2024-08-11 Serving bare git on the web.
The only thing that's strange is that this lists all the repositories by the last modification date of the `index.html` file contained within. That's not good.
I ended looping through all the directories a few times as I kept finding bugs in my `post-update` hook, so I ended up writing a `Makefile`. That's the reason `Makefile` is listed in the `IndexIgnore` directive for Apache, above.
This is the `Makefile`:
SHELL=/usr/bin/fish
# Regenerate the index.html files. Set their modification time because
# it looks like FancyIndex uses the index.html modification date.
update-indexes:
for f in *.git; \
cd "$$f"; \
sudo -u git hooks/post-update; \
sudo -u git git log -1 --format='%at' \
| xargs -I{} date -d @{} '+%Y-%m-%d %H:%M:%S' \
| xargs -I{} touch index.html --date {}; \
cd ..; \
end
So now this will regenerate all the `index.html` files:
make
In any case, now we're done.
#Butlerian Jihad #Git #Administration
- *2024-08-12**. I wondered about links from the README to local files. Right now, linking to images and files hosted in the same repository doesn't work since they don't exist in the raw repository. The question then becomes, as far as I am concerned, whether this README is supposed to speak to developers or end-users? If it is for developers, then pictures, screenshots, PDF files and all of that don't need to be linked from the repository. If you are interested in these things, do a `git clone --depth 1` and investigate locally.
If the repository is for the end users, however, things are harder. The `post-update` hook should extract all the local files linked to from the README. Something like the following, perhaps:
for file in (printf "%s\n" $body | /home/oddmu/oddmu links - | egrep -v '^(https?:|mailto:|/)')
set dir (dirname $file)
if test ! -d $dir
mkdir -p $dir
end
echo $file; sudo -u git git show $branch:$file > $file
end
This uses oddmu to extract the links from a Markdown file, creates the necessary directories and checks out the files.
But if the files are no longer linked from the README, they are not deleted. If a directory is linked from the README (I have done this! 🤦), the checkout won't work.
I think the better way forward is to move this information elsewhere. The README is not the documentation.
And with that, I think I did it! Serving git repositories from static files. A single directory per project containing the bare git data and a single `index.html` file. No more gazillion end points for crawlers to lose themselves.