2024-08-11 Serving bare git on the web

This page describes the setup of src.alexschroeder.ch. That's where I host public source repositories.

src.alexschroeder.ch

In the old days, I used `cgit` to render my git repositories on the web. It's simple to set up since it's a CGI script. This is ideal for URLs that get very few hits. When nobody is requesting the URL, the CGI script isn't running and no resources are being used. When a URL is requested, however, the CGI script loads, the interpreter loads, the libraries load, the script executes… It's an expensive end-point! And you know how it is. The web is full of leeches and bad bots, crawlers and idiots. Having an expensive end-point means it needs protection.

Then I started thinking about @Sandra@idiomdrottning.org's post on hosting git repos.

hosting git repos

This is going to use the Dumb HTTP protocol.

Dumb HTTP

Some libraries only support the newer Smart HTTP protocol. This setup doesn't support the newer protocol because there's no git server running. There's just a web server.

Smart HTTP

Time to fiddle with the Apache config.

I changed my site to the following in order to just serve `/home/git` from a subdomain:


    ServerAdmin alex@alexschroeder.ch
    ServerName src.alexschroeder.ch
    Include conf-enabled/blocklist.conf
    SSLEngine on
    DocumentRoot /home/git
    
        Options Indexes
        AllowOverride All
        Require all granted
    

For this to work, you need a `post-update` hook that calls `git update-server-info`.

Knowing that I'm also going to serve the bare git repositories via the web, the hook also needs to generate an `index.html` file.

Furthermore, given that the repositories have a `description` file, I update `/home/git/.htaccess` accordingly.

I prepared the hook that I want to install in every repository and saved it as /home/git/.post-update.

/home/git/.post-update

This is what it looks like, written in Perl.

#!/usr/bin/perl
use Modern::Perl;
use File::Slurper qw(read_text write_text);
use File::Temp qw(tempfile);
use Encode qw(decode_utf8);
use Cwd;

qx(/usr/bin/git update-server-info);

# create index.html
my $branch = qx(git branch --show-current);
chomp $branch;
my $template = read_text("/home/git/.readme.html");
my $dir = getcwd;
my $title = $dir;
$title =~ s/\.git$//;
my $body = decode_utf8(qx(/usr/bin/git show $branch:README.md));
my ($fh, $filename) = tempfile(SUFFIX => '.md');
write_text($filename, $body);
my $pagename = substr($filename, 0, -3);
my $html = decode_utf8(qx(/home/oddmu/oddmu html $pagename));
unlink($filename);
write_text("index.html", sprintf($template, $title, $html, $title, $title));

# update description
if (-r "description") {
    my $description = read_text("description");
    chomp $description;
    my $htaccess = read_text("/home/git/.htaccess");
    write_text("/home/git/.htaccess~", $htaccess);
    my @lines = grep { !/ $title\.git$/ } split(/\n/, $htaccess);
    push(@lines, "AddDescription \"$description\" $title.git");
    my (@new, @descriptions);
    for my $line (@lines) {
	if ($line =~ /^AddDescription .* (\S+\.git)$/) {
	    push(@descriptions, [length($1), $line]);
	} else {
	    push(@new, $line);
	}
    }
    for my $description (sort { $b->[0] <=> $a->[0] } @descriptions) {
	push(@new, $description->[1]);
    }
    write_text("/home/git/.htaccess", join("\n", @new));
}

I turn Markdown into HTML using oddmu but feel free to use some other command-line tool like `cmark`.

oddmu

Next, I created a symlink in every git repository's `hooks` directory.

Using the Fish shell, assuming that `/home/git` is where all the repositories are, owned by the user `git`, and that you're using the root account:

for d in *.git; sudo -u git ln -sf /home/git/.post-update $d/hooks/post-update; end

The hook uses /home/git/.readme.html as a template for the `index.html` file.

/home/git/.readme.html

This is what it looks like:



  
    
    
    
    %s
    
  
  
    
    
%s

Clone

git clone https://src.alexschroeder.ch/%s.git
      

Contact

If you like it, send an email to Alex Schroeder <alex@gnu.org> ❤️

I'm currently hosting 95 repositories according to `ls -d /home/git/*.git | wc -l`. Some of these don't have a `README.md` file. Should I ever touch them again, I'll have to investigate.

The `/home/git` directory has an `.htaccess` file that starts out containing the following:

HeaderName .top.html
IndexOptions SuppressIcon SuppressSize FancyIndexing HTMLTable IgnoreCase
IndexOrderDefault Descending Date
IndexIgnore *~ .* Makefile
IndexHeadInsert ""
IndexOptions Charset=UTF-8

The rest of the file is all the AddDescription directives added by the `post-update` hook.

The /home/git/.top.html file contains a fragment to add to the top of the index:

/home/git/.top.html

Source code repositories

Hello!

I'm Alex Schroeder. These are my source code repositories. You can find out more about me on my blog. There, you'll also find a page listing ways to contact me.

As for the git repositories, you should be able to clone them as they are. For example:

  git clone https://src.alexschroeder.ch/oddmu.git

For more about this setup, see How to host git repos by @Sandra and my post, 2024-08-11 Serving bare git on the web.

The only thing that's strange is that this lists all the repositories by the last modification date of the `index.html` file contained within. That's not good.

I ended looping through all the directories a few times as I kept finding bugs in my `post-update` hook, so I ended up writing a `Makefile`. That's the reason `Makefile` is listed in the `IndexIgnore` directive for Apache, above.

This is the `Makefile`:

SHELL=/usr/bin/fish

# Regenerate the index.html files. Set their modification time because
# it looks like FancyIndex uses the index.html modification date.
update-indexes:
	for f in *.git; \
	  cd "$$f"; \
	  sudo -u git hooks/post-update; \
	  sudo -u git git log -1 --format='%at' \
	   | xargs -I{} date -d @{} '+%Y-%m-%d %H:%M:%S' \
	   | xargs -I{} touch index.html --date {}; \
	  cd ..; \
	end

So now this will regenerate all the `index.html` files:

make

In any case, now we're done.

#Butlerian Jihad ​#Git ​#Administration

If the repository is for the end users, however, things are harder. The `post-update` hook should extract all the local files linked to from the README. Something like the following, perhaps:

for file in (printf "%s\n" $body | /home/oddmu/oddmu links - | egrep -v '^(https?:|mailto:|/)')
    set dir (dirname $file)
    if test ! -d $dir
        mkdir -p $dir
    end
    echo $file; sudo -u git git show $branch:$file > $file
end

This uses oddmu to extract the links from a Markdown file, creates the necessary directories and checks out the files.

oddmu

But if the files are no longer linked from the README, they are not deleted. If a directory is linked from the README (I have done this! 🤦), the checkout won't work.

I think the better way forward is to move this information elsewhere. The README is not the documentation.

And with that, I think I did it! Serving git repositories from static files. A single directory per project containing the bare git data and a single `index.html` file. No more gazillion end points for crawlers to lose themselves.