Up

Caching openSUSE repos with Squid

... and OpenWrt, in a way that requires no specific modification of clients.

Squid

How to

maximum_object_size 1024 MB
cache_dir aufs /mnt/data/squid  16 256
refresh_pattern \.rpm$ 10080 90% 43200
http_port 3129 intercept
url_rewrite_program /path/to/redirect.sh

For <CACHE_SIZE> see cache_dir documentation

cache_dir
#!/bin/bash

doo_regex='^http://download.opensuse.org/'

while read -r url extras; do
    if [[ "$url" =~ $doo_regex ]]; then
        location="$(curl -s --head "$url" \
            | grep -E "^Location: " \
            | sed -e 's/^Location: \(.*\)\r$/\1/' \
                  -e 's/^https:/http:/')"
        if [[ -n "$location" ]]; then
            echo "OK url=\"${location}\""
        else
            echo "ERR"
        fi
    else
        echo "ERR"
    fi
done

This will redirect all outgoing http connections to Squid intercepting socket.

/etc/config/firewall:

config redirect
        list proto 'tcp'
        option name 'squid'
        option target 'DNAT'
        option src 'lan'
        option src_dport '80'
        option dest 'lan'
        option dest_ip '192.168.1.1'
        option dest_port '3129'

Change dest_ip if you have installed Squid on different host.

Backstory

My internet connection over LTE and with monthly transfer limits forced me to optimize my downloads. I have two computers running openSUSE Tumbleweed, which as a rolling release distro has a lot of updates. Downloading them twice is an obvious waste of resources.

But one of those computers is my laptop, which is not always in my home network. So I wanted to have something, that requires no specific configuration on client side. Bonus points for running 100% on my router with OpenWrt.

First I found this guide:

It's quite complicated due to parsing the list of mirrors and generating config for url rewriter. But it was a good starting point.

openSUSE's primary download server download.opensuse.org is an instance of MirrorBrain. It's not hosting any data, but it redirects client to a nearest mirror. In practice it looks like this:

$ curl -s --head http://download.opensuse.org/[...]/some.rpm
HTTP/1.1 302 Found
Date: Wed, 17 Mar 2021 21:24:05 GMT
Server: Apache/2.4.43 (Linux/SUSE)
X-MirrorBrain-Mirror: ftp.gwdg.de
X-MirrorBrain-Realm: other_country
Link: ; rel=describedby; type="application/metalink4+xml"
Link: ; rel=duplicate; pri=1; geo=de
Link: ; rel=duplicate; pri=2; geo=de
Link: ; rel=duplicate; pri=3; geo=cz
Link: ; rel=duplicate; pri=4; geo=se
Link: ; rel=duplicate; pri=5; geo=ru
Location: https://ftp.gwdg.de/pub/opensuse/[...]/some.rpm
Content-Type: text/html; charset=iso-8859-1

The HTTP redirect points to url in Location line, but there are also additional Link urls. They are used by zypper to connect and download parts of a package from multiple mirrors at the same time.

Unfortunately Squid cannot cache such partial downloads. To remove those additional urls, the redirect.sh script gets the Location from upstream server and tells Squid to send its own redirect response.

And that's it! The only thing that could "break" caching is download server returning different Location for every request. But so far I haven't seen it do this.

Date: 2021-03-24