Time-stamp: <2020-09-02 14h33 EDT>

This post is part of my Qubes OS content.

Qubes split-SSH

This post is about my implementation of data separation in Qubes. We'll see how i've chosen to keep SSH keys separate from the VMs which need them, and fuss about the problem of extending this to passwords.

The goal

Qubes allows for some fairly fine-grained control over which piece of software has access what data. In particular, the RPC functionality allows for inserting ``reasonably secure'' user approval in what might otherwise be un-audited data access.

Split-keys is the idea that the keys themselves (SSH, GPG, passwords, ...) should be contained in separate, isolated VM to the software that would otherwise directly access these data, while retaining an access path that requires user approval.

In particular, i'd like a system wherein:

my keys are kept in a separate ``vault VM''
these keys exist on an encrypted volume in that separate VM, which locks at my request or when i close my laptop lid, etc.
there is an explicit whitelist of application VMs which are allowed to request keys from the vault VM
when such a request is made, the vault VM should produce a dialogue window detailing the key requested, and the application VM that generated the request
i may choose to grant or deny these requests, there is no default action
if i grant this request, in the case of things like GPG and SSH keys, the key material is good for ``only one operation''
this system should easily scale to multiple vault VMs, with application VMs potentially requesting keys from more than one vault VM

Implementation

So far i have only implemented SSH key handling, as this is my major use-case. GPG shouldn't be very different from this, but i have yet to look into the details of gpg-agent. Passwords are more problematic, and we'll think about these at the end.

The design of this system necessitates cooperating components across several VMs. These have been broken down by VM-type below.

Vaults

These VMs will be configured to do most of the heavy lifting for this implementation. The software i have chosen to use is

OpenSSH and GPG, for agents
tomb, for handling encrypted volumes (as well as tomb-gtk-tray)

tomb

tomb-gtk-tray

zenity, for presenting dialogues

Tomb is especially nice for our purposes because it has various hooks which we can leverage to auto-load keys and remove them from memory on unlocking and locking the vault, respectively. It also manages bind mounts for us, so we can set up reasonable defaults.

The scripts below function on the following layout of folders in a vault VM.

~
├── keys
│   ├── ssh
│   │   ├── appVM-1
│   │   │   ├── 
│   │   │   ·
│   │   │   ·
│   │   │   ·
│   ·   ·   └── 
│   ·   ·
│   ·   ·
│   │   └── appVM-K
│   │       ├── 
│   │       ·
│   │       ·
│   │       ·
│   │       └── 
│   │
│   └── sockets
│       └── ssh
│           ├── appVM-1
│           ·   ├── auth_socket
│           ·   └── pid
│           ·
│           └── appVM-K
│               ├── auth_socket
│               └── pid
├── .tomb
└── .tomb.key

Everything under sockets is managed by the scripts, and a user only typically interacts with the keys directory by creating keys in sub-directories corresponding to the application VMs and services. When creating SSH keys it's important to give the keys reasonable comments so that the prompt displays something human readable.

Note that the scripts assume that all the keys are passwordless -- there's no reason i can see to set one -- and in fact we overwrite ssh-askpass. You have been warned!

Vault template VM

The vault template contains most of the implementation, as there's very little that needs to be configured per vault VM instance.

To begin, lets set up the Qubes RPC handler for a service we'll call SshAgent.

/etc/qubes-rpc/qubes.SshAgent
#!/bin/sh

echo $QREXEC_REMOTE_DOMAIN > /tmp/qubes-ssh-agent-from
if [ -z "$(pgrep socat)" ]; then
  ssh_socket_file="/home/user/sockets/ssh/$QREXEC_REMOTE_DOMAIN/auth_socket"
  if [ -f "$ssh_socket_file" ]; then
    auth_socket="$(cat $ssh_socket_file)"
    socat - UNIX-CONNECT:"$auth_socket"
  else
    return 1
  fi
else
  return 1
fi

The idea here is that SshAgent will connect an incoming request to the appropriate socket if there is no other transaction currently happening, and if there is potentially an agent designated to the application VM ($QREXEC_REMOTE_DOMAIN) to receive the request. The rest is book-keeping.

We're going to want to change ssh-askpass too, so that it includes the application VM that made the request.

/usr/bin/ssh-askpass
#!/bin/sh
from="$(cat /tmp/qubes-ssh-agent-from)"
message="Request from: $from. $@"
zenity --question --title "Request from $from." --text="$message"

Finally we have to manage ssh-agents. The basic idea is that ~/keys/ssh contains a directory named per allowed application VM, whose contents are all the SSH keys that that application VM has access to. See the above file-system diagram.

Our split-ssh-add script below will scan this structure, and spawn a separate ssh-agent per directory. It will also organise the socket and PID information of these agents in ~/sockets/ssh so that it can later reconnect to them to add more keys, clear keys, or otherwise interact with the SshAgent RPC above.

Note that we have used the `-c' flag so that ssh-agent will confirm each use of a key, by calling our ssh-askpass script above.

/usr/bin/split-ssh-add
#!/bin/sh

key="$2"
dom="$1"

start_agent() {
  echo "Starting agent for "$dom"...\n"
  eval `ssh-agent -s`
  dir="/home/user/sockets/ssh/$dom/"
  mkdir "$dir"
  echo $SSH_AGENT_PID > "$dir/pid"
  echo $SSH_AUTH_SOCK > "$dir/auth_socket"
}

# Ensure that we have an agent for this domain
if [ -f "/home/user/sockets/ssh/$dom/pid" -a -f "/home/user/sockets/ssh/$dom/auth_socket" ]; then
  pid="$(cat /home/user/sockets/ssh/$dom/pid)"
  socket="$(cat /home/user/sockets/ssh/$dom/auth_socket)"
  if [ -d "/proc/$pid" ]; then
    echo "Connecting to running agent...\n"
    export SSH_AGENT_PID="$pid"
    export SSH_AUTH_SOCK="$socket"
  else
    echo Cleaning up dead agent...
    rm -rf "/home/user/sockets/ssh/$dom"
    start_agent
  fi;
else
  start_agent
fi

if [ -f "/home/user/keys/ssh/$dom/$key" ]; then
  ssh-add -c "/home/user/keys/ssh/$dom/$key"
  echo "\nCurrent keyring:"
  ssh-add -l
else
  echo Cannot find "$key" for domain "$dom" in ~/keys/ssh/.
  return 1
fi;

We also want to be able to clear the keys that a particular agent has.

/usr/bin/split-ssh-clear
#!/bin/sh

for dir in /home/user/sockets/ssh/*; do
  if [ -d "$dir" ]; then
    echo "Processing $(basename $dir)"
    if [ -f "$dir/pid" ]; then
      pid="$(cat $dir/pid)"
      if [ -f "$dir/auth_socket" ]; then
        socket="$(cat $dir/auth_socket)"
        if [ -d "/proc/$pid" ]; then
          echo "Connecting to running agent and clearing..."
          export SSH_AGENT_PID="$pid"
          export SSH_AUTH_SOCK="$socket"
          ssh-add -D
        else
          rm -rf "$dir"
          echo "Cleaned up dead agent"
        fi;
      else
        echo "Can't find socket for agent, attempting to kill... "
        kill -9 $pid
        [ "$?" = "0" ] && echo "Agent killed!" || echo "Agent was not running!"
          rm -rf "$dir"
          echo "Cleaned up"
      fi
    fi;
  echo
  fi;
done

Vault VM

That is about all we can configure globally, the rest of the configuration happens per vault VM instance. For each such instance we're going to set up a Tomb ``crypt'' to handle the encrypted storage of our keys. I have employed the naming scheme vault-<description> for my vault VMs, but you're free to choose your own.

tomb

Tomb is a handy piece of software for managing encrypted volumes. We're going to make use of its bind-mount system and exec-hooks to automatically link directories in ~/ to their encrypted counterparts, and to load all keys on unlock.

Follow the on-line (or man page) documentation to set up a small-ish crypt -- 128mb is plenty for our purposes. Be sure to use the --kdf flag for a more secure password derivation scheme (you'll also want a reasonable value of iteration time, i aimed for 3 seconds). The scripts below assume that the tomb file will be called $VMNAME.tomb, that the tomb key file will be called $VMNAME.tomb.key, and that both of these reside in $HOME.

Then, ensure that the directories

~/keys/ssh
~/sockets/ssh
~/QubesIncoming (we want incoming documents to be stored securely)

exist in the vault VM. With that achieved, ``open the tomb'' and write the following `exec-hooks' file in the mounted tomb.

/media/$(hostname)/exec-hooks
#!/bin/sh

vault_name="$(hostname)"

if [ "$1" = "open" ]; then
  # Start tray icon
  if [ -z "$(pidof tomb-gtk-tray)" ]; then
    nohup tomb-gtk-tray "$vault_name" >/dev/null 2>&1 &
  fi
  # Bind directories to /rw/home/user
  for dir in $(cat "/media/$vault_name/bind"); do
    if [ -d "/rw/home/user/$dir" -a -d "/media/$vault_name/$dir" ]; then
      sudo mount -o bind "/media/$vault_name/$dir" "/rw/home/user/$dir"
    fi
  done;
  # Unlock keys
  if [ -d "$HOME/keys/" ]; then
    cd "$HOME/keys/"
    for dir in ssh/*; do
      if [ -d "$dir" ]; then
        dom="$(echo $dir | cut -d \/ -f 2-)"
        for key in $dir/*; do
          if [ -n "$(file $key | grep 'OpenSSH private key')" ]; then
            split-ssh-add "$dom" "$(basename $key)"
          fi
        done
      fi
    done
  fi
else
 # Dismiss tray icon
  killall tomb-gtk-tray || true
  # Unload keys
  split-ssh-clear
  # Unmount binds
  for dir in $(cat "/media/$vault_name/bind"); do
    if [ -d "/rw/home/user/$dir" -a -d "/media/$vault_name/$dir" ]; then
      sudo umount "/rw/home/user/$dir" || true
    fi
  done
fi

Qubes seems to be a little fussy with the way that bind-mounts are handled, and vanilla tomb was not successfully closing the file. To remedy this i added some code on-close code to manually unmount the necessary things.

Set the `binds' file of the tomb to the below.

/media/$(hostname)/exec-hooks
sockets sockets
keys keys
QubesIncoming QubesIncoming

Finally ensure that the directories

/media/$(hostname)/keys/ssh
/media/$(hostname)/sockets/ssh
/media/$(hostname)/QubesIncoming/

all exist (note the difference to the above list!).

dom0

The last bit of back-end we need to set up is in dom0. Here we will enable the RPC we've created above, and in so doing choose a whitelist of application VMs and their corresponding vault VMs.

/etc/qubes-rpc/policy/qubes.SshAgent
  allow

Application VM

The final piece of the puzzle is having OpenSSH conduct its business over the Qubes RPC instead of locally. But the way we have things so far this isn't so difficult.

/rw/config/rc.local

SSH_VAULT_VM=""

if [ "$SSH_VAULT_VM" != "" ]; then
  export SSH_AUTH_SOCK=/home/user/.SSH_AGENT_$SSH_VAULT_VM
  rm -f "$SSH_AUTH_SOCK"
  killall socat
  sudo -u user /bin/sh -c "umask 177 && exec socat 'UNIX-LISTEN:$SSH_AUTH_SOCK,fork' 'EXEC:qrexec-client-vm $SSH_VAULT_VM qubes.SshAgent'" &
fi

~/.bash_profile (or similar)

SSH_VAULT_VM=""
if [ "$SSH_VAULT_VM" != "" ]; then
  export SSH_AUTH_SOCK=/home/user/.SSH_AGENT_$SSH_VAULT_VM
fi

... and that's it. Everything should Just Work (TM) from this point.

Problems with the current implementation

Qubes runs /rw/config/rc.local _really_ early, and there can be some sort of race condition which prevents the appVM from correctly instantiating the UNIX pipe.
Sometimes, for reasons i haven't looked into, it is not possible to SSH twice concurrently. This shouldn't be a problem -- as i understand it, SSH only requires key operations during authentication and the scripts only block concurrent requests.

Both of these are solvable by manually recreating the socket on the application VM, however. If anyone works out what's going in, i'd love to hear about it.

split-gpg, split-pass, and future work

Qubes already has an implementation of split-gpg

I haven't yet looked into this, but hopefully it will prove straightforward to combine this with the tomb setup above and have a uniform interface for vaults.

Passwords, on the other hand, are far more tricky. I can see no easy way to enforce ``access for a single operation'' and so having an RPC to request passwords over clear text is unsatisfactory (something like split-pass). This wouldn't be a bad compromise, but it seems that the better thing to do would involve (potentially hardware) U2F or some such. Despite its name, however, this isn't quite so universally supported. I'd certainly be open to suggestions!

Finally, there's the case of emails. Ideally the email client would run on an application VM, and request all email operations (list, fetch, store, send, ...) through an RPC channel with user approval. That is, i'd love to have my emails, passwords, and everything else live in a vault, but access these through an interface on another VM so that the vault is not internet connected.

I have no real idea how to do this, and it seems incredibly mail back-end sensitive. Perhaps an mbox file somehow accessed over RPC?

As a compromise i wouldn't mind storing my emails on an application VM, but the account information should definitely be in a vault. I believe offlineimap allows embedding arbitrary scripts in its authentication procedure, so something like this should be possible with split-pass (or a bespoke solution).