SMTP Filters

Sendmail added the milter interface way back when; this complication allows arbitrary bits of code to interact with various phases of a mail transaction (HELO, MAIL FROM, RCPT TO, etc) and make decisions about whether to accept or reject the message, or even to modify messages on the fly, such as to replace overly large attachments with a link to where that the attachment data has been moved to. If all goes well. Other mail servers have followed suit, e.g. smtpd-filters(7) for smtpd(8) on OpenBSD. Prior to milters one had to accept the message, which is problematic if one then must reject and thus bounce the (possibly forged) message by sending a new message elsewhere. You could also discard bad messages, but that can be problematic if legitimate email vanishes and was last seen being delivered to your server. There are cases where messages must be discarded, but that's rare.† If instead the rejection happens before the message is accepted, the onus is on the sending server to handle that, which may generate a bounce, but it's not a bounce from your server so less of a problem for you. Rejecting a message during the SMTP transaction may also save resources, especially if done between the connect and RCPT TO phases, that is, before the message body is sent, though on the other hand anti-spam and other such mail filter software may use quite a lot of memory and CPU.

Complaints about spammers bogging down systems are not new, and the practice continues to this day in the form of aggressive scraping of websites by AI bots or other such malicious actors.

One of the services of the gateway is to filter out “spam,” unsolicited mail that advertises services of dubious merit. After successful early trials of the spam filter, the service was installed as a permanent feature for all users of the mail gateway, and a problem immediately became apparent. The gateway machine, antiquated and already very busy, was overwhelmed because the filtering program was taking so much time—much more time than was required for all the other processing of each message—that the mail queues filled and message delivery was delayed by hours while the system struggled to catch up.

— "The Practice of Programming". Kernighan & Pike. 1999.

Since email suffers from spammers and their ilk it may help to reject spam and other such noise, and as soon as possible. A first step would be to log what is going on. One can either use the MTA (Mail Transport Agent) for this, e.g. Postfix, some existing anti-spam solution (rspamd, etc), or a custom filter. A custom filter is likely the most flexible but is the least supportable by people who are not you, and unlike most of the other options does require writing code. Larger sites will have more rules and regulations around email handling, and more mail will need to be shoveled back to (usually) an IMAP server and then the user's mail client rather than (ideally) rejecting it as it arrives. So these things go. Smaller sites (tildes, for example) may want to document how aggressive they are about spam, on the range of "we allow most anything through" to "we run a tight ship".

All messages should not be allowed through; that would be an open relay, and will cause all sorts of problems for you, your ISP, and various other folks, including future users of your IP address. On the other hand one could firewall port 25 and give up on email, in which case this document is not relevant. Between these extremes there is a fairly wide range of what is permitted and how aggressively email is policed for problems. To the notion that one is supposed to be tolerant in what one accepts, the Robustness Principle does include "robustness to attacks", though distinguishing a spammer from a buggy client or weird implementation can be difficult. However, the end result is the same if a spammer or buggy client causes server performance problems or generates too many user complaints due to stuffed mailboxes: something will be done to mitigate or solve the problem.

Logging Filter

Here we use a filter for smtpd to log what is going on. Similar could be written for milter using MIMEDefang if you use Postfix or Sendmail. This filter assumes a not very busy server. If your mail server instead sees a lot more traffic then the logging will need to be more efficient, or only logged in snapshots to sample what remote hosts are doing, or otherwise pruned down as quickly as possible to keep log growth from overrunning the server. Both small and large sites will also need to worry about rare but important messages that need to be handled correctly even in the face of stringent anti-spam rules: payment transactions that happen only monthly, or maybe someone with a wacky laptop configuration was away on sabbatical, that sort of thing.

    #!/usr/bin/perl
    # mylog.pl - a logging mail filter for OpenSMTPD 7.7.0. TODO install
    # this script to the libexec dir:
    #   mkdir -p /usr/local/libexec/smtpd
    #   chmod +x mylog.pl
    #   mv mylog.pl /usr/local/libexec/smtpd
    use 5.38.0;
    use Data::Dumper;
    use Sys::Syslog;

    # TODO or install from CPAN
    #   doas pkg_add p5-OpenSMTPd-Filter
    use OpenSMTPd::Filter;

    # TODO
    #   mkdir /var/spool/mylog
    #   chown _smtpd /var/spool/mylog
    my $DIR = '/var/spool/mylog';
    umask 0027;

    openlog( "mylog", "ndelay,pid", "mail" );
    syslog( "info", "starting..." );

    # TODO comment these calls out if not on OpenBSD
    use OpenBSD::Pledge;
    use OpenBSD::Unveil;
    unveil( $DIR, 'cw' )    or die "unveil '$DIR': $!\n";
    unveil()                or die "unveil: $!\n";
    pledge(qw{cpath wpath}) or die "pledge: $!\n";

    sub handle_greet {
        my ( $phase, $ctx ) = @_;
        my $state = $ctx->{state};
        my $id    = $state->{session};
        syslog( "info", "$state->{session} helo $state->{command}" );

        # TODO exclude things not to be logged here, possibly rejecting
        # bad patterns.

        open my $fh, '>', "$DIR/h.$id" or return 'proceed';
        say $fh Dumper $state;

        return 'proceed';
    }

    sub handle_rcpt {
        my ( $phase, $ctx ) = @_;
        my $state = $ctx->{state};
        my $from  = $state->{message}->{'mail-from'};
        my $to;
        for my $e ( reverse $ctx->{events}->@* ) {
            if ( $e->{request} eq 'filter' and exists $e->{address} ) {
                $to = $e->{address};
                last;
            }
        }
        # Did not parse RCPT TO: address out of the events?? Probably
        # should log this.
        return 'proceed' unless defined $to;

        # TODO accept or reject based on various patterns here.

        my $id = $state->{session};
        open my $fh, '>', "$DIR/r.$id" or return 'proceed';
        say $fh "mail-from $from";
        say $fh "rcpt-to $to";
        say $fh Dumper $state;
        say $fh Dumper $ctx->{events};

        return 'proceed';
    }

    OpenSMTPd::Filter->new(
        debug => 0,
        on    => {
            filter => {
                'smtp-in' => {
                    'helo'    => \&handle_greet,
                    'ehlo'    => \&handle_greet,
                    'rcpt-to' => \&handle_rcpt,
                }
            }
        }
    )->ready;

Once installed and configured, enable the filter in smtpd.conf. The following assumes that rspamd is also being used (mostly for DKIM, in my case). Note that the configuration for smtpd has varied over time, so be sure to check the fine manual for the version of OpenBSD or OpenSMTPD at hand; the following is for OpenBSD 7.7 and OpenSMTPD 7.7.0. It may also help to only run anti-spam services on external interfaces, to lower the risk of blocking legitimate internal email, though on the other hand an internal system could be compromised and start sending spam.

    filter mylog proc-exec "mylog.pl"
    filter rspamd proc-exec "filter-rspamd"
    filter congaline chain {mylog, rspamd}

    listen on all tls pki example.org filter "congaline"

Bad Patterns

HELO

    smtp failed-command command="EHLO" result="501 5.5.1 Invalid \
      command: EHLO requires domain name"

Various client faults here include:

Not using HELO, or EHLO of ESMTP.
Omitting the domain name.
Using an invalid domain name such as "localhost.localdomain", or an unqualified host such as "WIN-7N1FIECL6IC". These strings are probably what the system hostname is set to.
Using fake fully qualified hostnames such as "mail.example.org".
Using the IP address of the mail server being connected to.

Some of these faults may not actually be faults, though some amount of spam can be excluded by using strict HELO checks, at the cost of maybe blocking a legitimate but misconfigured mail server. Many mail servers do block poorly setup hosts so it's pretty much a requirement for legitimate mail server admins to properly configure these things, and thus probably okay to have strict HELO checks.

This does mean a new mail server will need to have a lot of configuration done before it has a chance of sending mail, and even then email from it may be rejected for various reasons: the domain is "too new", or maybe a spammer had been squatting the now blacklisted IP address, or maybe you have to fill out some form on a website for your messages to be accepted, etc. These barriers to entry are largely a result of spammers ruining the commons, though large corporations with government ties may also desire that mail be sorted into fewer silos, the email being easier to collect and analyze that way.

Anti-spam systems such as rspamd will score these faults and pass these findings to the user. Another option is to reject the mail during the SMTP tranasction, though again this risks blocking legitimate mail sent in an invalid way.

Metrics

Various companies scan the Internet and presumably sell that information elsewhere. These hosts can either be ignored, or blocked to cut down on log noise. The blocks could also be done by IP address range or routing numbers, if the scanning company publishes those. Some of these may include:

criminalip.com
cyberresilience.io
internet-measurement.com
securing-email.com
shadowserver.org

Malicious Clouds

Also relevant here are random cloud services. The problem is there are bad actors mixed in with legitimate users of the services. If your users are trying to send messages from a cloud compute node, ideally have them do it over a VPN or SSH tunnel so the traffic is not coming in "from the wild".

compute.amazonaws.com
googleusercontent.com

I'm not sure what exactly googleusercontent.com hosts as the webpage is broken with JavaScript. Google does however appear to host bad actors, so it may be prudent to ban googleusercontent.com addresses until google gets their act together.

    $ whois 162.216.149.104 | grep -i netname
    NetName:        GOOGLE-CLOUD
    $ rubbled 162.216.149.104
    162.216.149.104 zen.spamhaus.org. XBL (exploit)

AWS is also bad.

    smtp connected address=3.131.215.38 host=ec2-3-131-215-38.us-east-2.compute.amazonaws.com
    smtp failed-command command="SSH-2.0-Go" result="500 5.5.1 Invalid command: Command unrecognized"

Open Relay Attempts

These are messages with "RCPT TO" addresses foreign to your server; the spammer will monitor the results and if an open relay is found will open the floodgates. Note that the spammer can forge the RCPT TO address, so while it is possible the recipient address has been compromised by the spammer, they may have used a random recipient and only check whether your server accepts their attempt. That is, "Bar Hockman" may be a random innocent party here, or their hotmail account is compromised by the spammer.

    smtp failed-command command="rcpt TO:" \
      result="550 Invalid recipient: "

Pipelining

    smtp bad-input result="500 5.5.1 Invalid command: Pipelining \
      not supported"

Pipelining is the practice of sending multiple commands at once rather than interacting with the server command by command. The server does need to advertise pipelining, which smtpd does not. These logs can also occur if attackers send HTTP or other such unexpected requests to a SMTP server. Buggy mail server software may also trigger this if they send two commands at once, though to confirm that the log level may need to be turned up very high or the SMTP session recorded to see exactly what is going on.

    #!/usr/bin/perl
    # bad-relay - bad SMTP client example
    use 5.38.0;
    use IO::Socket::IP;
    use Time::Piece;

    my ( $host, $port ) = @ARGV;
    die "Usage: rly host [port]\n" unless defined $host;
    $port //= 25;

    my $helo = 'client.example.org';
    my $from = 'fixme@example.org';
    my $to   = 'fixme@example.org';

    my $s = IO::Socket::IP->new(
        PeerHost => $host,
        PeerPort => $port,
        Proto    => 'tcp',
    ) or die "bad-relay: connect error '$host:$port': $!\n";

    sub tx ( $s, $command ) {
        print $s->getline;
        print " -> ", $command;
        $s->print($command);
    }

    tx $s, "HELO $helo\r\n";
    # Bad! We send two commands at once. Also this client is bad in that it
    # is not checking the responses for errors and continues on blindly
    # assuming that all is well.
    tx $s, "MAIL FROM:<$from>\r\nRCPT TO:<$to>\r\n";
    # The RCPT TO must instead be sent as a distinct command.
    #tx $s, "RCPT TO:<$to>\r\n";
    tx $s, "DATA\r\n";
    $s->print("From: <$from>\r\n");
    $s->print("Subject: test\r\n");
    my $now = localtime;
    $s->print( $now->strftime("Date: %a, %d %b %Y %H:%M:%S %z\r\n") );
    $s->print( "Message-ID: <", $now->epoch, ">\r\n" );
    $s->print("\r\n");
    $s->print("asdf\r\n");
    $s->print(".\r\n");
    tx $s, "QUIT\r\n";
    print $s->getline;
    $s->close;

—

† Discarding the messages of a stalker, for example, as a bounce probably will tip off the sender while a discard probably won't, as complicated by bad mail clients that load remote resources by default or helpfully send other such automatic pings to the sender.