repo: gemini-site
action: commit
revision: 
path_from: 
revision_from: 3445578b6b74d05a74c977ab7f40be37f8060410:
path_to: 
revision_to: 
git.thebackupbox.net
gemini-site
git clone git://git.thebackupbox.net/gemini-site
commit 3445578b6b74d05a74c977ab7f40be37f8060410
Author: Solderpunk 
Date:   Sun Aug 11 16:47:33 2019 +0000

    Substantially flesh out specification.

diff --git a/spec-spec.txt b/spec-spec.txt
index 515d61fde40594ea7b1f4e8367ac31e70ecdd842..
index ..f598a1331b8512fe98cb9e9555871c7fefd7fff5 100644
--- a/spec-spec.txt
+++ b/spec-spec.txt
@@ -1,13 +1,16 @@
 -----------------------------
 Project Gemini
 "Speculative specification"
-v0.0.4, July 21st 2018
+v0.9.1, August 11th 2019
 -----------------------------

-This is a very rough sketch of how I imagine the actual spec for
-Project Gemini might look, based on my current set of ideas/decisions.
-Don't code to this "spec-spec" unless you are very keen to experiment
-and won't mind having to make frequent changes as things evolve.
+This is an increasingly less rough sketch of an actual spec for
+Project Gemini.  Although not finalised yet, further changes to the
+specification are likely to be relatively small.  You can write code
+to this pseudo-specification and be confident that it probably won't
+become totally non-functional due to massive changes next week, but
+you are still urged to keep an eye on ongoing development of the
+protocol and make changes as required.

 This is provided mostly so that people can quickly get up to speed on
 what I'm thinking without having to read lots and lots of old phlog
@@ -18,6 +21,8 @@ solderpunk@sdf.org.

 -----------------------------

+1. Overview
+
 Gemini is a client-server protocol featuring request-response
 transactions, broadly similar to gopher or HTTP.  Connections are
 closed at the end of a single transaction and cannot be reused.  When
@@ -25,133 +30,493 @@ Gemini is served over TCP/IP, servers should listen on port 1965 (the
 first manned Gemini mission, Gemini 3, flew in March '65).  This is an
 unprivileged port, so it's very easy to run a server as a "nobody"
 user, even if e.g. the server is written in Go and so can't drop
-priveleges in the traditional fashion.
+privileges in the traditional fashion.
+
+1.1 Gemini transactions

-There is currently one kind of Gemini transaction defined:
+There is one kind of Gemini transaction, roughly equivalent to a
+gopher request or a HTTP "GET" request.  Transactions happen as
+follows:

-C:   Opens TLS connection
+C:   Opens connection
 S:   Accepts connection
-C/S: Complete TLs handshake
-C:   Validates TLS connection
-C:   Sends request string (one CRLF terminated line)
+C/S: Complete TLS handshake (see 1.4)
+C:   Validates server certificate (see 1.4.2)
+C:   Sends request (one CRLF terminated line) (see 1.2)
 S:   Sends response header (one CRFL terminated line), closes connection
-     under abnormal conditions (see below)
-S:   Sends response body (text or binary data)
-S:   Closes connection when done
-C:   Handles response
-
-# TLS
-
-Use of TLS is mandatory.  Minimum allowed version 1.3?
-
-Clients can validate TLS connections however they like (including not
-at all) but the strongly recommended approach is to implement a
-lightweight TOFU/pinning system which treats self-signed certificates
-as first- class citizens.  This greatly reduces TLS overhead on the
-network (only one cert needs to be sent, not a whole chain) and
-lowers the barrier to entry for setting up a Gemini site (no need to
-pay a CA or setup a Let's Encrypt cron job, just make a cert and go).
-
-# Requests
-
-Request strings look like this:
-
-
+     under non-success conditions (see 1.3.1, 1.3.2)
+S:   Sends response body (text or binary data) (see 1.3.3)
+S:   Closes connection
+C:   Handles response (see 1.3.4)

- is:
+1.2 Gemini requests

- * UTF-8 encoded text
- * 1024 bytes long at max (too high? too low?)
- * conceptually equivalent to a gopher selector or a HTTP path
+Gemini requests are a single CRLF-terminated line with the following
+structure:

-# Responses
+

-## Response headers
+ is a UTF-8 encoded absolute URL, of maximum length 1024 bytes.
+If the scheme of the URL is not specified, a scheme of gemini:// is
+implied.

-Response headers look like this:
+Sending an absolute URL instead of only a path or selector is
+effectively equivalent to building in a HTTP "Host" header.  It
+permits virtual hosting of multiple Gemini domains on the same IP
+address.  It also allows servers to optionally act as proxies.
+Including schemes other than gemini:// in requests allows servers to
+optionally act as protocol-translating gateways to e.g. fetch gopher
+resources over Gemini.  Proxying is optional and the vast majority of
+servers are expected to only respond to requests for resources at
+their own domain(s).

-
+1.3 Responses

- is a single UTF-8 byte from the following range:
+Gemini response consist of a single CRLF-terminated header line,
+optionally followed by a response body.

- "2" - Success, everything is fine, response follows (cf HTTP 200)
- "4" - Not found, requested  does not exit (cf HTTP 404)
- "5" - Server error, something went wrong, wasn't your fault (cf HTTP
-       500)
+1.3.1 Response headers

-(more may follow.  There WILL be fewer than 10)
+Gemini response headers look like this:

-Status codes under consideration, not guaranteed, feedback welcome:
+ 

- "3" - Moved (cf HTTP 301)
- "9" - Slow down, cowboy (cf HTTP 429)
-
-If a server sends a status byte other than "2" (success) it MUST close
-the connection after the header and not send a response body.
-
-If a server sends a status byte not from the above list, a compliant
-client MUST treat this as equivalent to "5" (server error) and, if the
-server does not close the connection after the non-standard status
-line, the client MUST close the connection itself and not process the
-accompanying response body (Thou shalt not extend Gemini!).
+ is a two-digit numeric status code, as described below in
+1.3.2 and in Appendix 1.

  is a UTF-8 encoded string of maximum length 1024, whose meaning
-is  dependent:
-
-If  = 2, then  is a MIME type.  The full-flavoured
-RFC-defined type, with things like "text/plain; charset=utf-8", so it
-can tell you everything you need to know to correctly consume the
-response body.
+is  dependent.
+
+If  does not belong to the "SUCCESS" range of codes, then the
+server MUST close the connection after sending the header and MUST NOT
+send a response body.
+
+If a server sends a  which is not a two-digit number or a
+ which exceeds 1024, the client SHOULD close the connection and
+disregard the response header, informing the user of an error.
+
+1.3.2 Status codes
+
+Gemini uses two-digit numeric status codes.  Related status codes share
+the same first digit.  Importantly, the first digit of Gemini status
+codes do not group codes into vague categories like "client error" and
+"server error" as per HTTP.  Instead, the first digit alone provides
+enough information for a client to determine how to handle the
+response.  By design, it is possible to write a simple but feature
+complete client which only looks at the first digit.  The second digit
+provides more fine-grained information, for unambiguous server logging,
+to allow writing comfier interactive clients which provide a slightly
+more streamlined user interface, and to allow writing more robust and
+intelligent automated clients like content aggregators, search engine
+crawlers, etc.
+
+The first digit of a response code unambiguously places the response
+into one of six categories, which define the semantics of the 
+line.
+
+1	INPUT
+
+	The requested resource accepts a line of textual user input.
+	The  line is a prompt which should be displayed to the
+	user.  The same resource should then be requested again with
+	the user's input included as a query component.  Queries are
+	included in requests as per the usual generic URL definition
+	in RFC3986, i.e. separated from the path by a ?.  There is no
+	response body.
+
+2	SUCCESS
+
+	The request was handled successfully and a response body will
+	follow the response header.  The  line is a MIME media
+	type which applies to the response body.
+
+3	REDIRECT
+
+	The server is redirecting the client to a new location for the
+	requested resource.  There is no response body.  The header
+	text is a new URL for the requested resource.  The URL may be
+	absolute or relative.  The redirect should be considered
+	temporary, i.e. clients should continue to request the
+	resource at the original address and should not performance
+	convenience actions like automatically updating bookmarks.
+	There is no response body.
+
+4	TEMPORARY FAILURE
+
+	The request has failed.  There is no response body.  The
+	nature of the failure is temporary, i.e. an identical request
+	MAY succeed in the future.  The contents of  may provide
+	additional information on the failure, and should be displayed
+	to human users.
+
+5	PERMANENT FAILURE
+
+	The request has failed.  There is no response body.  The
+	nature of the failure is permanent, i.e. identical future
+	requests will reliably fail for the same reason.  The contents
+	of  may provide additional information on the failure,
+	and should be displayed to human users.  Automatic clients
+	such as aggregators or indexing crawlers should should not
+	repeat this request.
+
+6	CLIENT CERTIFICATE REQUIRED
+
+	The requested resource requires client-certificate
+	authentication to access.  If the request was made without a
+	certificate, it should be repeated with one.  If the request
+	was made with a certificate, the server did not accept it and
+	the request should be repeated with a different certificate.
+	The contents of  may provide additional information on 
+	certificate requirements or the reason a certificate was
+	rejected.
+
+Note that for basic interactive clients for human use, errors 4 and 5
+may be effectively handled identically, by simply displaying the
+contents of  under a heading of "ERROR".  The
+temporary/permanent error distinction is primarily relevant to
+well-behaving automated clients.  Basic clients may also choose not to
+support client-certificate authentication, in which case only four
+distinct status handling routines are required (for statuses beginning
+with 1, 2, 3 or a combined 4-or-5).
+
+The full two-digit system is detailed in Appendix 1.  Note that for
+each of the six valid first digits, a code with a second digit of zero
+corresponds is a generic status of that kind with no special
+semantics.  This means that basic servers without any advanced
+functionality need only be able to return codes of 10, 20, 30, 40 or
+50.
+
+The Gemini status code system has been carefully designed so that the
+increased power (and correspondingly increased complexity) of the
+second digits is entirely "opt-in" on the part of both servers and
+clients.
+
+1.3.3 Response bodies
+
+Response bodies are just raw content, text or binary, ala gopher.
+There is no support for compression, chunking or any other kind of
+content or transfer encoding.  The server closes the connection after
+the final byte, there is no "end of response" signal like gopher's
+lonely dot.
+
+Response bodies only accompany responses whose header indicates a
+SUCCESS status (i.e. a status code whose first digit is 2).  For such
+responses,  is a MIME media type as defined in RFC 2046.

 If a MIME type begins with "text/" and no charset is explicitly given,
 the charset should be assumed to be UTF-8.  Compliant clients MUST
 support UTF-8-encoded text/* responses.  Clients MAY optionally
 support other encodings.  Clients receiving a response in a charset
-they cannot decode SHOULD gracefully inform the user what happened.
+they cannot decode SHOULD gracefully inform the user what happened
+instead of displaying garbage.

-If  is an empty string, the mimetype MUST default to
+If  is an empty string, the MIME type MUST default to
 "text/gemini; charset=utf-8".

-If  != 2,  is some human-friendly extra information
-which clients MAY show their user (like what comes after a HTTP status
-code).
-
-## Response bodies
-
-Response bodies are just raw content, text or binary, ala gopher.  The
-server closes the connection after the final byte, there is no "end of
-response" signal like gopher's lonely dot.
-
-# Response handling
+1.3.4 Response body handling

 Response handling by clients should be informed by the provided MIME
 type information.  Gemini defines one MIME type of its own
-(text/gemini) whose handling is discussed below.  In all other cases,
-clients should do "something sensible" based on the MIME type.
-Minimalistic clients might adopt a strategy of printing all other
-text/* responses to the screen without formatting and saving all
-non-text responses to the disk and that's it.  Clients for unix
-systems may consult /etc/mailcap to find installed programs for
-handling non-text types.
-
-Response bodies of type text/gemini are a kind of ultralight hypertext
-format inspired by gophermaps.  The format is line-based.  Any line
-which begins with the two character prefix "=>" is a link, analogous
-to a gopher menu item with an item type other than "i" (full syntax
-below).  All other lines are just text and should be presented as-is,
-analogous to a gopher menu item with item type "i", but without the
-overhead of the dummy item type, selector, host and port.
+(text/gemini) whose handling is discussed below in 1.3.5.  In all
+other cases, clients should do "something sensible" based on the MIME
+type.  Minimalistic clients might adopt a strategy of printing all
+other text/* responses to the screen without formatting and saving
+all non-text responses to the disk.  Clients for unix systems may
+consult /etc/mailcap to find installed programs for handling non-text
+types.
+
+1.3.5 text/gemini responses
+
+1.3.5.1 Overview
+
+In the same sense that HTML is the "native" response format of HTTP
+and plain text is the native response format of gopher, Gemini defines
+its own native response format - though of course, thanks to the
+inclusion of a MIME type in the response header Gemini can be used to
+serve plain text, rich text, HTML, Markdown, LaTeX, etc.
+
+Response bodies of type "text/gemini" are a kind of lightweight
+hypertext format inspired by gophermaps.  The format is line-based.
+Any line which begins with the two character prefix "=>" is a link,
+analogous to a gopher menu item with an item type other than "i"
+(full syntax below).  All other lines are just text and should be
+presented as-is (although see 1.3.5.3), analogous to a gopher menu
+item with item type "i", but without the overhead of the dummy item
+type, selector, host and port.
+
+1.3.5.2 Link syntax

 Link lines have the following format:

-=> []
+=>[][]
+
+where:

-where  is any non-zero number of consecutive spaces or
-tabs and the square brackets indicate that the user-friendly name is
-optional.  The whitespace between the => and the URL is optional.  All
-the following examples are valid links:
+*  is any non-zero number of consecutive spaces or
+  tabs
+* Square brackets indicate that the enclosed content is
+  optional.
+*  is a URL, which may be absolute or relative.  If the URL does
+  not include a scheme, a scheme of gemini:// is implied.
+
+All the following examples are valid links:

 => gemini://example.org/
 => gemini://example.org/ An example link
 => gemini://example.org/foo	Another example link at the same host
 =>gemini://example.org/bar Yet another example link at the same host
+=> foo/bar/baz.txt	A relative link
+=> 	gopher://example.org:70/1 A gopher link
+
+Note that link URLs may have schemes other than gemini://.  This means
+that Gemini documents can simply and elegantly link to documents
+hosted via other protocols, unlike gophermaps which can only link to
+non-gopher content via a non-standard adaptation of the `h` item-type.
+
+1.3.5.3 Text display
+
+While simple Gemini clients are likely to print non-link lines of
+documents verbatim without any regard for the length of the lines
+relative to the length of the client's display, clients MAY optionally
+apply "reflowing" of text so that long lines are shown cleanly on
+narrow displays.  Reflowing should be performed as per the definition
+of the text/enriched media type in RFC 1896: isolated CRLF pairs are
+translated into a single SPACE character, while sequences of N
+consecutive CRLF pairs are translated into N-1 actual line breaks.
+However, care should be taken that link lines are not flowed into
+text: links should always be displayed on lines of their own.  After
+reflowing has been applied, lines of text too long to be displayed on
+a single line on the client's display may be wrapped at the
+appropriate point.
+
+As a result of this optional reflowing, authors of Gemini content
+MUST NOT assume that they have any control over the fine details of
+how their text is displayed.  Authors should avoid producing content
+which critically depends upon assumptions of a particular line width
+or the use of monospaced fonts.
+
+In order to facilitate a comfortable user experience with simple
+clients which do not implement reflowing, authors SHOULD limit the
+width of lines to 78 characters, excluding CRLF pairs.
+
+1.4 TLS
+
+1.4.1 Version requirements
+
+Use of TLS for Gemini transactions is mandatory.
+
+Servers MUST use TLS version 1.2 or higher and SHOULD use TLS version
+1.3 or higher.  Clients MAY refuse to connect to servers using TLS
+version 1.2 or lower.
+
+1.4.2 Server certificate validation
+
+Clients can validate TLS connections however they like (including not
+at all) but the strongly RECOMMENDED approach is to implement a
+lightweight "TOFU" certificate-pinning system which treats self-signed
+certificates as first- class citizens.  This greatly reduces TLS
+overhead on the network (only one cert needs to be sent, not a whole
+chain) and lowers the barrier to entry for setting up a Gemini site
+(no need to pay a CA or setup a Let's Encrypt cron job, just make a
+cert and go).
+
+TOFU stands for "Trust On First Use" and is public-key security model
+similar to that used by OpenSSH.  The first time a Gemini client
+connects to a server, it accepts whatever certificate it is presented.
+That certificate's fingerprint and expiry date are saved in a
+persistent database (like the .known_hosts file for SSH), associated
+with the server's hostname.  On all subsequent connections to that
+hostname, the received certificate's fingerprint is computed and
+compared to the one in the database.  If the certificate is not the
+one previously received, but the previous certificate's expiry date
+has not passed, the user is shown a warning, analogous to the one web
+browser users are shown when receiving a certificate without a
+signature chain leading to a trusted CA.
+
+This model is by no means perfect, but it is not awful and is vastly
+superior to just accepting self-signed certificates unconditionally.
+
+1.4.3 Transient client certificate sessions
+
+Self-signed client certificates can optionally be used by Gemini
+clients to permit servers to recognise subsequent requests from the
+same client as belonging to a single "session".  This facilitates
+maintaining state in server-side applications.  The functionality is
+very similar to HTTP cookies, but with important differences.
+
+Whereas HTTP cookies are originally created by a webserver and given
+to a client via a response header, client certificates are created by
+the client and given to the server as part of the TLS handshake:
+Client certificates are fundamentally a client-centric means of
+identification.  Further, whereas HTTP cookies can be "resurrected" by
+webservers after a client deletes them if the server recognises the
+client by means of browser finger-printing or some other tracking
+technology (leading to unkillable "super cookies"), if a client
+deletes a client certificate and also the accompanying private key
+(which the server has never seen), then the session ID can never be
+recreated.  Thus clients not only need to opt in to a certificate
+session, but once they have done so they retain a guaranteed ability
+to opt out of it at any point and the server cannot defeat this
+ability.
+
+Gemini requests typically will be made without a client certificate
+being sent to the server.  If a requested resource is part of a
+server-side application which requires persistent state, a Gemini
+server can return a status code of 61 (see Appendix 1 below) to
+request that the client repeat the request with a "transient
+certificate" to initiate a client certificate section.
+
+Interactive clients for human users MUST inform users that such a
+session has been requested and require the user to approve generation
+of such a certificate.  Transient certificates MUST NOT be generated
+automatically.
+
+Transient certificates are limited in scope to a particular domain.
+Transient certificates MUST NOT be reused across different domains.
+
+Transient certificates MUST be permanently deleted when the matching
+server issues a response with a status code of 21 (see Appendix 1
+below).
+
+Transient certificates MUST be permanently deleted when the client
+process terminates.
+
+Transient certificates SHOULD be permanently deleted after not having
+been used for more than 24 hours.
+
+Appendix 1. Full two digit status codes
+
+10	INPUT
+
+	As per definition of single-digit code 1 in 1.3.2.
+
+20	SUCCESS
+
+	As per definition of single-digit code 2 in 1.3.2.
+
+21	SUCCESS - END OF CLIENT CERTIFICATE SESSION
+
+	The request was handled successfully and a response body will
+	follow the response header.  The  line is a MIME media
+	type which applies to the response body.  In addition, the
+	server is signalling the end of a transient client certificate
+	session which was previously initiated with a status 61
+	response.  The client should immediately and permanently
+	delete the certificate and accompanying private key which was
+	used in this request.
+
+30	REDIRECT - TEMPORARY
+
+	As per definition of single-digit code 3 in 1.3.2.
+
+31	REDIRECT - PERMANENT
+
+	The requested resource should be consistently requested from
+	the new URL provided in future.  Tools like search engine
+	indexers or content aggregators should update their
+	configurations to avoid requesting the old URL, and end-user
+	clients may automatically update bookmarks, etc.  Note that
+	clients which only pay attention to the initial digit of
+	status codes will treat this as a temporary redirect.  They
+	will still end up at the right place, they just won't be able
+	to make use of the knowledge that this redirect is permanent,
+	so they'll pay a small performance penalty by having to follow
+	the redirect each time.
+
+40	TEMPORARY FAILURE
+
+	As per definition of single-digit code 4 in 1.3.2.
+
+41	SERVER UNAVAILABLE
+
+	The server is unavailable due to overload or maintenance.
+	(cf HTTP 503)
+
+42	CGI ERROR
+
+	A CGI process, or similar system for generating dynamic
+	content, died unexpectedly or timed out.
+
+43	PROXY ERROR
+
+	A proxy request failed because the server was unable to
+	successfully complete a transaction with the remote host.
+	(cf HTTP 502, 504)
+
+44	SLOW DOWN
+
+	Rate limiting is in effect.   is an integer number of
+	seconds which the client must wait before another request is
+	made to this server.
+	(cf HTTP 429)
+
+50	PERMANENT FAILURE
+	
+	As per definition of single-digit code 5 in 1.3.2.
+
+51	NOT FOUND
+
+	The requested resource could not be found but may be available
+	in the future.
+	(cf HTTP 404)
+	(struggling to remember this important status code?  Easy:
+	you can't find things hidden at Area 51!)
+
+52	GONE
+
+	The resource requested is no longer available and will not be
+	available again.  Search engines and similar tools should
+	remove this resource from their indices.  Content aggregators
+	should stop requesting the resource and convey to their human
+	users that the subscribed resource is gone.
+	(cf HTTP 410)
+
+53	PROXY REQUEST REFUSED
+
+	The request was for a resource at a domain not served by the
+	server and the server does not accept proxy requests.
+
+59	BAD REQUEST
+
+	The server was unable to parse the client's request,
+	presumably due to a malformed request.
+	(cf HTTP 400)
+
+60	CLIENT CERTIFICATE REQUIRED
+
+	As per definition of single-digit code 6 in 1.3.2.
+
+61	TRANSIENT CERTIFICATE REQUESTED
+
+	The server is requesting the initiation of a transient client
+	certificate session, as described in 1.4.3.  The client should
+	ask the user if they want to accept this and, if so, generate
+	a disposable key/cert pair and re-request the resource using it.
+	The key/cert pair should be destroyed when the client quits,
+	or some reasonable time after it was last used (24 hours?
+	Less?)
+
+62	AUTHORISED CERTIFICATE REQUIRED
+
+	This resource is protected and a client certificate which the
+	server accepts as valid must be used - a disposable key/cert
+	generated on the fly in response to this status is not
+	appropriate as the server will do something like compare the
+	certificate fingerprint against a white-list of allowed
+	certificates.  The client should ask the user if they want to
+	use a pre-existing certificate from a stored "key chain".
+
+63	CERTIFICATE NOT ACCEPTED
+
+	The supplied client certificate is not valid for accessing the
+	requested resource.
+
+64	FUTURE CERTIFICATE REJECTED
+
+	The supplied client certificate was not accepted because its
+	validity start date is in the future.
+
+65	EXPIRED CERTIFICATE REJECTED
+
+	The supplied client certificate was not accepted because its
+	expiry date has passed.

-----END OF PAGE-----