repo: gemini-site action: commit revision: path_from: revision_from: 5a9bc6a735c0648ac90a5e3e39ed8c6670cb3120: path_to: revision_to:
commit 5a9bc6a735c0648ac90a5e3e39ed8c6670cb3120 Author: SolderpunkDate: Sun Jun 7 15:33:33 2020 +0000 v0.13.0 spec updates. * Added lang parameter to text/gemini * Clarified need to percent-encode URLs * Required clients not to automatically follow links * Redefined list item lines to start with "* " * Defined quote line * Defined status code 11 diff --git a/docs/specification.gmi b/docs/specification.gmi
--- a/docs/specification.gmi +++ b/docs/specification.gmi @@ -2,7 +2,7 @@ ## Speculative specification -v0.12.4, June 7th 2020 +v0.13.0, June 7th 2020 This is an increasingly less rough sketch of an actual spec for Project Gemini. Although not finalised yet, further changes to the specification are likely to be relatively small. You can write code to this pseudo-specification and be confident that it probably won't become totally non-functional due to massive changes next week, but you are still urged to keep an eye on ongoing development of the protocol and make changes as required. @@ -69,7 +69,7 @@ The first digit of a response code unambiguously places the response into one of Status codes beginning with 1 are INPUT status codes, meaning: -The requested resource accepts a line of textual user input. The line is a prompt which should be displayed to the user. The same resource should then be requested again with the user's input included as a query component. Queries are included in requests as per the usual generic URL definition in RFC3986, i.e. separated from the path by a ?. There is no response body. +The requested resource accepts a line of textual user input. The line is a prompt which should be displayed to the user. The same resource should then be requested again with the user's input included as a query component. Queries are included in requests as per the usual generic URL definition in RFC3986, i.e. separated from the path by a ?. Reserved characters used in the user's input must be "percent-encoded" as per RFC3986, and space characters should also be percent-encoded. ### 3.2.2 2x (SUCCESS) @@ -173,31 +173,46 @@ Response bodies of type "text/gemini" are a kind of lightweight hypertext format Similar to how the two-digit Gemini status codes were designed so that simple clients can function correctly while ignoring the second digit, the text/gemini format has been designed so that simple clients can ignore the more advanced features and still remain very usable. -## 5.2 Line-orientation +## 5.2 Parameters + +As a subtype of the top-level media type "text", "text/gemini" inherits the "charset" parameter defined in RFC 2046. However, as noted in 3.3, the default value of "charset" is "UTF-8" for "text" content transferred via Gemini. + +A single additional parameter specific to the "text/gemini" subtype is defined: the "lang" parameter. The value of "lang" denotes the natural language or language(s) in which the textual content of a "text/gemini" document is written. The presence of the "lang" parameter is optional. When the "lang" parameter is present, its interpretation is defined entirely by the client. For example, clients which use text-to-speech technology to make Gemini content accessible to visually impaired users may use the value of "lang" to achieve improve pronounciation of content. Clients which render text to a screen may use the value of "lang" to determine whether text should be displayed left-to-right or right-to-left. Simple clients for users who only read languages written left-to-right may simply ignore the value of "lang". When the "lang" parameter is not present, no default value should be assumed and clients which require some notion of a language in order to process the content (such as text-to-speech screen readers) should rely on user-input to determine how to proceed in the absence of a "lang" parameter. + +Valid values for the "lang" parameter are comma-separated lists of one or more language tags as defined in RFC4646. For example: + +* "text/gemini; lang=en" Denotes a text/gemini document written in English +* "text/gemini; lang=fr" Denotes a text/gemini document written in French +* "text/gemini; lang=en,fr" Denotes a text/gemini document written in a mixture of English and French +* "text/gemini; lang=de-CH" Denotes a text/gemini document written in Swiss German +* "text/gemini; lang=sr-Cyrl" Denotes a text/gemini document written in Serbian using the Cyrllic script +* "text/gemini; lang=zh-Hans-CN" Denotes a text/gemini document written in Chinese using the Simplified script as used in mainland China + +## 5.3 Line-orientation As mentioned, the text/gemini format is line-oriented. Each line of a text/gemini document has a single "line type". It is possible to unambiguously determine a line's type purely by inspecting its first three characters. A line's type determines the manner in which it should be presented to the user. Any details of presentation or rendering associated with a particular line type are strictly limited in scope to that individual line. -There are 6 different line types in total. However, a fully functional and specification compliant Gemini client need only recognise and handle 4 of them - these are the "core line types", (see 5.3). Advanced clients can also handle the additional "advanced line types" (see 5.4). Simple clients can treat all advanced line types as one of the core line types and still offer an adequate user experience. +There are 7 different line types in total. However, a fully functional and specification compliant Gemini client need only recognise and handle 4 of them - these are the "core line types", (see 5.4). Advanced clients can also handle the additional "advanced line types" (see 5.5). Simple clients can treat all advanced line types as equivalent to one of the core line types and still offer an adequate user experience. -## 5.3 Core line types +## 5.4 Core line types The four core line types are: -### 5.3.1 Text lines +### 5.4.1 Text lines Text lines are the most fundamenal line type - any line which does not match the definition of another line type defined below defaults to being a text line. The majority of lines in a typical text/gemini document will be text lines. -Text lines should be presented to the user, after being wrapped to the appropriate width for the client's viewport (see below). Text lines may be presented to the user in a visually pleasing manner for general reading, the precise meaning of which is at the client's discretion. For example, variable width fonts may be used, spacing may be normalised, with spaces between sentences being made wider than spacing between words, and other such typographical niceties may be applied. Clients may permit users to customise the appearance of text lines by altering the font, font size, text and background colour, etc. Authors should not expect to exercise any control over the precise rendering of their text lines, only of their actual textual content. Content such as ASCII art, computer source code, etc. which may appear incorrectly when treated as such should be enclosed beween preformatting toggle lines (see 5.3.3). +Text lines should be presented to the user, after being wrapped to the appropriate width for the client's viewport (see below). Text lines may be presented to the user in a visually pleasing manner for general reading, the precise meaning of which is at the client's discretion. For example, variable width fonts may be used, spacing may be normalised, with spaces between sentences being made wider than spacing between words, and other such typographical niceties may be applied. Clients may permit users to customise the appearance of text lines by altering the font, font size, text and background colour, etc. Authors should not expect to exercise any control over the precise rendering of their text lines, only of their actual textual content. Content such as ASCII art, computer source code, etc. which may appear incorrectly when treated as such should be enclosed beween preformatting toggle lines (see 5.4.3). Blank lines are instances of text lines and have no special meaning. They should be rendered individually as vertical blank space each time they occur. In this way they are analogous to
tags in HTML. Consecutive blank lines should NOT be collapsed into a fewer blank lines. Note also that consecutive non-blank text lines do not form any kind of coherent unit or block such as a "paragraph": all text lines are independent entities. Text lines which are longer than can fit on a client's display device SHOULD be "wrapped" to fit, i.e. long lines should be split (ideally at whitespace or at hyphens) into multiple consecutive lines of a device-appropriate width. This wrapping is applied to each line of text independently. Multiple consecutive lines which are shorter than the client's display device MUST NOT be combined into fewer, longer lines. -In order to take full advantage of this method of text formatting, authors of text/gemini content SHOULD avoid hard-wrapping to a specific fixed width, in contrast to the convention in Gopherspace where text is typically wrapped at 80 characters or fewer. Instead, text which should be displayed as a contiguous block should be written as a single long line. Most text editors can be configured to "soft-wrap", i.e. to write this kind of file while displaying the long lines wrapped to fit the author's display device. +In order to take full advantage of this method of text formatting, authors of text/gemini content SHOULD avoid hard-wrapping to a specific fixed width, in contrast to the convention in Gopherspace where text is typically wrapped at 80 characters or fewer. Instead, text which should be displayed as a contiguous block should be written as a single long line. Most text editors can be configured to "soft-wrap", i.e. to write this kind of file while displaying the long lines wrapped at word boundaries to fit the author's display device. Authors who insist on hard-wrapping their content MUST be aware that the content will display neatly on clients whose display device is as wide as the hard-wrapped length or wider, but will appear with irregular line widths on narrower clients. -### 5.3.2 Link lines +### 5.4.2 Link lines Lines beginning with the two characters "=>" are link lines, which have the following syntax: @@ -225,23 +240,25 @@ All the following examples are valid link lines: => gopher://example.org:70/1 A gopher link ``` +URLs in link lines must have reserved characters and spaces percent-encoded as per RFC 3986. + Note that link URLs may have schemes other than gemini://. This means that Gemini documents can simply and elegantly link to documents hosted via other protocols, unlike gophermaps which can only link to non-gopher content via a non-standard adaptation of the `h` item-type. -Clients can present links to users in whatever fashion the client author wishes. +Clients can present links to users in whatever fashion the client author wishes, however clients MUST NOT automatically make any network connections as part of displaying links whose scheme corresponds to a network protocol (e.g. gemini://, gopher://, https://, ftp://, etc.). -### 5.3.3 Preformatting toggle lines +### 5.4.3 Preformatting toggle lines -Any line whose first three characters are "```" (i.e. three consecutive back ticks with no leading whitespace) are preformatted toggle lines. These lines should NOT be included in the rendered output shown to the user. Instead, these lines toggle the parser between preformatted mode being "on" or "off". Preformatted mode should be "off" at the beginning of a document. The current status of preformatted mode is the only internal state a parser is required to maintain. When preformatted mode is "on", the usual rules for identifying line types are suspended, and all lines should be identified as preformatted text lines (see 5.3.4). +Any line whose first three characters are "```" (i.e. three consecutive back ticks with no leading whitespace) are preformatted toggle lines. These lines should NOT be included in the rendered output shown to the user. Instead, these lines toggle the parser between preformatted mode being "on" or "off". Preformatted mode should be "off" at the beginning of a document. The current status of preformatted mode is the only internal state a parser is required to maintain. When preformatted mode is "on", the usual rules for identifying line types are suspended, and all lines should be identified as preformatted text lines (see 5.4.4). Preformatting toggle lines can be thought of as analogous toandtags in HTML. -### 5.3.4 Preformatted text lines +### 5.4.4 Preformatted text lines Preformatted text lines should be presented to the user in a "neutral", monowidth font without any alteration to whitespace or stylistic enhancements. Graphical clients should use scrolling mechanisms to present preformatted text lines which are longer than the client viewport, in preference to wrapping. In displaying preformatted text lines, clients should keep in mind applications like ASCII art and computer source code: in particular, source code in langugaes with significant whitespace (e.g. Python) should be able to be copied and pasted from the client into a file and interpreted/compiled without any problems arising from the client's manner of displaying them. ## 5.4 Advanced line types -The following advanced line types MAY be recognised by advanced clients. Simple clients may treat them all as text lines as per 5.3.1 without any loss of essential function. +The following advanced line types MAY be recognised by advanced clients. Simple clients may treat them all as text lines as per 5.4.1 without any loss of essential function. ### 5.4.1 Heading lines @@ -252,7 +269,11 @@ heading in the file as a human-friendly title. ### 5.4.2 Unordered list items -Lines beginning with a * are unordered list items. This line type exists purely for stylistic reasons. The * may be replaced in advanced clients by a bullet symbol. Any text after the * character should be presented to the user as if it were a text line, i.e. wrapped to fit the viewport and formatted "nicely". Advanced clients can take the space of the bullet symbol into account when wrapping long list items to ensure that all lines of text corresponding to the item are offset an equal distance from the left of the screen. +Lines beginning with "* " are unordered list items. This line type exists purely for stylistic reasons. The * may be replaced in advanced clients by a bullet symbol. Any text after the "* " should be presented to the user as if it were a text line, i.e. wrapped to fit the viewport and formatted "nicely". Advanced clients can take the space of the bullet symbol into account when wrapping long list items to ensure that all lines of text corresponding to the item are offset an equal distance from the left of the screen. + +### 5.4.3 Quote lines + +Lines beginning with ">" are quote lines. This line type exists so that advanced clients may use distinct styling to convey to readers the important semantic information that certain text is being quoted from an external source. For example, when wrapping long lines to the the viewport, each resultant line may have a ">" symbol placed at the front. # Appendix 1. Full two digit status codes @@ -260,6 +281,10 @@ Lines beginning with a * are unordered list items. This line type exists purely As per definition of single-digit code 1 in 3.2. +## 11 SENSITIVE INPUT + +As per status code 10, but for use with sensitive input such as passwords. Clients should present the prompt as per status code 10, but the user's input should not be echoed to the screen to prevent it being read by "shoulder surfers". + ## 20 SUCCESS As per definition of single-digit code 2 in 3.2.
-----END OF PAGE-----