Experiments with client-side scripting

auragem.ddns.net/techlog/20250630.gmi

I don't have time for an in-depth post about this, but a few hot takes:

I agree with @clseibold that this is not really in line with the Gemini specification.
In practice, there is little value to be gained. Client-side environments are too varied (from GUI to terminal, desktop to mobile); there is no one scripting language that people can rely on, like there is Javascript on the web.
Time is much better spent doing server-side scripting (without clients knowing about it), where scripts can easily be embedded in a custom variant of Gemtext that the server reads as input, as long as it outputs standard Gemtext.

Jun 30 · 6 months ago · 👍 jsreed5, stack, clseibold

11 Comments ↓

When I wrote my log about this issue, I had missed the fact that the spec describes text lines as having no semantics. That certainly dissuades the use of text itself to present meaningful data to the client. However, I still worry about two (related) possible eventualities with Gemini clients, of which this scripting question is an example.

My first worry is that an entity would use legalistic word games to bend the spirit of the spec ("Text lines have no semantics, but per the spec, preformatting lines are not text lines", "Nowhere in the spec does it say we MUST present textual data to the user--only that we SHOULD", "Processing embedded code in the preformatting line is part of our rendering process and is therefore allowed", etc.). My second, more specific worry is that an entity would use some sort of trickery with the spec to justify their clients making extra network connections without the informed consent of the user.

The idea that it's perfectly fine for a Web browser to make arbitrary network requests and run arbitrary code is, I feel, at the heart of what makes the modern Web so terrible. My opinion is that Germini should always have a one-request/one-response document retrieval paradigm, and I would support that paradigm being explicity enforced in-spec if necessary.

🍀 meidam · Jun 30 at 19:40:

@jsreed5 Well...

If you had edited the comment in the draft editor instead of just commenting it straight in the post, you could have added three text segments in one comment instead of posting three comments.

Even without titan. I have never used titan before..

You can enter the draft editor (or whatever it's called) when posting a comment on a post by ending it with a backslash.

🚀 jsreed5 · Jun 30 at 20:00:

@meidam Thank you for the tip! I knew I could do that with posts, but not with comments. I've merged my comments into one so as not to clutter up the replies.

🚀 clseibold [🛂] · Jun 30 at 21:42:

@jsreed5 You do make some good points, but I just feel that if people are going to try to use technicalities to skirt the spec, then they are going to try to do that anyways regardless of what the spec says. While a spec cannot be interpreted anyway one wants in a valid way, people are still going to try to do it and claim it's valid.

I do want to note that the spec does say performatted lines are text lines, under the "In pre-formatted mode" section:

Any line which does not begin with the three characters "```" is a text line.

However, the rules about no semantics in text lines are explicitly under the "In normal mode" section. So one could use some trickery to say that preformatted text lines can have semantics, but normal mode text lines cannot.

As for the no background requests thing, I thought that was, or at least used to be, in the spec? Hm, I'll have to check again. The problem with this is there are some good and perfectly valid reasons to make what are basically background requests, and one of these is already in use quite a bit: the Gemsub feed aggregator in Lagrange!

🚀 jsreed5 · Jun 30 at 22:17:

@clseibold I should clarify that I don't object to automated or background requests in general. What's important is that the user is made aware of them and in some way proactively permits them (adding feeds to a config, requesting URLs programatically, selecting a link in the UI, etc.). The mainline Internet has moved far in the opposite direction; every Web page silently downloads and runs dozens of scripts, taking the position that if the user didn't explicity say no to running them, their default answer is yes. Gemini rejects that paradigm--not by prescription but by culture.

Looking at the current Gemtext specification, it seems preformatted text is allowed to have semantics, at least implicitly. One suggested use of alt text in the preformatting toggle line is to indicate the language the following block is written in, which "advanced clients may use to for syntax highlighting." This would require parsing the lines in the block more meaningfully than arbitrary textual data. Since the alt text part of the spec is fairly brief, I wonder if its implications may have been overlooked.

🦂 zzo38 · Jun 30 at 22:33:

Reading the linked documents and those linked from them, as well as the message on geminibbs, I will respond.

About what @skyjake wrote in 29783, I agree with all three points made.

In response to:

— ultimatumlabs.com/gemlog/20250504.gmi

Plain JavaScript without any other non-core functions other than "print" (and possibly requesting inputs as well) would be possible. A client should never automatically execute such scripts, although some clients could have a command to execute a script that is available for any preformatted block containing a script. This should be unnecessary and these scripts should be discouraged, but it would be possible to do this if you really want to do.

My opinion (like many other people) is that JavaScripts should not be implemented in Gemini. (Gemini FAQ 1.13 says "Gemini is not intended to replace either Gopher or the web, but to co-exist peacefully alongside them ..." so you can still use JavaScripts in HTML with HTTP(S) anyways, if you like to do that.)

There is also another response to that previously mentioned article:

— freeshell.de/gemlog/2025-06-28__I_d_never_do_this__sounds_like_a_challenge.gmi

In response to:

— ultimatumlabs.com/gemlog/20250630.gmi

You could roll dice or offer Magic 8 Ball answers with each page refresh. You could display the date in a locale specific format or maybe construct randomized ascii art. How about showing sunrise and sunset times for the browser's locale?

If it is done on the client then it shouldn't require to reload the page with each one, for doing random numbers. If it is needing reloading then doing it on the server side will work better anyways and has improved compatibility (a client-side program would be useful if you wanted to save a local copy and use it without an internet connection, but I think that uxn/varvara might be better for this anyways). Scorpion file format does have a way to display a date in a locale specific format, without needing client-side scripting, and is intended to be made in such a way that clients that do not implement it will ignore it and still display the date correctly anyways.

This would allow client-side text games and other programs without the risk of tracking. For example, I could port my Zoe 3d Tic Tac Toe game to JavaScript.

You do not need scripts inside of documents in order to make this, though. You can also use other VM systems which are independent from the document, e.g. uxn/varvara, NES/Famicom, Infocom Z-machine, etc. (It is still possible to send any kind of file, although the client is not (and should not be) guaranteed to be able to display or execute them.)

I could even perversely argue that playing Zoe on the client side would afford greater privacy than playing the networked version.

This is possible, although server-side implementations can still be useful for clients that want to use it for any reason. There is also the issue that the server sends a program that is somehow "marked" for some kind of tracking (even if it does not connect to the internet, someone who copies the text or makes screenshots etc); however, if the hashes of the programs are distributed then that problem can also be mitigated.

So no, I'm not seriously proposing scripting for Gemini. Just CSS.

I do not want CSS either.

Another thing that I can say will be: In my opinion, the non-extensibility does not actually work as well as they (or you or I) might want it to be.

TerseNet (which is not enough to actually be used) allows attaching WebAssembly programs to a document, but they are not downloaded or executed unless explicitly launched by the user. (The part that it doesn't work unless explicitly launched by the user, is a good idea, but I think there are other problems with TerseNet though.)

Scorpion protocol also has a way to do client-side scripting but in a different way; it uses the uxn instruction set and not JavaScript or WebAssembly, and has the following deliberate restrictions:

It is optional to implement.
Scripts cannot be included in a document; they can only be used in the conversion file (which is also optional to implement, and does more things than only client-side scripting anyways).
The conversion file is never automatically downloaded or processed; the end user must explicitly do so. (This is part of the "one-request/one-response" that Gemini and other small-web protocols also have.)
If the conversion file is implemented, it is mandatory that the end user is able to substitute their own conversion file.
It is expected that most implementations would normally restrict I/O, although this may be configurable by the end user.
The uxn and varvara is simpler to implement than JavaScript and WebAssembly (and, like the Scorpion document file format and the conversion file and many others, it is also a binary file format).

I should also mention "gemiweb0", which is designed to be a subset of WWW, and includes (a subset of) JavaScripts. So, it would also be possible to use that (serving a HTML file with HTTP(S)) if you want to use JavaScripts; a multi-protocol browser could implement multiple protocols and file-formats and some implementations could potentially include this too (I think this might be a part of the intention, at least I seem to remember it is what they told me). (I think that gemiweb0 should still specify such recommendations as e.g. that it must be possible to disable JavaScripts and CSS and some other features, to avoid needing JavaScripts and CSS for files if it could be made to not need it, and other recommendations to avoid some of the problems of WWW.)

Response to 29791 @jsreed5

The idea that it's perfectly fine for a Web browser to make arbitrary network requests and run arbitrary code is, I feel, at the heart of what makes the modern Web so terrible.

I agree this is a part of it (although it is not the only problem, but they are a very significant part of the problems).

My opinion is that Gemini should always have a one-request/one-response document retrieval paradigm, and I would support that paradigm being explicity enforced in-spec if necessary.

I agree with this too, at least as the default working (but see below). (However, a specification does not guarantee that everyone will implement it according to the specification.)

Response to 29798 @clseibold

However, the rules about no semantics in text lines are explicitly under the "In normal mode" section. So one could use some trickery to say that preformatted text lines can have semantics, but normal mode text lines cannot

It might be a mistake in the document.

As for the no background requests thing, I thought that was, or at least used to be, in the spec? Hm, I'll have to check again. The problem with this is there are some good and perfectly valid reasons to make what are basically background requests, and one of these is already in use quite a bit: the Gemsub feed aggregator in Lagrange!

Even if it is the case, it should require that the end user should explicitly configure that capability.

🚀 clseibold [🛂] · Jun 30 at 23:01:

@jsreed5 Btw, completely agree about the background connections. I think they are fine as long as there's some action of the user to enable them, just like you were saying.

As for preformatted lines, you are technically correct that they have semantics in a sense, but I do think the spec emphasizes rendering/display to the exclusion of execution, which is really the bigger point in my post, imo.

And there's a good reason I took this approach in my post. I have a B.A. in Theology, and one of the things you learn is that some people interpret Scripture in a way where the lack of meantioning something means that thing is disallowed. For example, the Bible doesn't have/mention gay marriage, therefore gay marriage is a sin, or more commonly, marriage is only between a man and a woman.

Now this approach to hermeneutics is a problem for me, because I'm obviously gay, and my personal experience tells me that God cannot find gay marriage, let alone being gay, a sin.

Therefore, I cannot say that the Bible never talking about gay marriage makes it a thing that should never exist. So failing to mention something is not the same as disallowing it. I promise this is reevant to this conversation :D

This same interpretive methodology is also used in the law and constitution. If the Constitution fails to mention a particular right at all, does it then disallow that right? I say no, other people might say yes. (Although the trick is our Bill of Rights in the US does explicitly say that there are more rights of a person than what might be mentioned in the Bill or Rights, which technically covers this problem just fine).

So if I personally cannot say that failing to mention something is not the same as disallowing it, then I also wouldn't be able to say the spec disallows scripting, because it *does* in fact fail to explicitly disallow scripting.

Therefore, my approach in the article is to talk about how it explicitly attributes display and rendering to each line, and how this conflicts with a potential execution purpose for any particular line. If a line's purpose is for display, it cannot also be for execution.

So with the preformatted lines, you are right that it doesn't explicitly disallow scripting, and it also allows semantic parsing. However, here's the trick:

advanced clients may use to for syntax highlighting

The trick is that it says "for syntax highlighting". This to me says that the only purpose of the semantic parsing of preformatted lines is for syntax highlighting, which is yet another method of rendering/display.

🦎 bluesman · Jul 01 at 00:30:

It sounds like we all agree. More or less.

My guess is that this topic was especially interesting to JBanana and myself because we've both written Java Gemini clients and scripting in Java is relatively easy to implement. With graalvm, you can even write interpreted scripts in Python and Ruby (or any language that can be compiled into LLVM byte code like C and Rust). I support three different scripting engines in my photo management software to create image filters and automate tasks. I find scripting extremely useful in that scenario. (It also brings me back to the days of ARexx on the Amiga).

I have no plans on adding scripting to Alhena. That said, I have no problem with anyone implementing experimental features as long as those features can be switched off. Is that controversial?

Also, my comment about adding CSS was a joke. Both because I don't think adding a fontsize tag to a text block is in any way equivalent to CSS but also because the statement itself is a bit inflammatory if taken seriously. Comedy is not my strong suite. Anyway, the only issue with that idea that I can see (since it doesn't break anything in existing clients) is that other authors might feel obliged to add a feature not explicity in the spec - especially if it starting gaining traction. That said, I wasn't around for ANSI color or inline images so I'm no expert.

🚀 clarahd · Jul 01 at 17:01:

There was an interesting post about "How to Kill a Decentralised Network (such as the Fediverse)":

— bbs.geminispace.org/s/Fediverse/17348

— ploum.net/2023-06-23-how-to-kill-decentralised-networks.gmi

From a humble user perspective, my OperaMini browser gets refused access to a modern website, or I can't use a search engine on my Kobo ereader alternate browser due to lack of script capability.

Originally the web could transmit useful info using simple software and low resources. Then "improvements" happened. Then Gemini. Then what?

🚀 jsreed5 · Jul 02 at 15:33:

@clarahd That is precisely what I worry about. I don't want to see a situation where Gemini technically works with all clients that comply with the spec, but it's functionally useless without clients that implement additions like client-side scripting. Everyone agrees it's a bad idea, but my concern is whether it's technically allowed in the spec. If it is, all it takes is for Gemini to get popular enough, and someone is going to implement it one day.

As an analogy, HTTP and HTML are two different specs, just like Gemini and Gemtext are. By definition, HTTP works perfectly on every Web site--whether the site is Wikipedia or TikTok, if I send an HTTP request, I will get a working HTTP response, and probably an HTML document as well. However, if my client does not support Javascript, then I can't do anything practical with TikTok's working HTTP response and subsequent HTML document. This "working but useless" trap is what we all want Gemini to avoid.

If we interpret the spec as allowing scripts, then capsules can implement scripting all they want, and Gemini will "work" as the spec is written. We then run the risk of Gemtext (and therefore Geminispace) becoming "useless" for clients that don't use the scripts.

🚀 clarahd · Jul 02 at 17:05:

@jsreed5 Oh I see that's what your point was, but I guess I got triggered by the mere MUSING of extensions to Gemini/gemtext, lol.

It just irks me - at one time I got by with a 100MB/month data plan with OperaMini - now these webpages that I have to switch over to Fennec for, for what value added? Menus?? That couldn't have been HTML? And now 100MB eaten up like that.grrr..