2025-08-15 An opt-in search engine: Xobaque

I've been writing a search engine that doesn't crawl the web. I called it Xobaque.

Xobaque

On 2025-08-22 I set it up to search Planet Emacslife.

2025-08-22
search Planet Emacslife

Xobaque is entirely opt-in and push based: web admins have total control over what they want to have indexed. The drawback is that web admins have to upload their pages!

I foresee this to be a solution for sites that don't have good local search, like Emacs Wiki, or communities, like all the blogs on Planet Emacslife. If we want to rely less on big search engines, we have to provide our own.

Emacs Wiki
Planet Emacslife

Technically, Xobaque is writen on top of the SQLite FTS5 Extension. This extension does all the heavy lifting: indexing, searching, ranking, highlighting, snippets, boolean operators. Boolean operators! Do you remember those? Good times. The extension is quite amazing, really.

SQLite FTS5 Extension

What Xobaque provides is the command-line interface and the web interface. Right now, the command-line interface has a command to initialize an empty database and a command to upload a single page.

Here are some examples of how to use this in a script.

My homepage:

#!/usr/bin/fish
cd /home/alex/alexschroeder.ch/wiki
for line in (/home/oddmu/oddmu list)
  set --local info (string split "	" $line)
  set --local name (string replace --regex '^/home/alex/alexschroeder.ch/wiki/' '' $info[1] | string replace --regex '.md' '')
  set --local title $info[2]
  /home/xobaque/xobaque upload \
    -db "/home/xobaque/index.db" \
    -base "https://alexschroeder.ch/view/" \
    -local "$name" \
    -title "$title" \
    -filename "$name.md"
end

Emacs Wiki:

#!/usr/bin/fish
for path in (grep --files-without-match '^#FILE ' /home/alex/emacswiki/git/* /home/alex/emacswiki/git/.*)
  set --local name (string replace '/home/alex/emacswiki/git/' '' $path)
  set --local title (string replace '_' ' ' $name)
  /home/xobaque/xobaque upload \
    -db "/home/xobaque/index.db" \
    -base "https://www.emacswiki.org/emacs/" \
    -local "$name" \
    -title "$title" \
    -filename "/home/alex/emacswiki/git/$name"
end

The alternative to uploading via the command-line, I think, will be feed ingestion. Web site owners can send their feeds to Xobaque and it will index the entries in the feed. If the entries have the full text, that's great. If they have an excerpt, that works, too. You opt-in with whatever you provide.

I'm assuming that this will be a system for friends only, so the upload will be protected by a login. Otherwise, people will be able to upload other people's sites and that violates the opt-in idea.

#Search ​#Web ​#Xobaque

Planet Emacslife OPML
Indie
OSR
Other
Planet Jupiter

The feed import from the net should use ETags and If-Modified-Since headers if possible, but the convenience functions of bot the feed and the OPML library don’t provide this, so I’ll have to roll my own. And then think of parallelising the requests. Go channels ahoy! 😬

And then a web UI for uploads and accounts? 🥺

Error importing feed https://lifeofpenguin.blogspot.com/feeds/posts/default?alt=rss: XML syntax error on line 1: illegal character code U+FFFF
Error importing feed https://blog.laurentcharignon.com/index.xml: XML syntax error on line 231: illegal character code U+0008
Error importing feed http://feeds.feedburner.com/emacslife: parsing time "Sat, 9 Aug 2025 18:00:43 GMT" as "Mon, 02 Jan 2006 15:04:05 MST": cannot parse "9 Aug 2025 18:00:43 GMT" as "02"
Error importing feed https://www.unwoundstack.com/emacs-rss.xml: Get "https://www.unwoundstack.com/emacs-rss.xml": net/http: TLS handshake timeout
Error importing feed http://emacsmovies.org/atom.xml: parsing time "Sun, 04 Dec 2022 08:19:18 8DecGMT" as "Mon, 02 Jan 2006 15:04:05 MST": cannot parse "8DecGMT" as "MST"

But I guess I have 248/253 feeds imported.

Other issues I have noticed: Endless repeats of news items that overwrite each other, only differing in title and description, and no GUID set:

 Update http://tromey.com/elpa/news.html

Then again, the items all have dates set from 2007 to 2010, so perhaps that's OK.

And now I find:

# ./xobaque search alex schroeder
Emacs: Take Two
https://takeonrules.com/2025/06/10/emacs-take-two/emacs-take-two
extend Emacs
.

Third, learning of Alex Schroeder, both involved in RPGs

It works! And I really need to strip HTML, here! 😂