Fast and scalable sub-second ivy completion for V2 with thousands of zettels

Hey, I posted about this on slack but I think this is a more appropriate place for it.

I have an experimental ivy-based drop-in replacement for org-roam’s node completion. If you load the file, you can call a function and it will use advice to take over completion.

Here’s the file.

Background: I have over 8000 zettels and even with the optimizations I pushed, org-roam still can’t handle it as-is. I plan on adding a huge amount more and I had to do something about the responsiveness.

The two techniques I used to get performance:

  • SQLite’s Full-text search (fts5)
  • “cursoring” search, i.e. using limit and offset in the sql query

I took a look at how ivy works under the hood and came up with a way to “trick” it into playing along with a sliding 20-item query. 20 items is just fine for all the fancy decorating of text you want in an ivy completion display.

At this point, the code is just a <250 line gist. After I (and hopefully others) have tested it a bit, I’ll invest some more time and make it into a separate org-roam-ivy package (maybe also an org-roam-fts package).

Please, anyone who tries it out, let me know what you think.

It’s a bit buggy right now – if you immediately start typing quickly it gets confused and if you scroll too violently up and down some results disappear from the list. But it seems to be functional enough for me to use, anyway.

1 Like

Looks interesting; thanks for your work.

I’ve recreated the scale of your setup with fake nodes, and I can confirm that the speed of the query is scaling badly. Since we care about keeping the code snappy in core, we might keep such functions in core, but we’d first have to vet your implementation. There might be better backends to work with than fts5, or better ways than interactively limiting the view-port which feels a tad hadck-ish. It’s a good idea, though, and if it’s the best we can come up with without needing to be too creative, we might go for that.

There’s still plenty of work to do on the UX front, and I’d rather focus my efforts now on porting missing behaviours from v1 into v2. The fundamental problem is that optimisation problems are going to be everywhere now that we’re giving the greenlight for people to add IDs with wild abandon into a single file. For instance, after generating the 8000 fake nodes, it takes about 2 min on my machine for our code to traverse the file and insert/verify the IDs, making the external parsing of org-mode files all the more pressing. This ought not delay query optimisations like the one you suggest, though, and we’d be happy to come up with temporary, unharmonised solutions.

1 Like

Perhaps a backend like Microfts? :smiley:

Btw, I just added optional support for Fts5’s new trigram tokenizer – it verifies trigram support before using it.

Emacs’ programmable completion API does not support cursoring of any sort and neither does ivy so there may be few other options.

Another “hack” would be to have a hard limit of N results but if a user were to hold down the down-arrow key, they would eventually reach a false “end” of matches.

I’ve been able to speed up org-roam quite a bit so far without having to resort to backend programs, 6x for cleaning up 8000+ deleted files and 2x for importing 8000+ new files, and 35% for interactive search (my improvements have already been merged into V2).

I haven’t looked at speeding up imports beyond those improvements but I suspect there’s a lot more that can be done. For instance, on my laptop Emacs can read in all 8200+ of my files in less than 1.5 seconds.

(let* ((files (seq-filter (lambda (f)
                            (not (or (string-match "^\\.+" f)
                                     (file-directory-p f))))
                          (directory-files "~/Dropbox/org" t nil t)))
       (count (length files))
      (buf (get-buffer-create "TMP"))
      (time (current-time)))
  (with-current-buffer buf
    (cl-loop for file in (directory-files "~/Dropbox/org" t nil t)
             do
             (if (not (or (string-match "^\\.+" file) (file-directory-p file)))
                 (progn
                   (insert-file-contents file)
                   (erase-buffer))))
    (kill-buffer))
  (list count (format "%.06f" (float-time (time-since time)))))
(8289 "1.332432")

Emacsql can insert 9000 rows on my laptop in less than 2 seconds:

(let ((con (emacsql-sqlite "/tmp/test-insert.db")))
  (emacsql con [:create-table-if-not-exists test [a b c d e]])
  (let* ((time (current-time))
         (txn (emacsql-with-transaction con
                (cl-loop for i from 1 to 9000 do
                         (emacsql con [:insert-into test :values ["asdfasdfasdfadsfadsfasdfasdfasdfasdfasdfadsfadsfasdfasdfasdfasdf" "asdfasdfasdfadsfadsfasdfasdfasdfasdfasdfadsfadsfasdfasdfasdfasdf" "asdfasdfasdfadsfadsfasdfasdfasdfasdfasdfadsfadsfasdfasdfasdfasdf" "asdfasdfasdfadsfadsfasdfasdfasdfasdfasdfadsfadsfasdfasdfasdfasdf" "asdfasdfasdfadsfadsfasdfasdfasdfasdfasdfadsfadsfasdfasdfasdfasdf"]]))))
         (elapsed (float-time (time-since time))))
    (emacsql-close con)
    (format "%.06f" elapsed)))
"1.640338"

That’s less than 3 seconds for reading and updating 8200+ files, so a complete update of 9000 files should take seconds rather than minutes.

So I’m guessing there is not a need for alternate backends – we can make it much faster by making better choices in the lisp code. For instance:

  • org-roam-db-map-links calls org-element-parse-buffer which does a huge amount of needless work and creates a huge amount of needless garbage that Emacs has to clean up
  • org-roam also creates a new buffer for each update instead of reusing a temporary one like I did in the first code section

Thanks for your work. I’ll have to delay a proper review until we’re a little more advanced with the v2 merge, but it’s pretty high priority for me, so I’ll try to get back to you soon.

I did a quick test and reusing buffers isn’t any faster at all – it makes sense that Emacs’ buffer management facilities would be well tuned.

Here’s what happens when I just use org-roam-with-file on 8200+ files:

(let* ((x 0)
       (time (current-time)))
  (cl-loop for file in (directory-files "~/Dropbox/org" t nil t)
           do
           (if (not (or (string-match "^\\.+" file) (file-directory-p file)))
               (org-roam-with-file file nil
                     (cl-incf x))))
  (list x (format "%.06f" (float-time (time-since time)))))
(8289 "98.327005")

Setting gc-cons-threshold to most-positive-fixnum shaved 12 seconds off this.

A forced sync for me takes about 140 seconds, so it would seem that well over half the time, 60%-70% of it, is consumed solely by org-roam-with-file. Also, org-roam-db-update-file appears to be the only user of org-roam-with-file.

Speeding up org-roam-with-file could increase speed by another 2x or 3x.

Also +1 for the attempt to make completion fast and scalable! I don’t understand the solution @zot suggests, so that’s beyond my knowledge. But whatever the solution is, it should not be relying on the specific mechanism how ivy retrieves the candidates. Many people use many different completion frontends.

EDIT: The first version of this post asked for async completion, but looking at the gist, I seems like the proposal is using async collection (“dynamic collection”) feature, right?

But whatever the solution is, it should not be relying on the specific mechanism how ivy retrieves the candidates. Many people use many different completion frontends.

Emacs’ programmed completion API allows using a function for completion candidates but this is not asynchronous in any way. There is no standard way to “asynchronously fetch” completion results as you scroll in Emacs. Any completion system that can support some kind of cursoring behavior will do it in a “non-standard” way and each one will require custom code.

Emacs’ completion API supports using a function as a collection source which it calls once each time the user changes the query string. The function returns all matches for that string which the user can then look through. Ivy is consistent with this behavior.

Ivy (as well as Emacs’ default completion code) does not call the collection function when you scroll through the list of completions, it only calls the function when the user changes the query string (that’s how Emacs’ programmed completion system works).

What I did was to use ivy’s update-fn function, which it does call when the user scrolls. My update function changes some of ivy’s settings when the user scrolls past close to the end of the current portion of the results, forcing ivy to call the collection function again.

Even though I describe this as “tricking” ivy, I don’t consider this a hack – the purpose of the update-fn is for customizing ivy’s behavior.

Thanks for the clarification! But there are different ways of pulling asynchronously installed by the different completion frontends, right? I use notmuch-consult for example, which fetches async using consult. There was a similar package for counsel, too. I – perhaps naively – thought that these APIs have a similar approach.

But I agree, the most practical thing would be to support one completion frontend and then either refactor it or wait until some other guys (like me) pick it up and offer support for another package.

Does consult have asynchronous support? I don’t see any in the code or the README at first glance but maybe I’m missing something.

Maybe we’re not using the same terminology. This example in the readme is not asynchronous but maybe it appears to be:

This is making a synchrous call each time you change the query string, which is what Emacs’ completion API supports.

What I’m doing is also making synchrous calls when you scroll which Emacs’ completion API does not support.

You might be right with the terminology, but let’s clear that up. So I was referring to commands like consult--async-command, which launch a process and pull the completion candidates in a separate thread. The docstring calls it an “asynchronous pipeline.” To me, that sounds close to what I would understand as ‘call when you scroll’.

Sounds neat – I’ll check that out

FWIW, Daniel Mendler is the author of consult, and also is the author of or a contributor to a number of other recent related completing-read-based packages (selectrum, vertico, marginalia, embark, etc.).

As a result of that experience, he’s now working on enhancing completing-read itself on emacs-devel, which started with this summary of his proposals.

https://lists.gnu.org/archive/html/emacs-devel/2021-04/msg00270.html

I don’t believe async is among them though.

Maybe he feels he’s figured that out.

1 Like

It looks like a nice approach – to your original point though, consult’s asynchronous approach is just as consult-specific as my current code is ivy-specific. In response to your concern though, I separated out fts-specific functions from ivy-specific functions and the ivy part is quite small.

Consult seems to be using a streaming model and has to keep the entire history in memory, so the more you scroll down, the more memory it takes up, until it’s released when you finish the search (or type a character and cause a new search). My approach releases past items from memory as you scroll down and fetches them again when you scroll back up.

The streaming model applies well to a lot of external programs (like grep), making it a good approach for a general completion package.

In practice, maybe the streaming model works just fine for thousands of zettels since the memory only stays around a little while.

emacsql (which org-roam currently uses) isn’t an external program, though, it’s a (synchronous) lisp function. The query can be broken up into multiple calls, though, each of which returns N results. Maybe that be made to work with consult?

Keeping past rows around instead of requerying on scroll-up might simplify my code a bit (the logic for that isn’t really very large though).

I do like the simplicity of the streaming model and I think the extra memory overhead is a fair exchange.

Thanks for telling me about this!

Could be but consult-async-refresh-delay says “The completion ui is only updated every consult-async-refresh-delay seconds.”

That seems to imply that results are appended as they come in, regardless of scrolling. It’s a lot of code to go through though – consult.el is about 10x as large as my code :).

It could be that when it refreshes, it only processes the visible lines plus some margin – no idea.

Yeah, looking through the email exchange, I don’t see anything about dealing with thousands of results.

Is this a non-problem?

It could be that this is just a red herring and maybe it’s reasonable to just limit it to N results (like 200) and not worry about amounts greater than that.

That just doesn’t sit well with me. I think if a user wants to scroll through tons and tons of results, who am I stop them from that? N could be configurable but when a user sets N to something large, we’re back with the same problem.

@zot So you are using a ‘window model’, right? I see. I’m wondering if it is so difficult to implement a generic support for your ‘window model’ (as opposed to the ‘streaming model’) for completion. It is a rather special use case, since it requires a program in the background which reliably fetches slices of a big list on demand, which will most likely always only be a database. But it seems to be as easy as “if we scrolll, increase the upper window index and fetch again.” Plus, of course, memory handling etc.

But more to the point: Maybe another way to make completion more scalable via completion would be to filter directly using the SQL query (instead with Lisp). So we could simply limit the result list to a reasonable N, but if we type, the SQL query changes and then returns a smaller result list. This would mean to override the completion style and sounds a bit like introducing all kind of oddities from the user side, but it could be much more efficient. And even if the return size is limited, the user would effectively search all nodes.

EDIT: Or is this what you are doing? I can’t figure that out by looking at the code quickly, there are a lot of SQL queries :blush:

On POSIX-like systems you can use the “window” model on streaming programs with head and tail. PROGRAM | tail +100 | head 20 to access 20 lines starting at the 100th line of PROGRAM. How this compares to the “stream” model will depend on PROGRAM and Emacs :slight_smile:.

It’s worth thinking about but my code isn’t totally working yet so I don’t think I’m ready to try to make a general solution. Also, the only completion system I’ve built to is ivy (for microfts and for this) so I don’t think I know enough of what’s out there to generalize yet.

That’s why I’m using SQLite’s full-text indexing; so that I don’t have to filter the results in Lisp.

This is not overriding the completion style, it’s actually what Emacs’ programmed completion is designed to do. When you supply a function for completion, Emacs calls your function each time the user types (or deletes) a character and gives it your current string. It expects the function to return all of the items that match the string. So (as you say) there should be a smaller result as the user types more characters.

The “window” and “stream” models don’t fit Emacs’s API because they don’t return all the items that match, so they have to be “custom-fit” for each completion system.

1 Like