Improving performance of node-find et. al

Hello.

I think I’m starting to run into performance problems of org-roam-node-find and the other completing-read functions. They freeze up Emacs for a second or two each time I call them. I am pretty sure this is because they call org-roam-node-list which gets all the nodes from the database, and then filter them from Emacs. I have around 2500 nodes which seems to be enough to make Emacs struggle on my computer. Is there any better way to do this?

I’ve thrown together a quick proof-of-concept that does some initial filtering in the query to the database so that it can do the search. One immediate problem with this is that it completely ignores aliases.

From ba60b5a4bfd4e32ebd074563ee0cdbe25e45a4e3 Mon Sep 17 00:00:00 2001
From: Leo <git@relevant-information.com>
Date: Thu, 21 Dec 2023 11:35:28 +0100
Subject: [PATCH] filter initial-input in sqlite

---
 org-roam-node.el | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/org-roam-node.el b/org-roam-node.el
index 720c3a0..9a0d149 100644
--- a/org-roam-node.el
+++ b/org-roam-node.el
@@ -343,10 +343,11 @@ (cl-defmethod org-roam-populate ((node org-roam-node))
             (org-roam-node-aliases node) alias-info)))
   node)
 
-(defun org-roam-node-list ()
+(defun org-roam-node-list (&optional initial-input)
   "Return all nodes stored in the database as a list of `org-roam-node's."
   (let ((rows (org-roam-db-query
-               "SELECT
+               (format
+                "SELECT
   id,
   file,
   filetitle,
@@ -411,7 +412,9 @@ (defun org-roam-node-list ()
     LEFT JOIN refs ON refs.node_id = nodes.id
     GROUP BY nodes.id, tags.tag, aliases.alias )
   GROUP BY id, tags )
-GROUP BY id")))
+  where title like '%%%%%s%%%%'
+
+GROUP BY id" initial-input))))
     (cl-loop for row in rows
              append (pcase-let* ((`(,id ,file ,file-title ,level ,todo ,pos ,priority ,scheduled ,deadline
                                         ,title ,properties ,olp ,atime ,mtime ,tags ,aliases ,refs)
@@ -529,7 +532,7 @@ (defun org-roam-node-read (&optional initial-input filter-fn sort-fn require-mat
 for an example sort function.
 If REQUIRE-MATCH, the minibuffer prompt will require a match.
 PROMPT is a string to show at the beginning of the mini-buffer, defaulting to \"Node: \""
-  (let* ((nodes (org-roam-node-read--completions filter-fn sort-fn))
+  (let* ((nodes (org-roam-node-read--completions filter-fn sort-fn (or initial-input "")))
          (prompt (or prompt "Node: "))
          (node (completing-read
                 prompt
@@ -550,7 +553,7 @@ (defun org-roam-node-read (&optional initial-input filter-fn sort-fn require-mat
     (or (cdr (assoc node nodes))
         (org-roam-node-create :title node))))
 
-(defun org-roam-node-read--completions (&optional filter-fn sort-fn)
+(defun org-roam-node-read--completions (&optional filter-fn sort-fn initial-input)
   "Return an alist for node completion.
 The car is the displayed title or alias for the node, and the cdr
 is the `org-roam-node'.
@@ -560,7 +563,7 @@ (defun org-roam-node-read--completions (&optional filter-fn sort-fn)
 for an example sort function.
 The displayed title is formatted according to `org-roam-node-display-template'."
   (let* ((template (org-roam-node--process-display-format org-roam-node-display-template))
-         (nodes (org-roam-node-list))
+         (nodes (org-roam-node-list initial-input))
          (nodes (if filter-fn
                     (cl-remove-if-not
                      (lambda (n) (funcall filter-fn n))
@@ -675,7 +678,7 @@ (defun org-roam-node-read--annotation (_node)
 ;;;; Linkage
 ;;;;; [id:] link
 ;;;###autoload
-(cl-defun org-roam-node-insert (&optional filter-fn &key templates info)
+(cl-defun org-roam-node-insert (&optional filter-fn initial-input &key templates info)
   "Find an Org-roam node and insert (where the point is) an \"id:\" link to it.
 FILTER-FN is a function to filter out nodes: it takes an `org-roam-node',
 and when nil is returned the node will be filtered out.
@@ -692,7 +695,7 @@ (cl-defun org-roam-node-insert (&optional filter-fn &key templates info)
                     (setq beg (set-marker (make-marker) (region-beginning)))
                     (setq end (set-marker (make-marker) (region-end)))
                     (setq region-text (org-link-display-format (buffer-substring-no-properties beg end)))))
-               (node (org-roam-node-read region-text filter-fn))
+               (node (org-roam-node-read (or region-text initial-input) filter-fn))
                (description (or region-text
                                 (org-roam-node-formatted node))))
           (if (org-roam-node-id node)

base-commit: 5c06471c3a11348342719fd9011486455adeb701
-- 
2.43.0



Edit: you could then use it like this:

(defun my/org-roam-node-find ()
  (interactive)
  (org-roam-node-find nil (read-from-minibuffer "Nodes: " nil
                                                (let ((km (make-keymap)))
                                                  (bind-key "SPC" #'exit-minibuffer)
                                                  km))))

The effect is that you call my/org-roam-node-find with a key binding, type some initial query (without any results shown) and when you hit space you get the normal completing-read interface filtered for your initial input. This is much faster for me.

Edit 2: Updated the patch to also include org-roam-node-insert

3 Likes

Hi. This sounds horrible! Would you mind telling us more about your setup? Are you running Emacs on Linux/macOS/Win? How many files hold your 2500+ nodes?

(For comparison: I run a somewhat similar note-taking package and on a m1pro MBP it takes 0.66023s to compile the list of 3866 titles and alias. On a Raspberry Pi 5 the first run took 0.786s. Subsequent runs are (strangely) faster.)

I run Linux with Emacs 29.1. I have around 2000 files. The computer is a Thinkpad T450s. It takes around 2.5s to get Emacs to an interactive state after calling org-roam-node-find.

Edit: here is the diagnostics thing for org-roam:

- Emacs: GNU Emacs 29.1 (build 2, x86_64-suse-linux-gnu, GTK+ Version 3.24.38, cairo version 1.18.0)
- Framework: Doom
- Org: Org mode version 9.7 (9.7-??-e90a8a69a @ /home/leo/.emacs.d/.local/straight/build-29.1/org/)
- Org-roam: ba60b5a
- sqlite-connector: sqlite-builtin

According to Geekbench, the fastest variant of that computer (with an i7) should be on par with the Raspberry Pi 5 (which, by itself, is a crazy thing). There is at least one more person on Mastodon (Nick Anderson: "Looks like I won't break 5k org-mode files until …" - Fosstodon) complaining about a several second delay when calling org-roam-node-find.

I think completing-read in Emacs is a bit on the slow side but the majority of your delay comes from calling org-roam-node-list, as you suggested. Not sure there is a way to speed-up sqlite processing in Emacs.

Apologies for not having any advice but these scaling issues are really interesting. For orgrr I am thinking of adding a “speed modus”, e.g. calling rg without --sort, using fzf instead of completing-read…

@Zetagon How about this proposal? It’s stalled, but @d12frosted has done a working implementation with his valpea package.

2 Likes

Yeah, once it loads in all the nodes completing-read is really fast. I don’t notice any delays at all.

I’ve experimented further with consult because it seems to have some nice stuff around asynchronous commands.

(defun my/org-roam-node-read (&optional initial-input filter-fn sort-fn require-match prompt)
  (let* (nodes
         (prompt (or prompt "Node: "))
         (node (consult--read

                (consult--dynamic-collection           
                 (lambda (x)
                   (setq nodes (org-roam-node-read--completions nil nil x))
                   nodes))
                :prompt prompt
                :annotate
                (lambda (title)
                  (funcall org-roam-node-annotation-function
                           (get-text-property 0 'node title)))
                :category 'org-roam-node
                :require-match require-match
                :initial initial-input
                :history 'org-roam-node-history)))
    (or (cdr (assoc node nodes))
        (org-roam-node-create :title node))))

This is my first attempt. It isn’t the best, but I think it’s an improvement. I need to look into how consult’s string splitting works, because that seems to be a good way to do an initial filtering through sqlite, and then let completing-read do some filtering when it’s more manageable. I don’t know whether any of the sqlite connectors work asynchronously.

1 Like

I tried vulpea-find just now and it’s much better! There is barely any noticeable delay at all.

If I understand it correctly, it does some stuff in the backend but it still fetches all candidates and then lets completing-read handle it right?

1 Like

Vulpea just has an extra table optimised for reading. The table is populated together with regular org roam tables.

P.S. Happy you find it useful :heart:


Forgot to mention something. In case you have a large set of notes and you use tags to categorize notes, you can actually create multiple vulpea-find (as well as vulpea-insert) variants for different categories. For example, I have the following setup for vulpea-find and vulpea-insert (I want to see only file-level notes that are not part of my wine db):

(setq-default
   vulpea-find-default-candidates-source #'vulpea-find-candidates
   vulpea-insert-default-candidates-source #'vulpea-insert-candidates
   vulpea-find-default-filter
   (lambda (note)
     (= (vulpea-note-level note) 0))
   vulpea-insert-default-filter
   (lambda (note)
     (= (vulpea-note-level note) 0)))

(defun vulpea-find-candidates (&optional filter)
  "Return list of candidates for `vulpea-find'.

FILTER is a `vulpea-note' predicate."
  (let ((notes (vulpea-db-query-by-tags-none '("cellar" "rating" "appellation" "grape" "region"))))
    (if filter
        (-filter filter notes)
      notes)))

(defun vulpea-insert-candidates (&optional filter)
  "Return list of candidates for `vulpea-insert'.

FILTER is a `vulpea-note' predicate."
  (let ((notes (vulpea-db-query-by-tags-none '("cellar" "rating" "appellation" "grape" "region"))))
    (if filter
        (-filter filter notes)
      notes)))

And for wine I have specialized functions:

(let ((vulpea-find-default-candidates-source
       (lambda (filter)
         (let ((notes (vulpea-db-query-by-tags-every '("wine" "cellar"))))
           (if filter
               (-filter filter notes)
             notes)))))
  (vulpea-find))

You can easily query notes using vulpea-query, but it requires to load every note into memory which is not what you want in some cases. That’s why vulpea comes with a few specialized functions vulpea-db-query-by-*. You can find some benchmark results in the README.

Let me know if you have any questions :slight_smile:

3 Likes

Aye, initially I implemented it as part of vulpea library, cause my use cases required better optimisations. Then I figured it would be nice to contribute back (I even have a WIP branch somewhere), but… the war started, lots of stuff happened, so I got a bit of demotivated, especially considering lack of interest in the original proposal. So you see, I am just looking for excuses :joy_cat:

2 Likes

@d12frosted I am glad you have come back and posted your comments. I do not know how to express anything for your situation (I am sorry, I have no words); I am just relieved and happy to see you here.

My impression of your proposal thread is that Jethro wanted you to complete the PR. And then everything happened. Jethro moved countries and life changed. @zaeph changed the job and life changed. And so the maintenance of the package has stalled for the moment.

But I think there has been a lot of interest in the performance of org-roam-node-find and friends. As far as I could see from this forum (and a little bit on Reddit) people seem to start to feel it noticeably slow at around 2000-3000 nodes (or 10,000 – I guess this depends also on the machine power?).

I’d love to see the PR to be merged really. I have always thought it would be a great addition. I am hoping that 2024 will be a different year.

PS. I would use your branch, or steal the code from it, even when I have only 400 notes to worry about :slight_smile: .

1 Like

Since for now org-roam isn’t getting any new commits I put up my own little private fork: ~zetagon/org-roam - sourcehut git

I might PR this later if there’s interest. I think it works pretty well.

2 Likes

I am always around the corner (as long as I am not on the frontlines haha). I am just not active on forums :joy_cat:


I find myself at a crossroads with my plugin, Vulpea, which extends and utilizes Org Roam. While I deeply appreciate Org Roam’s role in the Emacs community and the innovative approach it has brought to note management, my vision for what it could diverge somewhat from its current path.

Org Roam’s niche as a ‘rudimentary Roam replica with Org-mode’ is admirable, yet I believe its potential extends beyond. Ideally, it would serve as a robust low-level library, excelling in managing notes through powerful, fast, and extensible APIs. These APIs could then be the foundation for various applications, from Roam replicas to more unique use cases.

Vulpea was born out of a need for such flexibility and has since proven its value across several projects (vino, Barberry Garden, personal blog and other smaller scripts and applications). However, I’m aware that continuing on a separate path or (partially) merging back with Org Roam are both significant decisions.

The prospect of contributing back to Org Roam is enticing. I see value in sharing the advancements of Vulpea with the wider community, enhancing Org Roam’s core to support a broader range of use cases, particularly non-interactive ones (i.e. for building various extensions/applications). Yet, such a contribution isn’t trivial; it would entail reimagining parts of Org Roam’s architecture to accommodate extended functionality and performance improvements.

Before committing to any path, I seek clarity on the community’s needs and the current trajectory of Org Roam. If there’s a substantial overlap in vision and requirements, I am open to contributing my efforts towards enhancing Org Roam and even helping with maintenance. Conversely, if the needs are too divergent, focusing on Vulpea’s independent development might be more prudent.

Your thoughts and insights on this matter would greatly assist in making a decision that aligns with both my goals and the community’s needs.

3 Likes

I scraped through the issue tracker and tested many solutions provided.

For example, Lists are slow; why not using hash tables? · Issue #2241 · org-roam/org-roam · GitHub This advices using the memoize library

There is also a pull request (feat): Increase speed of org-roam-node-find by vherrmann · Pull Request #2341 · org-roam/org-roam · GitHub

But yet, the solution provided in issue #2330 is so simple yet effective: `org-roam-node-find` takes around 6 seconds to show completion list · Issue #2330 · org-roam/org-roam · GitHub

It relies on creating a cache, I tested this with 12000 nodes here are the benchmark results

*CONTROL/BEFORE*
(12001 (0.919884862 6 0.5056896570000013) (0.9286408740000001 6 0.5203195919999999) (0.934569367 6 0.5220208249999985))

*EXPERIMENT/AFTER*
(12001 (.000070844 0 0.0) (0.000048763 0 0.0) (0.000048763 0 0.0))

The first number represents the number of nodes. = 12001 :grin:

The code can be found by following the issues page, but pasting it here for ease of access

;; org-roam-node-read--completions Cache Mechanism
;; see: https://github.com/org-roam/org-roam/issues/2330
;; source: https://github.com/Konubinix/Devel/blob/30d0184db0a61f46ca35e102302d707fef964a8c/elfiles/config/after-loads/KONIX_AL-org-roam.el#L770-L787

;; The following elisp code introduces a caching mechanism for the function
;; `org-roam-node-read--completions`, used as a backend to org-roam-node-find and org-roam-node-insert

(defvar org-roam-node-read--completions/cache
  nil
  "Memory cache of the list of nodes")

(defvar org-roam-node-read--completions/cache-time
  nil
  "The time when the cache was last taken")

(defun advice/org-roam-node-read--completions-cache (orig-fun &rest args)
  "Advice function to cache the result of `org-roam-node-read--completions`.

This function wraps around the original function and introduces a cache mechanism
to avoid recomputation of the node list when unnecessary.

- If the cache is empty.
- If the cache time is not set.
- If the cache time is older than the last modification time of `org-roam.db`.

In such cases, the advice function recomputes the cache, updates the cache time,
and returns the cached result."
  
  (when (or                                                               ; check if either one is true
         (not org-roam-node-read--completions/cache)                      ; cache is empty
         (not org-roam-node-read--completions/cache-time)                 ; cache-time is not set
         (time-less-p                                                     ; .. or less than the last modified time -
          org-roam-node-read--completions/cache-time
          (file-attribute-modification-time                               
           (file-attributes org-roam-db-location))))                      ; - of org-roam.db
    (message "Rebuilding the org-roam-node-read--completions/cache")      ; if any condition satisfy
    (setq org-roam-node-read--completions/cache-time (current-time))      ; set cache time to current time
    (setq org-roam-node-read--completions/cache (apply orig-fun args))    ; cache results of org-roam-node-read--completions
    )
  org-roam-node-read--completions/cache                                     ; return cached result 
  )

(advice-add #'org-roam-node-read--completions
            :around
            #'advice/org-roam-node-read--completions-cache)

Thanks to Konubinix (Samuel Loury) · GitHub

Now only if someone could provide me with some optimisations for the database syncing!
That would increase my startup time by one second :grinning: :raised_hands:


Test for yourself!

To test the miracle for yourself - create a test directory in your desktop, create a file .dir-locals.el and paste the following

((nil . ((org-roam-directory . "/home/user/Desktop/test/")
         (org-roam-db-location . "/home/user/Desktop/test/org-roam.db"))))

replace user with your username

then use this bash script to generate N no of files for org-roam

#!/bin/bash

echo "Enter the number of files to generate:"
read num_files

# Check if the input is a positive integer
if [[ ! $num_files =~ ^[1-9][0-9]*$ ]]; then
    echo "Invalid input. Please enter a positive integer."
    exit 1
fi

# Generate the specified number of files
for ((i = 1; i <= num_files; i++)); do
    filename="file_$i.org"
    uuid=$(uuidgen)
    content=":PROPERTIES:
:ID: $uuid
:END:
#+TITLE: $filename"

    echo "$content" > "$filename"
    echo "Generated: $filename"
done

echo "Files generated successfully."

Then do a org-roam-db-sync and do a before/after benchmark:

(cons
 (length (org-roam-node-read--completions))
 (cl-loop
  for i from 1 to 3
  collect (benchmark-run 1 (org-roam-node-read--completions))))
1 Like

One problem with using the advice function is that since it caches the entire contents of org-roam-node-read–completions, filter functions would fail to work with the cache. So if an user uses some filtering on their nodes then the advice function is not recommended.

I tested the PR linked above and it is a more comprehensive patch that doesn’t have this problem.

The benchmark with the pull request is :

(12000 (0.15328572599999998 1 0.06297729799999985) (0.218046877 2 0.12712598600000025) (0.154069664 1 0.06508798700000007))

how to patch

I will post here how to do the patch for users that may not know

  1. Paste the patch file in ~/.emacs.d/elpa/org-roam-20230307.1721/ that is in the org-roam package directory -

  2. Name the patch file as something such as patch.speed-completions

  3. In your terminal cd to the directory and do patch < patch.speed-completions

  4. Restart Emacs - upon restart the patched files will be byte-compiled automatically

To reverse the patch:

patch -R < patch.speed-completions then restart Emacs

To get the contents of the patch file,

go to the github page for the pull request, and just append .patch at the end of the url

that is : https://github.com/org-roam/org-roam/pull/2341.patch

This is a much better approach.

Best,

2 Likes

So I packaged a couple of lightning-fast commands to replace node-find and node-insert: quickroam

4 Likes

For what is worth it. I did a bit of digging tonight. sqlite is relatively fast. The real problem seems to be the processing that is done inside emacs for each of the tuples returned from the db. The database query seems to be taking 1/10 of the actual execution time.

1 Like

I heard somewhere that using filter functions speed up find and insert significantly - if this is the case - then the results are inline with expectations - it is the processing of the list retrieved from the db query that is what takes the time.

here is a small snippet to profile both the org-roam-node-list (which is the DB access) and org-roam-node-read–completions, which processes the results and prepares them for interaction with the user. In my case, the db access is 0.15 seconds, and the other is 0.9 seconds. Comment the corresponding line as needed.

1 Like

What about the following (any or all below):

  1. Since the cache is valid at least until an org-roam file is saved, why not attach a function that asynchronously updates the cache?

  2. But we could also somehow extend org-roam-find-node such that the cache is rebuild on demand while search a node.

  3. Or Provide a function that can be run by midnight that updates the cache asynchronously

The user can choose.

@dmg
Generally I feel there are already lots of solutions that have been floated to this problem – all dealing with determining when to update a cache – the easy solution I showed earlier caches the same thing as do the PR except with the difference that if current filter fn is different than last filter fn it also refreshes the cache – this was the case when I posted it – also there was this another solution to gut the entire user facing system of org-roam and rewrite the interface only using org-roam for its db system.

The real problem I feel is that – and I speak this out more out of intuition rather than statistical testing – the expensive process in the whole issue is the iteration it does while propertizing the entries.
A good solution must make this iteration less expensive – querying the sql db should not be a problem it is optimised for large data sets.

But until we have a real solution in place – all this brute force cache all come with some limitation or the other – currently the most important problem this way is that after every save – we face the same antagonising wait while the cache refreshes or whenever it does depending on how it has been put in place.

I currently lack the data set in doing hypothesis tests, but if you can you may try to figure out if the expensive part is running org-roam-node-read--to-candidate over all the entries one by one whenever the time comes to refresh the “cache” – it should do this every time the user calls the find or insert without a cache in place.