Org-roam-node-read speedup without cache - Guide

This post is a culmination of various discussions that have taken place over time scattered throughout - the problem is as follows - inefficiencies in the code base causes the read protocol (org-roam-node-find and org-roam-node-insert) to lag when node size crosses a certain threshold - the cause is myriad but over time two branches of solutions have come out; they are namely

  1. Hide the problem under the rug and hope for the best (use cache over cache over cache over cache)
  2. Refactor the read protocol such that no cache is requisite.

I have worked extensively over the second path - @dmg has been very kind in giving valuable feedback and correcting many initial faults, and also providing with the requisite tools to reduce GC calls.

I present the final solution - it consists of two modules

a. Implement a cache in the org-roam.db itself, since the database is a persistent cache - we do not have to rebuild the cache over and over anytime a change in db occurs. This is the materialized view - now implemented in elisp

org-roam-node-view.el

b. Rewrite the read protocol such that weaknesses are eliminated

org-roam-node-read+.el

I had gotten suggestions to make my earlier iterations more general - so it has been done,

The following is a guide on how to set it up,


Variables:

The following are the variables that should be set up by the user

org-roam-node-list-filter : This is the filtering logic - folders named herein are excluded by default in the node-list.

org-roam-node-list-sort : This is the sorting logic

Both these must be written in SQL - which is much more easy than implementing in elisp

(defun +org-roam-node-display-template (node)) : This is the display template,

for example consider the following -

Examples:

(defun org-roam-node-read-sort-by-RTAL (completion-a completion-b)
  "Custom sort function for org-roam-node completions.

RTAL: ROOT TIME ALPHABETICAL ORG-LEVEL

Root level files are sorted first.
Followed by files in the same directory,sorted by modification time.
If two files are in different directories, they are sorted lexicographically (Alphabetically) by directory path.
If two nodes are from the same file, the one with the higher level is placed below.

Arguments COMPLETION-A and COMPLETION-B are items in the form of (NODE-TITLE . ORG-ROAM-NODE-STRUCT),
where ORG-ROAM-NODE-STRUCT is an instance of the `org-roam-node` struct.

Returns t if COMPLETION-A should come before COMPLETION-B in the sorted list,
and nil if COMPLETION-A should come after COMPLETION-B."
  (let* ((node-a (cdr completion-a))
         (node-b (cdr completion-b))
         (file-a (org-roam-node-file node-a))
         (file-b (org-roam-node-file node-b))
         (root-a (not (string-match-p "^/" (file-relative-name file-a "roam"))))
         (root-b (not (string-match-p "^/" (file-relative-name file-b "roam"))))
         (dir-a (file-name-directory file-a))
         (dir-b (file-name-directory file-b))
         (mtime-a (org-roam-node-file-mtime node-a))
         (mtime-b (org-roam-node-file-mtime node-b))
         (level-a (org-roam-node-level node-a))
         (level-b (org-roam-node-level node-b)))
    (cond
     ((and root-a (not root-b)) t)
     ((and (not root-a) root-b) nil)
     ((and (string= dir-a dir-b) (time-less-p mtime-b mtime-a)) t)
     ((string< dir-a dir-b) t)
     ((and (string= file-a file-b) (> level-a level-b)) t)
     (t nil))))

such a sorting logic may be trivially implemented in sql to give equivalent results.

Note: anything after --- is comments in SQL.

(defvar org-roam-node-list-sort
  "order by
   length(nodes_view.file) - length(replace(nodes_view.file, '/', '')),
   case when nodes_view.file like '%%%%guides%%%%' then files.mtime end desc,
   files.mtime desc,
   nodes_view.\"level\" desc

  -- Keep root files in the beginning
  -- then anything in the guides folder (sort this by modification time)
  -- sort all other categories by modification time
  -- also keep headline nodes before file nodes."

  "Define default SORT of `+org-roam-node-list'.")

Similarly for a filtering logic

(defvar org-roam-node-list-filter
  "where
   nodes_view.file not like '%%%%/references/%%%%'

  -- AND nodes_view.file not like '%%%%/folder/%%%%'
  -- add additional folders like above."

  "Define default FILTER of `org-roam-node-list',
folders mentioned here are excluded from the list.")

Users are expected to set these two variables with their own requirement.

For the display template - consider the following implementation

;; Customise appearance of nodes in `org-roam-find' & `org-roam-insert'
(cl-defmethod org-roam-node-directories ((node org-roam-node))
  (if-let ((dirs (file-name-directory (file-relative-name (org-roam-node-file node) org-roam-directory))))
      (format "(%s)" (car (split-string dirs "/")))
    ""))

(cl-defmethod org-roam-node-hierarchy ((node org-roam-node))
  (let ((level (org-roam-node-level node)))
    (concat
     (when (> level 0) (concat (propertize (org-roam-node-file-title node) 'face 'italic) " / "))
     (when (> level 1) (concat (string-join (org-roam-node-olp node) " > ") " > "))
    (propertize  (org-roam-node-title node) 'face 'bold))))

(setq org-roam-node-display-template "${directories:12} ${hierarchy:*}")

This can be converted to

(defun +org-roam-node-display-template (node)
  (let* ((level (org-roam-node-level node))
	 (file (org-roam-node-file node))
	 (dir (if (string-match
		   (format "/%s/\\([^/]+\\)/" org-roam-directory-name)
		   file)
		  (match-string 1 file)
		"/"))
	 (formatted-dir (format "(%s)" dir))
	 (dir-width 10)  ;; Set fixed width for the directory component
	 (padded-dir (truncate-string-to-width (format "%-10s" formatted-dir) dir-width)))
    (concat
     padded-dir " "
     (when (> level 0) (concat (propertize (org-roam-node-file-title node) 'face 'italic) " / "))
     (when (> level 1) (concat (string-join (org-roam-node-olp node) " > ") " > "))
     (propertize (org-roam-node-title node) 'face 'bold))))

This provides the same template but being orders of magnitude faster. If users don’t want to leverage efficiency then this may be trivially disregarded by removing its reference from

(defun +org-roam-node-read--completions (&optional filter sort)
  "Modfied `org-roam-node-read--completions'
Filtering and Sorting is delegated to `+org-roam-node-list'

Return an alist for node completion.
The car is the displayed title or alias for the node, and the cdr
is the `org-roam-node'.

The displayed title is formatted according to `+org-roam-node-display-template'."
  (mapcar (lambda (node)
	    (let ((display-candidate (+org-roam-node-display-template node)))
	      (cons display-candidate node)))
	  (+org-roam-node-list filter sort)))

in place of +org-roam-node-display-template swap to org-roam-node-read--to-candidate and make appropriate substitutions to determine the template variable-- see example from the original code how to do this.

> Reducing the amount of query from the database

For users requiring to even query a minimal subset – only those that are required by the filtering, sorting, display-template case by case basis

may be set in INIT such as (only those required in our example)

(org-roam-node-struct-set-slots '(id file file-title level point olp file-mtime title))

EDIT: 2024-06-23 : (org-roam-node-struct-set-slots (arg-list) now simply sets the query subset and doesn’t alter the constructor argument list.

No other user input is required - everything else will fall in its place.

Additionally one more variable may be customised by the user

(defvar org-roam-node-list-differentiate-aliases t) : If the user prefers not to have each alias become a node by itself - it may be turned off - it is on by default.

P.S : Please make sure that title and alias are at the end in (org-roam-node-struct-set-slots … ) if org-roam-node-list-differentiate-aliases is set to t. UPDATE: order is no longer important when setting this - the function will automatically take care of order.

AND RESET YOUR DATABASE ONCE - (org-roam-db-sync 'force) this will set up the materialized view.

Please read the code in detail - you can even paste the entire code in some LLM to understand what is being done.

Best,
If you use it - and notice any issue - let me know. Thanks

1 Like

Would you please describe how option B works from a high level point of view? It is hard to code-review the code without understanding first how it works.

thanks!

Known Limitations / Intentional Breakages


1. Annotation

Usage of annotation functions has been intentionally broken - since the default is a dummy function. Most users will not face any problem due to this - instead of annotations - the display template should be used, since its doing the same thing, I presume the annotation functionality is a relic from the past.

To bring back the annotation functionality - we need to propertize the display candidate with the node struct - an additional step which otherwise serves no other purpose.

See the source for +org-roam-node-read--to-candidate

change

(mapcar (lambda (node)
	    (let ((display-candidate (+org-roam-node-display-template node)))
	      (cons display-candidate node)))
	  (+org-roam-node-list filter sort))

to

(mapcar (lambda (node)
	    (let ((display-candidate (+org-roam-node-display-template node)))
	      (cons (propertize display-candidate 'node node) node)))
	  (+org-roam-node-list filter sort))

Untested - but it should work, if you were relying on that.

2. Sort in completion ui

This is broken in org-roam. I tested the code with consult-org-roam-node-read consult-org-roam.el- it is not broken there - the sort functionality had been broken in completions read in the default org-roam-node-read before this and will continue to be broken after this. Not a fault of this mechanism.