Improving org-roam-format-template (next to make org-roam-node-find fast)

I was profiling the code of org-roam-node-find and the biggest bottleneck (after the improvements to org-roam-node-list) is the processing, for each node, of the template (it is using around 80% of the total time of finding a node with my org-roam).

The main issue is that the template (eg: “${title} ${filename}”) is parsed as many times as nodes exist (using a regular expression). If this processing was done once only, it will certainly improve the performance of org-roam-node-find.

This will require to refactor the code of this function and org-roam-node–format-entry and rewrite org-roam-node-read–to-candidate

Controlled experiment on Display Format Generation with and without GC running for node size 20000
for an example display template – all other processes have been controlled for,

Format Generation Result

| 0.8807215509999999 | 0 | 0.0 |
| 0.9324620770000001 | 0 | 0.0 |
| 0.9284624810000001 | 0 | 0.0 |
|        0.925824983 | 0 | 0.0 |
|        0.926202192 | 0 | 0.0 |
|        0.939134124 | 0 | 0.0 |
|        0.928057464 | 0 | 0.0 |
| 0.9281281800000001 | 0 | 0.0 |
| 0.9267680730000001 | 0 | 0.0 |
|        0.927419631 | 0 | 0.0 |
| 0.9294667249999999 | 0 | 0.0 |
| 1.7701753569999998 | 11 | 0.9080397510000005 |
| 1.8481772189999999 | 12 | 0.9911701650000015 |
|        1.765555101 | 11 | 0.9101836789999993 |
| 1.8435445910000001 | 12 | 0.9885689709999994 |
|        1.848470344 | 12 | 0.9940336959999989 |
|        1.769367922 | 11 | 0.9133971390000006 |
|        1.849403378 | 12 | 0.9926142330000012 |
| 1.7668457899999999 | 11 | 0.9117966240000008 |
|        1.842279775 | 12 | 0.9882231329999982 |
|        1.761206654 | 11 | 0.9065899379999998 |
|        1.843822947 | 12 | 0.9877636340000002 |

Let me know if you make any headway in reducing the number off temporary variables in the process.

https://hastebin.skyra.pw/isisibupuh.org

I attempted to totally bypass the functions and construct the display candidates directly -

|        0.512200102 | 0 | 0.0 |
| 0.5243189109999999 | 0 | 0.0 |
| 0.5256468750000001 | 0 | 0.0 |
|        0.527694037 | 0 | 0.0 |
|         0.52837358 | 0 | 0.0 |
| 0.5306920380000001 | 0 | 0.0 |
|         0.52618145 | 0 | 0.0 |
|        0.526668523 | 0 | 0.0 |
|        0.526865157 | 0 | 0.0 |
|        0.532609853 | 0 | 0.0 |
|        0.528698882 | 0 | 0.0 |
|        1.309958374 | 2 |  0.8066643420000048 |
|        0.902891209 | 1 |         0.405633967 |
|        0.892309232 | 1 | 0.40388118600000666 |
|        0.897282229 | 1 |  0.4023565109999936 |
|        0.896623804 | 1 |   0.401657111999981 |
|        1.302014031 | 2 |  0.8049902160000215 |
| 0.8949897739999999 | 1 |   0.404243699999995 |
|        0.890800676 | 1 |  0.4025110569999981 |
|         0.89391253 | 1 |   0.402396901000003 |
|        1.301704781 | 2 |  0.8076995979999992 |
| 0.9027907079999999 | 1 |  0.4049892609999972 |

My conclusion is that - the functions themselves don’t take that much time - they generate a lot of garbage sure, but even if we eliminate them – the display format templates themselves will drag down the performance,
This is the impact of using two display templates , directories and title. Using only one has significant performance boosts

If we then do our calculations to make estimate for the functions in between - and estimate for their impact - it is negligible in the scale.

(defun +org-roam-node-read--completions (&rest args)
  "Generate completion candidates for Org-roam nodes, optionally filtered and sorted."
  (let* ((template (org-roam-node--process-display-format org-roam-node-display-template))
	 (width (1- (if (bufferp (current-buffer))
				      (window-width) (frame-width)))))
    
    (mapcar 
     
     (lambda (node)
       (let ((candidate-main (+org-roam-node-read--to-candidate node width)))
	 (cons (propertize candidate-main 'node node) node)))
     
     (++org-roam-node-list))))


(defun +org-roam-node-read--to-candidate (node width)
  (concat (test/directory node) " " (test/hierarchy node)))


(defun test/directory (node)
  (if-let ((dirs (file-name-directory (file-relative-name (org-roam-node-file node) org-roam-directory))))
      (format "(%s)" (car (split-string dirs "/")))
    ""))

(defun test/hierarchy (node)
  (let ((level (org-roam-node-level node)))
    (concat
     (when (> level 0) (concat (propertize (org-roam-node-file-title node) 'face 'italic) " / "))
     (when (> level 1) (concat (string-join (org-roam-node-olp node) " > ") " > "))
    (propertize  (org-roam-node-title node) 'face 'bold))))

I spent time trying to understand where the actual bottle necks. These are my results.

basically, retrieving and formatting the fields for the template are the most expensive actions.

1 Like

Excellent results - I think your experiments also corroborate my finding that any function that comes from the files.el are disgustingly slow.

Elimination thereof skyrockets performance. I think if you collect your findings more concisely you may improve the existing implementation and reduce the function times from about 0.4 sec to 0.1. This could be a significant improvement -

  • mainly I want to focus on this part that it should be made common knowledge among org-roam users to not use any function that comes from the files.el - so to give up about finding directory information for nodes without implementing a cache in place.

For my case - I have eliminated the entire middle circuit of generation of display template from user set options so many of these functions have been totally retired for my use case.

I think you have made significant improvement - made the middle of the circuit 4 times better nevertheless - I would if possible like a confirmation on the impact of just ripping from the code any function that comes from the said file especially here

Edit I just saw your message

Excellent ! Consider doing a PR with your findings.

@nobiot maybe I can bring your attention to here – removing this function from files.elfile will speed up performance – I dont understand what this line is supposed to do here – but it creates a huge latency – GC reduction as proposed by @dmg will have the most significant impact here – this is the most expensive protocol between the trio of functions that make up the read protocol –

So you are talking about this 3-line when form in the function org-roam-node--format-entry?

         (when (and (equal field-name "file")
                    field-value)
           (setq field-value (file-relative-name field-value org-roam-directory)))

It’s a bit surprising that this form affects performance – I think it’s used to construct each candidate displayed in org-roam-node-find based on user option org-roam-node--format-entry. So… If you did’t have ${file} as part of the user option (usually, you would not), I would have thought the form would not be evaluated and hence no performance implication…

1 Like

hi nobiot,

you are right:

org-roam-node-display-template is a variable defined in ‘org-roam-node.el’.

Its value is "${title} ${tags} ${file} ${todo}"
Original value was "${title}"

So my template is the one that triggers that.

To be honest, I didn’t look into the original template. If that is the case, the “default” query for the database does not need to do all those joins!

it brings an interesting point: a major rewrite would be:

the query should retrieve only what it needs (plus the id) and the completion mechanism should be used this data only (plus perhaps an ID, if one wants to dynamically show the candidates contents)

Yes, thanks for your clarification @nobiot - I have been led to believe that I need to speak more clearly, because I feel I can never get my intent across

By default it doesn’t you are correct - I discovered when I was testing a more direct route to generate the display candidate that if I used any function coming from the files.el file the performance will degrade very fast. So in the condition it gets evaluated – it shall manifest itself as a noticeable lag.

I want to ask you a final opinion - do you think a hash table here would benefit the larger community? It will have to be reset by some mechanism when display candidate format is changed – but if say we make the node struct the key and the main candidate format as the value – then we can improve read latency easily.

Between the trio of functions - the latency here will be noticed by users first. Can you think of some reasons why a hash table by default will be a bad idea – or whether it should be a user set feature – I want to ask you this because over time collective analysis has shown that trivially memoizing this part has the greatest impact for medium sized knowledge bases.

Thanks fr your time
Best.