Preconceived Ideas & org-roam-node-find Performance

Since I finally got more comfortable with Elisp and Org-roam lately, I decided to revisit my config to make some improvments.

A big part of this work focuses on how I organize my notes (metadata, subfolders), and query the database for various purposes. The problem being more comfortable with Elisp is that I have many ways to achieve the same results. More options mean I need to base my design choices on other considerations than my ability to code, so I can think more about convenience and performance.

As I was (re)writing my different finders (i.e. filtered variations of org-roam-node-find), I started to question my decisions and parctices. Mostly from a database/query performance standpoint. I do not experience delays or performance issues so far, but I would like to start my Org-roam journey on sane bases so I do not have to start over the most structural parts on my setup.

My doubts for the most part reside in my ignorance of some of the underlying technologies and code I am using, so I thought more knowledgeable users could help me separate reasonable asumptions from misconception and maybe even a touch of magical thinking :slight_smile:

It is also possible that my worries about slowing down my ’notes finders’ by using poorly written FILTER-FN are simply baseless and that writing efficient filters will only have a marginal effect (or none at all). I really do not know.

Please help debunk or confirm my ’beliefs’:

  1. The less filters, the better.

    This assumption of mine is the one I deem the more realistic, so I hope I am not gonna get disappointed on that one.

    I usually do not use more than three criterias to filter my notes, most of the time two. I feel like if the node is tested against n criterias, it increases the complexity hence the computing time by n fold.

    True, False or kind of ?

  2. Not all criterias are equal.

    That one has some common logic with the previous one. For example I get the preconceived idea that when testing a node property against a list of strings using (member prop string-list), it will be (length string-list) times longer to process than testing it against a single string using (string= string prop) for example.

    Is there any truth to that ? If yes, how dramatic would the difference be in a situation like the example above ?

  3. Order matters.

    When writing a filter function, I always end up reordering my condition testing in some kind of a reverse ’funneling’ sequence.

    I basically start by the criteria with the narrowest pool of candidate (i.e. the highest power of elimination), and work my way down. It is hard to illustrate, because it depends on the content and organization of one’s notes. But let’s imagine I write a lot about politics, and very little about religion. If I write the following function:

      (org-roam-node-find other-window nil
                              (lambda (node)
                               (and
                                (member "Politics" (org-roam-node-tags node))
                                (member "Religion" (org-roam-node-tags node)))))

I feel it will be faster if I switch up because since the Religion tag is rare in my zk, more nodes will be eliminated before testing against the more popular Politics tag. Which in my mind means less code execution before yielding results.

This idea of mine is also reinforced by the Info page for the and ’special form’ as the doc calls it which says that the evaluation stops as soon as a condition returns nil.

> Eval args until one of them yields nil, then return nil.
> 
> The remaining args are not evalled at all.
> If no arg yields nil, return the last arg’s value.

Thanks for sharing your knowledge/experience and help me understand what is happening behind the scenes a bit better. Hopefully it can be useful to future readers facing the same kind of dilemmas and doubts as well.

Before continuing the discussion, have you scan our long discussion of the performance of org-roam-node-find?

  1. the main problem with the function is not the DB, it is the way the nodes are created in memory and the excessive use of memory and the repeated application of the template (which is slow) to each node

  2. I have developed a small extension that improves the performance to make it no longer an issue at least with my db. Look at the code. It will explain what the current bottlenecks are.

Try it: it is a minor mode, so you can enable and disable it:

oh, one more thing. I believe that once the nodes are loaded in memory the biggest improvement is to create a function to format the node for presentation rather than using a template that has to be processed for every node. See the top of my code for an example of how to do such formatting.

it does not matter.

True. The harsher the filter (the more it filters) the less work the template system (and any other operation after the filter) has to do.

Also, for each node, the filter has to be run (but usually filters are extremely fast)

Marginally. Sort is a very fast function.

one more thing: garbage collection affects performance. The more garbage is created, the more garbage needs to be collected.

My best recommendation, use the profiler. Create some use-cases, their tests, and run the profiler with each. That you let you understand how much time is spent on each operation and how much garbage is created.

Lisp is a very simple language. Its syntax is a < aloneToken > or (< tokenA > [< other tokens >])

tokenA can be:

  • a function name => call the function with the rest tokens are parameters
  • a special form => a primitive in the language, such as defun, let, let*, and, etc.
  • a macro => the macro is recursively expanded with the parameters until only special forms and function calls remain
    If I remember correctly, that is basically it.

org-roam is full of macro invocations (which make the code easier to write) that affect its performance, such as pcase-let, cl-defstruct (this is a major consumer of time and ephemeral use of memory), do-list, etc.etc.

Thanks for taking the time to go over each point, I get a clearer understanding of where the problems actually lie. I had skimmed through the discussion you referred to, and made note of your extension to explore later.

The discussion was quite technical, and I was not sure how much all the aspects I was setting up at the time of asking would impact future performance. Now I can relax about it, and focus on what matters most performance wise.

I will especiallly try to re-write my display templates as functions as you mentioned, this part I did not retain from reading your previous discussion on the subject.

Thanks again for all the tips, I will update the post after giving those improvments a try.

one clarification: org-roam does not support functions as templates. It is a feature of my extensions. With the current code, the simpler the template, the faster (as it parses the template once per node).

if I remember correctly the slowest stages of org-roam-node-find were: 1) the constructions of the nodes; and 2) the formatting of each node with the template.

Because all nodes are always read, 1 cannot be changed (unless it is rewritten) ; but 2 can be reduced by filtering out as many nodes as possible using a filtering function.

1 Like