Rewriting org-roam-node-list for speed (it is not sqlite)

Actually we can reduce the time and increase efficiency by increasing efficiency in our query.
But this has to be customised according to the user. For certain use cases – the node struct does not need to be fully filled - we can fill by as much information we need and eliminate the rest … avoiding table joins are also crucial - if we eliminate aliases, tags and refs – we need half the time the query spends.

#+begin_src elisp
(cl-loop
for i to 10
collect (benchmark-run 1 (org-roam-node-list)))
#+end_src

#+RESULTS:
| 0.7831113749999999 | 7 | 0.42851951800000165 |
| 0.7781220160000001 | 7 | 0.42559097199999485 |
|        0.780368353 | 7 | 0.42647846900000275 |
|        0.786213332 | 7 |  0.4322603780000094 |
|        0.784391999 | 7 |   0.428577207999993 |
|        0.782923646 | 7 |  0.4264506800000021 |
|        0.784293421 | 7 |  0.4278606339999982 |
|        0.782726004 | 7 |  0.4278649470000033 |
|        0.782318894 | 7 |  0.4274087440000045 |
|        0.782252445 | 7 | 0.42879090999998937 |
|        0.786107294 | 7 |   0.427898036000002 |

#+begin_src elisp
(cl-loop
for i to 10
collect (benchmark-run 1 (test/org-roam-node-list)))
#+end_src

#+RESULTS:
|        0.740000569 | 6 |   0.374237762000007 |
| 0.7285562160000001 | 6 | 0.36818314399999963 |
|        0.721891912 | 6 | 0.36292082800000003 |
|        0.721998279 | 6 |  0.3617702559999998 |
|        0.720591303 | 6 | 0.36127016199999673 |
|        0.723495694 | 6 |  0.3636653589999952 |
|        0.725744881 | 6 | 0.36286276700000997 |
|        0.721128088 | 6 |  0.3609848879999902 |
|        0.722876063 | 6 |  0.3632802460000022 |
|        0.725067465 | 6 | 0.36378220699999986 |
|        0.721852864 | 6 | 0.36231498700000486 |

#+begin_src elisp
(cl-loop
for i to 10
collect (benchmark-run 1 (test1/org-roam-node-list)))
#+end_src

#+RESULTS:
| 0.32689531299999997 | 5 | 0.22068411800000032 |
|         0.327281741 | 5 | 0.22180803599999965 |
| 0.31872937799999995 | 5 |  0.2187466920000003 |
|         0.319050968 | 5 | 0.21987334300000017 |
|         0.317938584 | 5 | 0.21860417300000012 |
| 0.31835814700000004 | 5 | 0.21891438799999996 |
| 0.31891989699999995 | 5 | 0.21976714899999994 |
|         0.320068757 | 5 | 0.22048416599999943 |
|         0.319427283 | 5 |  0.2197354840000001 |
|         0.318135582 | 5 | 0.21929593100000044 |
|         0.319353553 | 5 | 0.21991869899999994 |

The control case is the first
Then your constructor function – but we fill up the same amount of information. The efficiency gained by elimination of aliases is very small <0.1 seconds.
The Third Case is a new constructor function but also decreasing the complexity of the query we do

(cl-defstruct (org-roam-node (:constructor org-roam-node-create)
                             (:constructor org-roam-node-create+
                                           (id &optional file title level point properties olp))
                             (:copier nil))
  "A heading or top level file with an assigned ID property."
  file file-title file-hash file-atime file-mtime
  id level point todo priority scheduled deadline title properties olp
  tags aliases refs)

(defun test1/org-roam-node-list ()
  "Return all nodes stored in the database as a list of `org-roam-node's."
  (let* ((query
          "SELECT
      nodes.id as id,
      nodes.file as file,
      nodes.title as title,
      nodes.\"level\" as \"level\",
      nodes.pos as pos,
      nodes.properties as properties,
      nodes.olp as olp
    FROM nodes")
         (rows (org-roam-db-query query)))
    (cl-loop for row in rows
	     collect (apply #'org-roam-node-create+ row))))

These are all the information I need for my case to complete the node-read protocol

It decreases the time taken by more than half.

@dmg

The aliases can be processed with a map:

But I am getting an error sometimes (marker does not point anywhere). As much as I can squint, this code looks equivalent to the old code, but something changes…

I do agree that requesting all the info from the database might not be worth it, and having less fields returned might reduced the time to process them. But I feel there are a lot of opportunities for improvement outside the database though.

I was thinking today… the real problem is trying to build the entire list of candidates before the user starts to type. But what if it is done asynchronously. For example, consult supports that. A potential solution is to leave org-roam alone and instead create a consult-org-roam-find-node-async feature that create a version of org-roam-find that is async.

2 Likes

You have made a trivial mistake that I also made earlier – I corrected in the second code.
The error happens when we have not accounted for the name change of the marker in the the constructor function.

The database stores point as pos whereas the node struct stores it as point.
Without accounting for this name change the marker will fail.

Good suggestion. But I have decided to eliminate the org-roam-node-read--completions function altogether.

I am retiring every filtering and sorting mechanism in my distribution. This is a bottleneck in Org Roam in many places. We may embed the filtering and sorting when doing queries itself. This is what I have chosen to do for my case. Live close to what I need.

Thanks!!! i’ll check my bug

What do you mean by “distribution”?

I mean – the process whereby allocation of resources and information across interconnected components within a larger system is maintained with the intent to ensure equilibrium and functionality. Basically my Init protocols taken altogether.

I fixed my bug. Thanks. That was it. Sorry about my question. What I was wondering is if you have an emacs distribution where you package org-roam.

In my computer, without any caching, the new org-roam-node-find feels much more snappier, and that is without caching enabled. With caching it might buy me another 4000-5000 nodes before I start to worry again about performance.

But I think the async method might be worth it. What do you think submitting a PR with this latest code? it is functionally equivalent to the old one. though I have not seen any PR merged in a long time.

1 Like

I think it would be a good addition - nevertheless I feel adding a new constructor that specialises during querying the database only would not mesh well with the overall system. Over time I have come to the conclusion that Org-roam is a system of mere suggestions for end users

  • there are many places where efficiency has been intentionally sacrificed to enable greater customisation options, not all systems are alike - and the upstream has to be able to generalise to N arbitary cases. So users must take calculated risks to make the system to their liking.

The entire read interface is one such example – it is so generalised that after a certain point of time youd feel like sacrificing customisations in favour of more stability, speed and efficiency.

Org Roam has a new set of maintainers, I feel time is soon when PRs and Issues will be solved – for the new maintainers stepped in with the intent to manage the repo.

How about we try to develop such a system with consult and make a module to attach right in place when needed? We can make arbitary optimisations for our use case that way.

Also the candidate formatting hash table cache will not trouble in anything else – just need to be reset when you do change the display template – else they wont be updated.

So this cache even that is the least “intrusive” one will confuse new users if enabled by default :slight_smile: – the other two cache are not worth doing in my opinion - it is much better to customise the read interface by then. It is not too complicated to chisel it to requirement.

Commenting for self reference in the future

  1. Querying for node.properties has the highest effect on the query – about 30%. Mostly because it calls garbage collection aggresively

Increasing gc-cons-threshold between calls - has the biggest impact on reducing latency
Results wont be comparable because Garbage Collection varies on machines to machines and has a very wide variation.

Results for the 20,000 nodes synthetic database

(let ((gc-cons-threshold 80000000))
  (cl-loop for i to 10
	   collect (benchmark-run 1 (emacsql test-database test-query))))

#+RESULTS:
|          0.27203571 | 0 |                 0.0 |
|         0.264726503 | 0 |                 0.0 |
|         0.410066537 | 1 | 0.13551040900000544 |
|         0.268856756 | 0 |                 0.0 |
|         0.432024729 | 1 | 0.16527758199998743 |
|         0.280949442 | 0 |                 0.0 |
|         0.496651709 | 1 |  0.2260986379999963 |
|          0.26814452 | 0 |                 0.0 |
|         0.264481209 | 0 |                 0.0 |
| 0.34821725200000003 | 1 | 0.07965078899999867 |
|         0.270294406 | 0 |                 0.0 |
(setq test-database (emacsql-sqlite-builtin "/home/akash/Desktop/or.db"))

(setq test-query "
SELECT id, file, filetitle, \"level\", todo, pos, priority, scheduled, deadline , title,  olp, atime, mtime, 
'(' || group_concat(tags, ' ') || ')' as tags, aliases, refs

FROM
  (SELECT id, file, filetitle, \"level\", todo, pos, priority , scheduled ,deadline,title, olp,atime, mtime, tags,
 '(' || group_concat(aliases, ' ') || ')' as aliases, refs

 FROM
    (SELECT nodes.id as id, nodes.file as file, nodes.\"level\" as \"level\", nodes.todo as todo, nodes.pos as pos, nodes.priority as priority, nodes.scheduled as scheduled, nodes.deadline as deadline, nodes.title as title, 
    nodes.olp as olp, files.atime as atime, files.mtime as mtime, files.title as filetitle, tags.tag as tags, aliases.alias as aliases, '(' || group_concat(RTRIM (refs.\"type\", '\"') || ':' || LTRIM(refs.ref, '\"'), ' ') || ')' as refs
    FROM nodes
    LEFT JOIN files ON files.file = nodes.file
    LEFT JOIN tags ON tags.node_id = nodes.id
    LEFT JOIN aliases ON aliases.node_id = nodes.id
    LEFT JOIN refs ON refs.node_id = nodes.id
    GROUP BY nodes.id, tags.tag, aliases.alias )

  GROUP BY id, tags )

GROUP BY id;")

I think there some main take aways from these improvements

  1. garbage collection is mostly due to the ephemeral creation of data structures. The key is to inspect the code and see why it is garbage collecting and try to avoid the creation of temporary variables (and parameters).

  2. sqlite gets blamed a lot for performance decay, but the reality is that as the DB grows, the number of data structures that need to be created in elisp grows.

  3. Caching can help, but it is intrusive (e.g. the cache should be invalidated when the DB is recreated, for example, and perhaps the cache can be recreated).

  4. There is a limit on how responsive org-roam-find-node can be made as the DB grows. If the number of nodes is very large, a new function needs to be written that reduces the amount of data retrieved from the DB based user input—which means wait until the user provides at least one character).

I think we both agree that we have accomplished our personal goals (I am happy with the performance now). We’ll see what others decide.

Is it too much to ask if I requested both of you to show your (current) final version in gist or your preferred code sharing site?

The discussion above is too involved for an onlooker like me, but I think the problem is common and your respective results are impressive — It is great to know performance can be improved without using a cache.

3 Likes

I apologise, we lack a practical space to share results and discussion in a less structured manner, I had to discover several aspects while going through it - so it was a series of revelations.

The problem is simple in this function, memory limitation. GC is causing the computation delay by running colinearly.

Reducing complexity of query and pre filtering and sorting in the sql query rather than using cl-filter or seq-sort also speeds up the process.

But this has to be tailored as per the user by the user rather than using the mechanism provided by org-roam.

These were my optimisations.

Also I want to add - I created an extra index on the files table for my use case – but i dont know if thats helping.
We may investigate the other case of table joins – but i lack the appropriate data set. My intuition is that trying to optimise the db through material views and so on wouldnt help here, the problem is not in sqlite truly. Or doing the pure query to put it like this.

https://hastebin.skyra.pw/ebocoboyoq.el

  • Plus I disabled colinearly running GC for emacs altogether that I will not recommend to anyone. But I have developed a deep distaste for Garbage Collected Languages in general, but that is a different story.

Thank you.

So your optimization is through three things:

  1. Increased GC threshold
  2. Custom sql query with less data retrieval (eg no aliases) and with filtering and sorting (within the sql) to replace those in Point 3
  3. Remove filtering with cl-filter and sorting with seq-sort

Correct understanding?

Do you use the custom constructor in the end?

1 Like

@nobiot That is correct.

I use a custom constructor simply so that I dont have to use p-case let to destructively bind them and use &key to make the node struct.

Since I dont use aliases – for my case it is more straight forward that I can more directly pipe the sql output into the constructor to make an appropriate node struct. Its tailored to my very specific need.

1 Like

hi nobiot,

I disagree with Akash on how to implement this. We do not need to change the behaviour (i.e. aliases can be preserved).

Here is a link to the code:

  1. We create a second constructor that does not take named parameters. The order of parameters has been optimized to match the results of the query (title and aliases are first, the others, it does not matter).

  2. We simplify the processing of the results of the query by using the new constructor.

That is basically it. The code is functionally equivalent, but by using mapcan instead of append, and avoiding the named-parameters, it reduces significantly the processing time and the amount of garbage created.

Give it a try.

I do agree that we can add an order by to match the default ordering done inside org-roam-node-read–completions. That way that sort will run faster.

I don’t think it is a good idea to change the gc threshold. GC is not the problem, is a symptom.

No, the problem of org-roam-node-list is not memory limitation. It is the creation of dozens of ephemeral variables and data structures.

Every time a name parameter function is called, a new object is created per parameter (I suspect is is a cons pair, name, value). Then the called function has to go through all parameters to find the one by name because it does not know their order.

This has several implications for performance:

  1. For every tuple in the DB, there is one node for the title, and one node for each alias (this is expected behaviour)
  2. For each node, when calling the constructor, 19 pairs are created (name, value).
  3. Inside the original constructor of org-roam-node-create the list of parameters has to be scanned 19 times to each each parameter (this might create another 19 variables).
  4. After the function ends, the 19 pairs are garbage.
  5. On top of that, the original for-loop created 19 variables in the pcase-let. After each DB tuple iteration these 19 variables are garbage.

as you can see, 38 variables are created for each tuple in the DB that end becoming garbage.

On top of that, creating these variables and the processing of named parameters in the constructor takes time.

As I said above, GC is not the problem, it is the symptom.

In my experiments, the new function (without the DB query) runs 10 times faster than the original function (without considering that GC cost).

1 Like