I ran the profiler and here is where 70% of the time is spent:
org-roam-format-template
But, most of the time spent is in the lambda passed to it.
I reckon that building the list of candidates can be done in at least 1/4 of the current time by doing them all at once rather than one at a time. The template is invariant, so there is no point on parsing it with regexp N-nodes times * number of template variables (in my case 4: “${title} ${tags} ${file} ${todo}”.
And once an attribute is identified (eg. title), all the nodes will have the same processing.
#'org-roam-node-list
> pass off semi formatted data from db
#'org-roam-node-read--completions
> after applying filter-fn
> iteratively call
#'org-roam-node-read--to-candidate
#'org-roam-node-format-entry
#'org-roam-format-template
So indeed - using a filter-fn and not querying the full breadth of nodes will be beneficial somewhat – although the #'org-roam-node-list also takes some time – due to its formatting – but its nothing compared to when we iteratively call the final formatting after choosing from the nodes that pass the filter-fn in question.
Keep me updated with what tests you do and their results.
I think I got it: memoize org-roam-node-read–to-candidate. Give it a try.
in theory it should be possible to memoize it without redefining the function. No more lag for my 1000 nodes, but it still has to do the query. By the way, the function that does the query, it spends 4-5 times more time formatting the result than the actual time sqlite takes.
Just keep in mind: it uses memory, but it should never go out of control, since the nodes don’t change that much. Perhaps we should have a function to reset the cache once in a while. Maybe with at a fixed time.
I spent this morning making the existing cache more compact – I also converted your function to an advice function. Your cache will kick in and make the first update after db changes more fast and make the process more better.
No need to embed garbage collection inside the function – users can simple run with idle timer.
I like using the advice. I was planning to write a decorator for the memoization (forgot that it is done by the advice).
thanks, I’ll use your version in my init. For my database (around 1000 nodes) org-roam-node-read–to-candidate seems good enough and it is the least intrusive of all of the caches. I’ll add a timer that invalidates the cache once a day.
The data is replicated twice for each case - each benchmark runs thrice, in total 2 benchmark runs were taken for each case - this duplication allows us to extract a confidence interval over the data. If the range between each benchmark run was wide for the same case then the benchmark is not useful to do control study.
The control case shows the benchmark when all cache is turned off for node size 20,000 (twenty thousand)
The last case of the benchmark gives wrong inference - to benchmark #'org-roam-node-list properly the cache over '#org-roam-node-node-read--candidates should be turned off. Apologies for oversight.
Thank you. Now I understand how it works and was able to use it.
In my computer the original org-roam-node-read–completions takes between 1.5 and 2 seconds. With the cache and the improves to the node processing only, it goes down to 0.15 seconds. one order of magnitude! and no need to worry about caching db data.
Seems you guys found a method to speed up org-roam around the same time as I gave up and wrote a replacement: GitHub - meedstrom/org-node: A notetaking system like Roam using Emacs Org-mode! A lot of the same ideas, with caching completions, and it has highlighted for me that there’s a bunch of things could be improved in org-roam.
For starters, there’s no good reason that building completion candidates should take so long that they need to be memoized. Org-node can build the candidates nearly instantly, and caches them anyway only because I want it to feel instant even on super-low-powered devices like a Kindle.
Aside from the matter of completions, you guys might have use of org-node-fakeroam-db-feed-mode if saving large files is slow. And org-node-fakeroam-db-rebuild if you frequently rebuild the DB.
It optimizes the creation of the nodes (the original constructor is very slow)
It replaces the code to format the node given the template. I gives the option of replacing the processing of the template (which is currently very expensive) with a call to a function.
The overall result is that, a typical database (say 1-2 k nodes) it feels instantaneous.
The best part is that there is no caching to worry about.
I plan to extend this minor mode with my improved template processing for org-roam (more in line with org templates).
I have been using it for more than 1 month and it seems stable. Since it does not modify the database, there is no risk to those using it (simply disable the minor mode if you don’t like it)