Rewriting org-roam-node-list for speed (it is not sqlite)

akashp · May 26, 2024, 6:41pm

You speak of the sun - I speak of the bright star in the sky.

Do you know what is the memory limit for garbage collection? 8MB for 64bit systems.

You indeed stopped GC from interrupting by not using &key or named parameters how you refer thereof. But you have delayed its apparition. It will come back to haunt you soon. You are refusing to see the systematic problem and privileging your experience as the absolute measure by which to measure this problem.

Best of luck.

dmg · May 26, 2024, 6:45pm

I am not sure we are talking about the same thing. I am avoiding the garbage, So they will be less garbage to collect.

Try this test. You will see how much frequently GC is created with the old constructor compared to the new one.

gist.github.com

https://gist.github.com/dmgerman/4d46d4883c20bf9e4fa6964416e93cbb

test-node-constructor.org

#+begin_src emacs-lisp   :exports both
(defun test-node-creation-original (rows)
    (cl-loop for row in rows
             append (pcase-let* ((`(,id ,file ,file-title ,level ,todo
                                        ,pos ,priority ,scheduled ,deadline
                                        ,title ,properties ,olp  ;;12
                                        ,atime ,mtime ,tags ,aliases ,refs)
                                  row)
                                 (all-titles (cons title aliases)))
                      (mapcar (lambda (temp-title)

This file has been truncated. show original

dmg · May 26, 2024, 6:51pm

I am not going to take this personal.

I think the issue is that the goal of this post was to improve org-roam-node-list. Not to fix the overall performance problems of org-roam-find-node. The two problems are related but are not the same.

akashp · May 26, 2024, 7:01pm

I want to thank you because you showed me the actual danger of not letting GC run concurrently. Your code caused my entire 16GB to be consumed in matter of minutes. If I didnot have a running meter in my status bar it wouldve been a real problem. The original function resulted in a memory runoff real quick.

nobiot · May 26, 2024, 7:03pm

Will give it a try. Thank you for the details. Good to know about named-parameters and mapcan vs append… (to avoid creating “ephemeral creation of data objects”).

akashp · May 26, 2024, 7:14pm

((12.565267540999999 198 10.119147684) 
(12.586926016 198 10.133527428999999)
 (12.679864182 199 10.21216201)
 (12.569218254 198 10.142452860000002)
 (12.646604257 198 10.184059835)
 (12.730608683 199 10.287921325) 
(12.719753434 198 10.265040858000006)
 (12.734317783 198 10.266267747)
 (12.751079298999999 199 10.308935932999987)
 (12.655331395 198 10.225003819000008))

((2.334587381 34 1.7285304440000004)
 (2.3340318019999997 34 1.726118126000003)
 (2.3998588350000003 35 1.786226365999994) 
(2.34627646 34 1.7379468980000041)
 (2.334389745 34 1.729939129999991) 
(2.4012674169999997 35 1.7924815439999975) 
(2.355598956 34 1.7451108780000055)
 (2.3526207020000003 34 1.738609482000001)
 (2.418250885 35 1.7980149689999934)
 (2.369987739 34 1.7544976500000047)

The results really speak for itself - GC has been reduced to very minimal levels The pure construction is about 1.5 seconds the rest is due to GC.

Your function is indeed a very good improvement over the original - @dmg

akashp · May 26, 2024, 7:32pm

@dmg But my problem is this doesn’t translate inside the actual querying the database part. Testing purely the construction and testing the construction inside the query dont match up

((4.211816963 43 2.667408874000003)
 (3.3887830290000003 43 2.569976830999991)
 (3.404704159 43 2.5966941630000235)
 (3.513540266 36 2.6666194270000005)
 (4.014954374 37 3.1913484229999938)
 (3.995240881 36 3.1696141749999924)
 (4.292487442 37 3.4804995510000083)
 (4.200951773 36 3.3835223929999927)
 (4.09498155 36 3.2752181159999907)
 (4.036290528 36 3.2234094110000058)
 (4.069772767 36 3.254689763000016))

(2.1138285029999997 9 0.9324652439999852)
(3.312031015 9 2.2671419960000208)
((3.8493328069999997 9 2.0812000519999856) 
(4.029758998999999 9 2.272773110000003) 
(4.0702967590000005 9 2.298531553999993) 
(3.834395152 9 2.054774081000005) 
(4.008232194 9 2.2249461819999965) 
(3.9467964810000002 9 2.1564310699999965) 
(3.962750159 9 2.1469183970000074) 
(3.973150672 9 2.114172695999997) 
(4.2814362 9 2.4193735890000028) 
(5.72975772 9 3.8936696530000177) 
(3.959540304 9 2.099407384999978))

Can you tell which one is which? One doesnt use named parameters and one does !

THATS THE PROBLEM !

https://hastebin.skyra.pw/zipejaduru.el

akashp · May 26, 2024, 7:45pm

I pasted the wrong result the first time, the GC is indeed low - but the change is very low in the grand scheme of things. For 20,000 nodes you get ~ < 1 sec less GC. But this value even varies. It is unnoticeable for large node size.

dmg · May 26, 2024, 8:08pm

This might be why. I didn’t even think about what my gc-cons-threshold was.

It is set by prelude to be 50 megs. you are running with 8 megs. that will explain why I get much less GCs.

./modules/prelude/init.el:(setq gc-cons-threshold 50000000)

But even when I change it to 8Megs (as you do), I get this for the new function:

|            0.046952 | 0 |                 0.0 |
| 0.26367700000000005 | 1 | 0.22832300000001737 |
|             0.03366 | 0 |                 0.0 |
|            0.033556 | 0 |                 0.0 |
|            0.033359 | 0 |                 0.0 |
|            0.260653 | 1 |  0.2270859999999857 |
|            0.033462 | 0 |                 0.0 |
|            0.033393 | 0 |                 0.0 |
|            0.033357 | 0 |                 0.0 |
|            0.258917 | 1 | 0.22527900000000045 |

and this for the old function:

|            0.107185 | 0 |                0.0 |
|            0.090248 | 0 |                0.0 |
|            0.316679 | 1 | 0.2257799999999861 |
|            0.090017 | 0 |                0.0 |
|            0.089758 | 0 |                0.0 |
|            0.319499 | 1 | 0.2283440000000212 |
|            0.090246 | 0 |                0.0 |
|            0.089719 | 0 |                0.0 |
| 0.31684300000000004 | 1 | 0.2266039999999805 |
|             0.09006 | 0 |                0.0 |

Still 3 times faster, but when they both have GC, they run almost at the same speed. My DB has 1000 nodes.

akashp · May 26, 2024, 8:51pm

I think what we can both agree is for a holistic fix to this issue - we should apply a set of fixes - in a systematic manner.

A custom constructor function tailored to need +
increased GC threshold - depending on node size ( but this has the catch when GC will run it will run with more intensity ) , tailored query,
and overall optimisations throughout the read protocol.

We did reach the same conclusion,

Also we should not be afraid to take a systems approach to problems rather than trying to solve systematic problems in piece wise fashion.

I suggested to not port something like this upstream because users will need to write this with their needs in mind - what information they want to query and so on, so it should be left to the discretion of users, we should only try to elucidate certain main principles to be employed in this process.

With this I conclude, this Forum is asking me explicitly to shut up now –

"You’ve posted a lot in this topic! Consider giving others an opportunity to reply here and discuss things with each other as well."

I apologise for being frustrated earlier - it was a pleasure working with you, I got to know many things with your help.

Best.

nobiot · May 26, 2024, 9:03pm

Just an aside related to GC: this thread in Reddit about GC and gcmh may be of interest for more customization…

akashp · May 27, 2024, 2:34am

I came across that earlier - I think there are dangers of arbitrarily setting this variable - even using complex mechanisms like the GCMH tries to do - I dont think we need to take the dangers - I will just post a last picture here showing how with very minimal changes to the gc threshold – not using setq but simple let - we can get very good results. Just increasing it to 16mb from 8mb between calls. I show in the other pane that indeed the node size is 20,000. The node list not only generates the list but also does complex sorting computation based on three criterias.

Without counting minimal GC we have result very close to ~0.01 seconds

dmg · May 27, 2024, 3:34am

I had the wonderful idea of creating a database of 60k nodes. I took the linux kernel, added a header to each file with a unique ID and title, and started…
that was more than 6 hrs ago!!! At least it seems to be doing something

though it has not updated the DB file in the last hr or so.

there are lots of room for improvement in org-roam for large knowledge bases.

akashp · May 27, 2024, 2:19pm

I think its not a fault of Org roam per se, emacs is not designed to handle such cases. For very large Knowledge systems we might have to give up a db system altogether maybe and move to something like denote which organises based on a naming convention.

dmg · May 28, 2024, 2:50am

I ended only creating a DB of 18k files/nodes (one node per file).

Here is the result of the benchmarking of org-roam-node-list. It still shows that there is an improvement in the new processing. There is less garbage collection.
However, the rest of the processing (without caching) means that it takes around 5 seconds to try to find a node. I ran with 50M of gc-cons-threshold.

One of the problems of these artificial datasets is that the joins don’t do much work because there are no tags, refs nor aliases.

#+begin_src sqlite :db /tmp/test-roam.db   :exports both
select count(*) from nodes;
#+end_src

#+RESULTS:
#+begin_example
18616
#+end_example


#+begin_src emacs-lisp   :exports both
(cl-loop
 for i from 1 to 10
 collect (benchmark-run 1
            (org-roam-node-list)
           ))
#+end_src

#+RESULTS:
|           0.622012 | 0 |                0.0 |
|           2.668191 | 1 | 2.0512409999973897 |
| 0.6107699999999999 | 0 |                0.0 |
|           2.606348 | 1 | 1.9958320000005187 |
|           0.607799 | 0 |                0.0 |
|           2.795983 | 1 |  2.179583000001003 |
|           0.632062 | 0 |                0.0 |
|           2.706139 | 1 | 2.0752820000016072 |
| 0.6100519999999999 | 0 |                0.0 |
| 2.6985919999999997 | 1 | 2.0795169999983045 |

#+begin_src emacs-lisp   :exports both
(cl-loop
 for i from 1 to 10
 collect (benchmark-run 1
            (org-roam-node-list-old)
           ))
#+end_src

#+RESULTS:
|           1.689673 | 0 |                0.0 |
|           3.836726 | 1 | 2.1332390000025043 |
|           3.640173 | 1 | 2.0067009999984293 |
|           3.722971 | 1 | 2.0415150000007998 |
|           1.668975 | 0 |                0.0 |
|           3.684444 | 1 |  2.007223000000522 |
| 3.6806189999999996 | 1 | 2.0058239999998477 |
|           3.775295 | 1 | 2.0830819999973755 |
|           1.657521 | 0 |                0.0 |
| 3.6920569999999997 | 1 | 2.0023760000003676 |

akashp · May 28, 2024, 4:10am

Of course your fix works - it reduces GC by not creating temp variables there is no doubt about it, but we need to test at 8mb GC, that should be our target because thats the default in Emacs, and there the Memory Limit comes prior to this. Your suggestion was crucial in getting the last mile in my setup.

If the GC threshold is low - the Garbage created while querying nullifies the effect of using the custom constructor function, that is what I was trying to say.

I also got the display candidate generation to about <0.1 seconds by making the generation much fast. So in total I get about 0.2 ~ 0.3 seconds for the full circuit to complete for 20k

And please rethink running GC at 50MB - in my test anything after 16MB has diminishing results moreover - when the GC does happen for larger thresholds – it takes more time to complete. Try to reduce it to 16MB, I think you would find the same benefits.

Also I think the joins won’t tank much result - there are index on each of the Foreign Key Columns of Tables - so they shouldn’t alter the result by too much.

d12frosted · May 28, 2024, 10:11am

Oh, this thread is interesting. I haven’t read it fully yet, but it seems that you folks found a lot of bottle neck in the current lisp implementation of the query/population routine.

One of the other things that I’ve found some time ago is how slow multiple joins become as amount of notes grown. I’ve reported it as #1997 and even provided a link to solution that I use in vulpea, but I never found enough time and motivation to contribute it to Org Roam

Also I think the joins won’t tank much result - there are index on each of the Foreign Key Columns of Tables - so they shouldn’t alter the result by too much.

So according to my findings, they do tax. Flat table is much faster. At least according to my benchmarks.

So maybe both approaches combined could give tremendous speed improvements.

akashp · May 28, 2024, 10:18am

Thanks, I wanted to create a custom table and replicate all the data in one place, but since I didnt have the appropriate data set I thought the index would cover it.

Will look into it, thanks for chiming in. Youre the OG optimiser. Your agenda mod has made a night and day diff for me.

Can you please comment on the optmisations I made? I will post the full mod I made to the read protocol. Performance has really increased.

Ill add your optimisations

I said here about material views, when I thought about it then, I looked into someway to offload as much of the processing to sqlite itself - I didn’t want to burden the write to db protocol with more entries - but when I researched about it more I realised sqlite3 doesn’t have native support - so gave up on the path. My reasoning was mostly data related - I had no way to make a controlled experiment and extract data, so it did not feel correct to try to solve a problem I couldn’t be sure about. I will study about it more

Edit

After reading your vulpea-db.el I feel you have already made much of the modifications and generalised them very efficiently - we do have to spend more time during delta syncs - but if they are offset by efficiency in this regard it may be worth it -

   (advice-add 'org-roam-db-insert-file-node :after #'vulpea-db-insert-file-note)
   (advice-add 'org-roam-db-insert-node-data :after #'vulpea-db-insert-outline-note)
   (advice-add 'org-roam-db-map-links :after #'vulpea-db-insert-links)

I think your package solves many of the efficiency problems and you have identified the bottlenecks much before us. In effect the bottlenecks in org-roam-node-list and org-roam-node-read--completions are solved by you more comprehensively.

Maybe @dmg can find his GC reduction technique very useful here. That is only what could be imported into your approach

You have already taken care of query/filtering/sorting bottlenecks from my first reading.

akashp · May 28, 2024, 10:21am

@d12frosted

I have posted my full modification here.

gist.github.com

https://gist.github.com/akashpal-21/19e2350cf5b51addd3c61afcd254ca39

org-roam-read+.el

;; org-roam-node-read+.el

(require 'org-roam-node)

;; Add a new constructor function to the org-roam-node struct
;; This constructor function avoids using &keys, and only takes a
;; curated lists of slots to construe the node struct.
(cl-defstruct (org-roam-node (:constructor org-roam-node-create)
			     (:constructor +org-roam-node-create
					   (id &optional file-title file-mtime level point file title olp))

This file has been truncated. show original

Also if we flatten the structure - we may not be so restrictive as to what we we take - I have tuned this to my exact needs –

To be very honest I have eliminated all other table joins except the files because I could never be sure if table joins do tank even though I told myself maybe the index would be enough, but it was always bugging me, thanks for your confirmations

This can be generalised without losing performance. I think org-roam tries to be too general and in having to process all the alternative choice an user can make it has to take more indirect routes, and this redundancy comes in sharp contrast with limits in elisp itself when node size exceeds a certain threshold.

It would be excellent to work with you and maybe improve on my approach more.

dmg · May 28, 2024, 3:54pm

Also I think the joins won’t tank much result - there are index on each of the Foreign Key Columns of Tables - so they shouldn’t alter the result by too much.

I think the query evaluation plan will depend a lot on the actual data of a user. The work done by the joins depends a lot on the number of tags, refs and aliases used.

In fact, I have been thinking that there is no reason why this table could be kept up to date as a user modifies files.

By the way, you can use explain in sqlite. It is not as reasonable as other DBMS, but it gives an idea of what sqlite does when answering a query.

type explain in sqlite.

Topic		Replies	Views
Improving performance of node-find et. al Development	31	1023	July 28, 2024
Org-roam-node-read speedup without cache - Guide Guides	2	115	June 28, 2024
How to make org-roam-node-find faster? Requests	4	712	December 31, 2022
Improving org-roam-format-template (next to make org-roam-node-find fast) Development	8	146	May 30, 2024
Difficult to use at >20,000 nodes? Troubleshooting	2	370	October 23, 2023

Rewriting org-roam-node-list for speed (it is not sqlite)

Edit

Related topics