What does it feel like to work with 10,000 notes in Org-roam: Benchmarking Org-roam's search methods

nobiot · May 31, 2020, 3:15pm

What does it feel like to work with 10,000 notes in Org-roam: Benchmarking Org-roam’s search methods

Intro

I got inspiration from this exchange in Org-roam Discourse on “Performance Testing”.

I went ahead, downloaded the 10,000 markdown files, and took Org-roam (with Md-roam as a companion) for a quick spin.

Here is my quick write-up, and my short video demo to share the impression with the community here. The write-up would feel more like lab notes rather than finished paper—I have never been a scientist, so it’s just my imagination…

Thank you, @cobblepot, for the link to the 10,000 markdown files

Summary

I find it good and workable with 10,000 notes.
For org-roam--list-files-xx functions, I observe significant performance difference between elisp and rg (two different search methods that work for Windows)—I suspect it would affect performance of org-roam-insert and org-roam-find-file, but I did not confirm this myself
elisp starts out faster, but then the performance degrades linearly. rg is flat for 100, 1,000, and 10,000 files. rg starts to be more attractive after between 2,000 and 3,000 files. elisp does not seem feasible at the 10,000 mark
The first DB build for 10,000 files take 6–7 minutes with my machine. Given that you would have to do this once in a while—you should build your DB overtime, and only occasionally would you need to re-build your DB—it should not be a problem
After the initial build, launching Org-roam takes about a minute (org-roam-db-build-cache needs this time). It might be once a day when you start working with PC; or it could be once in some weeks if you don’t turn it off. Not bad, for 10,000 knowledge base helping you generate more and better knowledge work
Inserting and searching an existing note is OK—see my 2-minute YouTube video for an impression yourself

There are obvious limitations to my tests.

I only looked at the number of files; I did not create many more number of backlinks among the files. To prepare this data, I would need some programming support to add links in the 10,000 files. I do not know the SQL and DB architecture enough to assess if I should expect big difference in performance if you had exponentially more backlinks
In addition, I didn’t test graph capabilities, and I didn’t have notes in subdirectories in org-roam-directory. I do not expect subdirectories would change performance much, but I can be wrong

Environment

Windows 10 (64-bit)
Emacs 26.3
sqlite3 3.32.0
Org-roam commit b2594b8
Md-roam commit 12dff25

I do not know where the machine power starts to play a significant role, but here are basic characteristics of mine:

Processor: Intel Core i7-8650U 1.90 GHz
RAM: 16.0 GB

Method

It’s rather rudimentary set-up and easy manual execution.

Download 10,000 markdown files from here
Unzip, copy 100, 1000, 2000 files to separate folders
Change configuration for org-roam-directory to point to the respective folder for each case
Run benchmark with interactive function benchmark as:

(benchmark 10 '(org-roam--list-files-xx "full/path/to/org-roam-directory"))

(I learned about benchmark from GitHub user siawyoung in this exchange on Org-roam PR). Thanks!)

The full path seems to be required for Windows as ~/ does not seem to expand in this form. It seems that for Windows I need to use \\ for the path.

Run the benchmark again and record the elapsed time for both the first and second repeats.

Results

100 notes

(benchmark 10 '(org-roam--list-files-elisp "C:\\Users\\nobiot\\100-markdown"))
Elapsed time: 0.400056s
Elapsed time: 0.264749s

(benchmark 10 '(org-roam--list-files-rg "C:\\Users\\nobiot\\scoop\\shims\\rg.exe" "C:\\Users\\nobiot\\100-markdown"))
Elapsed time: 5.150758s
Elapsed time: 5.137724s

1,000

(benchmark 10 '(org-roam--list-files-elisp "C:\\Users\\nobiot\\1000-markdown"))
Elapsed time: 2.274566s (0.079555s in 1 GCs)
Elapsed time: 2.486280s

(benchmark 10 '(org-roam--list-files-rg "C:\\Users\\nobiot\\scoop\\shims\\rg.exe" "C:\\Users\\nobiot\\1000-markdown"))
Elapsed time: 5.174479s
Elapsed time: 5.167108s

2,000

(benchmark 10 '(org-roam--list-files-elisp "C:\\Users\\nobiot\\2000-markdown"))
Elapsed time: 4.788072s (0.069892s in 1 GCs)
Elapsed time: 4.725148s (0.075500s in 1 GCs)

(benchmark 10 '(org-roam--list-files-rg "C:\\Users\\nobiot\\scoop\\shims\\rg.exe" "C:\\Users\\nobiot\\2000-markdown"))
Elapsed time: 5.151170s
Elapsed time: 5.095258s

10,000 notes

(benchmark 10 '(org-roam--list-files-elisp "C:\\Users\\nobiot\\10000-markdown"))
Elapsed time: 22.899702s (0.419321s in 6 GCs)
Elapsed time: 22.441639s (0.408953s in 6 GCs)

(benchmark 10 '(org-roam--list-files-rg "C:\\Users\\nobiot\\scoop\\shims\\rg.exe" "C:\\Users\\nobiot\\10000-markdown"))
Elapsed time: 6.148009s (0.067745s in 1 GCs)
Elapsed time: 6.456070s (0.162346s in 2 GCs)

org-roam-db-build-cache

Build the db file from scratch (no db file exists).

I’m not repeating to wait for that long for org-roam-db-build-cache.

(benchmark 1 '(org-roam-db-build-cache))

(org-roam) files: 10000, links: 0, tags: 0, titles: 10000, refs: 0, deleted: 0
Elapsed time: 376.709222s (28.662395s in 521 GCs)

Once the db is built, run org-roam-db-build-cache again. This happens every time you re-launch Emacs and Org-roam, even if there has been no change to notes after you shutdown Emacs and restart it.

(benchmark 1 '(org-roam-db-build-cache))

(org-roam) files: 0, links: 0, tags: 0, titles: 0, refs: 0, deleted: 0
Elapsed time: 47.147790s (0.913966s in 18 GCs)

Video demo to share impression

My video demo uses org-roam--list-files-rg, instead of the default org-roam--list-files-elisp.

Limitation of this performance testing

Few backlinks between notes
If you have multiple backlinks between notes, you might potentially exponetially higher volume of data. I am not sure how this translates to our perceived performance of the system.
Related to the backlinks, I didn’t test graph capabilities.
I had only flat directory. I don’t necessarily expect it would be much different, but use of subdirectories might influence performance.

Coda

It was fun. It also feels to me like a useful exercise. Hope you can take away something useful from my report and video, too.

I don’t know how I can add backlinks easily. Perhaps someone in the community can go beyond my tests here, and see where that takes us to.

Luhmann is said to have had 90,000 slip-notes (source, and referenced here). I don’t think I’ll ever reach the 10,000 mark, but it’s good to imagine what it could be like. It seems Org-roam can continue to be your good companion there.

Happy note taking.

scotto · May 31, 2020, 7:49pm

Nice to see! I had a niggling doubt about how org-roam would scale well after a year’s worth of notes. Looks like I don’t have to worry.

nobiot · May 31, 2020, 8:24pm

Yes, that’s my conclusion, too, so far. There are limitations to my quick investigation, though. As I noted in the body, I can’t see the influence of links, which are the heart of Org-roam. I would really love to see what I have done to be extended to investigate if the number of links among the notes significantly affects performance…

jethro · June 1, 2020, 5:39am

Nice benchmarks! I’d like to have these sort of benchmark tests run in the CI too, to make sure they aren’t too slow.

Someone pointed out that org-roam was performing multiple parses of each file to extract things. The numbers you see for org-roam-db-build-cache can probably be significantly improved

siawyoung · June 6, 2020, 5:26pm

Hey! I was the original author of the PR that introduced find and ripgrep into org-roam--list-files. Glad to see someone taking more serious benchmarks I wrote a short commentary about the implementation here: https://siawyoung.com/org-roam-rg-find based on the original, buggy PR that I submitted.

nobiot · June 6, 2020, 8:35pm

@siawyoung, yeah, cool implementation! With additional commits, now we can use rg on Windows, too (the only thing is it takes a bit to install it). I’m working to extend the benchmark I did; hopefully I can do a write up about it, too Thank you for your initiative to implement the feature in the first place

Topic		Replies	Views
Maximum number of notes Troubleshooting	23	2916	April 13, 2022
Performance Testing Development	7	1250	June 8, 2020
Rewriting org-roam-node-list for speed (it is not sqlite) Development	90	757	August 8, 2024
Merging Org-roam and Org Development	28	4622	February 9, 2022
Zettelkasten on Org-Roam is a two tool process? Am I doing this right? Requests	16	3626	July 20, 2022