Cache and big roam files

Hi all, I’m thinking of using org-roam on two big files of mine. They are some tens of thousands lines long now but I would like them to scale up to some hundreds of thousands. Say 50 lines per heading, up to 5000 headings, probably less but that’s my stress test. I’ve worked with files that large in org in the past and it’s manageable. And it makes life easier for some things, the tooling around org tend to work better with few files, even if they are somewhat big. I don’t care about cache auto update, I would be happy manually updating it every few hours or once a day. Do you think it would fare well? Thank you in advance.

This issue might be relevant for you.

My hunch is that how “it fare” would more depend on the number of links (including http hyperlinks, file links, and ID links) in your large files than their size. Org-roam scans the entire file for links to cache as far as I remember; this operation might start to struggle if the file size is large.

I guess … the only way to find out is to try.

By the way, cache will not store content of the file, so data retrieval would be fine, I guess

1 Like

Thanks for the link, it’s useful information.

Org-roam scans the entire file for links to cache as far as I remember; this operation might start to struggle if the file size is large.

Yes, this is my main concern, I assume it’s not an incremental operation. It’s fine if it takes 15 seconds, not so much if it takes 2 hours. I’m going to do some stress testing as soon as I find the time.

I was wondering if one could write a command to update just a modified node, but I’ve taken a look at the db code and it’s not straightforward to do that. The code is file oriented, updates start with a cleanup of the entire file followed by re-adding of all nodes, there is no primitive to delete and re-add a single node. But I will give it a shot when I have more time.