Merging Org-roam and Org

jethro · January 16, 2022, 8:40pm

It’s already pretty much separate: org-roam’s functions are just a read-only interface on top of the database layer. I also don’t see how users can specify their own schema, when the schema and how the database is populated is very much tied to what’s being pulled out of each Org file.

midas · January 17, 2022, 12:17pm

Out of curiosity, what is gained with an org-db? Org already has many ways to search and find headers. One can already find any header by its name, refile a header into any other header, find headers by the value of a property, search for relevant headers by regexp (org-search-view), crosslink headers, look for backlinks to a header (org-sidebar-backlinks), etc.

Many of the improvements I read above deal with making things run more efficiently/quicker/etc. However, perhaps because my org files are not large enough yet, I don’t run into problems with performance lags.

jethro · January 20, 2022, 8:41pm

For one, I think applications that require parsing everything in every Org file to produce a view (e.g. org-agenda) would receive an incredible speedup. Org-agenda, while very useful, fairs very poorly when there are many, and large files.

midas · January 23, 2022, 10:17pm

I haven’t hit the point where classical org-agenda has become inefficient or shows lag. But I can appreciate that a db could be an efficiency boost.

Another upside is the ability to write and embed complex queries. Complex in the sense that queries allow:

all sorts of logical operators and filters,
sort results in flexible ways
search against a wide-range of properties of headers
display results in various formats, e.g., tables, lists (hierarchical?), (other?)

While queries can do complex things, they ideally would be simple to write.

midas · January 23, 2022, 10:17pm

Is the following relevant?

mricke · January 25, 2022, 6:55pm

My reasoning is what Jethro mentioned but adding on to that, if you wanted to find a fully scalable solution for full-text search, like you asked in one of your questions, you would basically cache an entire org document and create a virtual table using sqlite’s fts extension.

Yesterday, I generated 10000 mockup org files, which took up ~85MB of space, the database was ~140MB, the database with fts5 enabled was ~240MB.

At that point, it might cause some people to question why files are even used to begin with, except for the convenience of using traditional file based tools.

Edit: You could enable full-text search on org-roam’s database and you could make other other org extensions work with it but that means having to install org-roam to use its database, which is not as modular as having other extensions also depend on org-db because, let’s face it, the database is what enables integration between org-roam and everything else.

And frankly, I’m not entirely convinced that org-roam’s caching model will work at larger scales. rgrep isn’t magic, it’s literally just grep -r or recursive grep, something that all *NIX users are familiar with and it still experiences the same slowdown when searching through tonnes of files.

mricke · January 26, 2022, 6:02pm

Is the following relevant?

Yeah, very relevant. Thank you for finding that, it’s very interesting. It would be interesting to hear his thoughts on whether or not he’s overcome the scaling problem.

What’s also interesting is that he shows how to directly query the database through emacsql and that it could be possible to create a wrapper to simplify the querying process. That would be a really cool feature.

Edit: org-db is included as part of scimax: org-db.el, so any proposal to have org-db as a separate package would involve consulting jkitchin.

As for scaling, scimax still has the same issue of indexing a large number of files, it’s to be expected and that’s fine. The issue comes with tracking changes to files in a directory:

;; org-db balances performance and accuracy in a way that works “well enough”
;; for me. There are a number of ways it can be out of sync and inaccurate
;; though. The main way is if files get changed outside of emacs, e.g. by git

Which I think is funny, since git would be perfect for tracking changes to files outside of emacs.

mricke · January 26, 2022, 11:49pm

I also don’t see how users can specify their own schema, when the schema and how the database is populated is very much tied to what’s being pulled out of each Org file

That is a schema, in a sense. Choosing to add headline content on top of everything else that is pulled from an org file, would be part of a separate schema. Realistically, most users would use the former or a combination of those two schema for the sake of working with org-roam but there may be some crazy fringe 3rd custom schema defined in or loaded from init, for someone not necessarily interested in working with org-roam.

zot · February 9, 2022, 6:22am

Actually, I saw Kitchin’s project a while back and decided to make a faster and more scalable indexer myself: https://github.com/zot/microfts

Topic		Replies	Views
Org-roam development status, May 2025 Development	9	82	June 6, 2025
Org-roam's future-proofedness Meta	6	822	December 14, 2021
Can my org agenda files and notes live happily within org-roam? How To	5	3901	October 23, 2020
Exploring Org-Roam. Have questions. Hi! Will I be arrested if I put all my emacs tutorial links in one big .org file rather than using atomic notes? Should I be cross-pollinating my coding knowledge base with my cooking recipes? How To	0	237	January 21, 2024
Expanded capture functionality Development	3	761	May 24, 2021

Merging Org-roam and Org

Related topics