Merging Org-roam and Org

It’s already pretty much separate: org-roam’s functions are just a read-only interface on top of the database layer. I also don’t see how users can specify their own schema, when the schema and how the database is populated is very much tied to what’s being pulled out of each Org file.

Out of curiosity, what is gained with an org-db? Org already has many ways to search and find headers. One can already find any header by its name, refile a header into any other header, find headers by the value of a property, search for relevant headers by regexp (org-search-view), crosslink headers, look for backlinks to a header (org-sidebar-backlinks), etc.

Many of the improvements I read above deal with making things run more efficiently/quicker/etc. However, perhaps because my org files are not large enough yet, I don’t run into problems with performance lags.

For one, I think applications that require parsing everything in every Org file to produce a view (e.g. org-agenda) would receive an incredible speedup. Org-agenda, while very useful, fairs very poorly when there are many, and large files.

I haven’t hit the point where classical org-agenda has become inefficient or shows lag. But I can appreciate that a db could be an efficiency boost.

Another upside is the ability to write and embed complex queries. Complex in the sense that queries allow:

  • all sorts of logical operators and filters,
  • sort results in flexible ways
  • search against a wide-range of properties of headers
  • display results in various formats, e.g., tables, lists (hierarchical?), (other?)

While queries can do complex things, they ideally would be simple to write.

Is the following relevant?

My reasoning is what Jethro mentioned but adding on to that, if you wanted to find a fully scalable solution for full-text search, like you asked in one of your questions, you would basically cache an entire org document and create a virtual table using sqlite’s fts extension.

Yesterday, I generated 10000 mockup org files, which took up ~85MB of space, the database was ~140MB, the database with fts5 enabled was ~240MB.

At that point, it might cause some people to question why files are even used to begin with, except for the convenience of using traditional file based tools.

Edit: You could enable full-text search on org-roam’s database and you could make other other org extensions work with it but that means having to install org-roam to use its database, which is not as modular as having other extensions also depend on org-db because, let’s face it, the database is what enables integration between org-roam and everything else.

And frankly, I’m not entirely convinced that org-roam’s caching model will work at larger scales. rgrep isn’t magic, it’s literally just grep -r or recursive grep, something that all *NIX users are familiar with and it still experiences the same slowdown when searching through tonnes of files.

1 Like

Is the following relevant?

Yeah, very relevant. Thank you for finding that, it’s very interesting. It would be interesting to hear his thoughts on whether or not he’s overcome the scaling problem.

What’s also interesting is that he shows how to directly query the database through emacsql and that it could be possible to create a wrapper to simplify the querying process. That would be a really cool feature.

Edit: org-db is included as part of scimax: org-db.el, so any proposal to have org-db as a separate package would involve consulting jkitchin.

As for scaling, scimax still has the same issue of indexing a large number of files, it’s to be expected and that’s fine. The issue comes with tracking changes to files in a directory:

;; org-db balances performance and accuracy in a way that works “well enough”
;; for me. There are a number of ways it can be out of sync and inaccurate
;; though. The main way is if files get changed outside of emacs, e.g. by git

Which I think is funny, since git would be perfect for tracking changes to files outside of emacs.

I also don’t see how users can specify their own schema, when the schema and how the database is populated is very much tied to what’s being pulled out of each Org file

That is a schema, in a sense. Choosing to add headline content on top of everything else that is pulled from an org file, would be part of a separate schema. Realistically, most users would use the former or a combination of those two schema for the sake of working with org-roam but there may be some crazy fringe 3rd custom schema defined in or loaded from init, for someone not necessarily interested in working with org-roam.

Actually, I saw Kitchin’s project a while back and decided to make a faster and more scalable indexer myself: https://github.com/zot/microfts