Orgrr - org-roam-ripgrep

laotang · April 3, 2023, 6:48am

Org-roam v2 is an amazing project and brought org-roam to a new level of sophistication. Still, I never made the switch. As a reminder, the crucial difference between the two was the move from files as the main unit of interaction to “nodes”, which are org headlines with an org-id attached to them. This allowed for impressive graphs and more thorough linking to headlines (and even to show backlinks for nodes) but it never clicked with me and so I continued to use v1 (as mentioned a few times here).

This worked fine for me until recently, where some weird emacsql problem broke the interaction between Emacs 29 and orgroam v1 on a new (old) intel Mac from work, which I wanted to use for field research (instead of my newer and expensive MBP). Enter orgrr:

Orgrr is an almost feature-complete replica of the core functionality of org-roam v1, built using ripgrep (rg), a lot of regex and hashtables. It does recognize alternative note titles (#+roam_alias) and tags (#+roam_tags) as introduced by org-roam v1. Orgrr currently only works with org-files (i.e. files ending in .org).

In part I wrote this to learn more elisp (just so you know where I’m coming from ). I’ve been using orgrr for many hours in the past few days and it worked for me (I fixed a few remaining smaller shortcomings). If you also miss the first version of org-roam or prefer to not use databases (or to reduce dependencies), then this might be something for you.

PS For full text search I strongly recommend deadgrep, which also uses rg.

laotang · March 14, 2025, 9:54am

The GitHub traffic page for orgrr shows that folks are still being send there from this post. On the occasion of the two year anniversary of the project, here some notes on how orgrr has evolved:

There have been many changes small and big (290 commits), but the most important three have been more ways to investigate relationships between notes (orgrr-show-related-notes, orgrr-show-show-sequence, orgrr-show-multiverse), the addition of a caching option (orgrr-use-caching) and support for containers (think Denote’s silos or Obsidian’s vaults).

Most of these changes have addressed personal needs of a social scientist (me). As I am certainly biased, take my words with a grain of salt - for me orgrr is the best tool to write, structure and analyze thousands of plain-text notes. I love the independence/freedom that writing this package myself provides me.

akashp · March 15, 2025, 6:13am

Very fascinating. Will try out someday. Was looking forward to writing a minimal system from scratch for myself. Interested to know the pro/cons of going the database route or on the fly regexp. Does it scale well? Is it possible to parse various information programmatically. One thing that intimidates me is having to write regexps continuously to get regular data.
Will check out your work. Thanks for the ping.

laotang · March 15, 2025, 7:36am

Thanks! For most use-cases I use rg and regexp to put the data on org files in local hashtables, which are very fast. Even without caching it scales pretty well, I think:

But with caching (asynchronous, based on make-process) orgrr is ridiculously fast.

Parsing orgfiles and filenames (as unique identifiers) is pretty straight forward because of their regular nature. In comparison parsing HTML is just horror (for example website2org is 90% regexp and pattern matching).

nobiot · March 17, 2025, 6:28pm

@akashp

An alternative approach: GitHub - meedstrom/indexed

laotang · March 17, 2025, 6:56pm

Yes and no. It is somewhat similar in that we both keep the database in ram only. His is in sql+hashtables, mine is hashtables only. And his package does collect a lot more metadata (including tasks), individually parsing every orgmode file. Mine uses ripgrep, is a bit older and so I had more time to work on all the eventual quirks and issues. I think org-node - the basis for this project - is already pretty mature.

nobiot · March 17, 2025, 9:39pm

@laotang, orgrr shows links, both forward and backward, to the second degree (2-hop links). It also shows sequential links (Luhmann’s Folgezettel).

What are your experience of both?

I am thinking that backlinks are not as useful as many of us thought they would be initially. But 2-hops links may deliver what backlinks once “promised”. I am curious about your real experience as a sociologist (I suspect you mainly do qualitative research, which I am also interested in, rather than quantitative).

For sequential links, I have also started to find them useful as the number of notes has grown. They help me keep track of different trains of thoughts — I can pick up my thought where I left it a few months before. Do you have similar experience?

akashp · March 17, 2025, 11:40pm

A lot of interesting design decisions.

Honestly, I just use org-roam as a file-system for all my org-files. So I end up creating a major node and then almost all the information in a linear and sequential manner relating to its subtopics, for me sequence encodes an information that is lost when writing in small nodes that briefly talks about that topic and then links to others. To print all these sequential information in a single file is also a hassle which is also important for me.. Links do not encode the same information for me. But your zettel sequencing may be really what is required for me to move to a one small node one file paradigm coupled with native project support for printing- something that mirrors org-transclusion.

Definitely very alluring features, but will require me to migrate quite a bit. It would definitely be worthwhile in knowing your experience in using zettel to encode information about sequences in this way.

laotang · March 18, 2025, 9:22am

In short, both for me are more useful than backlinks to retrieve the “known-knowns”, “known-unknowns” and the “unknown-unknowns”. Especially sequential notes are useful to create trains of thoughts that mix facts and ideas.

I recently finished reading Doto 2024 - A System for Writing, which is a good addition to Ahrens 2017 - How to take smart notes. If I want to see how the book and its ideas have entered my notes, I use backlinks:

If, however, I want to learn about a specific topic, I now almost always use show-multiverse, which combines a view of the sequence and related notes (and was an idea suggested to me by a user):

The drawback is that in order for sequences to be useful all notes require Zettel No values. It is a telling sign that I have added these numbers to almost all of my 4000+ old notes.

On backlinks I agree, see above. I am a political scientist working mostly on local politics in rural China. Orgrr is the QDA software I use to analyse primary sources (and secondary sources, of course). To give you one example, I constantly monitor Chinese newspapers for specific terms to be informed about developments on the ground. This now completely takes place in Emacs: Elfeed > cuckoo-search > website2org > orgrr.

Yes, indeed.

laotang · March 18, 2025, 9:32am

I created the compile-sequence function for exactly this, see below. There is also an option to remove the level 1 headings linking to the notes. So one could use this to write a paper or book.

akashp · March 18, 2025, 10:24am

Something is missing from the matrix. Unknown knowns, or things I /know/ but remain unclarified due to lack of insights

laotang · March 18, 2025, 10:48am

Correct, this is missing

nobiot · March 18, 2025, 7:15pm

Thank you for sharing the detail; very insightful! QDA = Qualitative Data Analysis, I believe.

I want to read it too… Didn’t know about it.

I have taken a different approach for implementation. My observation is that notes in the sequential relationships form an ordered rooted tree (a mathematical notion borrowed from Jens Getreu's blog - Set up a Zettelkasten with Tp-Note – it’s stricter than file systems because a node=file can only belong to one parent, so no symbolic link, which allows a node=file to belong to more than one parents). With this, I went ahead and implemented my own hierarchical representation with using the built-in hierarchy library and custom header data – I wanted to work with plaintext .txt files.

I figured that the “Zettel number” was not essential but the structure is (an ordered rooted tree). I am not sure if it saved any efforts (you had to add Zettel numbers; I needed to add the note’s ID directly in the header=meta data section) – but at least I didn’t need to deal with generating a number with a certain format.

I have come to differentiate the following three types of links (My libraries in the brackets).

Dictionary: (Ten=glossary.el)
Line of thinking (Tei)
Association (Ren, Roku, (Org-roam))

I have decided that, at least to me, they are distinct types and do not overlap, conceptually and functionally. And for each type, I have implemented my own, small program.

This is a screenshot of Tei, showing the tree on the left and one of the notes on the right.

This is something I want to look into in the next step… It has never occurred to me to display both a line of thinking (#2) and associations (#3) in the same buffer. Thank you for inspiration.

laotang · March 18, 2025, 8:03pm

This is a fascinating system, thanks for all the details! Looking at the metadata of the note on the right hand side: what part of this decides on the position of a note?

I also separate different data points. Interviews and newspaper articles go in their own directory (=container in orgrr parlance). A first analysis (loosely based on Grounded Theory) takes place without necessarily referencing ideas from secondary literature. I have designed all functions that deal with relationship to be either limited to the current folder or to search across all active orgrr folders.

I am thinking about a function to manually stack notes. Doto has written a lot about this. I also was pretty jealous about the simplicity of Ryan Holiday’s notecard system.

nobiot · March 18, 2025, 8:49pm

It’s the arrow => followed by an ID (timestamp in my case). Let me explain.

For this little system of mine (Tei 綴 – Japanese pronunciation; I am adding this because you’d know the original Chinese way), the position is simply determined by the parent knowing what notes are its children. A root does not have a parent – that’s the “theme” or the beginning of a certain train of thought. It’s a simple rooted ordered tree: root -> children -> grand children -> .... Children do not know their parent (singular, because a note can have only one parent). But as the tree knows the entire structure, I can traverse on a theme branch, navigate from a child to its parent and vice versa. See this example.

The current note “What are we liking?” is under the root named “Note-taking” and has four child notes (each => followed by an ID. You see the file names but I have improved this aspect so that I only need the ID now). Each child can have 0, 1, or more notes.

To place a new note to an existing part of the tree, I simply write the ID following a => to a parent manually – or create a note from within the parent with C-u tei-new-note command.

I do not know if the hierarchy.el and my implementation scale with thousands of notes yet but I am happy and will go further with this.

laotang · March 19, 2025, 6:27am

The CCP has simplified the character a bit (缀) but the meaning (“stitching together”) is probably the same . I love how despite the differences in execution - your system is parent-based, mine children-based - the logic and the results are so similar. The limitations, if I understand correctly, are also the same: there is only one true path from root down to children and grand children.

nobiot · March 19, 2025, 8:55pm

Yes

I think so, and I look at it as the positive property of the structure, not as a limitation. If I understand the math correctly with the aid of ChatGPT, in an ordered rooted tree, a child can belong to only one parent. If a child can belong to multiple parents, the structure will be a network (a graph). I wanted the structure so that there is only one path from a root to a given note (and back to the root).

This is what I have been experimenting: clear conceptual and functional separation among these three kinds of “links”:

Dictionary (definition-references, like Xref but in note-taking)
Line of thinking (ordered rooted tree)
Association (a graph, or network)

This has cleared up a lot of fog for me.

laotang · March 20, 2025, 7:16am

I agree with this differentiation into three different types of links (and notes, by extension). The difference here might be that I try to have all three in orgrr.

The thing orgrr may still missing is something like org-transclusion - a manual synthesis of notes. But then again, why not just use that great project ;).

doubleloop · July 5, 2025, 6:30pm

Thank you for this package. I have been using org-roam v2 for a long time, but after reading A System for Writing, I am now also using orgrr to help add Folgezettel IDs to my main notes.

I’m also able to use the show sequence function, which is great. I’d love to use the show related functions too, but understandably they don’t work as I’m using org-roam v2 and org ids.

laotang · July 7, 2025, 7:27pm

Thanks for the kind words. So you are using just one node per note (=one topic per file)? If so, it should be rather easy to convert these notes.

Topic		Replies	Views
Org-roam-graph with org-roam V2 Development	3	2315	July 3, 2021
Package release: Org-Roam-UI Graphing Capabilities	62	5890	June 22, 2023
Rg-roam: minimal org-roam with zero config, no sql, depends only on ripgrep Random	15	139	July 5, 2025
V1 ==> V2; Links Between Nodes Nonexistent Troubleshooting	2	385	October 13, 2021
Video requested Requests	2	532	June 11, 2020

Orgrr - org-roam-ripgrep

Related topics