I’ve been tweaking a tool in the last two weeks to recommend possible connections given a node/text portion, and I wonder if we should start working in an “org-roam-ai” package to manipulate org-roam data using AI.
My pain is that with company package, I get recommendations based on the title in an auto-complete style. So I need to type precisely the title of another node. I’ve been creating more and more nodes with longer titles that provide more meaningful access to their content when looking at a graph, which makes it harder to auto-complete.
To solve it, I’ve put together a very precarious implementation of a semantic search (with the help of ChatGPT!), but it is usable enough to let me test it and decide to invest in it. It shows the 20 most similar nodes and a 2d representation that lets us see their relation with each other and keep aware of the similarity ordering by their color.
I have been using it for three things:
- To find a node when I remember the subject but not the title; Prompt: “a preprocessing step for a text that splits the words into tokens based in a vocabulary built by fragments occurrence” (I was thinking on WordPiece indeed).
- Look in my notes for references to read/remember/think about a particular subject. Prompt: “The performance of my machine learning model is degrading”
- To help me write/think/connect to other nodes. Start writing, select a sentence, and search:
I’ve been using open-source language models using sentence-transformers library.
It aligns with AI generated node connections using semantic similarity estimation.
I’ve requested GPT-3 to write a set of features of an “org-roam-ai” package:
- Automatically generate a graph of related documents based on natural language processing (NLP) analysis.
- Suggest relevant documents and nodes to link to while writing in org-mode.
- Automatically detect keywords and create links to other documents.
- Cross-document/node search with full-text search capabilities.
- Automatically tag documents and nodes with relevant topics.
- Automatically suggest new topics for documents, nodes, and notes.
- Automatically generate summary cards for documents and nodes.
They all look exciting, but I’m not sure they are possible with open-source models.
So I’d like to hear from the org-roam community: should we build org-roam-ai? I’m a Data Scientist, so my Software Engineering skills are limited, and I’m open to working with more experienced developers or simply helping design the interface with these models.
Hmm, this sounds interesting. I am very far from being able to help develop something like this, but I would definitely be happy to help with testing.
In some cases, you do want to do all the job on your own - especially so when first writing something - but I could see a use case for this in retrieval of notes.
I see it more as grabbing your references before writing, while writing, or after writing the first version.
I’ve been using some paragraphs of articles I did as input, and it is great it recommends concepts I’d cite in the following 1-3 paragraphs and others I didn’t, but I could have. It is more about augmenting my ability to put my references together than writing for me.
Yeah, I see.
Its definitely interesting as a concept and I would love to see a complete implementation of it in org roam.
Sounds really interesting! Have you looked at org-similarity? So far, it only works on a per-document basis - but does so surprisingly well for my 2.5K+ notes.
Oh, org-similarity looks interesting. I will definitely give it a try!
I recently put this together to have the results of org-similarity in a side-window (like org-roam v1). It makes org-similarity even more useful for me (and now looks a tiny bit like DevonThink). Use at will
(defun lt/org-similarity-sidebuffer ()
"Puts the results of org-similarity in a side-window."
(let ((command (format "python3 %s -i %s -d %s -l %s -n %s %s"
(concat org-similarity-root "/assets/org-similarity.py")
(if org-similarity-show-scores "--score" ""))))
(setq similarity-results (shell-command-to-string command)))
(with-output-to-temp-buffer "*Similarity Results*"
(with-current-buffer "*Similarity Results*"
(inhibit-same-window . t)
(side . right)
(window-width . 0.33))
The screenshot shows the note “AI and the competitive struggle of nations” on the left and the org-similarity results for that buffer on the very right.
I was actually just trying to get it to insert to a separate buffer as I didn’t like the default behaviour of inserting in the current buffer so you saved me like 30 mins of setting this up myself .
And since you seem to have played with this more, do you also have a workaround for the inserted links showing :PROPERTIES: (which is the first line of a typical org-roam file) or am I going to have to hack that one in myself?
Sorry, no workaround. I am still on v1 and don’t use org-id.
I did not know it! Thanks for pointing it out.
One difference is that I’m using a Large Language Model to represent the input and nodes as a vector, which should provide better results than the org-similarity approach. The other is that I’m working on top of org-roam node granularity instead of documents. Theoretically, org-similarity could replace the approach to include LLM.
I want to keep expanding it on top of what org-roam offers in terms of data (nodes, links, tags), but I understand many people will find org-similarity covering most of their usage.
One common thing is that the org-similarity author and I are Brazilians
Brilliant idea !
When I was using Obsidian, there are two plugins I like most:
Find the paths between two notes:
Find similar notes:
Org-roam deserves a more powerful tool !
I’m quite interested in this too! Besides looking at my notes, I’d love it if it looked at the stack of a thousand data science pdfs that I’ve collected over the years.
Relatedly, here’s chatGTP in emacs, writing elisp: ChatGPT in Emacs
This looks amazing. Although my coding skills are limited I would love to help build this.
Here’s a blog post with directions to replicate it.
Now about a Q&A with sources over Org roam notes.
I’m currently writing a new function to better surface related notes for my minimalist org-roam v1 clone (see also here). The function collects all notes related to a given note to the second degree - the backlinks for the backlinks and the outgoing links mentioned by outgoing links. To use the image of a family, it considers all parents and grandparents as well as all children and grandchildren of a note. All links to a specific note are counted and the resulting list is ranked by frequency.
The results are really interesting (at least for me). This is a note and its backlinks:
This is the same note with org-similarity in the side-window:
And this is the same note with
orgrr-related-notes (the above teased function, not yet released):