I’ve been tweaking a tool in the last two weeks to recommend possible connections given a node/text portion, and I wonder if we should start working in an “org-roam-ai” package to manipulate org-roam data using AI.
My pain is that with company package, I get recommendations based on the title in an auto-complete style. So I need to type precisely the title of another node. I’ve been creating more and more nodes with longer titles that provide more meaningful access to their content when looking at a graph, which makes it harder to auto-complete.
To solve it, I’ve put together a very precarious implementation of a semantic search (with the help of ChatGPT!), but it is usable enough to let me test it and decide to invest in it. It shows the 20 most similar nodes and a 2d representation that lets us see their relation with each other and keep aware of the similarity ordering by their color.
I have been using it for three things:
To find a node when I remember the subject but not the title; Prompt: “a preprocessing step for a text that splits the words into tokens based in a vocabulary built by fragments occurrence” (I was thinking on WordPiece indeed).
I’ve requested GPT-3 to write a set of features of an “org-roam-ai” package:
Automatically generate a graph of related documents based on natural language processing (NLP) analysis.
Suggest relevant documents and nodes to link to while writing in org-mode.
Automatically detect keywords and create links to other documents.
Cross-document/node search with full-text search capabilities.
Automatically tag documents and nodes with relevant topics.
Automatically suggest new topics for documents, nodes, and notes.
Automatically generate summary cards for documents and nodes.
They all look exciting, but I’m not sure they are possible with open-source models.
So I’d like to hear from the org-roam community: should we build org-roam-ai? I’m a Data Scientist, so my Software Engineering skills are limited, and I’m open to working with more experienced developers or simply helping design the interface with these models.
Hmm, this sounds interesting. I am very far from being able to help develop something like this, but I would definitely be happy to help with testing.
In some cases, you do want to do all the job on your own - especially so when first writing something - but I could see a use case for this in retrieval of notes.
I see it more as grabbing your references before writing, while writing, or after writing the first version.
I’ve been using some paragraphs of articles I did as input, and it is great it recommends concepts I’d cite in the following 1-3 paragraphs and others I didn’t, but I could have. It is more about augmenting my ability to put my references together than writing for me.
Sounds really interesting! Have you looked at org-similarity? So far, it only works on a per-document basis - but does so surprisingly well for my 2.5K+ notes.
I recently put this together to have the results of org-similarity in a side-window (like org-roam v1). It makes org-similarity even more useful for me (and now looks a tiny bit like DevonThink). Use at will
The screenshot shows the note “AI and the competitive struggle of nations” on the left and the org-similarity results for that buffer on the very right.
I was actually just trying to get it to insert to a separate buffer as I didn’t like the default behaviour of inserting in the current buffer so you saved me like 30 mins of setting this up myself .
And since you seem to have played with this more, do you also have a workaround for the inserted links showing :PROPERTIES: (which is the first line of a typical org-roam file) or am I going to have to hack that one in myself?
I did not know it! Thanks for pointing it out.
One difference is that I’m using a Large Language Model to represent the input and nodes as a vector, which should provide better results than the org-similarity approach. The other is that I’m working on top of org-roam node granularity instead of documents. Theoretically, org-similarity could replace the approach to include LLM.
I want to keep expanding it on top of what org-roam offers in terms of data (nodes, links, tags), but I understand many people will find org-similarity covering most of their usage.
One common thing is that the org-similarity author and I are Brazilians
I’m quite interested in this too! Besides looking at my notes, I’d love it if it looked at the stack of a thousand data science pdfs that I’ve collected over the years.
Relatedly, here’s chatGTP in emacs, writing elisp: ChatGPT in Emacs
I’m currently writing a new function to better surface related notes for my minimalist org-roam v1 clone (see also here). The function collects all notes related to a given note to the second degree - the backlinks for the backlinks and the outgoing links mentioned by outgoing links. To use the image of a family, it considers all parents and grandparents as well as all children and grandchildren of a note. All links to a specific note are counted and the resulting list is ranked by frequency.
The results are really interesting (at least for me). This is a note and its backlinks: