I’m a hardcore fan of org-roam and use it on a daily basis to organize my ideas and, above all, discover new connections between them. The organizing part is in check, but as the number of notes grow, finding similar or related notes by hand becomes a daunting task with suboptimal results. Therefore, I decided to code my first Emacs package behold
org-similarity
!
For installation instructions and usage:
The package actually uses the power of Python’s scikit-learn and nltk modules for text feature extraction and pre-processing. More specifically, it cleans the org files (by stripping the front matter and some undesired characters), tokenizes the documents, replaces each token with their respective linguistic stems, generates a TF-IDF sparse matrix, and calculate the cosine similarity between the note you are currently editing and other notes in a directory of choice. It works with org-roam and org-mode in general. Here’s a demo:
This is both my first Emacs package and “useful” git repo, so it was a lot of fun to learn new stuff, and I’d love to have the community’s feedback to improve this package!
PS: org-roam users might want to change the value of org-similarity-directory to org-roam-directory, like this:
(setq org-similarity-directory org-roam-directory)