Maximum number of notes

sprout · April 11, 2022, 10:43pm

I actually have a similar set of goal. I’m not a tech or library science person, so many of my decisions are almost certainly suboptimal though!

The basic idea is that I have a bucket of all my pdfs. I can use Zotero to search by metadata and Recoll to search by content. For work that I’ve actually read, I create a copy of the PDF and a .bib that points to the PDF to a new directory, create reference nodes for each item using org-roam, and extract or create summaries or highlights using org-remark or org-noter.

Slightly more detail:

I use Zotero to:
1. store my pdfs, quickly access them, and search them by metadate.
2. read and do some annotations (mostly highlighting, sometimes commenting or summarising).
I use Recoll to search my pdfs by their content, because I found Recoll’s database to be faster than Zotero. My database right now is just my Zotero storage directory.
I use Org-Roam-Bibtex to create a reference node for each document I have read, stored in a reference directory in my org-roam directory.
I use Org-Noter to extract the highlights or other annotations from the PDFs and store them in a separate directory in my org-roam directory. Each is then linked to its corresponding reference node.
I use Org-Remark to comment on annotations I’ve made (mostly on highlights) with those comments being stored in a separate directory in my org-roam directory using a function that @nobiot wrote.
I write various different ideas inspired by highlights or annotations as standalone notes that are stored in another org-roam subdirectory (using Emacs).
I use Org-Roam-ui to see the connections between my nodes.

Again, not a library science or tech person, so I’m still ironing out a lot of kinks:

Problems

Way too much repetitive stuff. I’d like to automate the exporting and linking as much as possible.
Adding highlights as nodes is dumb, because there avalanche of text makes Roam’s unlinked references function slow and useless (too many false positive). I’m probably going to remove them as nodes and just have them as standard file links or give them a tag and exclude them from the database. Leaning against the latter because I might want to ‘promote’ a quote to a node, but I keep tag inheritance on.
A lot of duplicated space. A PDF of any article I read is both in my bibliography folder and my Zotero database. This seems dumb and I’m pretty sure I can just symlink them, but I always forget.

Wishlist:

I still need to implement this function so I can automatically convert the annote or note field from LaTeX to org-mode when creating reference nodes.
Need to add each org-roam subdirectory to my recoll database and make shortcuts so I can search specific directories by content easily.
Need to actually use some of the consult ripgrep functions mentioned here and here to better search through my notes.
Might want to add some way to search org-roam metadate, like org-roam-search.
Make more use of transclusion.

I think this hits a lot of what you want via three tools (Emacs, Recoll, and Zotero) + five packages in Emacs (org-roam, org-noter, org-remark, org-roam-bibtex, org-roam-ui), but again inefficiently (nothing is as automated or as smooth as I’d like it).

I think it doesn’t hit your goal of finding related or suggested content through fulltext (or metadata?) analysis of your content. Recoll has a similarity function, but the docs say it’s not that useful. There’s also apparently a package called org-similarity that promises to use machine learning to find similar documents to what’s already in your buffer. I think I used it once during Org-Roam V1 and then never again. It’d be interesting to compare the quality of its (and recoll’s) results to @laotang’s suggestion of Devonthink, but I’m not a Mac user.

laotang · April 12, 2022, 7:37am

This is a very interesting workflow. I have moved away from exporting PDF highlights and started to summarize findings with my own words (to enhance retainability). However, Zotero’s build-in PDF highlight extraction tool is not too bad and one could use pandoc to turn the resulting files into an .org files. I also do not use Bibtex as creating new citation styles (and dealing with non-western characters, transliterations and translations) is a pain in the …

Never heard of this one. Sounds very interesting!

This is something I am genuinely curious about. Might report back later.

sprout · April 12, 2022, 9:19pm

I’ve heard really good arguments for summarising over highlighting, but I haven’t felt ready to give up highlighting yet. I’m also probably suffering form the sunk cost fallacy, since I’ve put so much effort into extracting highlights.

And like I linked above there’s a simple function to call pandoc on the annote filed in an a from Zotero exported bibliography to convert that field into orgmode and add it to a node in org-roam.

Honestly, I hated Zotero’s interface for awhile, but the reader and annotation support better than what I got from Okular.

Please do! Looking at the person’s Github, they’ve apparently made a few other document similarity tools. I’ve normally tracked down papers through citation networks or impact factors rather than a find similar function, but it’d be nice to see what documents I myself made might be similar even if I haven’t linked them.

laotang · April 13, 2022, 9:03pm

To follow-up on this, here is a very brief comparison between org-similarities and Devonthink (DT).

This is a list of notes org-similarities considers “similar” to the note procrastination (output limited to 15 notes):

And this is the list DT provides (in two different perspectives):

Amazingly similar. Of course, DT has a more comprehensive list but org-similarity has only 75 lines of Lisp and 166 of Python.

Some caveats regarding org-similarities:

It needed some tweaking before I got it running on my machine (especially getting the scores to appear was bit of a headache). I don’t think it is maintained anymore.
It does work on a per-file basis, so it will have difficulties with org-roam v2, if you have more than one node per file. I am still on v1, so this was not an issue here.
It does not work well with org-id (yet). The last entry (":Properties:") is an org-id entry.
It would be more useful to display the results via display-buffer-in-side-window.
I have about 2250 notes and it takes 3-4 seconds each time I run org-similarities (on a really quick computer). DT is near instant for a multitude of this amount of data.
It only works with one language at a time.

Topic		Replies	Views
What does it feel like to work with 10,000 notes in Org-roam: Benchmarking Org-roam's search methods Development	5	3373	June 6, 2020
Performance Testing Development	7	1247	June 8, 2020
Can my org agenda files and notes live happily within org-roam? How To	5	3900	October 23, 2020
Random stats from my ~/org Random	8	449	July 31, 2023
Binary notes files Requests	1	543	October 17, 2020

Maximum number of notes

Related topics