I actually have a similar set of goal. I’m not a tech or library science person, so many of my decisions are almost certainly suboptimal though!
The basic idea is that I have a bucket of all my pdfs. I can use Zotero to search by metadata and Recoll to search by content. For work that I’ve actually read, I create a copy of the PDF and a .bib that points to the PDF to a new directory, create reference nodes for each item using org-roam, and extract or create summaries or highlights using org-remark or org-noter.
Slightly more detail:
- I use Zotero to:
- store my pdfs, quickly access them, and search them by metadate.
- read and do some annotations (mostly highlighting, sometimes commenting or summarising).
- I use Recoll to search my pdfs by their content, because I found Recoll’s database to be faster than Zotero. My database right now is just my Zotero storage directory.
- I use Org-Roam-Bibtex to create a reference node for each document I have read, stored in a reference directory in my org-roam directory.
- I use Org-Noter to extract the highlights or other annotations from the PDFs and store them in a separate directory in my org-roam directory. Each is then linked to its corresponding reference node.
- I use Org-Remark to comment on annotations I’ve made (mostly on highlights) with those comments being stored in a separate directory in my org-roam directory using a function that @nobiot wrote.
- I write various different ideas inspired by highlights or annotations as standalone notes that are stored in another org-roam subdirectory (using Emacs).
- I use Org-Roam-ui to see the connections between my nodes.
Again, not a library science or tech person, so I’m still ironing out a lot of kinks:
Problems
- Way too much repetitive stuff. I’d like to automate the exporting and linking as much as possible.
- Adding highlights as nodes is dumb, because there avalanche of text makes Roam’s unlinked references function slow and useless (too many false positive). I’m probably going to remove them as nodes and just have them as standard file links or give them a tag and exclude them from the database. Leaning against the latter because I might want to ‘promote’ a quote to a node, but I keep tag inheritance on.
- A lot of duplicated space. A PDF of any article I read is both in my bibliography folder and my Zotero database. This seems dumb and I’m pretty sure I can just symlink them, but I always forget.
Wishlist:
- I still need to implement this function so I can automatically convert the annote or note field from LaTeX to org-mode when creating reference nodes.
- Need to add each org-roam subdirectory to my recoll database and make shortcuts so I can search specific directories by content easily.
- Need to actually use some of the consult ripgrep functions mentioned here and here to better search through my notes.
- Might want to add some way to search org-roam metadate, like org-roam-search.
- Make more use of transclusion.
I think this hits a lot of what you want via three tools (Emacs, Recoll, and Zotero) + five packages in Emacs (org-roam, org-noter, org-remark, org-roam-bibtex, org-roam-ui), but again inefficiently (nothing is as automated or as smooth as I’d like it).
I think it doesn’t hit your goal of finding related or suggested content through fulltext (or metadata?) analysis of your content. Recoll has a similarity function, but the docs say it’s not that useful. There’s also apparently a package called org-similarity that promises to use machine learning to find similar documents to what’s already in your buffer. I think I used it once during Org-Roam V1 and then never again. It’d be interesting to compare the quality of its (and recoll’s) results to @laotang’s suggestion of Devonthink, but I’m not a Mac user.