Concepts and References, Do you split the nodes?

I had a question that I would love to get some peoples thoughts on, most about the layout of notes and link types.

I work in Machine Learning, and often you have some concept or model that is first introduced in some paper. For example, there is a model called “T5” and it was introduced in the paper “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer” by Raffel et. al.

My question is about how people deal with this situation and their experience (good of bad) with the different ways to approach this situation. I can think of a few ways to organize these notes:

  1. A different note for “T5” as a concept (a normal org-roam node) and a citation note for notes about the paper (an org-roam node with a :roam_refs: of the citekey). The “T5” node would contain notes about the model and usage beyond the scope of the paper and would be linked to from other nodes with an id-link. The paper node would have notes about that specific paper and links would be in the form of a cite-link. The main downside of this is duplication of content between the main “T5” node and the paper node, and duplication of links, generally I want link to the “T5” node (to see all the various projects that use it) but I also want a cite link to the paper so it appears in an exported bibliography.

  2. A shared note between “T5” as a concept and the paper. This node would have a :roam_refs: key for the citekey of the paper and the node title would be the paper title. I would also include a :roam_alias: for what would have been the title if it was it’s own note. Sections in the note could disambiguate between notes about the model in general and notes about the paper. The main downside of this is duplication of links, I would often want both a normal link and a cite link as the titles of paper (and the cite key for them) often aren’t very readable, “T5” doesn’t appear in the paper title at all. This results in a backlink panel with a lot of duplication where the same sentence appears in the back link and the ref link panel.

Additionally, I often find myself having to insert 2 links, the backlink and the citation in all these cases, if I could get it down to a single link that would be super nice.

Another option would be to use 2) and the prefix feature of org-cite to have the short name (“T5”) as the prefix, this would look good on export, but plain text readability could suffer, plus we lose completion like you have when it is a real node.

Has anyone else had this issue and do they have any thoughts on it?

I am actually also in Machine Learning so I have faced the same problem. I don’t know if it is the best solution, but I can share what I do.

I will generally combine both the model and the paper into the same note (Option 2) where the note is from org-roam-bibtex. It seems easier and requires less work to just keep one note. So, the note is really based on the paper. Then, links to that note (either citations or by paper/note title) refer to the paper. If I want to refer to the model, I will usually try to add an alias on the note for the name of the model (“T5”).

For the special cases where models branch out into entire families of related models (such as BERT or ResNet), I would eventually make a separate note to refer to the model family

So typically, even if an idea is introduced in one article, its probably not the only source of info you have. I would personally split the two and keep the ref note exclusively on the article and the other as a more general input of info.

It makes more sense to me that way

I think if in doubt, I’d usually split.

So perhaps a short note on the paper, linked to a longer one on the concept.

But maybe depends on how important that concept is? If other people engage with it, seems there’s value in having a dedicated note for it, so you can include multiple citations/links to those different papers?