[Org-roam-bibtex] Changing citation keys

Hi,

Org-roam is really working great, with surprisingly little friction for a package of that size. Thanks!

But as a side effect, what little friction remains looks disproportionately huge. :slight_smile:

Presently, I’m trying to establish a workflow for easily inputting bibliographical notes in ORB, without worrying about the bibtex entry.

My setup is that I have a bibliography on Zotero where I read and annotate the pdfs (either with a desktop pdf viewer or with a tablet – my screen is too small for pdf-tools+org-noter to be a comfortable option), and a “clean” bibliography which I use across documents. There are many files in my Zotero bibliography which are not part of my main bibliography, but for which I would still like to keep notes.

So far that’s not a problem: I can have ivy-bibtex scan both bibliography files, and write notes like that.

But problems can arise if I want to put entries from the Zotero bib into the main bib. Even without changing the citation key manually, there will always be a risk that a C-u C-c C-c later changes it.

Ideally, I’d like to preempt the problem and normalize the citation key when cleaning up the citation, but this will orphan the corresponding ORB notes.

In the worst case, I can of course make a script which moves the file and replaces the citation key in existing files (I assume that it would be enough?).

But it seems to me that it would be a very common problem, so is there a built-in solution? Like a function to update the citation key of an ORB entry, or a way to make the ORB entries independent of their bibtex key?

Thanks!

I don’t have an answer, but am interested as well.

FWIW, I maintain my bib data in Zotero, and use Better BibTeX and ZotFile to automatically maintain the bib file and PDF file names and locations for use in Emacs.

But I just did refactor my keys, which is going to break the notes association.

Hi @TVerron,

I would say you should look into the root of the problem. As with arguably any database’s primary key, the BibTeX key in our case is a unique identifier that is not supposed to change. If you change the key, you effectively get a different entry - from the database point of view. So the answer to

is no. To make ORB entries independent of BibTeX keys means one needs another identifier. There are simply no good candidates for that in the existing BibTeX scheme. Authors, title, book volume? No. DOI? Will work only for documents that have it. One could imagine using a sort of UUIDs. This can be easily achieved within Emacs, but will your Zotero software support them? Probably not. Chances they are overwritten on the next BetterBibTeX export? Close to 100%. So no. (Hint: BibTeX keys are supposed to be such an identifier, by design).

In SQL, primary keys are generated automatically and are not supposed to be directly modified by users. BibTeX is of course not an SQL database and its keys are prone to accidental or intentional changes. It is still a database and, to my opinion, one should put serious efforts to make the key generation robust, adopt a suitable naming scheme for keys early - before the database grows to thousands of entries - and stick to it throughout their career.

Keep in mind that this is not about ORB, ORB is just a special case. Generally, you are probably going to create dozens of publications (re-)using the keys from your database. If your keys are going to change, you’ll have big troubles compiling or even reading documents from a couple of years ago. If you don’t use BibTeX for publishing through LaTeX, then I would argue you simply don’t need BibTeX.

As a side note and a personal opinion based on personal experience: if you are committed to BibTeX, you’ll probably have to wait until Zotero becomes BibTeX-compliant (my answer is never) or at least provides a good level of built-in BibTeX compatibility. Or use one of a few available graphical frontends to BibTeX, e.g. JabRef or BibDesk.

Now to good news. ORB has a functionality to automatically generate BibTeX keys, currently rather basic though. It is mainly used internally for ORB PDF Scrapper but there are plans to expand it to a more general solution, which could potentially be used elsewhere in Emacs, for example in BibTeX mode. There is also a standing feature request to make the ORB autokey functionality more compatible with Zotero, so that the naming scheme used in BetterBibTeX could be seamlessly applied in ORB too. These two steps would make the interaction between these tools more transparent and user-friendly. I, however, currently cannot give any time estimates.

1 Like

Hi, thanks for the answer!

I guess it will be a script then.

As with arguably any database’s primary key, the BibTeX key in our case is a unique identifier that is not supposed to change.

I disagree with that premise: yes, in a perfect world the bibtex key would be a unique identifier which doesn’t change, but a perfect world it isn’t. We get bibtex entries from coauthors, keys get in conflict, we want a shorter or more informative key format… I have an old bibliography which I keep around just for compiling old documents, I have papers with their (small) bibliography in version control, I have papers with the .bbl under version control… such is life. :slight_smile:

A bibtex file is a database indeed, but one that is indexed by plain-text, human-readable keys. As such, I don’t think that it is unreasonable or useless to want to be able to reindex it in a safe and robust way.

If it was SQL, typically there would be a table associating a bibtex key to each primary ID. Changing the citation key would be done by updating that table.

Short of that, a UUID could be a good option. For instance, something like a field “orb_id” which would either not exist (in the file from zotero), in which case ORB would just use the bibtex key, or be set to the old zotero bibtex key when migrating the entry, would do the job.

But you are saying there is no way to tell ORB to use one of those?

As a side note and a personal opinion based on personal experience: if you are committed to BibTeX, you’ll probably have to wait until Zotero becomes BibTeX-compliant (my answer is never) or at least provides a good level of built-in BibTeX compatibility.

Why? I don’t rely on zotero’s bibtex, other than to have a bibliography which ivy-bibtex can scan to feed items to ORB. When moving entries to the “real” bibliography (the one I use for citations), I use biblio.el’s features to get clean bibtex entries.

Now to good news. ORB has a functionality to automatically generate BibTeX keys, currently rather basic though.

I’m not sure to understand how it will help, sorry. I currently use emacs’s (or biblio.el’s) key generation, and I could make it (almost?) compatible with zotero if I wanted. The problem is that I would rather have the key I want from emacs, than the key I want and can make zotero produce.

And, for the present problem, I want to be able to add notes right after reading a file in zotero, without going through the trouble of inserting something, regardless of its bibtex key, in my bibliography.

Keep in mind that this is not about ORB, ORB is just a special case. Generally, you are probably going to create dozens of publications (re-)using the keys from your database. If your keys are going to change, you’ll have big troubles compiling or even reading documents from a couple of years ago.

Keeping an old bibliography around is enough to keep latex happy, ORB not so much. :wink:

Ok, then you definitely don’t need it if you can figure it out with what is shipped with Emacs. Neither BibTeX mode, nor biblio can generate the keys I use, hence ORB autokey. Thought you were also having troubles with the limited capabilities of the built-in BibTeX autokey. By the way, Zotero’s BetterBibTeX extension is quite flexible when it comes to key generation, you can adopt it to match that of Emacs rather than the other way around. That would remove a lot of friction.

That is of course not true. Keep the old bibliography and ORB will be happy :wink:

In fact, you probably better than me know the hassle of when it comes to reusing old pieces in a new publication.

Putting the above exchange aside, I also sometimes have to update some of my keys. These are mainly from publications ahead of press, which lack the necessary page numbers and journal volume info, so I create temporary keys until the article gets its full bibliographic data. For me the ORB issue is still only a special case, because I have to update the keys in other places as well. So solving the issue just for ORB notes doesn’t solve it automatically for my lab journal or publications in progress. On the other hand, grep and multi-occur solve it for any plain text document in my Documents directory. Hence my little personal interest in implementing this in ORB. A much better approach would be a small independent Emacs package that would keep track of BibTeX files and Org/LaTeX documents and help update the keys when necessary. This is something I was thinking about early in the morning before I read your post.

I don’t mind orb_id in principle. This would, however, mean a paradigm shift from ORB being an Org Roam extension that reads a bibliographic entry to retrieve some context to it writing into the bibliographic entry and stepping onto the path of bibliography managers. This may sound trivial, but what to do if the user changes orb_id from within an ORB note. Should we update it in the BibTeX file? If so, should we be able to do the same for authors, title and all other fields? If we update the file, which one exactly? Maybe there is more than one having the same entry. Should we then somehow keep track of the BibTeX file the current entry belongs to? And so on.

I finally managed to understand your workflow, which boils down to using temporary BibTeX files solely for the creation of ORB notes. These entries may then get updated after being merged into the master bib and break any existing notes. I can’t tell for sure how widespread such an approach is. I personally create a note only after an entry makes it into my single and only bib file. If it doesn’t, it’s not worth making a note of it. I’d like to have more feedback from other users using your approach, and then perhaps think about providing a viable solution.

By the way, Zotero’s BetterBibTeX extension is quite flexible when it comes to key generation, you can adopt it to match that of Emacs rather than the other way around. That would remove a lot of friction.

Thanks for the info. I am using this package, and I noted that it has a key generator built-in, which I use to get a good approximation of the emacs key, but I did not dive deep enough to see if I can get the exact same key as emacs.

That is of course not true. Keep the old bibliography and ORB will be happy :wink:

… I really did not think of that.

I mean, it’s not the same as for documents, I can’t simply keep the old bibliography around and forget that it exists, but the idea should still work. Basically after promoting an entry I get a duplicate key for that entry, and my problem is how to migrate the notes and links over to the “new” entry. But who says they must? It’s not like I can permanently remove the bibtex entry from the zotero bibtex anyway. The notes could simply stay attached to the old entry, and I would just have to remember to use the old entry for the notes.

Not perfect, but it’s not like the duplicate is going anywhere anyway. Or… I should have a look at whether ivy-bibtex has facilities for de-duplicating.

I don’t mind orb_id in principle. This would, however, mean a paradigm shift from ORB being an Org Roam extension that reads a bibliographic entry to retrieve some context to it writing into the bibliographic entry and stepping onto the path of bibliography managers.

Even without letting ORB write into that field, I think that having it as an option for users to specify their own ID for ORB to use, could be useful.

For me the ORB issue is still only a special case, because I have to update the keys in other places as well. So solving the issue just for ORB notes doesn’t solve it automatically for my lab journal or publications in progress. On the other hand, grep and multi-occur solve it for any plain text document in my Documents directory. Hence my little personal interest in implementing this in ORB.

Aha, that makes a lot of sense.

I can’t tell for sure how widespread such an approach is.

Me neither. As far as I’m concerned, it only became a viable option thanks to org-roam and ORB. And as the issue with duplicates shows, it is certainly not quite mature yet.

A similar workflow, which does not create duplicate entries but could also use facilities for rekeying, is to copy/paste a “bad” bibtex (from zotero, from a journal’s webpage…) in the bibliography just to create a note, and later cleaning up that entry. It’s actually what I have been doing for a while, before trying to leverage zotero’s permanent bibtex to remove even that step.

Thanks for your answers, and thanks a lot for your hard work! I’m really only talking about minor annoyances here, it doesn’t change the fact that org-roam and ORB are life-savers. :smiley:

So if the editing of the BibTeX file will be done by the user, then such an option can be readily incorporated into ORB. In other words, if an ORB note is created from an entry that already contains an orb_id field, ORB will be able to use it as the primary identifier (#+ROAM_KEY) instead of the BibTeX key and fall back to the latter in case the former is missing. Perhaps creation of this field can be automated from Zotero or in Emacs BibTeX mode?

Since ORB relies on other packages, mainly bibtex-completion, to get the bibliographic data and these packages in turn may rely on BibTeX keys as primary identifiers, some work may be required to make them behave properly. But I think it won’t be hard. So if you are up to such a solution that will keep the one-way read-only ORB status quo, then file a feature request on GitHub. Unfortunately, I can’t promise it will be done quickly.

Although in principle ORB will work with any files in bibtex-completion-bibliography, this will lead to a mess very soon I believe. Relying on a master bib file as a sole source of truth is a better approach.

I’ve been using Zotero for quite a long period of time, maybe five years or more. But never managed to make it play nicely with Emacs. There were several packages offering integration and even interaction of Zotero and Emacs, but the workflow still felt clumsy. Especially the backward import into Zotero. One of the main reasons I left it was that it didn’t map to BibTeX 1:1 though, so I could not benefit from biblatex’ extended entry types (like review, mvcollection, etc), could not create my own custom fields and so on. For example, I now have 4 different orthogonal tag fields: vanilla keywords and custom projects, topics and tags. This is something hardly achievable in Zotero.

Thank you for your kind words and an interesting discussion! I don’t think these are minor annoyances. They are part of the BibTeX experience and the Zotero/BibTeX ecosystem whether one likes them or not and may not be ignored. So when they are addressed, Org Roam and ORB will become better. :slight_smile:

So if the editing of the BibTeX file will be done by the user, then such an option can be readily incorporated into ORB. In other words, if an ORB note is created from an entry that already contains an orb_id field, ORB will be able to use it as the primary identifier (#+ROAM_KEY) instead of the BibTeX key and fall back to the latter in case the former is missing. Perhaps creation of this field can be automated from Zotero or in Emacs BibTeX mode?

Maybe. I can think of primarily two use-cases: the first one is rekeying, as mentioned. In that case, one could want to add to bibtex-clean-entry-hook a function which adds an orb_id field if necessary (if the key was changed and the entry does not already have an orb_id key, and possibly also only if the entry has an orb note(?)). That sounds easy enough.

The second one is “pinning” keys for entries for which the automatic key generation is not appropriate, and for users who care about what key ORB use. That would be similar in some sense to the cite_key (or label, I don’t remember) which some citation engines can use. For that case, the user is the only one who can populate the field, and maybe they’d want to do it from zotero, I don’t know. I personally find it easier to edit bibtex directly in emacs than in bibliography management software. :slight_smile:

Since ORB relies on other packages, mainly bibtex-completion , to get the bibliographic data and these packages in turn may rely on BibTeX keys as primary identifiers, some work may be required to make them behave properly. But I think it won’t be hard.

Actually I was also thinking of that yesterday, with the duplication thing. An easy way to de-duplicate is to have bibtex-completion make use of such an id. But then ORB could just as well rely on what key bibtex-completion feeds it, and not have to implement that logic at all.

Are there situations where ORB fetches the bibliographic data directly, without going through bibtex-completion?

So if you are up to such a solution that will keep the one-way read-only ORB status quo, then file a feature request on GitHub. Unfortunately, I can’t promise it will be done quickly.

Sure, no problem.

I’ve been using Zotero for quite a long period of time, maybe five years or more. But never managed to make it play nicely with Emacs.

Yes, that matches my experience. I have used zotero for a few years, then mendeley, then paperpile for a while, and now I’m back to zotero. None of them integrate nicely with emacs.

My solution is to treat the two problems as separate: I have zotero/paperpile/mendeley for managing the pdfs, and emacs to manage the bibtex.

In the old pre-roam days, keeping the notes in paperpile/mendeley/zotero was fine. I tried maintaining my notes in an org file with org-ref for a while, but it didn’t stick.

But now that we have org-roam and ORB, they really offer a superior experience for note taking, and I need to revisit where the separation between the bibliography software and emacs is. First world problems… :smiley:

1 Like

Is this not partially or largely achievable with Better BibTeX?

I learned in the last few days, for example, it has a “post-script” option where you can customize the biblatex output. I use it, for example, to split title fields, and rename “keywords” to “tags.”