Merge two Roam Databases?

togakangaroo · March 13, 2021, 2:56pm

I’ve got a few different org roam databases

My main one (stored in Dropbox)
A work one (stored in Google Cloud, the work computer is locked down so no Dropbox)
One I used for a demo that got away from me and now I’d like to merge it with my main one (in a temp directory)

I would like to merge the third into the first one time, and the second into the first at regular intervals.

Are there established workflows for this?

I’m thinking I can simply rsync the directories but then I would have to write a script to detect duplicates (probably by running levenshtein distance on the text part of filenames?) and manually merge them? That doesn’t sound like the best process but is the only thought I have.

Have people figured out other options?

nobiot · March 13, 2021, 5:09pm

If you just need to use a different Cloud sync services for work and private, perhaps you could use symlinks.

I have two sub-folders (private and work) under one org-roam-directory, one database. And the sub-folders are both symlinks, pointing to folders Cloud synced by different providers. Sub-folders are set to be org-roam tags, so it’s easy to filer one or the other．

Or perhaps you have other requirements that symlinks won’t fulfill, or you want to have two databases?

togakangaroo · March 13, 2021, 5:22pm

symlinks is a good solution for a part of the problem but I specifically am looking for a way to merge two existing databases. I’m visualizing something akin to resolving merge conflicts with git would work reasonably well, just need to account for the fact that the file names aren’t actually the same because they include a timestamp. (Now that I think about it, I guess rsync wouldn’t work for the same reason)

Btw, I haven’t used the tags feature yet, is there a place this is documented? All I can find is this discussion but the links from there are 404

kimmy · March 13, 2021, 6:09pm

i do something like this! i have a struct subset of notes that I want to explicitly mark as sharable, and make sure those sharable notes are synchronized occasionally.

To do this, I have a blank note in my org-roam called “Public Notes.” When I want one of my ordinary journal pages to be synchronized, I add a link to this special “Public Notes” note that flags it for sharing.

I then occasionally run a script that synchronizes all of “Public Notes”’ backlinks to a git repository.

nobiot · March 13, 2021, 8:40pm

Look at the org-roam manual, online or in-system. Look for tag or tags.

I don’t fully understand why you need to merge databases. I’d just delete the two db files, and create a new one, which is created anew as merged… I am sure there is a valid reason why a merge approach is desirable, not delete-and-create (to me, they are only ephemeral; they are cache).

togakangaroo · March 13, 2021, 9:53pm

So perfectly possible that I’m missing something but yeah, let. me explain a bit more

The specific workflow I want to focus on is doing a one-way merge from the database I’m building up at my new job to my personal one. During the work day I mostly take notes on my work computer. As I’ve been onboarding, it has quickly ballooned to >100 heavily interlinked entries as I’m really using it to capture knowledge constantly coming at me in meetings.

Much of it I do want in my personal roam notes but I

Can’t have Dropbox installed on that machine (not approved for our environment)
I don’t actually want any of my personal non-work-related notes on my work machine as they could technically claim ownership of these (call this an overabundance of caution, not any sort of problem with my employer - they’re cool)
I already have a sizable database which I cannot recreate easily

So while I don’t need notes on - for example - who our HR director is in my personal roam; I do want notes about more general things I’m learning about (such as Google Cloud, Hi-Trust Compliance, etc) there. Additionally, deciding which database I need to put a note into at the time of creating a note and switching between computers is simply too much of a workflow interruption when I’m already working as fast as I can to keep up at meetings.

So what I would like, is to do a weekly one-way merge from my work to my personal notes. Probably with an “ignore” list for notes that I don’t want to bring over as they contain info that should best remain only in the work database.

Moving the files over is easy, the hard part is when I have the same note in both databases. So for example I have a Google Cloud Platform note in both with some knowledge in my personal, and a lot more in my work one as I’ve been getting familiar with it on company time. The content of these should really be merged.

It’s even tougher when you consider outgoing links. So for example in my personal roam, outgoing links to the Python note look like this: [[file:20200529235210-python.org][Python]].Bbut of course in my work one that would be different. It’s called Python in both, but they were created at different times and so the timestamp portion of the filename is not the same! This needs to be resolved for the note graph to remain intact. This is also why a simple script like the one @kimmy posted (thanks by the way!) is not going to work for me.

So I’m thinking this will have to be written in elisp (well it doesn’t have to, but I have roam utility functions already available there). I’m thinking something like the following:

* The Plan
** Open a logging window
** Iterate through all the files in the source roam instance
** Pass 1: Move content
*** For each file
**** Get its title
**** and any aliases
**** If none of those is in the ignore list
***** Get all notes this note links to
****** the full link text
****** the name of the link target
***** See if target roam...
****** ...has a matching note for title or alias
****** If not found
******* Create a new note in the target
******* insert title and aliases
******* log note creation
****** Get note in target
       - This should now exist as it was either located or created in the previous setp
****** Insert source note content
****** Log note content was transferred
**** Return files in the target and all links that were in the content that was moved
** Pass 2: Recalculate links
*** For each note modified/created in the target in Pass 1
**** For each of its (now) outdated links
***** Look up the proper link for that name
***** Replace the link
**** Log number of links replaced

…of which the only bits I actually know how to do is to org-roam-find-file and to get aliases eg.

     (save-window-excursion
       (save-excursion
         (find-file "/Users/gmauer/Dropbox (Personal)/org/roam/20200704164319-emacs_lisp.org")
         (org-roam--extract-titles-alias)))

The rest I have no idea.

So

Question 1:
Does the above reasoning about why its not straightforward make sense? Have I explained the issue well? Am I missing anything?

Question 2:
Does the above plan for how to script this make sense?

Question 3:
Anyone want to help me figure out how to do any of that stuff?

masukomi · January 28, 2022, 3:10pm

because no-one responded and this seems a useful question.

to me it seems that there’s a pretty straightforward answer to @togakangaroo 's problem:

Check the files into git and sync the repos, then rebuild/resync the org-roam db on the home computer. Git’s already got conflict resolution mechanics built in. I presume you wouldn’t have direct access to work computer but work could push to cloud, and home pulls from cloud.

nobiot · January 28, 2022, 4:03pm

@togakangaroo
Did you just edit the previous post to bump it so that people can see it?

If you are still seeking people’s comment on your issue, I would suggest to create a new comment here to outline the current situation (V2 used?) or open a new post.

Topic		Replies	Views
Keeping two v2 separate databases How To	4	1590	August 27, 2021
Org-roam.db across multiple machines? How To	11	4124	March 22, 2022
Sharing an org-roam database with a partner Requests	1	477	January 23, 2023
[SOLVED] Searching in multiple org-roam databases Troubleshooting	2	943	February 18, 2021
"Duplicate IDs in file" error Troubleshooting	7	1245	September 17, 2020

Merge two Roam Databases?

Related topics