Is there a mechanism to easily identify broken links in V2?

lyndhurst · June 3, 2024, 1:09pm

Hi,

I am encountering pretty often the same workflow interruption another user was previously describing here.

Sometimes I am writing some text, and I know I want a link to a note/section that does not exist yet. I might not want to interrupt my writing to think about how this new note note/section should be named, or where it should go. Inserting a broken or empty link at this spot seems like a good way to mark the spot to easily come back to it later, and fix the broken link, or create the new note/section.

All that is missing is a simple mechanism to retrieve broken links that can be used in a routine notes review using a dynamic block for example, the org-roam-buffer, or some kind of consult query.

The user I was refering to at the begining was actually offering their own solution hosted on GitHub to achive exactly that, but it is unfortunately a library written for org-roam-v1 that did not see any activity in four years as far as I can see. In that post there is also a reference to an org-roam-find-broken-references which I cannot find anywhere either (commands or manual).

Looking around, I found org-lint which could be cool, but did not list my broken roam or id links when trying it out. Finally I found this github issue on the org-roam project space about an org-roam-doctor set of variables and functions that looked promising too, but then again, it dates back to 2020, and I am afraid it was for org-roam V1 as well. M-x roam doctor does not return any result on my end…

So before I go ahead and try (and certainly fail) to reinvent the wheel, I wanted to check first if more seasoned users here knew about a solution to solve this problem.

Thanks for your time.

twitchy-ears · June 3, 2024, 1:41pm

Yeah I’ve been meaning to update that library, I’ll try and get on it in the next day or few unless someone else comes up with a better idea

lyndhurst · June 3, 2024, 2:35pm

Thank you, that would be very helpful.

Out of mere curiosity, are you still using org-roam V1, or did you find another way to achieve what you were describing in you original post ?

I am always looking for new ideas or inspiration to improve my workflow and practices, maybe there is a better alternative to the “broken link” strategy

twitchy-ears · June 3, 2024, 2:37pm

I switched to v2 and just kinda put up with it because I never got around to having enthusiasm for porting it, but I actually was thinking about it this weekend - so this is like an extra burst of enthusiasm to learn more about the v2 functions and port my little libraries over.

twitchy-ears · June 3, 2024, 2:39pm

(that isn’t to say there isn’t an existing built in way, if I find one I’ll let you know)

lyndhurst · June 3, 2024, 2:42pm

I am really glad I posted today then !

lyndhurst · June 3, 2024, 6:38pm

Thanks a lot for the code. I am sorry I am AFK, and I only got the notification on my phone now.

To answer your first question, yes, I am using roam: links so far, but looking for a solution to that problem, I found a lot of issues about exporting files with this type of links. I did not have time yet to really look into it, but that might change.

Anyway, I am just starting to be able to decipher elisp code, but for what I see, it looks that you got my problem right, and that the type of link could even be modified easily if it was ever necessary.

It is evening here, so I will not be able to test tonight, but I will report back first thing tomorrow morning.

lyndhurst · June 4, 2024, 9:21am

It took me a little while to understand the functions, I am not yet familiar with the org-roam nodes data structure, so I had to read up from the manual.

Both functions work very well for me. I was expecting them to list broken liks in the current buffer actually, so I did not understand at first your concern about performance. I still cannot comment on that from my testing since I do not have a lot of roam links yet.

It is however an added benefit from my point of view that the function looks for broken links throughout the whole database. I was already thinking about how I could modify a buffer local search to look through the whole database to integrate it as a tool in my notes maintenance reviews.

Of course it will remain a benefit only as long as it does not get too slow when I reach hundreds, or thousands of links. In which case two separate functions might be preferable; a fast one for the day to day workflow, a slower one for less frequent maintenance reviews. Hopefully by then, I will be able to modify your function with more confidence. I think it should be as easy as modifying the FROM clause in the initial query.

Finally, even if I cannot compare their performance yet, I still prefer to keep your second version of it because it displays the outline when the link is located in a section of the note.

The only unexpected thing I noticed, is that when the sections containing the broken link are org-roam nodes as well (meaning they have an ID in the local property drawer), the outline gets inverted like child > parent instead of parent > child. In addition, from what I could observe, the outline displayed is limited to 2 levels of depth, so in that particular edge case, the note’s title is not always displayed in the choices.

Thanks a lot for the code, it soves my initial problem really well

akashp · June 4, 2024, 11:05am

I will see what I can do about the outline showing wrong for edge cases - it should be easy to solve it,

I want to ask-- do you want to test another implementation I made?

I thought it would also be useful to find and repair all broken incoming links after you create a node for them.

P.s. dont worry about performance with the second function - it would parse thousands of links in milliseconds

(defun org-roam-repair-broken-links ()
  "For all broken links referencing current note,
   repair incoming links"
  (interactive)

  ;; we first determine if there exists any such broken references
  (when-let* ((title (org-get-title))
	      (query "select links.dest,
                             links.source, links.pos

                      from links where links.dest like $s1")
	      (links (org-roam-db-query query (concat "%" title "%"))))

    ;; for all such - go to those buffers and let org-roam's
    ;; [[roam:*]] replace protocol take over - it would do this on save
    ;; automatically
  (save-excursion
    (mapc (lambda (link)
	    (let ((id (nth 1 link)))
	      (+org-roam-id-goto id)
	      (set-buffer-modified-p t)
	      (save-buffer)))
	  links))))

(defun +org-roam-id-goto (id)
  "Switch to the buffer containing the entry with id ID.
Move the cursor to that entry in that buffer.
Like `org-id-goto', but additionally uses the Org-roam database"
  (interactive "sID: ")
  (let ((m (org-roam-id-find id 'marker)))
    (unless m
      (error "Cannot find entry with ID \"%s\"" id))
    (pop-to-buffer-same-window (marker-buffer m))
    (goto-char m)
    (move-marker m nil)
    (org-fold-show-context)))

ezgif-6-0e37608e1e

I had to define a helper function equivalent of org-id-goto for org-roam for a more cleaner execution.

TODO: I need to fix the buffer switching here - it would be more clean – ideally it should leave you from where you began after it has finished cleaning all broken incoming links - currently it is switching to the last buffer where it found broken incoming link

lyndhurst · June 4, 2024, 12:02pm

If I understand correctly, it would actually repair the links without any manual intervention. That sounds pretty cool.

In the use case I was initially describing, the description in the links is usually made of a few words of a paragraph I am writing, it does not contain the future node title because I usually have no clear idea how the future node will be called, or do not want to interrupt my flow of writing to think about it, I just know that I am addressing a concept that should be developed in another linked node; like a wikipedia link if you will.

That means that I am gonna have to setup a few test nodes to test out this repair function, so I’ll have to get back to you a bit later today for that.

That’s impressive !

For that part I am assuming you are talking about the org-roam-find-broken-links function, I hope I got that right…

In my use case, the full outline path including the node title can help understand better the choices offered in the case of nested nodes. From a performance perspective, I do not understand the query inner join and from statement well enough to give you a definitive answer.

I see two different scenarios: performance is altered by the total amount of roam links in the database or by the amount of broken links present in the database.

I am not worried about the second scenario because with your function in my toolbox, I will never have more than a couple of dozen broken links in my database.

In both scenarios, if, as I quoted you above, thousands of links can be processed in miliseconds, then this is no problem at all, and I would prefer to sacrifice some performance for better useability since it is not a function designed to quickly access information but a tool to use when I take the time to groom and review my notes.

Anyway, I really appreciate your effort and thoroughness you show in the thought process behind your code design. Thanks again.

lyndhurst · June 4, 2024, 1:58pm

Thanks for the offer. My SQL-fu is definitely rusty, but first I need to play around with
org-roam-db queries to really understand the structure; reading the manual helps, but I need a more hands-on approach I think.

I will definitely keep the repair mechanism, I am just starting to really use org-roam in production (past weeks were mostly configurations) so my workflow will definitely mature, and the underlying idea is too appealing to discard.

I can confirm that the first ‘broken link finder’ works perfectly now.

I had noticed a change in behavior actually. Now, I get one choice candidate for each broken link even if they are in the same section whereas before each section was listed only once even if it contained many broken links.

It is definitely more intuitive now, and I had not noticed the ‘duplicate broken link oversight’ yet.

Thanks again for taking the time to get it right with so much detail !

dmg · June 4, 2024, 5:53pm

This might be useful. This is a sqlite query that will find all the dangling links of type org-roam in an org-roam database (updated the query since I thought roam links could use the title of a node as a destination, that is removed now).

#+begin_src sqlite   :exports both
with 
  dangling as (
     select dest
      from links
      where type = '"roam"'
     except
      select id from nodes)

select file, title, dest, links.pos, "type", links.properties
   from links
      natural join dangling
      join nodes on (nodes.id = source);
#+end_src

A link can use the ID or the title of a node as its destination. It should be easy to reuse code from V1 using this query.

dmg · June 4, 2024, 6:01pm

I agree. This is something recommended by some people: create the links that you think you might need.

I also agree that the way a function should work is by listing all the dangling links, and then allow the user to jump to the location of each. Then the user can decide what to do (edit, create node, etc).

akashp · June 4, 2024, 6:21pm

I just implemented the same – the query is fed to a read complete interface and so on… why the necessity of such a complex query ? The query does what we can do much more simply.

dmg · June 4, 2024, 6:28pm

Slightly different idea. Correct me if I am wrong:

your code prompts the user with all potential dangling links
If selected, jump to it.

What I am suggesting is to create a list of dangling links (e.g. like agenda) and then allow the user to review them as a list.

oh, your code lists the links to the current node (correct me if am wrong).

What I am suggesting is listing ALL dangling links.

dmg · June 4, 2024, 6:33pm

I forgot to add the position to the query. I’ll update the query now.

dmg · June 4, 2024, 6:35pm

Keep in mind what I said before about giving all the work to the DBMS at once. Your code has two queries: one lists all links, the other check if links exist. You can do all in the DBMS at once (as in my query), making the emacs code much simpler.

(That is also why I see the DBMS as the API to roam)

dmg · June 4, 2024, 7:20pm

I don’t think so. They are all links in an org-roam database. Try it:

add a roam link to a org-roam document
and see how it appears in the database as a link

My understanding (and querying of the DB) makes me believe that all links inside org-roam are stored in this database.

sqlite> select type, count(*) from links group by type;
type          count(*)
------------  --------
"DOI"         1       
"Https"       1       
"arXiv"       2       
"attachment"  4       
"coderef"     6       
"custom-id"   294     
"file"        2037    
"fuzzy"       129     
"http"        35      
"https"       6536    
"id"          730     
"mu4e"        392     
"roam"        13      
"yt"          41      
sqlite>

akashp · June 4, 2024, 7:22pm

But for what purpose are you adding roam: links as? I have 0 roam: links except those that are broken –

sqlite> select type, count(*) from links group by type;
"custom-id"|9
"file"|11
"fuzzy"|21
"http"|1
"https"|14
"roam"|1

If you use a roam: link to an existing node - it will automatically be converted to an id type on save. Unless you have changed something in your configuration

dmg · June 4, 2024, 7:31pm

To link to a title. You can use the title as the destination. You can also link to the id

I’ll check on why my database allows me to have those links to title.
What does your code do?

It provides a link of roam links
Asks the user to select one,

and then?

Topic		Replies	Views
Finding Unlinked References (org-roam-sbl-show-broken-links) Development	9	1672	March 10, 2022
Org-roam V2 / org-id ID link resolution problem Troubleshooting	14	4701	January 2, 2022
Links not working Troubleshooting	9	1245	July 28, 2021
Broken links, even thou links are not broken Troubleshooting	1	423	November 11, 2022
V2, capture using auto-created org-id Troubleshooting	0	253	August 5, 2021

Is there a mechanism to easily identify broken links in V2?

Related topics