I’ve been thinking a bit about applying some graph theory concepts to Roam, and others have expressed interest. This thread exists to further discussion on the idea, and implementation.
Why Pursue This
I’m sure I won’t have to argue that being able to view a network of links is great. I think by utilising some graph theory concepts (namely centrality and communities) we can make it easier to
- identify important ideas in the network
- identify closely linked clumps of notes
- make the graph nicer to look at.
I think the benefits of 1-2 are clear. 3 is a nice bonus which comes with them.
I propose sizing nodes based on importance/centrality, and colouring them based on groups/communities.
I suspect that applying Graph Theory to the note links network will make it easier to identify particular items of interest, and gain a more accurate picture of the overall state of the network.
Graph Theory
I’ll try to just give a brief overview of the relevant ideas, and the relevant formula so you can get a rough idea for the form of the calculations involved. I’ve left out details on the formula because I doubt that that’s of interest, and Wikipedia exists .
V = vertices = nodes = notes
E = edges = connections = links
Centrality
A measure of importance. A simplistic measure is how many connections a note has, however this fails to take into account how important the notes connected to are. To account for this, one arrives at a recursive definition. At this point matrix algebra and eigenvectors get involved. This is 𝓞 (V³).
I’d consider using a tweaked version of this, namely page-rank (eigenvector centrality + starting weight + weight based on vertex degree), but experimenting with other variants.
Good for finding vertices which are connected a lot
Betweenness centrality
A diferent take on the same idea (importance). The extent to which a vertex lies on the shortest path route between other vertices. 𝓞 (VE)
Good for finding vertices which bridge large groups of vertices
k-cores
Basically incrementally picking off the nodes with < k connections, until nothing remains. 𝓞 (V).
Good for finding a tightly-connected core
Communities
Clusters of closely-related vertices.
Greedy Modular Communities
Modularity is (roughly speaking) a measure of how many connections occur within communities vs. across communities. Maximising this (which can be done greedily) is an approach which generally works well. 𝓞 (N log N)
Good for finding clumps of tightly-connected vertices
Implementation
Implementing this in elisp
would be (1) a huge effort (2) likely run into performance issues.
Common libraries exist in python
and R
, however, performance would likely still be an issue, and requiring say, R
libraries would make end-user installation a prohibitive hassle.
From my googling so far, the most promising option appears to be making use of https://igraph.org/ to create a c
/c++
source file which can be compiled on the end-users computer, in a similar manner to vterm. Providing pre-compiled executables would be nice, but I’m not sure how well this would work.
Since the centrality/community values are constant for a particular network structure, my current idea is to add a DB table to store the values to minimise computation, with a form such as this
- network_features
- UUID
- page_rank
- betweeness_centrality
- k_core
- community_index
Then, when the user asks for a view of the graph, these metrics can simply be retrieved if the network has not changed. If the network has, these values would need updating, so the c
/c++
script could be asynchronously executed, with the results directly written to the DB.
Final Thoughts
I, for one, am a huge fan of this idea. However, I lack experience with c
/c++
and therefore am currently unlikely to be able to give this a shot. If someone would be willing to have a go at this, that would be brilliant! While I may not be able to help with language-specific nuances, I would be keen to discuss basically anything else surrounding this.