*I’ve been thinking a bit about applying some graph theory concepts to Roam, and others have expressed interest. This thread exists to further discussion on the idea, and implementation.*

### Why Pursue This

I’m sure I won’t have to argue that being able to view a network of links is great. I think by utilising some graph theory concepts (namely centrality and communities) we can make it easier to

- identify important ideas in the network
- identify closely linked clumps of notes
- make the graph nicer to look at.

I think the benefits of 1-2 are clear. 3 is a nice bonus which comes with them.

I propose sizing nodes based on importance/centrality, and colouring them based on groups/communities.

I suspect that applying Graph Theory to the note links network will make it easier to identify particular items of interest, and gain a more accurate picture of the overall state of the network.

# Graph Theory

I’ll try to just give a brief overview of the relevant ideas, and the relevant formula so you can get a rough idea for the form of the calculations involved. I’ve left out details on the formula because I doubt that that’s of interest, and Wikipedia exists .

V = vertices = nodes = notes

E = edges = connections = links

## Centrality

A measure of importance. A simplistic measure is how many connections a note has, however this fails to take into account how important the notes connected to are. To account for this, one arrives at a recursive definition. At this point matrix algebra and eigenvectors get involved. This is *𝓞 (V³)*.

I’d consider using a tweaked version of this, namely page-rank (eigenvector centrality + starting weight + weight based on vertex degree), but experimenting with other variants.

**Good for finding vertices which are connected a lot**

## Betweenness centrality

A diferent take on the same idea (importance). The extent to which a vertex lies on the shortest path route between other vertices. *𝓞 (VE)*

**Good for finding vertices which bridge large groups of vertices**

## k-cores

Basically incrementally picking off the nodes with *< k* connections, until nothing remains. *𝓞 (V)*.

**Good for finding a tightly-connected core**

## Communities

Clusters of closely-related vertices.

### Greedy Modular Communities

Modularity is (roughly speaking) a measure of how many connections occur within communities vs. across communities. Maximising this (which can be done greedily) is an approach which generally works well. *𝓞 (N log N)*

**Good for finding clumps of tightly-connected vertices**

## Implementation

Implementing this in `elisp`

would be (1) a huge effort (2) likely run into performance issues.

Common libraries exist in `python`

and `R`

, however, performance would likely still be an issue, and requiring say, `R`

libraries would make end-user installation a prohibitive hassle.

From my googling so far, the most promising option appears to be making use of https://igraph.org/ to create a `c`

/`c++`

source file which can be compiled on the end-users computer, in a similar manner to vterm. Providing pre-compiled executables would be nice, but I’m not sure how well this would work.

Since the centrality/community values are constant for a particular network structure, my current idea is to add a DB table to store the values to minimise computation, with a form such as this

- network_features
- UUID
- page_rank
- betweeness_centrality
- k_core
- community_index

Then, when the user asks for a view of the graph, these metrics can simply be retrieved if the network has not changed. If the network has, these values would need updating, so the `c`

/`c++`

script could be asynchronously executed, with the results directly written to the DB.

## Final Thoughts

I, for one, am a huge fan of this idea. However, I lack experience with `c`

/`c++`

and therefore am currently unlikely to be able to give this a shot. If someone would be willing to have a go at this, that would be brilliant! While I may not be able to help with language-specific nuances, I would be keen to discuss basically anything else surrounding this.