How to properly group results of org-roam-db-query?

akashp · May 18, 2024, 11:37am

I have been trying to learn sql to do queries –

My question is – when I join query result from one table to another - the result is a cartesian product of the elements of the first table with the elements of the second table if the second table happens to have more than 1 element in the column in question

That is

(org-roam-db-query [:select [t0:c0 t0:c1 t1:c2]
                                  :from t0
                                  :inner :join t1 :on (= t0:common-column t1:common-column)
                                  ])

where 'ti' represents the ith table
and 'ci' represents arbitary columns

the result shall be

((c01 c11 c21) (c01 c11 c22) (c01 c11 c23) ... )

=> column c2 on table t1 has multiple entries in the same row

What I want to achieve is the following

((c01 c11 '(c21 c22 c23)) ... )

As a practical example consider querying all the links associated with and id

(let* ((sql `[:select [nodes:title
		       nodes:file
		       links:dest
		       ]
		      :from nodes
		      :inner :join links :on (= nodes:id links:source)
		      ]))
  (org-roam-db-query sql))

The only way I can make the data more structured is by using a hash-table to group out the common elements

(let* ((sql `[:select [nodes:title nodes:file
  		       links:dest links:type] ; SQL query to select title, file, and destination from nodes and links tables
		      :from nodes
		      :inner :join links :on (= nodes:id links:source)]) ; Join nodes and links tables on their IDs
       (results (org-roam-db-query sql)) ; Execute the SQL query and store the results
       (grouped-results (make-hash-table :test #'equal))) ; Initialize a hash table for grouping
  
  ;; Group the results by common elements (title and file)
  (dolist (result results) ; Iterate over each result from the SQL query
    (let* ((title (nth 0 result)) ; Extract the title from the result
	   (file (nth 1 result)) ; Extract the file from the result
	   (dest (nth 2 result)) ; Extract the destination from the result
	   (type (nth 3 result))
	   (path (concat type ":" dest))
	   
           (common-elements (list title file))) ; Create a list of common elements (title and file)
      
      ;; Add the destination to the list associated with the common elements
      (if (gethash common-elements grouped-results) ; Check if common elements already exist in the hash table
          (push path (gethash common-elements grouped-results)) ; If yes, append the destination to the list
        (puthash common-elements (list path) grouped-results)))) ; If not, create a new entry in the hash table
  
  ;; Print the grouped results
  (maphash
   (lambda (common-elements destinations) ; Iterate over each entry in the hash table
     (message " ---------- -----------") ; Print separator
     (message "Common Elements: %S" common-elements) ; Print the common elements
     (message " ---------- -----------") ; Print separator
     (dolist (destination destinations) ; Iterate over each destination associated with the common elements
       (message "Destination: %S" destination)) ; Print each destination
     (message " ")) ; Print a newline to separate groups
   grouped-results))

Please let me know if there are more efficient ways of doing this. Thanks.

nobiot · May 18, 2024, 12:40pm

… So you are trying to get

TITLE,
FILE,
all LINK DEST’s

for a given set of IDs?

What’s wrong with this?
((c01 c11 c21) (c01 c11 c22) (c01 c11 c23) … )

I feel that what’s efficient would depend on your usage of the data.

One way may be is to do sql twice: one to get TITLE and FILE, and the other to get the DEST’s, and then append the latter list to the former.

dmg · May 18, 2024, 6:42pm

Perhaps easier, write the query directly in sql, and use group_concat, which is what org-roam uses (search for group_concat in org-roam-node). It also works way faster.

sqlite> create table test(a,b);
sqlite> insert into test values (1,"a"), (1,"b"), (2,"c"), (2,"d"), (2,"f");
sqlite> select a, group_concat(b) from test group by a;
a  group_concat(b)
-  ---------------
1  a,b            
2  c,d,f          
sqlite>

akashp · May 18, 2024, 11:57pm

Thank you @nobiot and @dmg for your responses -

I was actually trying to just learn how to combine results from joining operations - for arbitary table joins – the example was just a quick show of intention - I can use group_concat() while passing the sql statement as a string

for example

"SELECT nodes.title, nodes.file, '(' || group_concat(links.dest, ' ') || ')' 
 FROM nodes
 INNER JOIN links ON nodes.id = links.source
 GROUP BY nodes.title, nodes.file"

But I realised there is a problem while converting to emacsql-vector representation – namely
the || concat operator has been stripped off emacsql intentionally so without concating the result within ( ) the group_concat is not useful, and doesn’t return the desired result

that is

[:select
 [nodes:title
  nodes:file
  (funcall group_concat links:dest " ")
  ]
 :from nodes
 :inner :join links
 :on (= nodes:id links:source)
 :group-by [nodes:title nodes:file]
 ]

CONVERTS TO

SELECT nodes.title, nodes.file, group_concat(links.dest, '" "') 
    FROM nodes INNER JOIN links ON nodes.id = links.source 
    GROUP BY nodes.title, nodes.file;

I cannot figure out a way to concat the parenthesis in between.

This seems to be a limitation in emacsql broadly. To properly use group_concat() we have to use the string representation as opposed to the vector representation.

Topic		Replies	Views
How to create a link to a org-roam search? How To	9	2537	August 24, 2021
Materialized View for Org-roam.db using sqlite triggers Development	27	324	June 3, 2024
Group links/tags in org-roam How To	15	1775	July 22, 2024
Rewriting org-roam-node-list for speed (it is not sqlite) Development	90	755	August 8, 2024
How can I query the database using variables? How To	6	188	August 27, 2024

How to properly group results of org-roam-db-query?

Related topics