Is there a simple, scalable way to do full text search?

How do I do full-text search through org-roam files? Things I’ve looked into or tried:

  • org-roam manual suggests Deft, which I have not tried because it doesn’t scale well (according to what I read)
  • org-roam manual also suggests NotDeft. This looks complicated.
  • this thread looks very interesting but is above my head.
  • I used to use org-search-view and it dutifully and reliably reported the org-entries in which search string was found.But it works on org-agenda-files and it seems a bad idea to include all roam files as org-agenda-files.

I would think this is a critical functionality. Relying on pulling up a note based on its title and tags works some of the time; but, what are people doing when you just want to search (possibly by regex) for a string among thousands of org-roam files? I guess I’m hoping that such a fundamental action for basic functionality already exists as a MELPA package.

EDIT: Solved

I use ripgrep, rg and consult-ripgrep

4 Likes

Deft really does not scale well. For a while I used helm-org-rifle, which has a great UI - but also does not scale well. Now I am using deadgrep set to my org-roam directory, which is just awesome. I have about 1800 notes, including several long Chinese legal documents and it really flies.

4 Likes

This is one of the areas where being a plain text format truly shines. You can recursively search for any regex inside all the directory’s files using the grep tool.

In emacs, I use two commands for this, both based on grep. The first is the rgrep command. Searches the roam directory recursively and stores every instance of the requested regex inside a buffer it creates for output. You can them open each instance and see everything. The great thing is that since this creates a new buffer, you can keep the search results for later even. The other tool I use is counsel-rg (as I am an Ivy user), which uses ripgrep as its backend (an excellent rust rewrite of grep) and stores the output in the minibuffer. You then choose what you want and can even narrow it down as it interactively searches as you type. I am pretty sure other completion frameworks have this as well.

I typically rgrep when looking for multiple instances of a word inside the directory (or every instance) as it gives a retainable buffer easily allowing you to search multiple times, and counsel-rg when I need a one off or when I want to create the regex as I go and see the results slowly show up.

The consult-ripgrep can show the finding list, but I can’t preview the content like org-notes-search results list with content preview.
In other words:

  1. In consult-ripgrep, I can see the search result list and enter one of them but lost the research result. if the content is not what I want , I have to search again and browse the result from beginning again.
  2. In org-notes-search, I can preview the content in the search result, then choose the right one to enter.
    But org-notes-search only work for the org directory if the roam is in another directory.

So I wonder what is the better way, here’s some ideas:

  1. Set the org-roam directory in the org directory
  2. Can we use org-notes-search for the org-roam?

btw, rg is not as simple as the above options to search more than 1 keywords.

I missed the ‘helm’, that’s pretty easy to use. M-x "helm/project-searcch" . Use tab to preview.

Sorry I haven’t been keeping up with this group – yes, there’s a simple, scalable way to do full text search: https://github.com/zot/microfts

The elisp package automatically indexes your org-mode files as you visit them and updates the index when you save them.

Late answer, as I missed this.

I often use emacs-velocity: https://github.com/bard/emacs-velocity. Among some of its very nice features are:

  • In contrast to consult-ripgrep and other ag/rg options, you can search for terms that are in the same file, but not necessarily in the same line.
  • It also offers preview.
  • I find it reasonably snappy with over 1100 notes (about as fast as org-roam-node-find).

Disadvantages?

  • The preview is not really preview as for the consult-ripgrep solution above; rather, it opens the files (so you can end up with a long list of open buffers).

  • If you are in the middle of editing a note and you have files with names that start with .# , you get an error if you try to use velocity.

First, I just want to point out a workaround if you end up opening a large number of buffers as a result of a full-text search and you want to kill them all quickly in one go. I suggest using projectile-kill-buffers bound to C c p k assuming that you are using projectile package and there is .projectile file (empty) inside org-roam dir (or any dir which contains your target or project org files). This way you can selectively kill all opened buffers which are org-roam related in one keystroke. This is my best way so far to kill project-specific buffers, very handy.

Second, thank you for your input regarding emacs-velocity which supports helm, I use helm-ag personally, but to be honest, I didnt test it’s scalability. I wonder if you have experienced firsthand how does emacs-velocity solution compare to others like deadgrep or microfts which are also positively mentioned in this thread?

Thanks a lot you for your suggestion of using projectile-kill-buffer. I had not thought about it, and I’ll give it a try.

I have not tried deadgrep nor microfts. Both are in my “TO-DO” list. Based on the searching tools behind those too, I’d assume they should be faster, but this is just a wild guess.

In the last hour, however, I’ve been playing with xeft (which I saw mentioned somewhere else). It uses Xapian (install was a piece of cake in Linux) and it is really fast and intuitive to use (e.g., the query syntax for searches: Xeft: Queries). The main repo is now here:

https://sr.ht/~casouri/xeft/
(but I think issues are still here: GitHub - casouri/xeft: Fast, interactive Emacs note searching).


EDIT: I gave deadgrep a try. It is very fast for sure, including the incremental searches. But, for me, searching for files that contain all of a set of words that can be in different lines is much more cumbersome than with xeft or velocity. And for matching multiple terms in the same line, I find consult-ripgrep, as explained above, a lot simpler (again, disclaimer: for me and my limited knowledge of ripgrep). In addition, I was not able to figure out how to, by default, use always the interactive search. So for my use cases, it is not a good fit. But, of course, this is just my experience and definitely other people here are using it joyfully.

In case it helps, based on Allow searching in other directories · Issue #92 · Wilfred/deadgrep · GitHub , I used this function to search in my org-roam-directory:

(defun deadgrep-org-roam (search-term dir)
      "deadgrep in org-roam-directory"
      (interactive (let (
			 (search-term (deadgrep--read-search-term)))
                     (list search-term your-org-roam-directory)
		     ))
      (deadgrep search-term dir))

Thank you for sharing this code. I just tried deadgrep now and my first impression as compared to helm-ag is this:

Pros:

  • much faster than helm-ag using ag Silver Searcher
  • convenient simple user interface (which I hope would add more features especially the way one would choose a dir to search in the near future)
  • the context to show around the search hits is very handy. One can choose how many lines to show before and after each hit. I find this feature very useful.
  • smart picking up the dir to search GIVEN that you have VCS inside that dir (git repo etc).
  • deadgrep-kill-all-buffers always useful to clean up the mess at the end.

Cons:

  • At the firing up of the deadgrep, the way to pick a dir to search in seems to be inefficient to me especially in dirs that don’t have VSC. Although the C-u M-x deadgrep was made to stop running a search till you hit D to choose a dir, again by entering a path, really? There must be a way in Emacs to make it easier. One idea came to my mind, is to pass a list of favorite search dirs I use frequently, more often than not, to some variable and let helm prompt them to you as candidates. This should be one way to be more efficient. The function you provided is one remedy to the same effect, but this means for every important dir you define a dedicated function for it, seems suboptimal.

I agree this kind of search is a bit combersome in deadgrep. You may want to try helm-rg using ripgrep with helm-projectile, which I assume you have already installed. Then put a pointer on any word of interest and hit C-c p s r and you will get search hits automatically for this word plus you can do fuzzy incremental search with ease. M-d to set directory, M-b bounce to the search buffer.
but for the complete list of keybindings of helm-rg go to M-x describe-keymap then helm-rg-map and there is also helm-rg--bounce-mode-map, quite many.

I didn’t try xeft or emacs-velocity but I wonder how would you compare that with your current experience in helm-rg when you do searches for words in different lines or fuzzy searches that you used to do.

EDIT:
Current verdict:
I am using now the following packages (using ripgrep) to accomplish full-text searches:

  • rg.el and wgrep to do usual ripgrep searches (no fuzzy ones) and follow-up editing using wgrep.
  • helm-projectile and helm-rg to get quick incremental fuzzy searches by C-c p s r

Emacs init.el setup:

(use-package projectile
  :diminish projectile-mode
  :config (projectile-mode)
  :bind-keymap
  ("C-c p" . projectile-command-map)
  :init
  (setq projectile-switch-project-action #'project-dired)
  )

(use-package helm-projectile
  :config
  (helm-projectile-on)
  )

(use-package wgrep
    :straight (wgrep
	     :host github
	     :repo "mhayashi1120/Emacs-wgrep"
	     :branch "master")
  :config
  (setq wgrep-auto-save-buffer t)	; to save buffer automatically when wgrep-finish-edit.
  (setq wgrep-enable-key "e")
  (setq wgrep-change-readonly-file t)	; to apply all changes regardless of whether or not buffer is read-only.
  )

(use-package rg.el
  :straight (rg.el
	     :host github
	     :repo "dajva/rg.el"
	     :branch "master")
  :after wgrep
  :init
  (require 'rg-isearch)
  :bind (("C-c s" . rg-menu)
	 :map isearch-mode-map
	 ("M-g" . rg-isearch-menu))  
   :config
   (rg-enable-default-bindings))

(use-package helm-rg
  ;; with helm-projectile C-c p s r and M-d to set dir, M-b to bounce to result buffer.
  :straight (helm-rg
	     :host github
	     :repo "cosmicexplorer/helm-rg"
	     :branch "master")
  )


I have used helm-rg (not with helm-projectile, but on its own); I wasn’t aware of the trick for using fuzzy incremental search you suggest. I’ll try to check it out.

Thus, I cannot really compare helm-rg with xeft or emacs-velocity. However, for searching terms over multiple lines I think xeft is likely to be much simpler and more flexible to use for me, because of the very straightforward search syntax (see xeft: Queries). In terms of speed, xeft feels immediate (it uses Xapian underneath), at least with 1400 notes. I do not notice any delay between my typing and the showing of results. For single term searches, I could not feel any difference between xeft or deadgrep (or any other ripgrep-based solution); for all practical purposes, they are all instantaneous.

Moreover, three days ago, @casouri, xeft’s author, provided code (Show preview for all files as one goes through each one · Issue #23 · casouri/xeft · GitHub) showing how to preview all files as one goes through each one (also removing them right after stepping to another, thus avoiding filling up the buffer list).

velocity is also extremely nice, though not as fast (see also the second issue I mentioned in my first post). (The lingering buffers issues I am dealing with as you suggested —though with project.el instead of projectile).