Working directories
1 | also installing the dependencies |
Faster alternatives to “find” and “locate”?
As of early 2021 I evaluated a few tools for similar use cases and integration with fzf:
tl;dr: plocate
is a very fast alternative to mlocate
in most cases. (GNU) find
often still seems to be hard to beat for tools which don’t require an index.
locate
-likes (using indices)
I compared the commonly used mlocate
and plocate
. In a database of about 61 million files plocate
answers specific queries (a couple of hundred results) in the order of 0.01 to 0.2 seconds and only becomes much slower (> 100 seconds) for very unspecific queries with millions of results. mlocate
takes an almost constant 35 to 40 seconds to query the same database in all tested cases. Most of the time plocate
is multiple orders of magnitude faster than mlocate
.
find
-likes (directory traversal)
It’s harder to - sorry - locate a good find alternative and the result very much depends on the specific queries. Here are some results for just trying to find all files in a sample directory (~200.000 files) in descending speed order:
- GNU find (
find -type f
): Surprisingly,find
is often the fastest tool (~ 0.01 seconds) - plocate (
plocate --regexp "^$PWD"
) (< 0.1 seconds) - mlocate (
mlocate --regexp "^$PWD"
): Interestingly faster than the non-regexp queries (~ 5 seconds) - ripgrep (
rg --files --no-ignore --hidden
): A good bit faster, but still very slow (~0.5 seconds) - fd (
fdfind --type f
on Debian): Not as fast as expected (~0.7 seconds) -
zsh
globs (**/*
and extended versions): Generally very slow, but can be elegant in scripts (~3 seconds)
With directory structures not cached in RAM, plocate
probably beats find
, especially on disks with high seek times (HDDs), but does of course require an up-to-date database.
A parallelized find
might give better results on systems which profit from command queuing and can request data in parallel. Depending on use case, this can be implemented in scripts, too, by running multiple instances of find
on different subtrees, but performance characteristics depend on a lot of factors there.
grep
-likes (looking at content)
There’s a whole lot of recursive grep
-like tools with different features, but most offer very limited features for finding files based on their metadata instead of their contents. Even if they are fast like ag
(silversurfer), rg
(ripgrep) or ugrep
, they are not necessarily fast if just looking at file names.