title="Query optimizer for find(1)"
duration="2-8 months depending on ambition"
Add a query optimizer to find(1).
Currently find builds a query plan for its search, and then executes
it with little or no optimization. Add an optimizer pass on the plan
that makes it run faster.
Things to concentrate on are transforms that allow skipping I/O: not
calling stat(2) on files that will not be matched, for example, or not
recursing into subdirectories whose contents cannot ever match.
Combining successive string matches into a single match pattern might
also be a win; so might precompiling match patterns into an executable
match form (like with regcomp(3)).
To benefit from many of the possible optimizations it may be necessary
to extend the fts(3) interface and/or extend the query plan schema or
the plan execution logic. For example, currently it doesn't appear to
be possible for an fts(3) client to take advantage of file type
information returned by readdir(3) to avoid an otherwise unnecessary
call to stat(2).
Step 1 of the project is to choose a number of candidate
optimizations, and for each identify the internal changes needed and
the expected benefits to be gained.
Step 2 is to implement a selected subset of these based on available
time and cost/benefit analysis.
It is preferable to concentrate on opportunities that can be found in
find invocations likely to actually be typed by users or issued by
programs or infrastructure (e.g. in pkgsrc), vs. theoretical
opportunities unlikely to appear in practice.
CVSweb for NetBSD wikisrc <wikimaster@NetBSD.org> software: FreeBSD-CVSweb