- Contact: tech-userlevel
- Mentors: David Holland
- Duration estimate: 350h
Add a query optimizer to find(1).
Currently find builds a query plan for its search, and then executes it with little or no optimization. Add an optimizer pass on the plan that makes it run faster.
Things to concentrate on are transforms that allow skipping I/O: not calling stat(2) on files that will not be matched, for example, or not recursing into subdirectories whose contents cannot ever match. Combining successive string matches into a single match pattern might also be a win; so might precompiling match patterns into an executable match form (like with regcomp(3)).
To benefit from many of the possible optimizations it may be necessary to extend the fts(3) interface and/or extend the query plan schema or the plan execution logic. For example, currently it doesn't appear to be possible for an fts(3) client to take advantage of file type information returned by readdir(3) to avoid an otherwise unnecessary call to stat(2).
Step 1 of the project is to choose a number of candidate optimizations, and for each identify the internal changes needed and the expected benefits to be gained.
Step 2 is to implement a selected subset of these based on available time and cost/benefit analysis.
It is preferable to concentrate on opportunities that can be found in find invocations likely to actually be typed by users or issued by programs or infrastructure (e.g. in pkgsrc), vs. theoretical opportunities unlikely to appear in practice.