find: files only with scm directory pruning

The version of find I’m discussing here is
find (GNU findutils) 4.7.0-git
I use this pattern frequently—
$ find . <conditions> |xargs grep <pattern>
to find files containing, say, a regular expression.  If the search tree contains mercurial or git directories, I usually want to exclude their contents from the search.

The -prune action prevents a search from descending into the pruned directory, but I also want to strip out all directories, because the filenames are being fed into xargs grep.  So the command feeding the xgrep looks something like—
$ find . -type d -name .hg -prune -o -type f -print
This works well.  All directory names are suppressed, along with the files contained in the .hg directory.

Because the default action on a find is -print, I often elide that action, so I end up with—
$ find . -type d -name .hg -prune -o -type f
Lo and behold, the name of the pruned .hg directory appears in the list of files passed to xargs.  All other directory names are suppressed.

What seems to be going on is this: the condition before the -o finds only directories named .hg.  Those it prunes, but the condition returns the names of the pruned directories.  The condition following the -o filters out all of the directories not named .hg.  The combined list of files and .hg directories (but not their contents) is passed to xargs.

So how is it that the first version works as I want it to?  How are the names of the .hg directories suppressed?

All I can surmise is that, in the absence of a specific -print action, the default -print applies to each of the conditions, but when it is specifically applied to the -o conditions, the default is suppressed for the initial conditions.

The man page says:
If the whole expression contains no actions other than -prune or -print, -print is performed on all files for which the whole expression is true.
That is ambiguous; in fact, it seems to be false. Neither of the versions above contain any actions apart from -prune and -print, and one doesn’t even contain a -print.  Yet they behave differently.

The command
$ find . \( -type d -name .hg -prune -o -type f \) -print
which pops the -print out from within the sub-expression, behaves the same as
$ find . -type d -name .hg -prune -o -type f
so that seems to be what is effectively happening in the absence of a specific -print on the or condition. (Incidentally, you seem to be able to use -or in place of just -o.)

Leave a Reply

Your email address will not be published. Required fields are marked *