Status quo

Monotone's current mechanism for specifying a set of unversioned files which are to be ignored when they appear in a workspace is rather complicated. Whenever the workspace is scanned, all unversioned files are run through the Lua ignore_file() hook. If this hook returns true, the file is ignored. The default definition of this hook consults a file at the root of the workspace, named .mtn-ignore; if that file exists, it is treated as a list of regular expressions, one per line, to be matched against the full pathname of the file that might be ignorable. Any match causes the file to be ignored. If the .mtn-ignore file doesn't exist or there is no match, the hook goes on to test the file against an internal list of patterns that might be ignorable. There is no syntax for putting comments in .mtn-ignore or disabling patterns (e.g. from the internal list).

There are two problems with this. One is that it's simultaneously too customizable and not customizable enough. The only way to remove an entry from the internal "always ignore these" list is to modify the Lua hook; but other than that, there's really no reason to modify the hook. The other is that it's really slow, because it has to copy strings in and out of the Lua interpreter for every pathname being tested, and it presently has to recompile all the regexes for every pathname being tested. For example, if top-of-trunk 'mtn ls unknown' is run in an unmodified Monotone checkout (but one for which autoreconf -i has been run), the overwhelming majority of the runtime is spent in the guts of malloc() and related operations (std::string constructors, for instance) but the top three non-memory-allocation operations according to cachegrind are basic_io::tokenizer::get_token, SHA_160::hash, and compile_regex.

What to do about it

It would be quite easy to teach the Lua hook to cache compiled regexes, as it currently caches the contents of the .mtn-ignore file. That would speed things up at no cost to the user.

NOTE: As of monotone 1.next (release after 1.0) the regular expressions are cached automatically for the user.

It is, however, worth considering a simpler, non-Lua approach. To help decide what users actually need, I read the documentation for a bunch of other VCS's ignore features. Just about all of them use a simple file in the root of the working copy, containing a list of patterns. There is no overall consensus on regexes versus globs, and several support both. Few have a hardwired 'always ignore these' list. (See below for details.)

Current proposal

It's possible to implement a scheme where, in normal usage, .mtn-ignore is read from C++ and Lua never gets involved, and .mtn-ignore has significantly more power, with only very small risk of breakage to existing setups. It goes like this:

  • The .mtn-ignore file continues to contain a list of regexes, one per line, matched against the full pathname relative to the workspace root. However (this part of the proposal is the only source of breakage):
    • If the very first character on a line is #, that line is a comment.
    • If the very first character on a line is !, that line adds to the anti-ignore list rather than the ignore list (see below).
    • Either of these may be escaped with \ to signify a regular expression whose first character is # or !.
    • If for some reason you need the regular expression itself to begin with \, you must double the backslash.
    • We reserve the right to make other punctuation characters have special meaning at the beginning of the line. (For example, we might in the future have a special character that means 'this pattern is a glob, not an RE'.) For forward compatibility, we recommend escaping all punctuation characters at the beginning of a line.
  • C++ code probably in work.cc maintains a starter list of default ignore patterns (identical to the existing starter list in std_hooks.lua). The content of .mtn-ignore is added to this starter list to form the complete list of ignore patterns. Files matching any pattern on this list are ignored.
  • The anti-ignore list is initially empty; it's filled in from .mtn-ignore lines beginning with !; files matching any pattern on this list are not ignored, even if they match a pattern on the ignore list. The primary purpose of this is to override the default patterns.
  • For backward compatibility's sake, if the ignore_file Lua hook is defined, all the above logic is skipped and we just do what we do now. It might be appropriate for monotone to print a warning in this case.

Older ideas

My recommendation is that we try to move to a model where the .mtn-ignore file is the sole source of patterns to match unknown files against. It continues to contain a list of regular expressions, one per line; additionally, # at the beginning of the line (and only the beginning of the line) should mark that line as a comment. (Permit \# at the beginning of the line to specify a pattern whose first character is #.)

Any change that removes the Lua hook will have to have some sort of transition plan. This is especially true in a context where we are expecting the "default ignores" to come from .mtn-ignore as well as user-specified ignores. My recommendation here is:

  • mtn setup creates a .mtn-ignore with the default list of ignores.
    • shall we auto-add this file as well? otherwise people might not notice its there (its hidden after all) and it won't get added to the repository on the first commit (ThomasKeller)
  • A new command, mtn ls default-ignores, dumps the default list of ignores to stdout.
    • What about mtn ignore --list-defaults? Two reasons: a) we don't want to clutter the ls command space even more and b) It would be handy to have a mtn ignore FILE (which would put the exact file path at the end of .mtn-ignore) and mtn unignore FILE (which would do the same, but add a marker on front which negates the causality, i.e. we'd need some kind of "do not ignore this file, even if it was ignored before"-syntax (ThomasKeller)
  • We announce loudly the removal of the ignore_file hook, tell people that if they hadn't customized it they can just do mtn ls default-ignores >> .mtn-ignore; mtn commit -m "Add defaults to .mtn-ignore"; if they removed something from the default list they will need to manually repeat that after doing the above, and if they made some other change please get in touch.

Other systems

git

Documented at http://www.kernel.org/pub/software/scm/git/docs/gitignore.html. A file named .gitignore may appear in any versioned directory; it contains a list of ignore patterns to apply to that directory and any subdirectories. .gitignore files deeper in the hierarchy override shallower ones. Additionally, patterns from $WORKSPACE/.git/info/exclude and also from a configurable (and possibly global or even system-wide) exclude file are considered. Patterns are shell-style globs which may be applied to the basename or the path relative to the .gitignore file. There is syntax for comments and for cancelling patterns. There does not appear to be any equivalent of monotone's internal always-ignore list. .git/info/exclude is created on workspace initialization, but doesn't contain any patterns.

bzr

Documented at http://doc.bazaar-vcs.org/latest/en/user-reference/bzr_man.html#ignore. A file named .bzrignore may appear in the workspace - I think only in the root, the document does not say clearly - listing ignore patterns to apply. Patterns may be globs or regular expressions, and may be matched against the basename or the full path. There is special glob syntax for matching at least one directory. The system may fill in .bzrignore with a set of default patterns on creation of a new empty branch; again, the documentation is vague.

Mercurial

Documented at http://www.selenic.com/mercurial/hgignore.5.html. Basically the same as bzr, except that the file is named .hgignore.

darcs

Documented at http://darcs.net/manual/node5.html#SECTION00510040000000000000 (URL may not be stable). Darcs calls ignored files "boring". Newly created repositories contain an unversioned file, within the equivalent of the bookkeeping directory, containing a list of regexps that will be matched against full pathnames within that repository. The "boringfile" can be reconfigured as a versioned file. There is also a per-user boring file.

subversion

Documented at http://svnbook.red-bean.com/en/1.4/svn.advanced.props.special.ignore.html. This is the only one that doesn't use a file. The svn:ignore property on a versioned directory specifies a list of glob patterns to be matched against basenames of files in that directory only. (In practice, people have the same or nearly the same svn:ignore property on all their directories.) There is also a per-user "ignore basenames matching all these globs always" pattern.

The manual is at pains to point out that the ignore property does not affect treatment of files already under version control. Apparently a lot of Subversion users are confused and think that if a versioned file matches an ignore glob, changes to it won't be committed unless specially requested. I don't know how anyone could get that idea - does Visual ?SourceSafe work that way or something?

codeville

I can't find documentation on codeville's ignored feature. http://codeville.org/doc/ToDoList includes this item:

  • Fix the file ignore pattern stuff. Should probably work more like CVS.

so it's got something, but who knows what.

Quick Links:     www.monotone.ca    -     Downloads    -     Documentation    -     Wiki    -     Code Forge    -     Build Status