AlternativeOverview

author's user page as a draft for a wikipedia article, I'm now online and can't check if it was used after all or not; in the latter case we might want to warn the user before deletion. -->

(Though there is a Monotone manual, this alternative presentation of concepts may be of use. It might be possible to refine this into an article that is the appropriate size, scope, and neutrality to be suited for integration into the Monotone Wikipedia article.)

P2P Database Basics

Monotone is a command-line utility which runs on many platforms, and is used to create and manage monotone databases. At a basic level, each monotone database can be thought of as storing archives of documents in directories on a file system. In this way of thinking, it is like a collection of ".zip" or ".tar" files.

Archiving

Each archive snapshot is called a revision. A directory can be added into the monotone database using a command called setup. When a new revision has been setup into a monotone database, a number known as a revision ID is automatically generated to identify it. '''(eds: technically, setup remembers what database you want to associate with, but can't generate a revision id yet. revision id's are only ever generated by a commit)'''

Unlike adding files to a ".zip" or a ".tar", a directory that has been setup into a monotone database retains a connection to that database and is known as a workspace. Hidden files in the workspace are used to remember the revision id that was generated. Because monotone remembers what files were extracted, you can issue a command called commit to generate another revision using the same list of files (unless told to do otherwise).

Extracting

Once a revision ID is known, you may extract the archive that corresponds to that revision into a directory, using a command called checkout. If you want to extract an archive into a directory that already exists...use the update command. Just as with setup, performing a checkout or update keeps a connection between a workspace and a monotone database. '''(eds: er, you can update in a workspace that already exists. i don't believe you can run update in any old directory and have it work.)'''

Transferring

Revisions can be transferred from one monotone database into another. The monotone application is able to run as a server and accept requests from another instance of monotone running as a client. These client requests can push revisions from the client's monotone database into one residing on the server. They can also pull revisions from the server and into a monotone database managed by the client.

(NOTE: Technically, both the client and server could run on the same machine, allowing one monotone database to be updated from another monotone database on the same file system.)

Once foreign revisions have been migrated into a monotone database, you may checkout or update them to unpack snapshots of the files tracked by that revision. There is no need to be connected to the network to perform these checkouts and updates, so long as the push or pull that moved the revision has been completed.

The term sync refers to a command that pushes and pulls at the same time. It is slightly faster than performing the two in succession. Sync has the advantage of utilizing your up and down bandwidth simultaneously. (The implementations of push and pull are built upon a core sync routine, but in those cases one side skips out on actually sending anything.)

Because a monotone database is actually a Sqlite database, it is a single file that is binary compatible across machine architectures. A monotone database usually starts out empty and is filled using commits and pulls. However, copying a pre-prepared monotone database that someone else has populated may be faster than running the push/pull commands yourself.

Labeling

Security and network authorization in Monotone client/server relationships are managed by the use of cryptographic keys. It is also possible to add metadata onto individual revisions which is digitally signed, and this metadata is called a cert. Contributors may add as many of these signed certs as they like. Once a cert has been added it is not generally practical to remove it.

Versioning in Monotone

Most of Monotone's interesting features are enabled by tracking a concept of how one revision relates to another in conceptual sequence. '''(eds: technically the set of revisions forms a directed acyclic graph. don't know if that's appropriate to point out for your target audience or not. the fact that it is acyclic does matter though.)'''

Hierarchy

During a commit, a relationship is generally set up between a "child" revision (the new one being created) and a "parent" revision (that already existed in the monotone database). If a commit is run and a parent is indicated which already has a child, then the parent will have multiple child revisions.

These relationships enable commands like "get the next/previous revision", as opposed to making the user keep their own list of revision IDs. When a commit is executed, it is assumed that the parent is the revision that was last fetched from the database via checkout or update. If no checkout or update has been run since the last setup or commit, it is assumed that the parent of the new version is the product of the prior setup or commit.

(NOTE: For efficiency of implementation, Monotone uses ancestral relationships as a hint for how to store revisions efficiently. It is assumed that a parent and child will have a large amount in common and thus the database can store a "diff" instead of full copies of the changed files.)

Merging

Often it's the case that users on different machines will be adding independent children to a revision which they both pulled from a common source. When these two children are brought together into the same database, it will be apparent that there is a parent with multiple children. The existence of this fork in the ancestry is a clue suggesting an intention to merge together these revisions as soon as possible. When a merge command is issued between two revisions, the resulting revision is a child of both--which ties up a "stray" leaf in the ancestry graph.

The large revision id number which is generated to identify a revision (generated by either a commit or a merge) is the product of a cryptographic function of the file contents and the ancestry information. In terms of security, this means there's a strong assurance that any time someone mentions a revision id it can be verified as referring to the same data that the original speaker intended. Another consequence of naming revisions this way is that there are no conflicts at all if two users perform a merge or commit that generates identical results. They can push those identical revisions into a common monotone database without incident.

As part of its operation, the update command will merge a revision into your current state, in those cases where you have not committed a revision for your local changes yet. This has the disadvantage of making it impossible to return (eds: your workspace) to the state prior to the update. It is almost always more valuable to first commit your working changes, (eds: do the merge) and then update them (eds: them -> your workspace). This provides a revision id that refers to your changes prior to merging, allowing you to extract or push your local changes if you needed to.

Branches

One of the widespread uses of certs in Monotone is to provide a textual label known as a branch. By convention, branches are given names that are hierarchical, in order to facilitate the use of wildcard pattern matching. The push and pull commands tend to use these wildcards, and are a way of saying "push or pull all the revisions which match this wildcard". The creation of a named branch is usually a way of expressing a temporary (or permanent) intention to create code that's not to be merged into the "mainline" of a project. This is especially useful for indicating revisions that are part of a parallel development of new, tricky, or disruptive code.

Many of the monotone command-line parameters do not act specifically on revision ids but instead derive the relevant ids by querying for branch labels. There are benefits of using branch information which builds upon the cert system as opposed to raw revision ids. The cert of a branch ensures that claims about a revision being the "latest" or the "most interesting one everyone should merge" can be tested cryptographically. (The commands don't happen to check these at the time of a pull, but rather when a merge or checkout is being run.)

One of the useful commands that can be run on a branch is to list its heads. These are the revisions that do not have any children, which represent continuing independent paths of development within that branch. Because branches are implemented using certs (and a cert can be put on any revision) it is possible to label arbitrary revisions in the history as belonging to a branch--giving it a new head. These heads are typically merged together in a branch, and they can be merged across branches using the propagate command.

Histories

Currently, trying to push or pull a specific revision across monotone databases will also push or pull all revisions in its ancestry. This means that if you intend to join an existing project with considerable history, you won't be able to pull only the last few revisions. There is no underlying technical reason why this needs to be the case, although there may be some implication in terms of the ability to verify the legitimacy of the cryptographic hashes if one is allowed to pull partial histories.

A side effect of making frequent commits instead of using update is that this will create revisions that others are unlikely to be interested in, and which you may not wish to share. To keep from pushing these intermediate changes along when you push your final revision to others when they merge, you can execute this procedure:

monotone diff -r starting_base_revision > /tmp/mydiff
checkout the base again
patch < /tmp/mydiff
commit

'''(eds: this set of actions still doesn't make sense to me. i'm also not convinced that you should, as a practice, try to hide intermediate commits in this way. if you're primarily concerned about others trying to merge your commits too soon, just don't push until you are ready.)'''

This makes revisions an acceptable way of tracing more minute changes to files, such as to commit many times during editing.