📄 structure
字号:
Subversion on Berkeley DB -*- text -*-There are many different ways to implement the Subversion filesysteminterface. You could implement it directly using ordinary POSIXfilesystem operations; you could build it using an SQL server as aback end; you could build it on RCS; and so on.This implementation of the Subversion filesystem interface is built ontop of Berkeley DB (http://www.sleepycat.com). Berkeley DB supportstransactions and recoverability, making it well-suited for Subversion.Nodes and Node RevisionsIn a Subversion filesystem, a `node' corresponds roughly to an`inode' in a Unix filesystem: * A node is either a file or a directory. * A node's contents change over time. * When you change a node's contents, it's still the same node; it's just been changed. So a node's identity isn't bound to a specific set of contents. * If you rename a node, it's still the same node, just under a different name. So a node's identity isn't bound to a particular filename.A `node revision' refers to a node's contents at a specific point intime. Changing a node's contents always creates a new revision of thatnode. Once created, a node revision's contents never change.When we create a node, its initial contents are the initial revision ofthe node. As users make changes to the node over time, we create newrevisions of that same node. When a user commits a change that deletesa file from the filesystem, we don't delete the node, or any revisionof it --- those stick around to allow us to recreate prior revisions ofthe filesystem. Instead, we just remove the reference to the nodefrom the directory.ID'sWithin the database, we refer to nodes and node revisions using astring of three unique identifiers (the "node ID", the "copy ID", andthe "txn ID"), separated by periods. node_revision_id ::= node_id '.' copy_id '.' txn_idThe node ID is unique to a particular node in the filesystem acrossall of revision history. That is, two node revisions who sharerevision history (perhaps because they are different revisions of thesame node, or because one is a copy of the other, e.g.) have the samenode ID, whereas two node revisions who have no common revisionhistory will not have the same node ID.The copy ID is a key into the `copies' table (see `Copies' below), andidentifies that a given node revision, or one of its ancestors,resulted from a unique filesystem copy operation.The txn ID is just an identifier that is unique to a single filesystemcommit. All node revisions created as part of a commit share this txnID (which, incidentally, gets its name from the fact that this id isthe same id used as the primary key of Subversion transactions; see`Transactions' below).A directory entry identifies the file or subdirectory it refers tousing a node revision ID --- not a node ID. This means that a changeto a file far down in a directory hierarchy requires the parentdirectory of the changed node to be updated, to hold the new noderevision ID. Now, since that parent directory has changed, its parentneeds to be updated, and so on to the root. We call this process"bubble-up".If a particular subtree was unaffected by a given commit, the noderevision ID that appears in its parent will be unchanged. Whendoing an update, we can notice this, and ignore that entiresubtree. This makes it efficient to find localized changes inlarge trees.A Word About KeysSome of the Subversion database tables use base-36 numbers as theirkeys. Some debate exists about whether the use of base-36 (as opposedto, say, regular decimal values) is either necessary or good. It isoutside the scope of this document to make a claim for or against thisusage. As such, the reader will please note that for the majority ofthe document, the use of the term "number" when referring to keys ofdatabase tables should be interpreted to mean "a monotonicallyincreasing unique key whose order with respect to other keys in thetable is irrelevant". :-)To determine the actual type currently in use for the keys of a giventable, you are invited to check out the "Appendix: Filesystemstructure summary" section of this document.NODE-REVISION and HEADER: how we represent a node revisionWe represent a given revision of a file or directory node using a listskel (see skel.h for an explanation of skels). A node revision skelhas the form: (HEADER PROP-KEY KIND-SPECIFIC ...)where HEADER is a header skel, whose structure is common to all nodes,PROP-KEY is the key of the representation that contains this node'sproperties list, and the KIND-SPECIFIC elements carry data dependenton what kind of node this is --- file, directory, etc.HEADER has the form: (KIND CREATED-PATH PRED-ID PRED-COUNT)where: * KIND indicates what sort of node this is. It must be one of the following: - "file", indicating that the node is a file (see FILE below). - "dir", indicating that the node is a directory (see DIR below). * CREATED-PATH is the canonicalized absolute filesystem path at which this node was created. * PRED-ID, if present, indicates the node revision which is the immediate ancestor of this node. * PRED-COUNT, if present, indicates the number of predecessors the node revision has (recursively).Note that a node cannot change its kind from one revision to the next.A directory node is always a directory; a file node is always a file;etc. The fact that the node's kind is stored in each node revision,rather than in some revision-independent place, might suggest thatit's possible for a node change kinds from revision to revision, butSubversion does not allow this.PROP-KEY is a key into the `representations' table (see REPRESENTATIONS below), whose value is a representation pointing to a string (see `strings' table) that is a PROPLIST skel.The KIND-SPECIFIC portions are discussed below.PROPLIST: a property list is a list skel of the form: (NAME1 VALUE1 NAME2 VALUE2 ...)where each NAMEi is the name of a property, and VALUEi is the value ofthe property named NAMEi. Every valid property list has an evennumber of elements.FILE: how files are represented.If a NODE-REVISION's header's KIND is "file", then the node-revisionskel represents a file, and has the form: (HEADER PROP-KEY DATA-KEY [EDIT-DATA-KEY])where DATA-KEY identifies the representation for the file's currentcontents, and EDIT-DATA-KEY identifies the representation currentlyavailable for receiving new contents for the file.See discussion of representations later.DIR: how directories are represented.If the header's KIND is "dir", then the node-revision skelrepresents a directory, and has the form: (HEADER PROP-KEY ENTRIES-KEY)where ENTRIES-KEY identifies the representation for the directory'sentries list (see discussion of representations later). An entrieslist has the form (ENTRY ...)where each entry is (NAME ID)where: * NAME is the name of the directory entry, in UTF-8, and * ID is the ID of the node revision to which this entry refersREPRESENTATIONS: where and how Subversion stores your data.Some parts of a node revision are essentially constant-length: forexample, the KIND field and the REV. Other parts can havearbitrarily varying length: property lists, file contents, anddirectory entry lists. This variable-length data is often similarfrom one revision to the next, so Subversion stores just the deltasbetween them, instead of successive fulltexts.The HEADER portion of a node revision holds the constant-length stuff,which is never deltified. The rest of a node revision just points todata stored outside the node revision proper. This design makes therepository code easier to maintain, because deltification andundeltification are confined to a layer separate from node revisions,and makes the code more efficient, because Subversion can retrievejust the parts of a node it needs for a given operation.Deltifiable data is stored in the `strings' table, as mediated by the`representations' table. Here's how it works:The `strings' table stores only raw bytes. A given string could beany one of these: - a file's contents - a delta that reconstructs file contents, or part of a file's contents - a directory entry list skel - a delta that reconstructs a dir entry list skel, or part of same - a property list skel - a delta that reconstructs a property list skel, or part of sameThere is no way to tell, from looking at a string, what kind of datait is. A directory entry list skel is indistinguishable from filecontents that just happen to look exactly like the unparsed form of adirectory entry list skel. File contents that just happen to looklike svndiff data are indistinguishable from delta data.The code is able to interpret a given string because Subversion a) knows whether to be looking for a property list or some kind-specific data, b) knows the `kind' of the node revision in question, c) always goes through the `representations' table to discover if any undeltification or other transformation is needed.The `representations' table is an intermediary between node revisionsand strings. Node revisions never refer directly into the `strings'table; instead, they always refer into the `representations' table,which knows whether a given string is a fulltext or a delta, and if itis a delta, what it is a delta against. That, combined with theknowledge in (a) and (b) above, allows Subversion to retrieve the dataand parse it appropriately. A representation has the form: (HEADER KIND-SPECIFIC)where HEADER is (KIND TXN [CHECKSUM])The KIND is "fulltext" or "delta". TXN is the txn ID for the txn inwhich this representation was created. CHECKSUM is a checksum of therepresentation's contents, that is, what the representation produces,regardless of whether it is stored deltified or as fulltext. (Forcompatibility with older versions of Subversion, CHECKSUM may beabsent, in which case the filesystem behaves as though the checksum isthere and is correct.)The TXN also serves as a kind of mutability flag: if txn T tries tochange a representation's contents, but the rep's TXN is not T, thensomething has gone horribly wrong and T should leave the rep alone(and probably error). Of course, "change a representation" here meanschanging what the rep's consumer sees. Switching a representation'sstorage strategy, for example from fulltext to deltified, wouldn'tcount as a change, since that wouldn't affect what the rep produces.KIND-SPECIFIC varies considerably depending on the kind ofrepresentation. Here are the two forms currently recognized: (("fulltext" TXN CHECKSUM) KEY) The data is at KEY in the `strings' table. (("delta" TXN CHECKSUM) (OFFSET WINDOW) ...) Each OFFSET indicates the point in the fulltext that this element reconstructs, and WINDOW says how to reconstruct it: WINDOW ::= (DIFF SIZE REP-KEY [REP-OFFSET]) ; DIFF ::= ("svndiff" VERSION STRING-KEY) Notice that a WINDOW holds only metadata. REP-KEY says what the window should be applied against, or none if this is a self-compressed delta; SIZE says how much data this window reconstructs; VERSION says what version of the svndiff format is being used (currently only version 0 is supported); and STRING-KEY says which string contains the actual svndiff data (there is no diff data held directly in the representations table, of course). Note also that REP-KEY might refer to a representation that itself requires undeltification. We use a delta combiner to combine all the deltas needed to reproduce the fulltext from some stored plaintext. Branko says this is what REP-OFFSET is for: > The offsets embedded in the svndiff are stored in a string; > these offsets would be in the representation. The point is that > you get all the information you need to select the appropriate > windows from the rep skel -- without touching a single > string. This means a bit more space used in the repository, but > lots less memory used on the server. We'll see if it turns out to be necessary.In the future, there may be other representations, for exampleindicating that the text is stored elsewhere in the database, orperhaps in an ordinary Unix file.Let's work through an example node revision: (("file" REV COUNT) PROP-KEY "2345")The entry for key "2345" in `representations' is: (("delta" TXN CHECKSUM) (0 (("svndiff" 0 "1729") 65 "2343")))and the entry for key "2343" in `representations' is: (("fulltext" TXN CHECKSUM) "1001")while the entry for key "1729" in `strings' is: <some unprintable glob of svndiff data>which, when applied to the fulltext at key "1001" in strings, resultsin this new fulltext: "((some text) (that looks) (deceptively like) (directory entries))"Et voila! Subversion knew enough, via the `representations' and`strings' tables, to undeltify and get that fulltext; and knew enough,
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -