📄 structure

📁 linux subdivision ying gai ke yi le ba
💻
📖 第 1 页 / 共 3 页
字号:
       (there is no diff data held directly in the representations
       table, of course).

       Note also that REP-KEY might refer to a representation that
       itself requires undeltification.  We use a delta combiner to
       combine all the deltas needed to reproduce the fulltext from
       some stored plaintext.

       Branko says this is what REP-OFFSET is for:
       > The offsets embedded in the svndiff are stored in a string;
       > these offsets would be in the representation. The point is that
       > you get all the information you need to select the appropriate
       > windows from the rep skel -- without touching a single
       > string. This means a bit more space used in the repository, but
       > lots less memory used on the server.

       We'll see if it turns out to be necessary.

In the future, there may be other representations, for example
indicating that the text is stored elsewhere in the database, or
perhaps in an ordinary Unix file.

Let's work through an example node revision:

   (("file" REV COUNT) PROP-KEY "2345")

The entry for key "2345" in `representations' is:

   (("delta" TXN CHECKSUM) (0 (("svndiff" 0 "1729") 65 "2343")))

and the entry for key "2343" in `representations' is:

   (("fulltext" TXN CHECKSUM) "1001")

while the entry for key "1729" in `strings' is:

   <some unprintable glob of svndiff data>

which, when applied to the fulltext at key "1001" in strings, results
in this new fulltext:

   "((some text) (that looks) (deceptively like) (directory entries))"

Et voila!  Subversion knew enough, via the `representations' and
`strings' tables, to undeltify and get that fulltext; and knew enough,
because of the node revision's "file" type, to interpret the result as
file contents, not as a directory entry list.

(Note that the `strings' table stores multiple DB values per key.
That is, although it's accurate to say there is one string per key,
the string may be divided into multiple consecutive blocks, all
sharing that key.  You use a Berkeley DB cursor to find the desired
value[s], when retrieving a particular offset+len in a string.)

Representations know nothing about ancestry -- the `representations'
table never refers to node revision id's, only to strings or to other
representations.  In other words, while the `nodes' table allows
recovery of ancestry information, the `representations' and `strings'
tables together handle deltification and undeltification
*independently* of ancestry.  At present, Subversion generally stores
the youngest strings in "fulltext" form, and older strings as "delta"s
against them.  However, there's nothing magic about that particular
arrangement.  Other interesting alternatives:

   * We could store the N most recently accessed strings as fulltexts,
     letting access patterns determine the most appropriate
     representation for each revision.

   * We could occasionally store deltas against the N'th younger
     revision, storing larger jumps with a frequency inverse to the
     distance covered, yielding a tree-structured history.

Since the filesystem interface doesn't expose these details, we can
change the representation pretty much as we please to optimize
whatever parameter we care about --- storage size, speed, robustness,
etc.

Representations never share strings.  Every string is represented by
exactly one representation; every representation represents exactly
one string.  This is so we can replace a string with deltified version
of itself, change the representation referring to it, and know that
we're not messing up any other reps by doing so.


Further Notes On Deltifying:
----------------------------

When a representation is deltified, it is changed in place, along with
its underlying string.  That is, the node revision referring to that
representation will not be changed; instead, the same rep key will now
be associated with different value.  That way, we get reader locking
for free: if someone is reading a file while Subversion is deltifying
that file, one of the two sides will get a DB_DEADLOCK and
svn_fs__retry_txn() will retry.

### todo: add a note about cycle-checking here, too.



The Berkeley DB "nodes" table

The database contains a table called "nodes", which is a btree indexed
by node revision ID's, mapping them onto REPRESENTATION skels.  Node 0
is always the root directory, and node revision ID 0.0.0 is always the
empty directory.  We use the value of the key 'next-id' to indicate
the next unused node ID.

Assuming that we store the most recent revision on every branch as
fulltext, and all other revisions as deltas, we can retrieve any node
revision by searching for the last revision of the node, and then
walking backwards to specific revision we desire, applying deltas as
we go.



REVISION: filesystem revisions, and the Berkeley DB "revisions" table

We represent a filesystem revision using a skel of the form:

    ("revision" TXN)

where TXN is the key into the `transactions' table (see 'Transactions' below)
whose value is the transaction that was committed to create this revision.

The database contains a table called "revisions", which is a
record-number table mapping revision numbers onto REVISION skels.
Since Berkeley DB record numbers start with 1, whereas Subversion
filesystem revision numbers start at zero, revision V is stored as
record number V+1 in the `revisions' table.  Filesystem revision zero
always has node revision 0.0.0 as its root directory; that node
revision is guaranteed to be an empty directory.



Transactions

Every transaction ends when it is either successfully committed, or
aborted.  We call a transaction which has been either committed or
aborted "finished", and one which hasn't "unfinished".  

Transactions are identified by unique numbers, called transaction
ID's.  Currently, transaction ID's are never reused, though this is
not mandated by the schema.  In the database, we always represent a
transaction ID in its shortest ASCII form.

The Berkeley DB `transactions' table records both unfinished and
committed transactions.  Every key in this table is a transaction ID.
Unfinished transactions have values that are skels of one of the
following forms:

   ("transaction" ROOT-ID BASE-ID PROPLIST COPIES)
   ("dead" ROOT-ID BASE-ID PROPLIST COPIES)

where:

   * ROOT-ID is the node revision ID of the transaction's root
     directory.

   * BASE-ID is the node revision ID of the root of the transaction's
     base revision.

   * PROPLIST is a skel giving the revision properties for the
     transaction.

   * COPIES contains a list of keys into the `copies' table,
     referencing all the filesystem copies created inside of this
     transaction.  If the transaction is aborted, these copies get
     removed from the `copies' table.

   * A "dead" transaction is one that has been requested to be
     destroyed, and should never, ever, be committed.

Committed transaction, however, have values that are skels of the form:

   ("committed" ROOT-ID REV PROPLIST COPIES)

where:

   * ROOT-ID is the node revision ID of the committed transaction's (or
     revision's) root node.

   * REVISION represents the revision that was created when the
     transaction was committed.

   * PROPLIST is a skel giving the revision properties for the
     committed transaction.

   * COPIES contains a list of keys into the `copies' table,
     referencing all the filesystem copies created by this committed
     transaction.  Nothing currently uses this information for
     committed transactions, but it could be useful in the future.

As the sole exception to the rule above, the `transactions' table
always has one entry whose key is `next-id', and whose value is the
lowest transaction ID that has never yet been used.  We use this entry
to allocate ID's for new transactions.

The `transactions' table is a btree, with no particular sort order.



Changes

As modifications are made (files and dirs added or removed, text and
properties changed, etc.) on Subversion transaction trees, the
filesystem tracks the basic change made in the Berkeley DB `changes'
table.  

The `changes' table is a btree with Berkeley's "duplicate keys"
functionality (and with no particular sort order), and maps the
one-to-many relationship of a transaction ID to a "change" item.
Change items are skels of the form:

   ("change" PATH ID CHANGE-KIND TEXT-MOD PROP-MOD)

where:

   * PATH is the path that was operated on to enact this change.

   * ID is the node revision ID of the node changed (may be a zero
     atom, but only in the "reset" kind case).

   * CHANGE-KIND is one of the following:

     - "add"     : PATH/ID was added to the filesystem.
     - "delete"  : PATH/ID was removed from the filesystem.
     - "replace" : PATH/ID was removed, then re-added to the filesystem.
     - "modify"  : PATH/ID was otherwise modified.
     - "reset"   : Ignore any previous changes for PATH/ID in this txn.

   * TEXT-MOD is a bit specifying whether or not the contents of
     this node was modified.

   * PROP-MOD is a bit specifying whether or not the properties of
     this node where modified.

In order to fully describe the changes made to any given path as part
of a single transaction, one must read all the change items associated
with the transaction's ID, and "collapse" multiple entries that refer
to that path. 



Copies

Each time a filesystem copy operation is performed, Subversion records
meta-data about that copy.  

Copies are identified by unique numbers called copy ID's.  Currently,
copy ID's are never reused, though this is not mandated by the schema.
In the database, we always represent a copy ID in its shortest ASCII
form.

The Berkeley DB `copies' table records all filesystem copies.  Every
key in this table is copy ID, and every value is a skel of one of the
following forms:

   ("copy" SRC-PATH SRC-TXN DST-NODE-ID)
   ("soft-copy" SRC-PATH SRC-TXN DST-NODE-ID)

where:

   * SRC-PATH and SRC-TXN are the canonicalized absolute path and
     transaction ID, respectively, of the source of the copy.

   * DST-NODE-ID represents the new node revision created as a result
     of the copy.

As the sole exception to the rule above, the `copies' table always has
one entry whose key is `next-id', and whose value is the lowest copy ID
that has never yet been used.  We use this entry to allocate new
copy ID's.

The `copies' table is a btree, with no particular sort order.



Merge rules

The Subversion filesystem must provide the following characteristics:

- clients can submit arbitrary rearrangements of the tree, to be
  performed as atomic changes to the filesystem tree
- multiple clients can submit non-overlapping changes at the same time,
  without blocking
- readers must never block other readers or writers
- writers must never block readers
- writers may block writers

Merging rules:

   The general principle: a series of changes can be merged iff the
   final outcome is independent of the order you apply them in.

Merging two nodes, A and B, with respect to a common ancestor
ANCESTOR:
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -