📄 schema-tradeoffs.txt
字号:
The Tradeoff============CVS indexes data in a certain way. When you create a CVS tag, a labelmust be applied to every single file in the repository. It takes O(N)time, where N is the size of the tree you're tagging. The tradeoff,however, is that someone can look at a specific version of a file, andsee *all* the tag-labels attached to it.Subversion's repository has the data indexed in the other direction.When you create an SVN tag, it makes a single new directory node thatpoints to an existing tree. It takes O(1) (constant) time. Thetradeoff, however, is that a version of a file is 'shared' by anynumber of directory paths. That means it takes O(N) time to findevery tag that contains the specific file.Why?===Why does Subversion index data this way? There are a few reasons thedesigners chose to do this. Having branches and tags in normaldirectory space makes it easy to browse them, easy to do accesscontrol on them, and (of course) they're automatically versioned.Also, the designers thought this would be optimizing the rightoperations. Organizations tend to create branches and tags quitefrequently -- much more frequently than asking the question "whichtags contain a specific file?" So if you can only make one of theseoperations O(1), you want it to be tagging. (Of course, if we knew away to make them *both* cheap, that would be the best solution! Butwe haven't found a way yet.)Questions that Users Ask========================Here are some questions subversion users might ask, and how subversiondeals with each question.1. "What version of foo.c is in tag X?"This is the easiest question to answer. Go into the tag-tree, andlook at the version of foo.c it contains.(This can be done with a simple "svn ls -v URL", where URL is a pathto the specific tag directory. Look at the first column of numbers.)2. "Does tag X contain the latest version of foo.c?"This is a bit harder to answer. From question 1, it's easy to seethat tag X contains version N of foo.c. But how do we know if that'sthe *latest* foo.c? Running 'svn log' on version N of foo.c won't help, because it onlygoes backwards in time. That is, it only shows predecessor nodes, notsuccessor nodes.Subversion-1.0 uses BerkeleyDB. The only reason 'svn log' showspredecessor nodes easily is because each node contains a back-pointerto its predecessor. It would be extremely painful to search BDB forsuccessors; BDB is mostly a glorified hashtable with transactions. [Note from kfogel: Is there some reason we can't store successor nodes at commit time? That is if N is a director successor of M, then when we create N, we add it to M's "successors list". Then we could track forward as well as backward... Nothing against having an SQL backend someday, of course, just pointing out that this particular problem can be solved simply in Berkeley DB.]Post-1.0 Subversion, however, will be able to use a SQL backend, andthen it will be very quick and easy to query for node successors. Atthat point, Subversion could make nice "complete history" graphs ofnodes, just like Clearcase does.3. "Which tags contain version N of foo.c?"This is the killer question, and the crux of the Tradeoff mentioned atthe beginning of this document. Because Subversion has O(1) tagging,the only way to answer this question is by brute-force searching. But there are two consolations to this tradeoff: A) "Rethink your work habits" From experience, when users ask question #3, it can very often be rephrased as a question about a *specific* tag. Very often, the manager doesn't really want to see the exhaustive list of every tag containing the file; instead, they simply want to know if a *certain* tag has the file. ("Did we give that file to a particular customer?") It turns into a "type-1" question. If you're used to CVS, it's very easy to instantly get the list of all tags attached to a file-version. And therefore you habituate to that, and use the tags-list as your main means of answering all your type-1 questions. But it's certainly not *required* to answer type-1 questions. B) "Build a cache" If a brute-force search is ever performed, it shouldn't be too difficult to cache the results of the search, because repository trees are immutable. That means the next time somebody runs the search, the search becomes *much* smaller. Eventually, the search can dwindle down into what feels like O(1) time, at least when viewed from a distance. :-)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -