📄 lj_article.txt
字号:
The Subversion Project: Building a Better CVS ============================================== Ben Collins-Sussman <sussman@collab.net> Written in August 2001 Published in Linux Journal, January 2002Abstract--------This article discusses the history, goals, features and design ofSubversion (http://subversion.tigris.org), an open-source project thataims to produce a compelling replacement for CVS.Introduction ------------If you work on any kind of open-source project, you've probably workedwith CVS. You probably remember the first time you learned to do ananonymous checkout of a source tree over the net -- or your firstcommit, or learning how to look at CVS diffs. And then the fatefulday came: you asked your friend how to rename a file."You can't", was the reply.What? What do you mean?"Well, you can delete the file from the repository and then re-add itunder a new name."Yes, but then nobody would know it had been renamed..."Let's call the CVS administrator. She can hand-edit the repository'sRCS files for us and possibly make things work."What?"And by the way, don't try to delete a directory either."You rolled your eyes and groaned. How could such simple tasks bedifficult?The Legacy of CVS-----------------No doubt about it, CVS has evolved into the standard SoftwareConfiguration Management (SCM) system of the open source community.And rightly so! CVS itself is Free software, and its wonderful "nonlocking" development model -- whereby dozens of far-flung programmerscollaborate -- fits the open-source world very well. In fact, onemight argue that without CVS, it's doubtful whether sites likeFreshmeat or Sourceforge would ever have flourished as they do now.CVS and its semi-chaotic development model have become an essentialpart of open source culture.So what's wrong with CVS?Because it uses the RCS storage-system under the hood, CVS can onlytrack file contents, not tree structures. As a result, the user hasno way to copy, move, or rename items without losing history. Treerearrangements are always ugly server-side tweaks.The RCS back-end cannot store binary files efficiently, and branchingand tagging operations can grow to be very slow. CVS also uses thenetwork inefficiently; many users are annoyed by long waits, becausefile differeces are sent in only one direction (from server to client,but not from client to server), and binary files are alwaystransmitted in their entirety.From a developer's standpoint, the CVS codebase is the result oflayers upon layers of historical "hacks". (Remember that CVS beganlife as a collection of shell-scripts to drive RCS.) This makes thecode difficult to understand, maintain, or extend. For example: CVS'snetworking ability was essentially "stapled on". It was neverdesigned to be a native client-server system. Rectifying CVS's problems is a huge task -- and we've only listed justa few of the many common complaints here.Enter Subversion----------------In 1995, Karl Fogel and Jim Blandy founded Cyclic Software, a companyfor commercially supporting and improving CVS. Cyclic made the firstpublic release of a network-enabled CVS (contributed by Cygnussoftware.) In 1999, Karl Fogel published a book about CVS and theopen-source development model it enables (cvsbook.red-bean.com). Karland Jim had long talked about writing a replacement for CVS; Jim hadeven drafted a new, theoretical repository design. Finally, inFebruary of 2000, Brian Behlendorf of CollabNet (www.collab.net)offered Karl a full-time job to write a CVS replacement. Karlgathered a team together and work began in May.The team settled on a few simple goals: it was decided that Subversionwould be designed as a functional replacement for CVS. It would doeverything that CVS does -- preserving the same development modelwhile fixing the flaws in CVS's (lack-of) design. Existing CVS userswould be the target audience: any CVS user should be able to startusing Subversion with little effort. Any other SCM "bonus features"were decided to be of secondary importance (at least before a 1.0release.)At the time of writing, the original team has been coding for a littleover a year, and we have a number of excellent volunteer contributors.(Subversion, like CVS, is a open-source project!)Subversion's Features----------------------Here's a quick run-down of some of the reasons you should be excitedabout Subversion: * Real copies and renames. The Subversion repository doesn't use RCS files at all; instead, it implements a 'virtual' versioned filesystem that tracks tree-structures over time (described below). Files *and* directories are versioned. At last, there are real client-side `mv' and `cp' commands that behave just as you think. * Atomic commits. A commit either goes into the repository completely, or not all. * Advanced network layer. The Subversion network server is Apache, and client and server speak WebDAV(2) to one another. (See the 'design' section below.) * Faster network access. A binary diffing algorithm is used to store and transmit deltas in *both* directions, regardless of whether a file is of text or binary type. * Filesystem "properties". Each file or directory has an invisible hashtable attached. You can invent and store any arbitrary key/value pairs you wish: owner, perms, icons, app-creator, mime-type, personal notes, etc. This is a general-purpose feature for users. Properties are versioned, just like file contents. And some properties are auto-detected, like the mime-type of a file (no more remembering to use the '-kb' switch!) * Extensible and hackable. Subversion has no historical baggage; it was designed and then implemented as a collection of shared C libraries with well-defined APIs. This makes Subversion extremely maintainable and usable by other applications and languages. * Easy migration. The Subversion command-line client is very similar to CVS; the development model is the same, so CVS users should have little trouble making the switch. Development of a 'cvs2svn' repository converter is in progress. * It's Free. Subversion is released under a Apache/BSD-style open-source license.Subversion's Design-------------------Subversion has a modular design; it's implemented as a collection of Clibraries. Each layer has a well-defined purpose and interface. Ingeneral, code flow begins at the top of the diagram and flows"downward" -- each layer provides an interface to the layer above it. <<insert diagram here: svn.tiff>>Let's take a short tour of these layers, starting at the bottom.--> The Subversion filesystem. The Subversion Filesystem is *not* a kernel-level filesystem that onewould install in an operating system (like the Linux ext2 fs.)Instead, it refers to the design of Subversion's repository. Therepository is built on top of a database -- currently Berkeley DB --and thus is a collection of .db files. However, a library accessesthese files and exports a C API that simulates a filesystem --specifically, a "versioned" filesystem.This means that writing a program to access the repository is likewriting against other filesystem APIs: you can open files anddirectories for reading and writing as usual. The main difference isthat this particular filesystem never loses data when written to; oldversions of files and directories are always saved as historicalartifacts.Whereas CVS's backend (RCS) stores revision numbers on a per-filebasis, Subversion numbers entire trees. Each atomic 'commit' to therepository creates a completely new filesystem tree, and isindividually labeled with a single, global revision number. Files anddirectories which have changed are rewritten (and older versions arebacked up and stored as differences against the latest version), whileunchanged entries are pointed to via a shared-storage mechanism. Thisis how the repository is able to version tree structures, not justfile contents.Finally, it should be mentioned that using a database like Berkeley DBimmediately provides other nice features that Subversion needs: dataintegrity, atomic writes, recoverability, and hot backups. (Seewww.sleepycat.com for more information.)--> The network layer.Subversion has the mark of Apache all over it. At its very core, theclient uses the Apache Portable Runtime (APR) library. (In fact, thismeans that Subversion client should compile and run anywhere Apachehttpd does -- right now, this list includes all flavors of Unix,Win32, BeOS, OS/2, Mac OS X, and possibly Netware.)However, Subversion depends on more than just APR -- the Subversion"server" is Apache httpd itself.Why was Apache chosen? Ultimately, the decision was about notreinventing the wheel. Apache is a time-tested, open-source serverprocess that ready for serious use, yet is still extensible. It cansustain a high network load. It runs on many platforms and canoperate through firewalls. It's able to use a number of differentauthentication protocols. It can do network pipelining and caching.By using Apache as a server, Subversion gets all these features forfree. Why start from scratch?Subversion uses WebDAV as its network protocol. DAV (DistributedAuthoring and Versioning) is a whole discussion in itself (seewww.webdav.org) -- but in short, it's an extension to HTTP that allowsreads/writes and "versioning" of files over the web. The Subversionproject is hoping to ride a slowly rising tide of support for thisprotocol: all of the latest file-browsers for Win32, MacOS, and GNOMEspeak this protocol already. Interoperability will (hopefully) becomemore and more of a bonus over time.For users who simply wish to access Subversion repositories on localdisk, the client can do this too; no network is required. The"Repository Access" layer (RA) is an abstract API implemented by boththe DAV and local-access RA libraries. This is a specific benefit ofwriting a "librarized" version control system; its a big win overCVS, which has two very different, difficult-to-maintain codepaths forlocal vs. network repository-access. Feel like writing a new networkprotocol for Subversion? Just write a new library that implements theRA API!--> The client libraries.On the client side, the Subversion "working copy" library maintainsadministrative information within special SVN/ subdirectories, similarin purpose to the CVS/ administrative directories found in CVS workingcopies.A glance inside the typical SVN/ directory turns up a bit more thanusual, however. The `entries' file contains XML which describes thecurrent state of the working copy directory (and which basicallyserves the purposes of CVS's Entries, Root, and Repository filescombined). But other items present (and not found in CVS/) includestorage locations for the versioned "properties" (the metadatamentioned in 'Subversion Features' above) and private caches ofpristine versions of each file. This latter feature provides theability to report local modifications -- and do reversions --*without* network access. Authentication data is also stored withinSVN/, rather than in a single .cvspass-like file.The Subversion "client" library has the broadest responsibility; itsjob is to mingle the functionality of the working-copy library withthat of the repository-access library, and then to provide ahighest-level API to any application that wishes to perform generalversion control actions.For example: the C routine `svn_client_checkout()' takes a URL as anargument. It passes this URL to the repository-access library andopens an authenticated session with a particular repository. It thenasks the repository for a certain tree, and sends this tree into theworking-copy library, which then writes a full working copy to disk(SVN/ directories and all.)The client library is designed to be used by any application. Whilethe Subversion source code includes a standard command-line client, itshould be very easy to write any number of GUI clients on top of theclient library. Hopefully, these GUIs should someday prove to be muchbetter than the current crop of CVS GUI applications (the majority ofwhich are no more than fragile "wrappers" around the CVS command-lineclient.)In addition, proper SWIG bindings (www.swig.org) should makethe Subversion API available to any number of languages: java, perl,python, guile, and so on. In order to Subvert CVS, it helps to beubiquitous! Subversion's Future-------------------The release of Subversion 1.0 is currently planned for early 2002.After the release of 1.0, Subversion is slated for additions such asi18n support, "intelligent" merging, better "changeset" manipulation,client-side plugins, and improved features for server administration.(Also on the wishlist is an eclectic collection of ideas, such asdistributed, replicating repositories.)A final thought from Subversion's FAQ: "We aren't (yet) attempting to break new ground in SCM systems, nor are we attempting to imitate all the best features of every SCM system out there. We're trying to replace CVS."If, in three years, Subversion is widely presumed to be the "standard"SCM system in the open-source community, then the project will havesucceeded. But the future is still hazy: ultimately, Subversionwill have to win this position on its own technical merits.Patches are welcome.For More Information--------------------Please visit the Subversion project website athttp://subversion.tigris.org. There are discussion lists to join, andthe source code is available via anonymous CVS -- and soon throughSubversion itself.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -