📄 readme

📁 非常热门的linux管理软件git的雏形小而全
💻
📖 第 1 页 / 共 2 页
字号:
12 下一页
	GIT - the stupid content tracker"git" can mean anything, depending on your mood. - random three-letter combination that is pronounceable, and not   actually used by any common UNIX command.  The fact that it is a   mispronounciation of "get" may or may not be relevant. - stupid. contemptible and despicable. simple. Take your pick from the   dictionary of slang. - "global information tracker": you're in a good mood, and it actually   works for you. Angels sing, and a light suddenly fills the room.  - "goddamn idiotic truckload of sh*t": when it breaksThis is a stupid (but extremely fast) directory content manager.  Itdoesn't do a whole lot, but what it _does_ do is track directorycontents efficiently. There are two object abstractions: the "object database", and the"current directory cache" aka "index".	The Object Database (SHA1_FILE_DIRECTORY)The object database is literally just a content-addressable collectionof objects.  All objects are named by their content, which isapproximated by the SHA1 hash of the object itself.  Objects may referto other objects (by referencing their SHA1 hash), and so you can buildup a hierarchy of objects. All objects have a statically determined "type" aka "tag", which isdetermined at object creation time, and which identifies the format ofthe object (ie how it is used, and how it can refer to other objects). There are currently three different object types: "blob", "tree" and"commit". A "blob" object cannot refer to any other object, and is, like the tagimplies, a pure storage object containing some user data.  It is used toactually store the file data, ie a blob object is associated with someparticular version of some file. A "tree" object is an object that ties one or more "blob" objects into adirectory structure. In addition, a tree object can refer to other treeobjects, thus creating a directory hierarchy. Finally, a "commit" object ties such directory hierarchies together intoa DAG of revisions - each "commit" is associated with exactly one tree(the directory hierarchy at the time of the commit). In addition, a"commit" refers to one or more "parent" commit objects that describe thehistory of how we arrived at that directory hierarchy.As a special case, a commit object with no parents is called the "root"object, and is the point of an initial project commit.  Each projectmust have at least one root, and while you can tie several differentroot objects together into one project by creating a commit object whichhas two or more separate roots as its ultimate parents, that's probablyjust going to confuse people.  So aim for the notion of "one root objectper project", even if git itself does not enforce that. Regardless of object type, all objects are share the followingcharacteristics: they are all in deflated with zlib, and have a headerthat not only specifies their tag, but also size information about thedata in the object.  It's worth noting that the SHA1 hash that is usedto name the object is always the hash of this _compressed_ object, notthe original data.As a result, the general consistency of an object can always be testedindependently of the contents or the type of the object: all objects canbe validated by verifying that (a) their hashes match the content of thefile and (b) the object successfully inflates to a stream of bytes thatforms a sequence of <ascii tag without space> + <space> + <ascii decimalsize> + <byte\0> + <binary object data>. The structured objects can further have their structure and connectivityto other objects verified. This is generally done with the "fsck-cache"program, which generates a full dependency graph of all objects, andverifies their internal consistency (in addition to just verifying theirsuperficial consistency through the hash).The object types in some more detail:  BLOB: A "blob" object is nothing but a binary blob of data, and	doesn't refer to anything else.  There is no signature or any	other verification of the data, so while the object is	consistent (it _is_ indexed by its sha1 hash, so the data itself	is certainly correct), it has absolutely no other attributes. 	No name associations, no permissions.  It is purely a blob of	data (ie normally "file contents"). 	In particular, since the blob is entirely defined by its data,	if two files in a directory tree (or in multiple different	versions of the repository) have the same contents, they will	share the same blob object. The object is toally independent	of it's location in the directory tree, and renaming a file does	not change the object that file is associated with in any way.  TREE: The next hierarchical object type is the "tree" object.  A tree	object is a list of mode/name/blob data, sorted by name. 	Alternatively, the mode data may specify a directory mode, in	which case instead of naming a blob, that name is associated	with another TREE object. 	Like the "blob" object, a tree object is uniquely determined by	the set contents, and so two separate but identical trees will	always share the exact same object. This is true at all levels,	ie it's true for a "leaf" tree (which does not refer to any	other trees, only blobs) as well as for a whole subdirectory.	For that reason a "tree" object is just a pure data abstraction:	it has no history, no signatures, no verification of validity,	except that since the contents are again protected by the hash	itself, we can trust that the tree is immutable and its contents	never change. 	So you can trust the contents of a tree to be valid, the same	way you can trust the contents of a blob, but you don't know	where those contents _came_ from.	Side note on trees: since a "tree" object is a sorted list of	"filename+content", you can create a diff between two trees	without actually having to unpack two trees.  Just ignore all	common parts, and your diff will look right.  In other words,	you can effectively (and efficiently) tell the difference	between any two random trees by O(n) where "n" is the size of	the difference, rather than the size of the tree. 	Side note 2 on trees: since the name of a "blob" depends	entirely and exclusively on its contents (ie there are no names	or permissions involved), you can see trivial renames or	permission changes by noticing that the blob stayed the same. 	However, renames with data changes need a smarter "diff" implementation. CHANGESET: The "changeset" object is an object that introduces the	notion of history into the picture.  In contrast to the other	objects, it doesn't just describe the physical state of a tree,	it describes how we got there, and why. 	A "changeset" is defined by the tree-object that it results in,	the parent changesets (zero, one or more) that led up to that	point, and a comment on what happened.  Again, a changeset is	not trusted per se: the contents are well-defined and "safe" due	to the cryptographically strong signatures at all levels, but	there is no reason to believe that the tree is "good" or that	the merge information makes sense.  The parents do not have to	actually have any relationship with the result, for example. 	Note on changesets: unlike real SCM's, changesets do not contain	rename information or file mode chane information.  All of that	is implicit in the trees involved (the result tree, and the	result trees of the parents), and describing that makes no sense	in this idiotic file manager. TRUST: The notion of "trust" is really outside the scope of "git", but	it's worth noting a few things.  First off, since everything is	hashed with SHA1, you _can_ trust that an object is intact and	has not been messed with by external sources.  So the name of an	object uniquely identifies a known state - just not a state that	you may want to trust. 	Furthermore, since the SHA1 signature of a changeset refers to	the SHA1 signatures of the tree it is associated with and the	signatures of the parent, a single named changeset specifies	uniquely a whole set of history, with full contents.  You can't	later fake any step of the way once you have the name of a	changeset. 	So to introduce some real trust in the system, the only thing	you need to do is to digitally sign just _one_ special note,	which includes the name of a top-level changeset.  Your digital	signature shows others that you trust that changeset, and the	immutability of the history of changesets tells others that they	can trust the whole history. 	In other words, you can easily validate a whole archive by just	sending out a single email that tells the people the name (SHA1	hash) of the top changeset, and digitally sign that email using	something like GPG/PGP. 	In particular, you can also have a separate archive of "trust	points" or tags, which document your (and other peoples) trust. 	You may, of course, archive these "certificates of trust" using	"git" itself, but it's not something "git" does for you. Another way of saying the last point: "git" itself only handles contentintegrity, the trust has to come from outside. 	The "index" aka "Current Directory Cache" (".git/index")The index is a simple binary file, which contains an efficientrepresentation of a virtual directory content at some random time.  Itdoes so by a simple array that associates a set of names, dates,permissions and content (aka "blob") objects together.  The cache isalways kept ordered by name, and names are unique (with a few veryspecific rules) at any point in time, but the cache has no long-termmeaning, and can be partially updated at any time. In particular, the index certainly does not need to be consistent withthe current directory contents (in fact, most operations will depend ondifferent ways to make the index _not_ be consistent with the directoryhierarchy), but it has three very important attributes: (a) it can re-generate the full state it caches (not just the directory     structure: it contains pointers to the "blob" objects so that it     can regenerate the data too)     As a special case, there is a clear and unambiguous one-way mapping     from a current directory cache to a "tree object", which can be     efficiently created from just the current directory cache without     actually looking at any other data.  So a directory cache at any     one time uniquely specifies one and only one "tree" object (but     has additional data to make it easy to match up that tree object     with what has happened in the directory) (b) it has efficient methods for finding inconsistencies between that     cached state ("tree object waiting to be instantiated") and the     current state.  (c) it can additionally efficiently represent information about merge     conflicts between different tree objects, allowing each pathname to     be associated with sufficient information about the trees involved     that you can create a three-way merge between them.Those are the three ONLY things that the directory cache does.  It's a
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -