📄 ngp.apt

📁 jsr170接口的java实现。是个apache的开源项目。
💻 APT
📖 第 1 页 / 共 2 页
字号:
上一页 12
   Some JCR operations are defined to affect the persistent workspace storage
   directly without going through the transient space of the session. Such
   operations are handled by creating a new draft revision for just that
   operation and persisting it as described above. If the operation succeeds,
   the session is updated to use the persisted revision as the new base
   revision.

[ngp/workspace.jpg] Workspace operation

Advanced Features

   The revision model offers very straightforward implementations of many
   advanced features. This section discusses some of the most prominent
   examples.

* Transactions

   Transactions that span multiple Session.save() operations are handled
   with an alternative branch of persisted revisions. Instead of making a
   persisted revision globally available as the latest revision of the
   workspace, it is kept local to the transaction. When the transaction is
   committed, all the revisions in the transaction branch are merged into
   a single draft revision that is then persisted normally as described above. 

[ngp/transaction.jpg] Transaction

   If the merged revision can not be persisted (causing the commit to fail) or
   if the transaction is explicitly rolled back, then the revisions in the
   transaction branch are discarded. 

   This model can also easily support two-phase commits in a distributed
   transaction.

* Namespace and Node Type Management

   If the revision model was repository-scoped as discussed above, then
   the namespace and node type registries could be managed as normal
   (write-protected) content under the global <<<jcr:system>>> subtree as
   described in the JCR specification. Such a solution, while probably more
   complex than having the registries in custom data structures, would have
   many nice features.

   If these global registries were managed as normal content then most of
   the other advanced features would cover also repository management. For
   example it would be possible to transactionally register or modify node
   types or to make the node type and namespace registries versionable!
   Backup and recovery operations would automatically contain also this
   repository metadata, and no extra code would be required for clustering
   support of node type or namespace changes. Even observation of the
   <<<jcr:system/jcr:nodeTypes>>> subtree would come for free.

* Versioning

   Since the revision model by default maintains a full change history of
   the entire repository it is possible to heavily optimize versioning
   operations. For example a check-in operation can be performed by simply
   recording the persisted revision where the checked in node was found. 

* Observation

   All the information needed for sending JCR observation events is
   permanently stored in the persisted revisions, which not only simplifies
   the observation implementation but also enables many advanced observation
   features.

   One tricky issue that this model solves quite nicely is the problem on how
   to handle access control of item removal events. If the item in question
   has already been removed, then many access control implementations no
   longer have a way to determine whether access to that item should be
   granted to a given session. With the revision model it is possible to ask
   whether a session would have been allowed to access the item in question
   when it still did exist, and to filter access to the remove events based
   on that information.

   The full change history kept by the revision model enables a new feature,
   <persistent observation>, in which a client can request all events since
   a given checkpoint to be replayed to the registered event listeners
   of a session.

   The revision history can also be used as a full write-level audit trail
   of the content repository.

* Hot and Incremental Backups

   Implementing hot backups is almost trivial since persisted revisions are
   never modified. Thus it is possible for a backup tool to simply copy the
   persisted revisions even if the repository that created them is still
   running.

   Once a full repository or workspace backup has been made, only new revision
   files need to be copied to keep the backed up copy up to date. If the
   revisions are stored as files on disk, then standard tools like <<<rsync>>>
   can be used to maintain an incremental hot backup of the repository. 

* Point-in-Time Recovery

   The revision model allows a repository or a workspace to be "rewinded" back
   to a previous point in time without doing a full recovery from backups.
   This makes it very easy and efficient to undo operations like accidental
   removals of large parts of the repository. 

* Clustering

   A repository cluster can be implemented on top of the revision model by
   making sure that operations to persist revisions are synchronized across
   cluster nodes.

   For example a token passing system can be used to ensure that only one
   cluster node can persist changes at a time. Once the node has persisted
   a revision it can multicast it to the other nodes and release the
   synchronization token. Since all change information is included in the
   revision the other nodes can for example easily send the appropriate
   observation events.

   A node can easily be added to or removed from a cluster. A fresh node
   will bootstrap itself by streaming the entire repository contents from
   the other nodes.
   
   An isolated cluster node can continue normal operation as a standalone
   repository. When the node is returned to the cluster it will first stream
   any new revisions from the other cluster nodes and request the
   synchronization token to merge those changes with any revisions that were
   persisted while the node was isolated. If the merge succeeds, the merged
   revisions are multicasted to the cluster and the node takes back its place
   within the cluster. If the merge fails, the node will release the
   synchronization token and remain isolated from the cluster. In such a case
   an administrator needs to either manually resolve the merge failure or
   use the point-in-time recovery feature to revert the isolated repository
   to a state where it can rejoin the cluster.

Performance

   It is still an open question how the revisions could be organized
   internally to implement efficient access across histories that might
   consists of thousands or even millions of individual revisions.
 
   Efficient internal data structures are a key to achieving this goal,
   but there are also a number of high-level optimizations that can be used
   on top of the revision level to achieve better performance. Many of these
   optimizations are independent of each other and require little or no
   changes in other repository operations. 

* Internal Data Structures

   Simply persisting a list of added, modified, and removed items in a
   revision is not likely to produce good performance as any content accesses
   would then potentially need to traverse all the revisions to find the
   item in question. Even if each revision is internally indexed so that
   each item can be accessed in constant time, item access can still take
   O(n) time where n is the number of persisted revisions. Thus a key to
   improving performance is finding a way to avoid having to iterate through
   all past revisions when locating a given node.

   One potential approach could be to assign each node a sequence number
   based on it's location in the document order of the repository and to
   manage these sequence numbers as they change over revisions. Each revision
   would list the sequence number ranges that the changes in the revision
   affect. With this information it could in many cases infer whether it
   even is possible for a node to exist in certain revisions, and thus
   the implementation could skip those revisions when looking for the node.

   Another alternative would be to use some sort of a backward-looking
   item index that indicates the revision in which a given item was last
   stored. Unless such an index is stored as a part of the revisions (probably
   not in each revision), maintaining it could introduce an unwanted
   synchronization block.

   Since persisted revisions are never modified it is possible to heavily
   read-optimize and index each revision. Especially for common situations
   where read performance is heavily prioritized over write performance it
   makes sense to spend extra time preparing complex read-only indexes or
   other data structures when the revision is persisted. For example it might
   be worth the effort to use some statistical access pattern data to find
   the best possible ordering and indexing for a persisted revision. 

* Combined Revisions

   The number and granularity of revisions will likely be a limiting factor
   in how efficiently the repository contents can be accessed. Many of the
   potential internal revision data structures also work better the more
   content there is in a revision. Thus it would be beneficial to increase
   the size of individual revisions.

   A repository implementation can not affect how large the revisions
   persisted by JCR clients are, but it can transparently combine or merge
   any number of subsequent small revisions into one larger revision.

[ngp/merge.jpg] Combined revision

   The combined revision can be used instead of the smaller revisions for all
   operations where the exact revision of a modified item does not matter.
   For example when querying and traversing the repository such transparent
   combined revisions can speed things up considerably. 

   Revisions can be combined for example in a low-priority background thread.
   Alternatively the repository implementation can offer an administrative
   interface for explicitly combining selected revisions. The combine
   operation can also be limited to just selected subtrees to optimize
   access to those parts of the repository.

   As an extreme case the combine operation can be performed on <all> revisions
   up to a specified checkpoint. The combined revision will then contain the
   full content tree up to that point in time. If the original revisions
   are no longer needed for things like point-in-time recovery or persistent
   observation, the combined revision could actually even replace all the
   individual revisions it contains to avoid using excessive amounts of disk
   space.

* Caching and Lazy Loading

   Since the persisted revisions are never modified, it is possible to cache
   their contents very aggressively. The caches can be very simple since there
   is no need for any cache coherency algorithms.

   The read-only nature of the revisions also allows many operations to be
   postponed to the very last moment the relevant information is needed. For
   example a JCR Node instance can simply keep a reference to the on-disk
   storage of the last version of the node and load any related information
   like property values or child node references only when it is actually
   requested.

* Concurrency

   In a world where multiprocessor servers and multicore or soon even
   manycore processors are commonplace it is essential for best performance
   that a software system like a content repository uses every opportunity
   for increased concurrency.  

   The revision model makes it possible to avoid all blocking of read
   operations and requires write synchronization only when new revisions are
   persisted. With optimistic constraint checking and a fallback mechanism the
   write synchronization can even be limited to just the very last step of
   the operation to persist a revision. However, this and the clustering
   support mentioned above are not the only opportunities of concurrency
   that the model allows. 

   Repository operations like search queries, XML exports, and many
   consistency and constraint checks can be formulated as map-reduce
   operations that can concurrently operate (map) on many past revisions
   and combine (reduce) the partial outcomes into the final result of the
   operation. Such algorithms might not be worthwhile on normal repositories,
   but offer a way to harness the benefits of massive parallelism in huge
   content repositories that may reside in grid environments.
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -