📄 reclaimspacehelper.java
字号:
/* Derby - Class org.apache.derby.impl.store.raw.data.ReclaimSpaceHelper Copyright 1998, 2004 The Apache Software Foundation or its licensors, as applicable. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. */package org.apache.derby.impl.store.raw.data;import org.apache.derby.impl.store.raw.data.BasePage;import org.apache.derby.impl.store.raw.data.ReclaimSpace;import org.apache.derby.iapi.services.daemon.DaemonService;import org.apache.derby.iapi.services.daemon.Serviceable;import org.apache.derby.iapi.services.sanity.SanityManager;import org.apache.derby.iapi.error.StandardException;import org.apache.derby.iapi.store.access.TransactionController;import org.apache.derby.iapi.store.raw.ContainerKey;import org.apache.derby.iapi.store.raw.ContainerHandle;import org.apache.derby.iapi.store.raw.LockingPolicy;import org.apache.derby.iapi.store.raw.Page;import org.apache.derby.iapi.store.raw.PageKey;import org.apache.derby.iapi.store.raw.RecordHandle;import org.apache.derby.iapi.store.raw.Transaction;import org.apache.derby.iapi.store.raw.xact.RawTransaction;import org.apache.derby.iapi.store.raw.data.RawContainerHandle;/** This class helps a BaseDataFactory reclaims unused space.Space needs to be reclaimed in the following cases:<BR><NL><LI> Row with long columns or overflow row pieces is deleted<LI> Insertion of a row that has long columns or overflows to other row pieces is rolled back<LI> Row is updated and the head row or some row pieces shrunk<LI> Row is updated and some long columns are orphaned because they are updated<LI> Row is updated and some long columns are created but the update rolled back<LI> Row is updated and some new row pieces are created but the update rolled back</NL> <P>We can implement a lot of optimization if we know that btree does not overflow.However, since that is not the case and Raw Store cannot tell if it is dealingwith a btree page or a heap page, they all have to be treated gingerly. E.g.,in heap page, once a head row is deleted (via a delete operation or via arollback of insert), all the long rows and long columns can be reclaimed - infact, most of the head row can be removed and reclaimed, only a row stub needsto remain for locking purposes. But in the btree, a deleted row still needs tocontain the key values so it cannot be cleaned up until the row is purged.<P><B>Row with long columns or long row is deleted</B><BR>When Access purge a committed deleted row, the purge operation will see if therow has overflowed row pieces or if it has long columns. If it has, then allthe long columns and row pieces are purged before the head row piece can bepurged. When a row is purged from an overflow page and it is the only row onthe page, then the page is deallocated in the same transaction. Note thatnon-overflow pages are removed by Access but overflow pages are removed by RawStore. Note that page removal is done in the same transaction and not postcommit. This is, in general, dangerous because if the transaction does notcommit for a long time, uncommit deallocated page slows down page allocationfor this container. However, we know that access only purges committed deleterow in access post commit processing so we know the transaction will tend tocommit relatively fast. The alternative is to queue up a post commitReclaimSpace.PAGE to reclaim the page after the purge commits. In order to dothat, the time stamp of the page must also be remembered because post commitwork may be queued more than once, but in this case, it can only be done once.Also, doing the page deallocation post commit adds to the overall cost andtends to fill up the post commit queue. <BR>This approach is simple but has the drawback that the entire long row and allthe long columns are logged in the purge operation. The alternative is morecomplicated, we can remember all the long columns on the head row piece andwhere the row chain starts and clean them up during post commit. During postcommit, because the head row piece is already purged, there is no need to logthe long column or the long rows, just wipe the page or just reuse the page ifthat is the only thing on the page. The problem with this approach is that weneed to make sure the purging of the head row does indeed commit (thetransaction may commit but the purging may be rolled back due to savepoint).So, we need to find the head row in the post commit and only when we cannotfind it can we be sure that the purge is committed. However, in cases wherethe page can reuse its record Id (namely in btree), a new row may reuse thesame recordId. In that case, the post commit can purge the long columns or therest of the row piece only if the head piece no longer points to it. Becauseof the complexity of this latter approach, the first simple approach is used.However, if the performance due to extra logging becomes unbearble, we canconsider implementing the second approach. <P><B>Insertion of a row with long column or long row is rolled back.</B><BR>Insertion can be rolled back with either delete or purge. If the row is rolledback with purge, then all the overflow columns pieces and row pieces are alsorolled back with purge. When a row is purged from an overflow page and it isthe only row on the page, then a post commit ReclaimSpace.PAGE work is queuedby Raw Store to reclaim that page.<BR>If the row is rolled back with delete, then all the overflow columns pieces androw pieces are also rolled back with delete. Access will purge the deleted rowin due time, see above.<P><B>Row is updated and the head row or some row pieces shrunk</B><BR>Every page that an update operation touches will see if the record on that pagehas any reserve space. It it does, and if the reserve space plus the recordsize exceed the mininum record size, then a post commit ROW_RESERVE work willbe queued to reclaim all unnecessary row reserved space for the entire row.<P><B>Row is updated and old long columns are orphaned</B><BR>The ground rule is, whether a column is a long column or not before an updatehas nothing to do with whether a column will be a long column or not after theupdate. In other words, update can turn a non-long column into a long column,or it can turn a long column into a non-long column, or a long column can beupdated to another long column and a non-long column can be updated to anon-long column. The last case - update of a non-long column to anothernon-long column - is only of concern if it shrinks the row piece it is on (seeabove).<BR>So update can be looked at as 2 separate problems: A) a column is a long columnbefore the update and the update will "orphaned" it. B) a column is a longcolumn after the update and the rollback of the update will "orphaned" it if itis rolled back with a delete. This section deals with problem A, next sectiondeals with problem B.<BR>Update specifies a set of columns to be updated. If a row piece contains oneor more columns to be updated, those columns are examined to see if they areactually long column chains. If they are, then after the update, those longcolumn chains will be orphaned. So before the update happens, a post commitReclaimSpace.COLUMN_CHAIN work is queued which contains the head rows id, thecolumn number, the location of the first piece of the column chain, and thetime stamp of the first page of the column chain. <BR>If the update transaction commits, the post commit work will walk the row untilit finds the column number (note that it may not be on the page where theupdate happened because of subsequent row splitting), and if it doesn't pointto the head of the column chain, we know the update operation has indeedcommitted (versus rolled back by a savepoint). If a piece of the the columnchain takes up an entire page, then the entire page can be reclaimed withoutfirst purging the row because the column chain is already orphaned.<BR>We need to page time stamp of the first page of the column chain because if thepost commit ReclaimSpace.COLUMN_CHAIN is queued more than once, as can happenin repeated rollback to savepoint, then after the first time the column isreclaimed, the pages in the column chain can be reused. Therefore, we cannotreclaim the column chain again. Since there is no back pointer from the columnchain to the head row, we need the timestamp to tell us if that column chainhas already been touched (reclaimed) or not.<P><B> Row is updated with new long columns and update is rolled back.</B><BR>When the update is rolled back, the new long columns, which got there byinsertion, got rolled back either by delete or by purge. If they were rolledback with delete, then they will be orphaned and need to be cleaned up withpost abort work. Therefore, insertion of long columns due to update must berolled back with purge.<BR>This is safe because the moment the rollback of the head row piece happens, thenew long column is orphaned anyway and nobody will be able to get to it. Sincewe don't attempt to share long column pages, we know that nobody else could beon the page and it is safe to deallocate the page.<P><B>Row is updated with new long row piece and update is rolled back.</B><BR>When the update is rolled back, the new long row piece, which got there byinsertion, got rolled back either by delete or by purge. Like update with newlong row, they should be rolled back with purge. However, there is a problemin that the insert log record does not contain the head row handle. It ispossible that another long row emanating from the same head page overflows tothis page. That row may since have been deleted and is now in the middle of apurge, but the purge has not commit. To the code that is rolling back theinsert (caused by the update that split off a new row piece) the overflow pagelooks empty. If it went ahead and deallocate the page, then the transactionwhich purged the row piece on this page won't be able to roll back. For thisreason, the rollback to insert of a long row piece due to update must be rolledback with delete. Furthermore, there is no easy way to lodge a posttermination work to reclaim this deleted row piece so it will be lost forever.<BR>RESOLVE: need to log the head row's handle in the insert log record, i.e., anyinsert due to update of long row or column piece should have the head row'shandle on it so that when the insert is rolled back with purge, and there is nomore row on the page, it can file a post commit to reclaim the page safely.The post commit reclaim page needs to lock the head row and latch the head pageto make sure the entire row chain is stable.<P><B>*/public class ReclaimSpaceHelper{ /** Reclaim space based on work. */ public static int reclaimSpace(BaseDataFileFactory dataFactory, RawTransaction tran, ReclaimSpace work) throws StandardException { if (work.reclaimWhat() == ReclaimSpace.CONTAINER) return reclaimContainer(dataFactory, tran, work); // Else, not reclaiming container. Get a no-wait shared lock on the // container regardless of how the user transaction had the // container opened. LockingPolicy container_rlock = tran.newLockingPolicy(LockingPolicy.MODE_RECORD, TransactionController.ISOLATION_SERIALIZABLE, true /* stricter OK */ );
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -