📄 diskspace.so
字号:
m4_comment([$Id: diskspace.so,v 10.17 2002/08/09 13:43:47 bostic Exp $])m4_ref_title(Access Methods, Disk space requirements, @disk space requirements, am_misc/dbsizes, am_misc/tune)m4_p([dnlIt is possible to estimate the total database size based on the size ofthe data. The following calculations are an estimate of how many bytesyou will need to hold a set of data and then how many pages it will taketo actually store it on disk.])m4_p([dnlSpace freed by deleting key/data pairs from a Btree or Hash database isnever returned to the filesystem, although it is reused where possible.This means that the Btree and Hash databases are grow-only. If enoughkeys are deleted from a database that shrinking the underlying file isdesirable, you should create a new database and copy the records fromthe old one into it.])m4_p([dnlThese are rough estimates at best. For example, they do not take intoaccount overflow records, filesystem metadata information, large setsof duplicate data items (where the key is only stored once), orreal-life situations where the sizes of key and data items are wildlyvariable, and the page-fill factor changes over time.])m4_section(Btree)m4_p([dnlThe formulas for the Btree access method are as follows:])m4_indent([dnluseful-bytes-per-page = (page-size - page-overhead) * page-fill-factorm4_blankbytes-of-data = n-records * (bytes-per-entry + page-overhead-for-two-entries)m4_blankn-pages-of-data = bytes-of-data / useful-bytes-per-pagem4_blanktotal-bytes-on-disk = n-pages-of-data * page-size])m4_p([dnlThe m4_bold(useful-bytes-per-page) is a measure of the bytes on each pagethat will actually hold the application data. It is computed as the totalnumber of bytes on the page that are available to hold application data,corrected by the percentage of the page that is likely to contain data.The reason for this correction is that the percentage of a page thatcontains application data can vary from close to 50% after a page splitto almost 100% if the entries in the database were inserted in sortedorder. Obviously, the m4_bold(page-fill-factor) can drastically alterthe amount of disk space required to hold any particular data set. Thepage-fill factor of any existing database can be displayed using them4_ref(db_stat) utility.])m4_p([dnlThe page-overhead for Btree databases is 26 bytes. As an example, usingan 8K page size, with an 85% page-fill factor, there are 6941 bytes ofuseful space on each page:])m4_indent([6941 = (8192 - 26) * .85])m4_p([dnlThe total m4_bold(bytes-of-data) is an easy calculation: It is thenumber of key or data items plus the overhead required to store eachitem on a page. The overhead to store a key or data item on a Btreepage is 5 bytes. So, it would take 1560000000 bytes, or roughly 1.34GBof total data to store 60,000,000 key/data pairs, assuming each key ordata item was 8 bytes long:])m4_indent([1560000000 = 60000000 * ((8 + 5) * 2)])m4_p([dnlThe total pages of data, m4_bold(n-pages-of-data), is them4_bold(bytes-of-data) divided by the m4_bold(useful-bytes-per-page). Inthe example, there are 224751 pages of data.])m4_indent([224751 = 1560000000 / 6941])m4_p([dnlThe total bytes of disk space for the database is m4_bold(n-pages-of-data)multiplied by the m4_bold(page-size). In the example, the result is1841160192 bytes, or roughly 1.71GB.])m4_indent([1841160192 = 224751 * 8192])m4_section(Hash)m4_p([dnlThe formulas for the Hash access method are as follows:])m4_indent([dnluseful-bytes-per-page = (page-size - page-overhead)m4_blankbytes-of-data = n-records * (bytes-per-entry + page-overhead-for-two-entries)m4_blankn-pages-of-data = bytes-of-data / useful-bytes-per-pagem4_blanktotal-bytes-on-disk = n-pages-of-data * page-size])m4_p([dnlThe m4_bold(useful-bytes-per-page) is a measure of the bytes on each pagethat will actually hold the application data. It is computed as the totalnumber of bytes on the page that are available to hold application data.If the application has explicitly set a page-fill factor, pages willnot necessarily be kept full. For databases with a preset fill factor,see the calculation below. The page-overhead for Hash databases is 26bytes and the page-overhead-for-two-entries is 6 bytes.])m4_p([dnlAs an example, using an 8K page size, there are 8166 bytes of useful spaceon each page:])m4_indent([8166 = (8192 - 26)])m4_p([dnlThe total m4_bold(bytes-of-data) is an easy calculation: it is the numberof key/data pairs plus the overhead required to store each pair on a page.In this case that's 6 bytes per pair. So, assuming 60,000,000 key/datapairs, each of which is 8 bytes long, there are 1320000000 bytes, orroughly 1.23GB of total data:])m4_indent([1320000000 = 60000000 * (16 + 6)])m4_p([dnlThe total pages of data, m4_bold(n-pages-of-data), is them4_bold(bytes-of-data) divided by the m4_bold(useful-bytes-per-page). Inthis example, there are 161646 pages of data.])m4_indent([161646 = 1320000000 / 8166])m4_p([dnlThe total bytes of disk space for the database is m4_bold(n-pages-of-data)multiplied by the m4_bold(page-size). In the example, the result is1324204032 bytes, or roughly 1.23GB.])m4_indent([1324204032 = 161646 * 8192])m4_p([dnlNow, let's assume that the application specified a fill factor explicitly.The fill factor indicates the target number of items to place on a singlepage (a fill factor might reduce the utilization of each page, but it canbe useful in avoiding splits and preventing buckets from becoming toolarge). Using our estimates above, each item is 22 bytes (16 + 6), andthere are 8166 useful bytes on a page (8192 - 26). That means that, onaverage, you can fit 371 pairs per page.])m4_indent([371 = 8166 / 22])m4_p([dnlHowever, let's assume that the application designer knows that althoughmost items are 8 bytes, they can sometimes be as large as 10, and it'svery important to avoid overflowing buckets and splitting. Then, theapplication might specify a fill factor of 314.])m4_indent([314 = 8166 / 26])m4_p([dnlWith a fill factor of 314, then the formula for computing database sizeis])m4_indent([n-pages-of-data = npairs / pairs-per-page])m4_p([or 191082.])m4_indent([191082 = 60000000 / 314])m4_p([dnlAt 191082 pages, the total database size would be 1565343744, or 1.46GB.])m4_indent([1565343744 = 191082 * 8192])m4_p([dnlThere are a few additional caveats with respect to Hash databases. Thisdiscussion assumes that the hash function does a good job of evenlydistributing keys among hash buckets. If the function does not do this,you may find your table growing significantly larger than you expected.Secondly, in order to provide support for Hash databases coexisting withother databases in a single file, pages within a Hash database areallocated in power-of-two chunks. That means that a Hash database with 65buckets will take up as much space as a Hash database with 128 buckets;each time the Hash database grows beyond its current power-of-two numberof buckets, it allocates space for the next power-of-two buckets. Thisspace may be sparsely allocated in the file system, but the files willappear to be their full size. Finally, because of this need forcontiguous allocation, overflow pages and duplicate pages can be allocatedonly at specific points in the file, and this too can lead to sparse hashtables.])m4_page_footer
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -