⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ch11.htm

📁 Teach yourself Oracle8 in 21 day
💻 HTM
📖 第 1 页 / 共 4 页
字号:
</BLOCKQUOTE>

<P><FONT COLOR="#000077"><B>Degree of Parallelism</B></FONT></P>
<P>Properly distributing I/Os and the degree of parallelism are the two most important
things to tune in the Parallel Query option. Tuning the degree of parallelism is
partially trial and error and partially analysis. It is very important to take notes
when you are experimenting with the degree of parallelism. Your first iteration should
be based on the following factors:

<UL>
	<LI>The CPU capacity of your system--the number and capacity of CPUs has an effect
	on the number of query processes you should run.
	<P>
	<LI>The capacity of the system to handle large numbers of processes--some operating
	systems can handle many simultaneous threads; others are more limited.
	<P>
	<LI>The system load--if the system is already running at 100% capacity, the degree
	of parallelism doesn't have much effect. If you are running at 90%, too many query
	processes can overload the system.
	<P>
	<LI>The amount of query processing on the system--if most operations are updates
	but there are a few critical queries, you might want many query processes.
	<P>
	<LI>The I/O capacity of the system--if your disks are striped or if you are using
	a disk array, you should be able to handle a large number of parallel queries.
	<P>
	<LI>The types of operations--are you performing a large number of full-table scans
	or sorts? These operations benefit greatly from parallel query servers.
</UL>

<P>All these parameters should have some influence on the degree of parallelism you
set up for your system. Remember that the preceding points are just guidelines to
help with your best attempt at a starting point. Here are a few other suggestions:

<UL>
	<LI>CPU-intensive operations such as sorts should indicate a lower degree of parallelism.
	CPU-bound tasks are already taking advantage of the CPUs and tend not to be waiting
	on I/O.
	<P>
	<LI>Disk-intensive operations such as full-table scans should indicate a higher degree
	of parallelism. The more operations waiting on I/O, the more the system can benefit
	from another query server.
	<P>
	<LI>Table size is very important. A small table does not need parallel queries. Reserve
	this for larger tables.
	<P>
	<LI>Many concurrent processes should indicate a lower degree of parallelism. Too
	many processes can overload the system.
</UL>

<P>Once you determine your starting point, you can monitor your system by querying
the dynamic performance table, <TT>V$PQ_SYSSTAT</TT>. This can be done with the query
shown in Listing 11.1.
<H4><FONT COLOR="#000000"><B>INPUT:</B></FONT></H4>
<H4><FONT COLOR="#000077"><B>Listing 11.1. Monitoring a parallel query from </B>V$SYSSTAT<B>.</B></FONT></H4>
<PRE><FONT COLOR="#0066FF">SQL&gt; select * from v$pq_sysstat;
STATISTIC                           VALUE
------------------------------ ----------
Servers Busy                            0
Servers Idle                           12
Servers Highwater                      16
Server Sessions                       380
Servers Started                         4
Servers Shutdown                        4
Servers Cleaned Up                      0
Queries Initiated                      21
DFO Trees                              77
Local Msgs Sent                   2459361
Distr Msgs Sent                         0
Local Msgs Recv'd                 2459318
<TT>Distr Msgs Recv'd 0</TT></FONT></PRE>
<P><FONT COLOR="#000000"><B><BR>
OUTPUT:</B></FONT></P>
<PRE><FONT COLOR="#0066FF">13 rows selected.
</FONT></PRE>
<P>When looking at this query, the following statistics are quite useful.

<UL>
	<LI><TT>Servers Busy</TT>--This is the number of servers busy at any one time. Check
	this statistic several times to get a good idea of the average value. If the value
	is equal to the initialization parameter <TT>PARALLEL_MIN_SERVERS</TT>, you have
	probably configured too many query servers.
	<P>
	<LI><TT>Servers Idl</TT>--This is the number of servers idle at any one time. If
	you always have many idle servers, consider reducing <TT>PARALLEL_MIN_SERVERS</TT>.
	<P>
	<LI><TT>Servers Starte</TT>--This is the number of query servers that have started
	up in this instance. If the value for <TT>Servers Busy</TT> is low but you see a
	large number in <TT>Servers Started</TT>, you might be using query servers sporadically.
	<P>
	<LI><TT>Servers Shutdow</TT>--This is the number of query servers that have been
	shut down because they are idle. This value is most likely similar to the <TT>Servers
	Started</TT> value.
</UL>

<P>After you determine your degree of parallelism, begin testing; evaluate the information
you get from <TT>V$PQ_SYSSTAT</TT> and from your operating-system-monitoring facilities.
Keep an eye out for CPU usage and excessive waiting on I/O. If the CPU usage is too
high, try reducing the degree of parallelism. If the CPU usage is too low and there
is significant waiting for I/O, try increasing the degree of parallelism.</P>
<P>Remember that the degree of parallelism is determined by SQL hints, table definitions,
and initialization parameters. The total number of query servers is determined by
the initialization parameter <TT>PARALLEL_MAX_SERVERS</TT>; the number started up
initially is determined by the initialization parameter <TT>PARALLEL_MIN_SERVERS</TT>.</P>
<P>The total number of query servers in use is the number of queries executed in
parallel multiplied by their degree of parallelism. If you try to use more than <TT>PARALLEL_MAX_SERVERS</TT>,
you won't be able to parallelize your query.
<H3><FONT COLOR="#000077"><B>Direct Write Sorts</B></FONT></H3>
<P>You can use the Direct Write Sort option with the Parallel Query option and have
the query servers each perform their own direct writes.</P>
<P>As you saw earlier, using direct writes causes the server processes to write the
output of sort operations directly to disk, bypassing the buffer cache. The effect
of direct writes is that for sort operations, large amounts of block buffers are
not ejected from the buffer cache. This leaves the buffer cache available for normal
queries and updates. When using direct write sorts with the Parallel Query option,
each query server gets its own set of direct write buffers.


<BLOCKQUOTE>
	<P>
<HR>
<FONT COLOR="#000077"><B>NOTE:</B></FONT><B> </B>Direct write sorts take more memory
	than normal sorts. The amount of memory they use with the Parallel Query option can
	be determined with the following formula:<BR>
	<BR>
	Direct write sort memory = <BR>
	(number of query servers) * (<TT>SORT_WRITE_BUFFERS</TT>) * (<TT>SORT_WRITE_BUFFER_SIZE</TT>)<BR>
	<BR>
	Only use direct write sorts if you have sufficient memory and temporary disk space.
	The temporary disk space should have a sufficient space and I/O bandwidth to handle
	the load. 
<HR>


</BLOCKQUOTE>

<H3><FONT COLOR="#000077"><B>Parallel Index Creation</B></FONT></H3>
<P>Another feature of the Parallel Query option is its capability to create indexes
in parallel. With the parallel index creation feature, the time it takes to create
an index can be greatly reduced.</P>
<P>As in parallel query processing, a coordinator process dispatches two sets of
query servers. One set scans the table to be indexed to obtain the <TT>ROWID</TT>s
and column values needed for the index. Another set performs the sorting on those
values and passes the results to the coordinator process. The coordinator process
then puts together the B<SUP>*</SUP>-tree index from these sorted items.</P>
<P>When creating an index, the degree of parallelism follows the same precedence
as it does in parallel query processing. The first value is an optional <TT>PARALLEL</TT>
clause in the <TT>CREATE INDEX</TT> statement, followed by the table definition and
finally the initialization parameters.</P>
<P>Creating an index in parallel can be several times faster than creating an index
by normal means. The same conditions apply for index creation as were given for parallel
query processing. A system that has been configured to take advantage of parallel
query processing will also see good performance from parallel index creation.
<H3><FONT COLOR="#000077"><B>Parallel Data Loading</B></FONT></H3>
<P>Loading can be done in parallel by having multiple concurrent sessions perform
a direct path load into the same table. Depending on the configuration of the system,
you can see excellent load performance by loading in parallel. Because loading is
both CPU and I/O intensive, you should see good results in an SMP or MPP environment
with a high-bandwidth, I/O subsystem.</P>
<P>Parallel loads are performed by multiple, direct-loader processes, each using
the <TT>PARALLEL=TRUE</TT> and <TT>DIRECT=TRUE</TT> options. When you specify <TT>PARALLEL=TRUE</TT>,
the loader does not place an exclusive lock on the table being loaded as it would
otherwise. During the parallel load, the loader creates temporary segments for each
of the concurrent processes and merges them together on completion.</P>
<P>Although parallel loading performs best when each temporary file is located on
a separate disk, the increased performance of the load does not usually justify the
complexity of the manual striping needed to do this. I still recommend striping the
tables on an OS level or preferably on a hardware disk array. Performance can be
improved by putting each of the input files on a separate volume to take advantage
of the sequential nature of the reads.</P>
<P>Parallel loading can be beneficial, especially if load time is critical in your
environment. By putting each of the input files on separate disk volumes, you can
increase performance. Overall, the general tuning principles used in parallel query
processing are valid in parallel loading also.
<H3><FONT COLOR="#000077"><B>Parallel Recovery</B></FONT></H3>
<P>Having multiple concurrent sessions perform a recovery is probably my favorite
feature of the Parallel Query option. When benchmarking Oracle and testing hardware
and software, it is often necessary to intentionally crash the system to prove recoverability.
With the Parallel Recovery option, the time it takes to perform an instance recovery
can be dramatically reduced.</P>
<P>Recovery time is significantly reduced when the system being recovered has many
disks and supports asynchronous I/O. For a small system that has few drives or for
an operating system that does not support asynchronous I/O, it might not be wise
to enable parallel recovery.</P>
<P>In traditional recovery, one process reads from the redo log files and applies
changes to the datafiles, as shown in Figure 11.10. This operation can take a significant
amount of time because the recovery process must wait for disk I/Os to complete.</P>
<P><A NAME="10"></A><A HREF="10.htm"><B>Figure 11.10.</B></A></P>
<P><I>Recovery without parallel recovery.</I></P>
<P>With the Parallel Recovery option, one process is responsible for reading and
dispatching redo entries from the redo log files and passing those entries on to
the recovery processes that apply the changes to the datafiles, as shown in Figure
11.11.</P>
<P><A NAME="11"></A><A HREF="11.htm"><B>Figure 11.11.</B></A></P>
<P><I>Parallel recovery.</I></P>
<P>Because the recovery process reads sequentially from the redo log files, the I/O
performance is much higher than that of the recovery processes that are writing random
data throughout the datafiles. Because writing the data is very seek intensive, it
is a good idea to have one or two recovery processes for each data disk in the system.</P>
<P>By having more recovery processes, you can have more outstanding I/Os and thus
simultaneously use all the data drives. Because recovery is done at instance startup,
this arrangement reduces dead time when no other database processing can be done.</P>
<P>The number of concurrent recovery processes is set with the initialization parameter
<TT>RECOVERY_PARALLEL</TT>. The value of this parameter cannot exceed the value specified
in the initialization parameter <TT>PARALLEL_MAX_SERVERS</TT>.</P>
<P>By specifying a sufficient number of recovery servers, you'll see an immediate
improvement in instance recovery time. Do not use parallel recovery if your system
does not support asynchronous I/O or if you are limited to a small number of disk
drives. If your I/O subsystem is fast enough and your data is properly striped across
these drives (either through software or hardware RAID), you should see very good
improvement.</P>
<P>In summary, the Parallel Query option is useful in distributing processing loads
so that CPUs are kept busy processing while other processes are waiting for I/Os
to complete. With multiprocessor machines, the Parallel Query option can be quite

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -