Feed aggregator
IOTs by the Oracle Indexing Expert
I’m really pleased to see that Richard Foote has started a series on Index Organized Tables. You can see his introductory post on the topic here. As ever with Richard, he puts in lots of detail and explanation and I’ve been a fan of his blogging style for a long time.
I’ve got a few posts on the topic left to do myself, maybe this competition will spur me to get on and write them!
What I will also be very interested to see is the different way we will cover the same topic. Richard has already put in a block dump and dug into the details a little at a low level about how the data is stored, which I have not done. He has also shown how using an IOT instead of a fully overloaded index (where you create an index covering all the columns of the table, to avoid visiting the table for other columns) results in not only saving the space taken up by the redundant heap table but that the IOT index is smaller than the fully overloaded index. This is due to the lack of a rowid.
I put in occasional pictures and maybe write more about how the example matches real world situations. If you want, you can look back at my own introduction to the topic.
I’m sure this is going to be an excellent series and I’ll be following it myself.
Index Organized Tables – An Introduction Of Sorts (Pyramid Song)
Fixed Objects Statistics and why they are important
Fixed objects are the x$ tables and their indexes. The v$performance views in Oracle are defined in top of X$ tables (for example V$SQL and V$SQL_PLAN). Since V$ views can appear in SQL statements like any other user table or view then it is important to gather optimizer statistics on these tables to help the optimizer generate good execution plans. However, unlike other database tables, dynamic sampling is not automatically use for SQL statement involving X$ tables when optimizer statistics are missing. The Optimizer uses predefined default values for the statistics if they are missing. These defaults may not be representative and could potentially lead to a suboptimal execution plan, which could cause severe performance problems in your system. It is for this reason that we strong recommend you gather fixed objects statistics.
Fixed Object statistics must be manually gathered. They are not created or maintained by the automatic statistics gathering job. You can collect statistics on fixed objects using DBMS_STATS.GATHER_FIXED_OBJECTS_STATS. BEGIN
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;
END;
The DBMS_STATS.GATHER_FIXED_OBJECTS_STATS procedure gathers the same statistics as DBMS_STATS.GATHER_TABLE_STATS except for the number of blocks. Blocks is always set to 0 since the x$ tables are in memory structures only and are not stored on disk. You must have the ANALYZE ANY DICTIONARY or SYSDBA privilege or the DBA role to update fixed object statistics.
Because of the transient nature of the x$ tables it is import that you gather fixed object statistics when there is a representative workload on the system. This may not always be feasible on large system due to additional resource need to gather the statistics. If you can’t do it during peak load you should do it after the system has warmed up and the three key types of fixed object tables have been populated:
Structural data - for example, views covering datafiles, controlfile contents, etcSession based data - for example, v$session, v$access, etc.
Workload data - for example, v$sql, v$sql_plan,etc
It is recommended that you re-gather fixed object statistics if you do a major database or application upgrade, implement a new module, or make changes to the database configuration. For example if you increase the SGA size then all of the x$ tables that contain information about the buffer cache and shared pool may change significantly, such as x$ tables used in v$buffer_pool or v$shared_pool_advice.
Dynamic Sampling On Multiple Partitions - Bugs
1. The table level dynamic sampling hint uses a different number of blocks for sampling than the session / cursor level dynamic sampling. So even if for both for example level 5 gets used the number of sampled blocks will be different for most of the 10 levels available (obviously level 0 and 10 are exceptions)
2. The Dynamic Sampling code uses a different approach for partitioned objects if it is faced with the situation that there are more partitions than blocks to sample according to the level (and type table/cursor/session) of Dynamic Sampling
Note that all this here applies to the case where no statistics have been gathered for the table - I don't cover the case when Dynamic Sampling gets used on top of existing statistics.
Dynamic Sampling Number Of Sample Blocks
Jonathan Lewis has a short post describing 1. above, although I believe that his post has a minor inaccuracy: The number of blocks sampled for the table level dynamic sampling is 32 * 2^(level - 1) not 32 * 2^level.
Note that the constant 32 is defined by the internal parameter "_optimizer_dyn_smp_blks" and is independent from the block size. So this is one of the cases where a larger block size potentially gives better results because more data might be sampled, of course it also means performing more work for the sampling.
Here are two excerpts from optimizer trace files that show both the difference between the table and cursor/session level sample sizes as well as the 2^(level -1) formula for the table level:
Table level 5:
** Executed dynamic sampling query:
level : 5
.
.
.
max. sample block cnt. : 512
Cursor/session level 5:
** Executed dynamic sampling query:
level : 5
.
.
.
max. sample block cnt. : 64
So both cases use level 5, but the number of sample blocks is different, and for the table level 5 it is 32 * 2^4 = 32 * 16 = 512 blocks
Dynamic Sampling On Multiple Partitions
Point 2. above is also described in one of the comments to the post mentioned. In principle the Dynamic Sampling code seems to assume an overhead of one sample block per (sub)segment, so the effective number of blocks to sample will fall short by the number of (sub)segments to sample.
Probably this is based on the assumption that the segment header block needs to be accessed anyway when reading a segment.
If the code didn't cater for this fact it could potentially end up with an effective number of blocks sampled that is far greater than defined by the sample size when dealing with partitioned objects.
For non-partitioned objects this is not a big deal because it means exactly one block less than defined by the sample size.
But if Dynamic Sampling needs to sample multiple partitions this has several consequences:
a. The number of blocks that are effectively sampled for data can be far less than expected according to the number of blocks to be sampled, because the code reduces the number of blocks by the number of partitions to sample
b. The point above poses a special challenge if there are actually more partitions to sample than blocks
Note that Dynamic Sampling uses static / compile time partition pruning information to determine the number of partitions that need to be sampled.
The upshot of this is that when sampling multiple partitions the sample sizes of the lower cursor/session Dynamic Sampling levels can be far too small for reasonable sample results.
If the Dynamic Sampling code faces the situation where more partitions need to be sampled than blocks, it uses a different approach.
Rather than sampling the whole table and therefore potentially accessing more partitions than blocks defined by the sample size it will randomly select (sample blocks / 2) subsegments.
According to the number of blocks determined per subsegment it will then use a sample size such that in total (sample blocks / 2) blocks will be sampled for data.
Of course you'll appreciate that this means that on average exactly one data block will be sampled for data per subsegment.
The sample query looks different in such a case because the subsegments sampled are explicitly mentioned and combined via UNION ALL resulting in quite a lengthy statement - even with a small sample size like 32 blocks 16 queries on subsegments will be UNIONed together.
Here are again two excerpts from optimizer trace files that show the two different approaches in action:
More sample blocks than partitions:
** Performing dynamic sampling initial checks. **
** Dynamic sampling initial checks returning TRUE (level = 5).
** Dynamic sampling updated table stats.: blocks=17993
** Generated dynamic sampling query:
query text :
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ 1 AS C1, 1 AS C2 FROM "T" SAMPLE BLOCK (0.711388 , 1) SEED (1) "T") SAMPLESUB
*** 2012-01-03 09:45:22.695
** Executed dynamic sampling query:
level : 5
sample pct. : 0.711388
total partitions : 384
partitions for sampling : 384
actual sample size : 7452
filtered sample card. : 7452
orig. card. : 98028
block cnt. table stat. : 17993
block cnt. for sampling: 17993
Potentially all partitions get sampled and the query used is similar to the one used for non-partitioned objects.
Less or equal blocks than partitions:
** Performing dynamic sampling initial checks. **
** Dynamic sampling initial checks returning TRUE (level = 5).
** Dynamic sampling updated table stats.: blocks=1496
*** 2012-01-03 09:44:04.492
** Generated dynamic sampling query:
query text :
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT 1 AS C1, 1 AS C2 FROM ((SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(6) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(21) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(28) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(30) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(68) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(80) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(83) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(98) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(102) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(109) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(134) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(141) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(153) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(158) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(176) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(177) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(179) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(205) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(206) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(249) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(257) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(260) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(263) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(265) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(273) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(277) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(309) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(339) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(341) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(342) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(359) SAMPLE BLOCK (2.139037 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(368) SAMPLE BLOCK (2.139037 , 1) SEED (1))) "T") SAMPLESUB
** Executed dynamic sampling query:
level : 5
sample pct. : 2.139037
total partitions : 384
partitions for sampling : 384
partitions actually sampled from : 32
actual sample size : 2583
filtered sample card. : 2583
orig. card. : 98028
block cnt. table stat. : 1496
block cnt. for sampling: 17952
partition subset block cnt. : 1496
You can clearly see that the query looks quite different by listing a number of subpartitions explicitly. Also the text dumped to the trace file is different and says that it will restrict the sampling to 32 partitions.
And it is this special case where in versions below 11.2.0.3 a silly bug in the code leads to incorrect cost estimates: When putting together the number of blocks that should be used for sampling and those that are extrapolated for the whole table the code copies the wrong number into the table stats - it uses the number of blocks to sample instead of the assumed table size. This can lead to a dramatic cost underestimate for a corresponding full table scan operation.
The issue seems to be fixed in 11.2.0.3, but you can see in above excerpt from 11.2.0.1 the problem by checking carefully these lines:
...
** Dynamic sampling updated table stats.: blocks=1496 <=== wrong number copied from below
...
block cnt. table stat. : 1496 <=== this should be on the next line
block cnt. for sampling: 17952 <=== this should be on the previous line
partition subset block cnt. : 1496
The two figures "block cnt. for sampling" and "block cnt. table stat." are swapped - and the wrong number is copied to the table stats line.
This will result in a potential underestimate of the table blocks. The first plan is generated with the session level 5 sample size where the bug copies the wrong number of blocks:
---------------------------------------+-----------------------------------+---------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | Pstart| Pstop |
---------------------------------------+-----------------------------------+---------------+
| 0 | SELECT STATEMENT | | | | 246 | | | |
| 1 | PARTITION RANGE ALL | | 996K | 89M | 246 | 00:00:03 | 1 | 12 |
| 2 | PARTITION HASH ALL | | 996K | 89M | 246 | 00:00:03 | 1 | 32 |
| 3 | TABLE ACCESS FULL | T | 996K | 89M | 246 | 00:00:03 | 1 | 384 |
---------------------------------------+-----------------------------------+---------------+
The second plan is generated for the same data set but using the table level 5 sample size that results in using the different code path that is not affected by the bug:
---------------------------------------+-----------------------------------+---------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | Pstart| Pstop |
---------------------------------------+-----------------------------------+---------------+
| 0 | SELECT STATEMENT | | | | 3637 | | | |
| 1 | PARTITION RANGE ALL | | 970K | 86M | 3637 | 00:00:44 | 1 | 12 |
| 2 | PARTITION HASH ALL | | 970K | 86M | 3637 | 00:00:44 | 1 | 32 |
| 3 | TABLE ACCESS FULL | T | 970K | 86M | 3637 | 00:00:44 | 1 | 384 |
---------------------------------------+-----------------------------------+---------------+
Note that although a minor discrepancy might be explained by the different sample sizes a cost estimate difference by an order of magnitude is clearly questionable.
Nasty Bug When Using Indexes
Finally there is another nasty bug waiting for you in the case of partitioned objects - and this time it doesn't matter if the number of partitions is more or less than the number of blocks to be sampled:
Dynamic Sampling will also make use of eligible indexes if a filter predicate is applied to a table and a suitable index exists (which probably means that it starts with the predicates applied but I haven't investigated that to a full extent).
The idea behind this is probably that by using the index a very cheap operation can be used to obtain a very precise selectivity estimate for highly selective predicates. Dynamic Sampling has some built-in sanity checks that reject the Dynamic Sampling result if not a reasonable number of rows pass the filter predicates applied - similar to saying "not enough data found to provide a reasonable estimate". So in case the filter predicates identify only a few rows out of many it requires a pretty high sample level in order to have the Dynamic Sampling results not rejected by these sanity checks.
Things look different however if there is a suitable index available: Dynamic Sampling will run an additional index-only query that is limited to a small number of rows (2,500 rows seems to be a common number) and a where clause corresponding to the filter predicates. If the number of rows returned by this query is less than 2,500 Dynamic Sampling knows that this corresponds exactly to the cardinality / selectivity of the filter predicates.
In case of partitioned objects though there is again a silly bug where the case of 100% matching rows is not handled correctly - so for any filter predicate that matches more than 2,500 rows the cardinality / selectivity estimate will be potentially incorrect.
Here are again two optimizer trace excerpts that show the bug in action:
Without a suitable index the cardinality estimate for a not really selective predicate (90%) is in the right ballpark:
** Dynamic sampling initial checks returning TRUE (level = 5).
** Dynamic sampling updated table stats.: blocks=1585
*** 2012-01-09 09:53:13.651
** Generated dynamic sampling query:
query text :
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE */ 1 AS C1, CASE WHEN "T"."ID">100000 THEN 1 ELSE 0 END AS C2 FROM ((SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(5) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(20) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(27) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(29) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(67) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(79) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(82) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(97) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(101) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(108) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(133) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(140) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(152) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(157) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(175) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(176) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(178) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(204) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(205) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(248) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(256) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(259) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(262) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(264) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(272) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(276) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(308) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(338) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(340) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(341) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(358) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(367) SAMPLE BLOCK (2.018927 , 1) SEED (1))) "T") SAMPLESUB
*** 2012-01-09 09:53:13.869
** Executed dynamic sampling query:
level : 5
sample pct. : 2.018927
total partitions : 384
partitions for sampling : 384
partitions actually sampled from : 32
actual sample size : 2063
filtered sample card. : 2003
orig. card. : 98028
block cnt. table stat. : 1585
block cnt. for sampling: 19020
partition subset block cnt. : 1585
max. sample block cnt. : 64
sample block cnt. : 32
min. sel. est. : 0.05000000
** Using dynamic sampling card. : 1226196
** Dynamic sampling updated table card.
** Using single table dynamic sel. est. : 0.97091614
Table: T Alias: T
Card: Original: 1226196 Rounded: 1190533 Computed: 1190533.13 Non Adjusted: 1190533.13
.
.
.
---------------------------------------+-----------------------------------+---------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | Pstart| Pstop |
---------------------------------------+-----------------------------------+---------------+
| 0 | SELECT STATEMENT | | | | 360 | | | |
| 1 | PARTITION RANGE ALL | | 1163K | 103M | 360 | 00:00:05 | 1 | 12 |
| 2 | PARTITION HASH ALL | | 1163K | 103M | 360 | 00:00:05 | 1 | 32 |
| 3 | TABLE ACCESS FULL | T | 1163K | 103M | 360 | 00:00:05 | 1 | 384 |
---------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
3 - filter("ID">100000)
With a suitable index in place the cardinality is estimated at 2,500 for the same data set:
** Dynamic sampling initial checks returning TRUE (level = 5).
** Dynamic sampling updated index stats.: T_IDX, blocks=3840
** Dynamic sampling index access candidate : T_IDX
** Dynamic sampling updated table stats.: blocks=1585
*** 2012-01-09 10:01:32.960
** Generated dynamic sampling query:
query text :
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ IGNORE_WHERE_CLAUSE */ 1 AS C1, CASE WHEN "T"."ID">100000 THEN 1 ELSE 0 END AS C2 FROM ((SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(5) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(20) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(27) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(29) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(67) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(79) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(82) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(97) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(101) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(108) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(133) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(140) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(152) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(157) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(175) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(176) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(178) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(204) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(205) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(248) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(256) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(259) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(262) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(264) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(272) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(276) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(308) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(338) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(340) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(341) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(358) SAMPLE BLOCK (2.018927 , 1) SEED (1)) UNION ALL (SELECT /*+ NO_PARALLEL("T") FULL("T") NO_PARALLEL_INDEX("T") */ * FROM "T" SUBPARTITION(367) SAMPLE BLOCK (2.018927 , 1) SEED (1))) "T") SAMPLESUB
*** 2012-01-09 10:01:33.100
** Executed dynamic sampling query:
level : 5
sample pct. : 2.018927
total partitions : 384
partitions for sampling : 384
partitions actually sampled from : 32
actual sample size : 2063
filtered sample card. : 2003
orig. card. : 98028
block cnt. table stat. : 1585
block cnt. for sampling: 19020
partition subset block cnt. : 1585
max. sample block cnt. : 64
sample block cnt. : 32
min. sel. est. : 0.05000000
** Using recursive dynamic sampling card. est. : 1226195.625000
*** 2012-01-09 10:01:33.163
** Generated dynamic sampling query:
query text :
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS opt_param('parallel_execution_enabled', 'false') NO_PARALLEL(SAMPLESUB) NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0), NVL(SUM(C3),0) FROM (SELECT /*+ NO_PARALLEL("T") INDEX("T" T_IDX) NO_PARALLEL_INDEX("T") */ 1 AS C1, 1 AS C2, 1 AS C3 FROM "T" "T" WHERE "T"."ID">100000 AND ROWNUM <= 2500) SAMPLESUB
*** 2012-01-09 10:01:33.179
** Executed dynamic sampling query:
level : 5
sample pct. : 100.000000
total partitions : 384
partitions for sampling : 384
actual sample size : 1226196
filtered sample card. : 2500
filtered sample card. (index T_IDX): 2500
orig. card. : 1226196
block cnt. table stat. : 1585
block cnt. for sampling: 1585
max. sample block cnt. : 4294967295
sample block cnt. : 1585
min. sel. est. : 0.05000000
** Increasing dynamic sampling selectivity
for predicate 0 from 0.002039 to 0.970916.
** Increasing dynamic sampling selectivity
for predicate 1 from 0.002039 to 0.970916.
index T_IDX selectivity est.: 0.00203883
** Using dynamic sampling card. : 1226196
** Dynamic sampling updated table card.
** Using single table dynamic sel. est. : 0.00203883
Table: T Alias: T
Card: Original: 1226196 Rounded: 2500 Computed: 2500.00 Non Adjusted: 2500.00
.
.
.
-------------------------------------------------------+-----------------------------------+---------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time | Pstart| Pstop |
-------------------------------------------------------+-----------------------------------+---------------+
| 0 | SELECT STATEMENT | | | | 55 | | | |
| 1 | PARTITION RANGE ALL | | 2500 | 222K | 55 | 00:00:01 | 1 | 12 |
| 2 | PARTITION HASH ALL | | 2500 | 222K | 55 | 00:00:01 | 1 | 32 |
| 3 | TABLE ACCESS BY LOCAL INDEX ROWID | T | 2500 | 222K | 55 | 00:00:01 | 1 | 384 |
| 4 | INDEX RANGE SCAN | T_IDX | 2500 | | 20 | 00:00:01 | 1 | 384 |
-------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
4 - access("ID">100000)
Again it can be seen from these lines:
** Increasing dynamic sampling selectivity
for predicate 0 from 0.002039 to 0.970916.
** Increasing dynamic sampling selectivity
for predicate 1 from 0.002039 to 0.970916.
index T_IDX selectivity est.: 0.00203883
that in principle the selectivity estimate from the table level operation is supposed to be used but finally the wrong selectivity gets copied over which is then echoed by the final execution plan.
This bug is tracked with bug "6408301: Bad cardinality estimate from dynamic sampling for indexes on partitioned table" and patches are available. The issue is fixed in 11.2.0.2, but the "wrong number of table blocks" issue is only fixed in 11.2.0.3. I don't have a bug number at hand for that bug, though.
Summary
If you plan to use Dynamic Sampling on partitioned objects with many partitions where the number of partitions to sample cannot be significantly limited by partition pruning the result of Dynamic Sampling might be questionable for lower levels.
In addition there is a bug that leads to wrong cost estimates for a full segment scan operation that is only fixed in the most recent releases.
It probably makes sense to use higher Dynamic Sampling levels in such cases - the side effect of this is not only more reasonable sampling results but it might also allow to avoid the mentioned bug if the number of blocks sampled is greater than the number of partitions to sample.
Be aware of the case where an index can be used by Dynamic Sampling in addition - for partitioned objects a bug might lead to dramatic underestimates of the cardinality.
Testcase Script
The issues described here can easily reproduced by using the following simple test case:
drop table t;
purge table t;
create table t
partition by range (pkey)
subpartition by hash (hash_id) subpartitions 32
(
partition pkey_1 values less than (2)
, partition pkey_2 values less than (3)
, partition pkey_3 values less than (4)
, partition pkey_4 values less than (5)
, partition pkey_5 values less than (6)
, partition pkey_6 values less than (7)
, partition pkey_7 values less than (8)
, partition pkey_8 values less than (9)
, partition pkey_9 values less than (10)
, partition pkey_10 values less than (11)
, partition pkey_11 values less than (12)
, partition pkey_12 values less than (13)
)
storage (initial 64k)
as
select
rownum as id
, mod(rownum, 12) + 1 as pkey
--, 12 as pkey
--, 1 as hash_id
, rownum as hash_id
, rpad('x', 100) as filler
from
dual
connect by
level <= 1000000
;
set echo on time on
alter session set optimizer_dynamic_sampling = 5;
alter session set tracefile_identifier = 'composite_part_dyn_samp';
alter session set events '10053 trace name context forever, level 1';
explain plan
for
select * from t
;
explain plan
for
select /*+ dynamic_sampling(t 5) */ * from t
;
alter session set tracefile_identifier = 'composite_part_dyn_samp_where';
explain plan
for
select /*+ dynamic_sampling(t 5) */ * from t
where id > 100000
;
alter session set tracefile_identifier = 'dummy';
create index t_idx on t (id) global;
alter session set tracefile_identifier = 'composite_part_dyn_samp_index';
explain plan
for
select /*+ dynamic_sampling(t 5) */ * from t
where id > 100000
;
Dumping Trace Events – What You See is Not Necessarily What You Get
Getting started with Apache Pig
If, like me, you want to play around with data in a Hadoop cluster without having to write hundreds or thousands of lines of Java MapReduce code, you most likely will use either Hive (using the Hive Query Language HQL) or Pig.
Hive is a SQL-like language which compiles to Java map-reduce code, while Pig is a data flow language which allows you to specify your map-reduce data pipelines using high level abstractions.
The way I like to think of it is that writing Java MapReduce is like programming in assembler: you need to manually construct every low level operation you want to perform. Hive allows people familiar with SQL to extract data from Hadoop with ease and – like SQL – you specify the data you want without having to worry too much about the way in which it is retrieved. Writing a Pig script is like writing a SQL execution plan: you specify the exact sequence of operations you want to undertake when retrieving the data. Pig also allows you to specify more complex data flows than is possible using HQL alone.
As a crusty old RDBMS guy, I at first thought that Hive and HQL was the most attractive solution and I still think Hive is critical to enterprise adoption of Hadoop since it opens up Hadoop to the world of enterprise Business Intelligence. But Pig really appeals to me as someone who has spent so much time tuning SQL. The Hive optimizer is currently at the level of early rule-based RDBMS optimizers from the early 90s. It will get better and get better quickly, but given the massive size of most Hadoop clusters, the cost of a poorly optimized HQL statement is really high. Explicitly specifying the execution plan in Pig arguably gives the programmer more control and lessens the likelihood of the “HQL statement from Hell” brining a cluster to it’s knees.
So I’ve started learning Pig, using the familiar (to me) Oracle sample schema which I downloaded using SQOOP. (Hint: Pig likes tab separated files, so use the --fields-terminated-by '\t' flag in your SQOOP job).
Here’s a diagram I created showing how some of the more familiar HQL idioms are implemented in Pig:
Note how using Pig we explicitly control the execution plan: In HQL it’s up to the optimizer whether tables are joined before or after the “country_region=’Asia’” filter is applied. In Pig I explicitly execute the filter before the join. It turns out that the Hive optimizer does the same thing, but for complex data flows being able to explicitly control the sequence of events can be an advantage.
Pig is only a little more wordy than HQL and while I definitely like the familiar syntax of HQL I really like the additional control of Pig.
The Most Brilliant Science Graphic I Have Ever Seen
The below link takes you to an absolutely fantastic interactive demonstration of the relative size of everything. Everything. Stop reading this and go look at it, when it finishes loading, move the blue blob at the bottom of the screen left and right.
The Relative_scale_of_everything
The raw web link is:
http://www.primaxstudio.com/stuff/scale_of_universe/scale-of-universe-v1.swf
The web page says scale_of_the_universe but it should be relative_scale_of_everything_in_the_universe. Did you go look at it? NO!?! If it’s because you have seen it before then fair enough – otherwise stop reading this stupid blog and Look At It! NOW! GO ON!!!
Yes, I do think it is good.
I have to thank Neil Chandler for his tweet about this web page which led me to look at it. Neil and I talked about relative sizes of things in the pub towards the end of last year, in one of the Oracle London Beers sessions. I think it was Neil himself who suggested we should convert MB, GB and TB into time to get a real feel for the size of data we are talking about, you know, when we chuck the phrases GB and TB around with abandon. Think of 1KB as a second. A small amount of time for what is now regarded as a small amount of data – This blog so far is around 1.2kb of letters. Given this scale:
1KB = 1 second. About the time it takes to blink 5, possibly 6 times, as fast as you can.
1MB = Just under 17 minutes. Time enough to cook fish fingers and chips from scratch.
1GB = 11 and a half days. 1KB->1GB is 1 second -> 1.5 weeks.
1TB = Just under 32 years. Yes, from birth to old enough to see your first returning computer fad.
1PB = pretty much all of known human history, cave paintings and Egyptian pyramids excepting, as the Phoenicians invented writing about 1150BC ago.
The wonderful thing about the web page this blog is about is that you can scan in and out and see the relative sizes of things, step by step, nice and slowly. Like how small our sun is compared to proper big ones and how the Earth is maybe not quite as small compared to Saturn as you thought. At the other end of the scale, how small a HIV virus is and how it compares to the pits in a CD and the tiniest of transistors on a silicon chip. I’m particularly struck by the size of DNA compared to a human red blood cell, as in how relatively large DNA is. Red blood cells are pretty big cells and yet all human cells (except, ahem, red blood cells) have 3.2 million letters of DNA in each and every one of them. That’s some packaging, as cells have a lot of other stuff in there too.
Down at the sub-atomic scale there are a fair number of gaps, where one graphic is pretty much off the scale before the next one resolves from a dot to anything discernable, but that is what it’s like down that end of things. Besides. It’s so small it’s hard to “look around” as there is nothing small enough (like, lightwaves went by several orders of magnitude ago) to look around with.
My one criticism? It’s a shame Blue Whale did not make it into the show
I actually had flashbacks looking at this web page. I remember, back in the mid-70′s I think, going to the cinema. Back then, you still had ‘B’ shows, a short film, cartoon or something before the main event. I no longer have a clue what the main event was, but the ‘B’ movie fascinated me. I think it started with a boy fishing next to a pond and it zoomed in to a mosquito on his arm, then into the skin and through the layers of tissue to blood vessels, to a blood cell… you get the idea, eventually to an atom. Some of the “zooming in” where it swapped between real footage was poor but it was 1970 or so and we knew no better. It then quickly zoomed back out to the boy, then to an aerial view of the field, out to birds-eye… satellite-like…the earth… solar system… I think it stopped at milky way. I wish I knew what that documentary was called or how to find it on the web…
{Update, see comments. Someone links to the film. I know I looked for this film a few years back and I did have a quick look again before I posted this message. I did not immediately find it but someone else did, in 10 seconds via Google. Shows how rubbish I am at using web searches…}
Curious Case Of The Ever Increasing Index Solution (A Big Hurt)
Faulty Quotes 7 – Deadlock Kills Sessions?
Curious Case Of The Ever Increasing Index Quiz (She’ll Drive The Big Car)
Index Organized Tables – A Start (Star)
What is the Meaning of the %CPU Column in an Explain Plan? 2
Book Review: Oracle Core Essential Internals for DBAs and Developers
Merry Christmas (plus long lost “The Jean Genie”)
The Transforming Face of the Oracle Support Site
Friday Philosophy – Christmas Cheer and Business Bah-Humbug
For many, today is the last working day before Christmas and the festive season – So I sincerely wish upon everyone a Merry Christmas.. If you don’t celebrate Christmas, well the intent of my wishes still holds – I hope everyone; whether working or not; religious leanings for, against or indifferent; has an enjoyable few days during whatever end-of-year festives you have.
I’m going to be miserably now. You might want to stop reading here and maybe go to the shops for that last spell of retail hell or some other Christmas tradition. It’s probably best if you do…
You see, despite the best wishes above, generally speaking I am not a big fan of Christmas and have not been for as long as I can remember. It is not the principle of Christmas I am not keen on {I rather like both the religious and secular aspects of the whole thing, especially the seeing-people part like Di and Bri and ringing up old friends}, it is what Business does to it. Like many people, I really object to the bombarding we endure of advertising, selling and down-right commercialist bullying for what seems to be 3 months on the run-up to Christmas. I know, I know, many people make this very same point ad nauseum around this time. What ticks me off the most is that I don’t think it would be an easy thing to change, for the fundamental reason that the businesses that are so set on telling us that Christmas will not be as good as it could be if we don’t buy their food to make us fat/get expensive presents for the kids to break/buy this bottle of smelly stuff so we get more sex/buy this booze cheap, probably for the same reason as the smelly stuff {or to help ignore the lack of sex}/take out a loan to make this Christmas REALLY “special” and you can pay it off for the whole of the rest of the year and be miserable as a result, {pause to catch breath…} as I was saying, any business that sells more stuff as a result of their advertising, no matter how much it annoys other people or adds to the degrading of the whole Christmas experience, will do better than a company that does not. And so will out-compete less tacky, crass and manipulative businesses.
That’s the huge problem with Christmas and other celebratory times. We live in a commercial society and commercial selection pressure means those companies that can squeeze the most out of a situation to sell tat will win. They give not a hoot about if we enjoy ourselves really {we are back to the smelly stuff and booze again, aren’t we?}, it’s profit. Oh, if enjoying ourselves in some way aids them in getting more profit then they won’t object, but it is not in the company mission statement of 99% of companies – and any that it is in are doing it for cynical, commercialist reasons.
So, all successful businesses are Evil and are ruining Christmas for us all {OK, so that’s a bit of a big leap, stay with me….} So, have your revenge!!!
Next year:
- Don’t buy stuff people probably don’t want. No adult wants 95% of what they get so….get nothing.
- Tell everyone “I have all the stuff I need, buy yourself something instead – treat yourself on me”. You can buy the stuff you really want from the savings from point 1.
- Having established the principle of reciprocal meanness above, that’s all that shopping hassle ditched.
- Get normal food you like {and that does not play merry hell with your digestive system}. Preferably stuff you can freeze or keep a while, so you don’t need to go into the supermarket after Dec 20th.
- Turn off the TV in December {or at least record everything and skip the adverts}. There is no decent TV in December anyway, it is all being saved up for the end of the month and, heck, even that is pretty awful.
- Don’t read the paper. Or if you do, if you must, first four pages and last four pages only and scribble over adverts with a felt-tip pen. You’ll get the gist of world events and if your team is winning or losing.
- That company you work for, that thinks paying you a wage means it owns your soul? It’s Evil, you owe them nothing they are not getting out of you already, so have a nice break at Christmas. {Unless you work at the same place as me, then they will need you to fill in for me as I will be on holiday}.
You will now be more relaxed, less stressed, have more time and generally be a nicer person. Take people to the pub, spend more time with people who like you being around (and this will be easier due to the people who no longer like you as you did not buy them any socks or a rubbish “humorous” golf book). Do things you actually enjoy. This year it is just going to be me, my gorgeous wife and the cat over Christmas and Boxing Day. The cat is really happy about this as we both like scratching the cat’s ears.
I might invite some neighbours over. They won’t come as they have to fulfil their awful Christmas Obligations – but they will like the fact they were invited. Heck, if they do turn up I’ll be in such a fine, happy mood I will even be nice to them.
Go and walk the hills of Mid Wales with your brother and relax.
Extended DISPLAY_CURSOR With Rowsource Statistics
So this will be my Oracle related Christmas present for you: A prototype implementation that extends the DBMS_XPLAN.DISPLAY_CURSOR output making it hopefully more meaningful and easier to interpret. It is a simple standalone SQL*Plus script with the main functionality performed by a single SQL query. I've demoed this also during my recent "optimizer hacking sessions".
DBMS_XPLAN.DISPLAY_CURSOR together with the Rowsource Statistics feature (enabled via SQL_TRACE, GATHER_PLAN_STATISTICS hint, STATISTICS_LEVEL set to ALL or controlled via the corresponding hidden parameters "_rowsource_execution_statistics" and "_rowsource_statistics_sampfreq") allows since Oracle 10g a sophisticated analysis of the work performed by a single SQL statement.
Of course you'll appreciate that it doesn't go as far as the Real-Time SQL Monitoring feature added in Oracle 11g but only available with Enterprise Edition + Diagnostic + Tuning Pack that is "always on" and provides similar (and much more) information while a statement is executing and doesn't require reproducing the execution with the corresponding hints / parameters set.
It's usually necessary to reproduce the execution without the Tuning Pack because the overhead of the Rowsource Statistics is significant and therefore it doesn't make sense to have them always enabled - unfortunately Oracle 11g gathers the same information "always on" but you're only allowed to access that information if you have the Tuning Pack license.
But for users without the corresponding licenses DBMS_XPLAN.DISPLAY_CURSOR together with Rowsource Statistics is still a very valuable tool.
However during my seminars and consulting at client sites I've realized that people quite often struggle to interpret the output provided for several reasons:
1. They have problems in general to interpret the execution plan - here I refer in particular to the flow of execution and underlying execution mechanics
2. They have problems in identifying the operations that are responsible for the majority of the work due to the cumulative nature of the work-related figures provided like Elapsed Time, Logical I/O, Physical Reads etc.
3. They are potentially mislead when trying to identify those steps in the execution plan that are subject to cardinality mis-estimates of the optimizer - the single most common reason for inefficient execution plans - due to the way the optimizer shows the number of estimated rows for operations that are executed multiple times (for example the inner row source of a Nested Loop join).
I've tried to address all of the above points (and even more) with this prototype implementation. In fact point 1 above has already been addressed by Adrian Billington's XPLAN wrapper utility that adds the Parent ID and Order of Execution to the DBMS_XPLAN output and I've picked up that idea of injecting additional information into the output with this prototype, so kudos to Adrian for his great idea and implementation.
Apart from any home-grown scripts there have probably been numerous attempts to address point 2 und 3 above, the latest one I know of being Kyle Hailey's DISPLAY_CURSOR post and his "TCF query" provided in the same article. I've included his TCF-GRAPH and LIO-RATIO information, so also thanks to Kyle for posting this.
I plan to eventually turn this into a SQL statement analysis "Swiss-army knife" for non-Tuning Pack users with more sophisticated formatting options (for example specifying which columns to show and in which order) and the ability to combine the information with the ASH info available from the Diagnostic Pack license (similar to the output provided by the Real-Time SQL Monitoring text mode).
However I believe that this prototype is already quite helpful and therefore decided to publish it as it is.
Let's have a look what the extended output has to offer by performing a couple of sample Rowsource Profiles.
Examples
The first example is deliberately kept as simple as possible to explain the basic functionality by performing a full table scan.
SQL> alter session set statistics_level = all;
Session altered.
Elapsed: 00:00:00.00
SQL>
SQL> select count(*) from t1;
COUNT(*)
----------
1000000
Elapsed: 00:00:03.01
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID 5bc0v4my7dvr5, child number 0
-------------------------------------
select count(*) from t1
Plan hash value: 3724264953
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 3 | SELECT STATEMENT | | 1 | | 1 |00:00:02.98 | 15390 | 15386 | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 1 | 0 | 2 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:02.98 | 15390 | 15386 | 00:00:01.39 | 0 | 0 | @@@@@@ | | | 0 | | 1 |
| 2 | 1 | 1 | TABLE ACCESS FULL| T1 | 1 | 1000K| 1000K|00:00:01.58 | 15390 | 15386 | 00:00:01.58 | 15390 | 15386 | @@@@@@ | @@@@@@@@@@@@| @@@@@@@@@@@@| 0 | | 1000K|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
14 rows selected.
The first thing that becomes obvious is the fact that you need a veeery wide display setting to see all the columns provided :-)
As you can see if you call the script without any parameters it will try to pick up the last statement executed by the current session and call DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST formatting option. Further options can be found in the documentation provided with the script.
To the left you can see the "Pid" and "Ord" column that Adrian added in his original XPLAN wrapper script - these define the Parent Id as well as the Order of Execution. Note that this Order of Execution is only correct for the common cases - it doesn't cater for the various exceptions to the general rules and therefore can be misleading. You'll find below an example that demonstrates this.
Furthermore you see in addition the following columns to what is provided out of the box by DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST option:
A-Time Self: This is the time spent on the operation itself. For leaf operations this corresponds to the A-Time, but for all non-leaf operations this is the time that was spent on the operation itself obtained by subtracting the time spent on all direct descendant operations from the time shown for the parent operation. Please note that if you use a lower rowsource sample frequency (for example as set by the GATHER_PLAN_STATISTICS hint) the A-Time information will be pretty wrong and misleading. You need to set the sample frequency to 1 to get a stable time information reported - of course this means that the overhead of the rowsource sampling gets maximized
Bufs Self/Reads Self/Write Self: This is the corresponding self-operation statistic obtaining in the same way as just described
Graphs: The self-operation work shown relative to the total work performed. Note that the "total" is defined by querying the MAX value found in the statistics rather than picking the top-most cumulative value. This is because for queries that are cancelled or performed using Parallel Execution the top-most value may either not be populated at all or may be different from the values accumulated by the Parallel Slaves. So there are cases where the Graphs may be wrong and misleading - treat them carefully
LIO Ratio: This the simply the ratio between the number of rows generated by the row source and the number of logical I/O recorded for the particular operation required to generate them. As usual care should be taken when interpreting a ratio, but in general a high value here might indicate that there are more efficient ways to generate the data, like a more selective access path. This can be very misleading for aggregation operations for example - a COUNT(*) will potentially show a huge LIO ratio but doesn't indicate a problem by itself
TCF Graph: "Tuning by Cardinality Feedback" - this is a graph in a different style - it shows either plus or minus signs and each plus / minus corresponds to one order of magnitude difference between the estimated and the actual rows. Plus stands for underestimates, minus for overestimates. So two plus signs indicate that the the actual number of rows was 100 times greater than the estimated number, and similarly two minus signs would indicate an overestimate of factor 100. Note that this information will be partially misleading with Parallel Execution, because an operation that is only started once with serial execution might be started several times to obtain the complete result set when executed in parallel. Also cancelling queries might show misleading information here, see the "E-Rows*Sta" column description for an explanation why.
E-Rows*Sta: This is the estimated number of rows times the Starts column. This tries to address the point 3 above where the simple comparison of E-Rows and A-Rows can be very misleading, however doesn't indicate a problem at all if the operation has been started a corresponding number of times. If a query gets cancelled then this might still indicate a difference between this and A-Rows simply because the operation wasn't run to completion. Also for Parallel Execution this information needs to be carefully treated because an operation executed in parallel will be started many times that gets only executed once with serial execution
Looking at above example graph it becomes obvious that all of the logical and physical I/O has been caused by the full table scan of course, but with the increased STATISTICS_LEVEL setting you can see that the SORT AGGREGATE function also required some time - presumably CPU time due to instrumentation overhead whereas the top-most operation didn't account for any work at all. The cardinality estimate is also spot on.
The next example shows a different shape:
SQL> select
2 sum(row_num)
3 from
4 (
5 select
6 row_number() over (partition by object_type order by object_name) as row_num
7 , t.*
8 from
9 t
10 where
11 object_id > :x
12 );
2.7338E+11
Elapsed: 00:00:41.83
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID fmbq5ytmh0hng, child number 0
-------------------------------------
select sum(row_num) from ( select
row_number() over (partition by object_type order by object_name) as
row_num , t.* from t
where object_id > :x )
Plan hash value: 1399240396
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp| A-Time Self |Bufs Self |Reads Self|Write Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|Write S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 6 | SELECT STATEMENT | | 1 | | 1 |00:00:41.84 | 1469K| 49356 | 10578 | | | | | 00:00:00.00 | 0 | 0 | 0 | | | | | 0 | | |
| 1 | 0 | 5 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:41.84 | 1469K| 49356 | 10578 | | | | | 00:00:02.19 | 0 | 0 | 0 | @ | | | | 0 | | 1 |
| 2 | 1 | 4 | VIEW | | 1 | 16 | 1466K|00:00:39.64 | 1469K| 49356 | 10578 | | | | | 00:00:04.22 | 0 | 0 | 0 | @ | | | | 0 | ++++ | 16 |
| 3 | 2 | 3 | WINDOW SORT | | 1 | 16 | 1466K|00:00:35.42 | 1469K| 49356 | 10578 | 93M| 3312K| 55M (1)| 84992 | 00:00:11.85 | 6 | 24410 | 10578 | @@@ | | @@@@@@ | @@@@@@@@@@@@| 0 | ++++ | 16 |
| 4 | 3 | 2 | TABLE ACCESS BY INDEX ROWID| T | 1 | 16 | 1466K|00:00:23.58 | 1469K| 24946 | 0 | | | | | 00:00:19.73 | 1466K| 21707 | 0 | @@@@@@ | @@@@@@@@@@@@| @@@@@ | | 1 | ++++ | 16 |
|* 5 | 4 | 1 | INDEX RANGE SCAN | I | 1 | 16 | 1466K|00:00:03.85 | 3240 | 3239 | 0 | | | | | 00:00:03.85 | 3240 | 3239 | 0 | @ | | @ | | 0 | ++++ | 16 |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("OBJECT_ID">:X)
25 rows selected.
Have I already mentioned that you need a veery wide display setting :-) ??
Anyway here we can see a couple of interesting points:
- An example of a parent operation requiring a significant amount of time - in this case a WINDOW SORT operation that spills to disk (see the Used-Tmp and Writes columns)
- A problem with the cardinality estimates as indicated by the TCF Graph. In this case it is the reason for an inefficient index-based access path. Note that the LIO Ratio isn't indicating this problem here very clearly
- The majority of the logical I/O (and time and work) is caused by the random access to the table, again caused by the bad choice of the optimizer due to the wrong cardinality estimates
Here is another example of a more complex execution plan:
SQL> alter session set statistics_level = all;
Session altered.
Elapsed: 00:00:00.00
SQL>
SQL> alter session set star_transformation_enabled = temp_disable;
Session altered.
Elapsed: 00:00:00.00
SQL>
SQL> select * from (
2 select t1.id as t1_id, t1.filler, s.id as s_id from t1, (
3 select
4 f.id
5 from
6 t f
7 , (select * from d where is_flag_d1 = 'Y') d1
8 , (select * from d where is_flag_d2 = 'Y') d2
9 , (select * from d where is_flag_d3 = 'Y') d3
10 where
11 f.fk1 = d1.id
12 and f.fk2 = d2.id
13 and f.fk3 = d3.id
14 ) s
15 where t1.id = s.id
16 )
17 where rownum > 1
18 ;
no rows selected
Elapsed: 00:00:21.26
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID 5u3x96k4s5zt6, child number 0
-------------------------------------
select * from ( select t1.id as t1_id, t1.filler, s.id as s_id from t1,
( select f.id from t f , (select * from d where
is_flag_d1 = 'Y') d1 , (select * from d where is_flag_d2 = 'Y') d2
, (select * from d where is_flag_d3 = 'Y') d3 where f.fk1 =
d1.id and f.fk2 = d2.id and f.fk3 = d3.id ) s where t1.id = s.id
) where rownum > 1
Plan hash value: 42027304
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 30 | SELECT STATEMENT | | 1 | | 0 |00:00:21.23 | 2161K| 43798 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
| 1 | 0 | 29 | COUNT | | 1 | | 0 |00:00:21.23 | 2161K| 43798 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
|* 2 | 1 | 28 | FILTER | | 1 | | 0 |00:00:21.23 | 2161K| 43798 | | | | 00:00:00.44 | 0 | 0 | | | | | | |
| 3 | 2 | 27 | NESTED LOOPS | | 1 | | 1000K|00:00:20.79 | 2161K| 43798 | | | | 00:00:00.44 | 0 | 0 | | | | 0 | | |
| 4 | 3 | 25 | NESTED LOOPS | | 1 | 9 | 1000K|00:00:11.10 | 2131K| 21150 | | | | 00:00:00.41 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 5 | 4 | 23 | HASH JOIN | | 1 | 9 | 1000K|00:00:06.12 | 19549 | 17970 | 33M| 6589K| 65M (0)| 00:00:00.59 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 6 | 5 | 21 | HASH JOIN | | 1 | 9 | 1000K|00:00:05.53 | 19385 | 17970 | 37M| 6044K| 69M (0)| 00:00:00.57 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 7 | 6 | 19 | HASH JOIN | | 1 | 10 | 1000K|00:00:04.95 | 19221 | 17970 | 1452K| 1452K| 1002K (0)| 00:00:00.53 | 0 | 0 | | | | 0 | +++++ | 10 |
|* 8 | 7 | 1 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
| 9 | 7 | 18 | VIEW | VW_ST_84A34AF1 | 1 | 10 | 1000K|00:00:04.42 | 19057 | 17970 | | | | 00:00:00.18 | 0 | 0 | | | | 0 | +++++ | 10 |
| 10 | 9 | 17 | NESTED LOOPS | | 1 | 10 | 1000K|00:00:04.24 | 19057 | 17970 | | | | 00:00:00.37 | 0 | 0 | | | | 0 | +++++ | 10 |
| 11 | 10 | 15 | BITMAP CONVERSION TO ROWIDS| | 1 | 10 | 1000K|00:00:00.41 | 2107 | 1020 | | | | 00:00:00.11 | 0 | 0 | | | | 0 | +++++ | 10 |
| 12 | 11 | 14 | BITMAP AND | | 1 | | 11 |00:00:00.30 | 2107 | 1020 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 13 | 12 | 5 | BITMAP MERGE | | 1 | | 11 |00:00:00.12 | 863 | 400 | 1024K| 512K| 2804K (0)| 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 14 | 13 | 4 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.10 | 863 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 15 | 14 | 2 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 16 | 14 | 3 | BITMAP INDEX RANGE SCAN| T_FK1 | 100 | | 800 |00:00:00.10 | 699 | 400 | | | | 00:00:00.10 | 699 | 400 | | | | 0 | | |
| 17 | 12 | 9 | BITMAP MERGE | | 1 | | 11 |00:00:00.12 | 847 | 400 | 2802K| 512K| 2804K (0)| 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 18 | 17 | 8 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.10 | 847 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 19 | 18 | 6 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 20 | 18 | 7 | BITMAP INDEX RANGE SCAN| T_FK3 | 100 | | 800 |00:00:00.10 | 683 | 400 | | | | 00:00:00.10 | 683 | 400 | | | | 0 | | |
| 21 | 12 | 13 | BITMAP MERGE | | 1 | | 11 |00:00:00.06 | 397 | 220 | 1024K| 512K| 1581K (0)| 00:00:00.01 | 0 | 0 | | | | 0 | | |
| 22 | 21 | 12 | BITMAP KEY ITERATION | | 1 | | 440 |00:00:00.05 | 397 | 220 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 23 | 22 | 10 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
|* 24 | 22 | 11 | BITMAP INDEX RANGE SCAN| T_FK2 | 10 | | 440 |00:00:00.05 | 233 | 220 | | | | 00:00:00.05 | 233 | 220 | | | | 0 | | |
| 25 | 10 | 16 | TABLE ACCESS BY USER ROWID | T | 1000K| 1 | 1000K|00:00:03.46 | 16950 | 16950 | | | | 00:00:03.46 | 16950 | 16950 | @@ | | @@@@@ | 0 | | 1000K|
|* 26 | 6 | 20 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 27 | 5 | 22 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 28 | 4 | 24 | INDEX RANGE SCAN | T1_IDX | 1000K| 1 | 1000K|00:00:04.57 | 2111K| 3180 | | | | 00:00:04.57 | 2112K| 3180 | @@@ | @@@@@@@@@@@@| @ | 2 | | 1000K|
| 29 | 3 | 26 | TABLE ACCESS BY INDEX ROWID | T1 | 1000K| 1 | 1000K|00:00:09.25 | 29628 | 22648 | | | | 00:00:09.25 | 29628 | 22648 | @@@@@ | | @@@@@@ | 0 | | 1000K|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM>1)
5 - access("ITEM_1"="D"."ID")
6 - access("ITEM_3"="D"."ID")
7 - access("ITEM_2"="D"."ID")
8 - filter("IS_FLAG_D2"='Y')
15 - filter("IS_FLAG_D1"='Y')
16 - access("F"."FK1"="D"."ID")
19 - filter("IS_FLAG_D3"='Y')
20 - access("F"."FK3"="D"."ID")
23 - filter("IS_FLAG_D2"='Y')
24 - access("F"."FK2"="D"."ID")
26 - filter("IS_FLAG_D1"='Y')
27 - filter("IS_FLAG_D3"='Y')
28 - access("T1"."ID"="ITEM_4")
Note
-----
- star transformation used for this statement
68 rows selected.
This is another case where a more efficient execution plan could be found if the cardinality estimate was in the right ballpark - you can see this pretty clearly in the "TCF Graph" column. Due to the strong underestimation several bad choices have been made: Reading all rows from T by ROWID rather than performing simply a full table scan and again an index driven random access to T1 which drives up the logical I/O unnecessarily. This is a crafted example that minimizes the logical and physical I/O due to the good clustering of T1 in relation to the data returned by the driving row source - a more real-life bad clustering together with larger table sizes would have turned this into a more or less infinitely running query.
It is also an example that simply looking at E-Rows and A-Rows can be misleading: Check operations 28 and 29: A-Rows is 1000K but E-Rows is 1, so should this be worrying? Not at all if you look at the "E-Rows*Sta" column because the operation has been started 1000K times hence the estimate is spot on.
The "LIO Ratio" for operation 23 is 16 - this means it took 16 LIOs on average to generate a single row and might indicate that there are more efficient ways to generate those rows than a full table scan.
By the way, the 11g buffer pinning optimization also helped to minimize the logical I/O on the T1 table.
Here is the same query, but this time with a bad clustering of T1 - I've cancelled it after 40 seconds to give you an example of that you can use DBMS_XPLAN.DISPLAY_CURSOR without the need to run a statement for completion.
SQL> select * from (
2 select t1.id as t1_id, t1.filler, s.id as s_id from t1, (
3 select
4 f.id
5 from
6 t f
7 , (select * from d where is_flag_d1 = 'Y') d1
8 , (select * from d where is_flag_d2 = 'Y') d2
9 , (select * from d where is_flag_d3 = 'Y') d3
10 where
11 f.fk1 = d1.id
12 and f.fk2 = d2.id
13 and f.fk3 = d3.id
14 ) s
15 where t1.id = s.id
16 )
17 where rownum > 1
18 ;
select t1.id as t1_id, t1.filler, s.id as s_id from t1, (
*
ERROR at line 2:
ORA-01013: user requested cancel of current operation
Elapsed: 00:00:40.71
SQL> @xplan_extended_display_cursor
SQL_ID 5u3x96k4s5zt6, child number 0
-------------------------------------
select * from ( select t1.id as t1_id, t1.filler, s.id as s_id from t1,
( select f.id from t f , (select * from d where
is_flag_d1 = 'Y') d1 , (select * from d where is_flag_d2 = 'Y') d2
, (select * from d where is_flag_d3 = 'Y') d3 where f.fk1 =
d1.id and f.fk2 = d2.id and f.fk3 = d3.id ) s where t1.id = s.id
) where rownum > 1
Plan hash value: 42027304
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 30 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 0 | 0 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
| 1 | 0 | 29 | COUNT | | 1 | | 0 |00:00:00.01 | 0 | 0 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
|* 2 | 1 | 28 | FILTER | | 1 | | 0 |00:00:00.01 | 0 | 0 | | | | 00:00:00.00 | 0 | 0 | | | | | | |
| 3 | 2 | 27 | NESTED LOOPS | | 1 | | 102K|00:00:40.42 | 337K| 143K| | | | 00:00:00.16 | 0 | 0 | | | | 0 | | |
| 4 | 3 | 25 | NESTED LOOPS | | 1 | 9 | 102K|00:00:13.08 | 235K| 40794 | | | | 00:00:00.13 | 0 | 0 | | | | 0 | ++++ | 9 |
|* 5 | 4 | 23 | HASH JOIN | | 1 | 9 | 102K|00:00:05.70 | 19420 | 17970 | 33M| 6589K| 65M (0)| 00:00:00.40 | 0 | 0 | | | | 0 | ++++ | 9 |
|* 6 | 5 | 21 | HASH JOIN | | 1 | 9 | 1000K|00:00:05.30 | 19385 | 17970 | 37M| 6044K| 69M (0)| 00:00:00.57 | 0 | 0 | | | | 0 | +++++ | 9 |
|* 7 | 6 | 19 | HASH JOIN | | 1 | 10 | 1000K|00:00:04.73 | 19221 | 17970 | 1452K| 1452K| 1010K (0)| 00:00:00.52 | 0 | 0 | | | | 0 | +++++ | 10 |
|* 8 | 7 | 1 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
| 9 | 7 | 18 | VIEW | VW_ST_84A34AF1 | 1 | 10 | 1000K|00:00:04.21 | 19057 | 17970 | | | | 00:00:00.18 | 0 | 0 | | | | 0 | +++++ | 10 |
| 10 | 9 | 17 | NESTED LOOPS | | 1 | 10 | 1000K|00:00:04.03 | 19057 | 17970 | | | | 00:00:00.37 | 0 | 0 | | | | 0 | +++++ | 10 |
| 11 | 10 | 15 | BITMAP CONVERSION TO ROWIDS| | 1 | 10 | 1000K|00:00:00.42 | 2107 | 1020 | | | | 00:00:00.10 | 0 | 0 | | | | 0 | +++++ | 10 |
| 12 | 11 | 14 | BITMAP AND | | 1 | | 11 |00:00:00.32 | 2107 | 1020 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 13 | 12 | 5 | BITMAP MERGE | | 1 | | 11 |00:00:00.13 | 863 | 400 | 1024K| 512K| | 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 14 | 13 | 4 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.11 | 863 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 15 | 14 | 2 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 16 | 14 | 3 | BITMAP INDEX RANGE SCAN| T_FK1 | 100 | | 800 |00:00:00.11 | 699 | 400 | | | | 00:00:00.11 | 699 | 400 | | | | 0 | | |
| 17 | 12 | 9 | BITMAP MERGE | | 1 | | 11 |00:00:00.13 | 847 | 400 | 2802K| 512K| 2804K (0)| 00:00:00.02 | 0 | 0 | | | | 0 | | |
| 18 | 17 | 8 | BITMAP KEY ITERATION | | 1 | | 800 |00:00:00.11 | 847 | 400 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 19 | 18 | 6 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 20 | 18 | 7 | BITMAP INDEX RANGE SCAN| T_FK3 | 100 | | 800 |00:00:00.11 | 683 | 400 | | | | 00:00:00.11 | 683 | 400 | | | | 0 | | |
| 21 | 12 | 13 | BITMAP MERGE | | 1 | | 11 |00:00:00.06 | 397 | 220 | 1024K| 512K| | 00:00:00.01 | 0 | 0 | | | | 0 | | |
| 22 | 21 | 12 | BITMAP KEY ITERATION | | 1 | | 440 |00:00:00.05 | 397 | 220 | | | | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 23 | 22 | 10 | TABLE ACCESS FULL | D | 1 | 10 | 10 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 16 | | 10 |
|* 24 | 22 | 11 | BITMAP INDEX RANGE SCAN| T_FK2 | 10 | | 440 |00:00:00.05 | 233 | 220 | | | | 00:00:00.05 | 233 | 220 | | | | 0 | | |
| 25 | 10 | 16 | TABLE ACCESS BY USER ROWID | T | 1000K| 1 | 1000K|00:00:03.24 | 16950 | 16950 | | | | 00:00:03.24 | 16950 | 16950 | @ | @ | @ | 0 | | 1000K|
|* 26 | 6 | 20 | TABLE ACCESS FULL | D | 1 | 100 | 100 |00:00:00.01 | 164 | 0 | | | | 00:00:00.00 | 164 | 0 | | | | 1 | | 100 |
|* 27 | 5 | 22 | TABLE ACCESS FULL | D | 1 | 100 | 11 |00:00:00.01 | 35 | 0 | | | | 00:00:00.00 | 35 | 0 | | | | 3 | | 100 |
|* 28 | 4 | 24 | INDEX RANGE SCAN | T1_IDX | 102K| 1 | 102K|00:00:07.26 | 216K| 22824 | | | | 00:00:07.26 | 216K| 22824 | @@ | @@@@@@@@ | @@ | 2 | | 102K|
| 29 | 3 | 26 | TABLE ACCESS BY INDEX ROWID | T1 | 102K| 1 | 102K|00:00:27.19 | 102K| 102K| | | | 00:00:27.19 | 102K| 102K| @@@@@@@@ | @@@@ | @@@@@@@@@ | 1 | | 102K|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(ROWNUM>1)
5 - access("ITEM_1"="D"."ID")
6 - access("ITEM_3"="D"."ID")
7 - access("ITEM_2"="D"."ID")
8 - filter("IS_FLAG_D2"='Y')
15 - filter("IS_FLAG_D1"='Y')
16 - access("F"."FK1"="D"."ID")
19 - filter("IS_FLAG_D3"='Y')
20 - access("F"."FK3"="D"."ID")
23 - filter("IS_FLAG_D2"='Y')
24 - access("F"."FK2"="D"."ID")
26 - filter("IS_FLAG_D1"='Y')
27 - filter("IS_FLAG_D3"='Y')
28 - access("T1"."ID"="ITEM_4")
Note
-----
- star transformation used for this statement
68 rows selected.
The result is similar to the previous, but you can see that the increased number of physical reads on the T1 table segment slowed down the execution significantly.
Here is an example of Parallel Execution. Note that I strongly recommend the Real-Time SQL Monitoring feature if you have to deal a lot with Parallel Execution, because it is offering much more insight and information than DBMS_XPLAN.DISPLAY_CURSOR.
SQL> alter session set statistics_level = all;
Session altered.
Elapsed: 00:00:00.01
SQL>
SQL> select /*+ parallel(t1, 4) */ count(*) from t1;
COUNT(*)
----------
1000000
Elapsed: 00:00:01.59
SQL>
SQL> @xplan_extended_display_cursor "" "" ALLSTATS
SQL> set echo off verify off termout off
SQL_ID 92661sht5tyw1, child number 0
-------------------------------------
select /*+ parallel(t1, 4) */ count(*) from t1
Plan hash value: 3110199320
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 7 | SELECT STATEMENT | | 1 | | 1 |00:00:01.53 | 7 | 2 | 00:00:00.00 | 0 | 0 | | | | 0 | | |
| 1 | 0 | 6 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:01.53 | 7 | 2 | 00:00:00.00 | 0 | 0 | | | | 0 | | 1 |
| 2 | 1 | 5 | PX COORDINATOR | | 1 | | 4 |00:00:01.53 | 7 | 2 | 00:00:01.53 | 7 | 2 | @@@ | | | 1 | | |
| 3 | 2 | 4 | PX SEND QC (RANDOM) | :TQ10000 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | 00:00:00.00 | 0 | 0 | | | | | | 0 |
| 4 | 3 | 3 | SORT AGGREGATE | | 4 | 1 | 4 |00:00:05.84 | 15541 | 15385 | 00:00:00.14 | 0 | 3841 | | | @@@ | 0 | | 4 |
| 5 | 4 | 2 | PX BLOCK ITERATOR | | 4 | 1000K| 1000K|00:00:05.71 | 15541 | 11544 | 00:00:00.27 | 0 | 0 | @ | | | 0 | | 4000K|
|* 6 | 5 | 1 | TABLE ACCESS FULL| T1 | 52 | 1000K| 1000K|00:00:05.44 | 15541 | 15385 | 00:00:05.44 | 15541 | 15385 | @@@@@@@@@@@ | @@@@@@@@@@@@| @@@@@@@@@@@@| 0 | - | 52M|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access(:Z>=:Z AND :Z<=:Z)
23 rows selected.
The important points to consider when dealing with Parallel Execution are:
- You need to use ALLSTATS instead of ALLSTATS LAST in order to get a meaningful output. ALLSTATS LAST would only show the activity of the Query Coordinator. However ALLSTATS means that it shows the statistics cumulative for all executions so far, so if the query has been executed multiple times this will show you not only the statistics of the last execution. If you want to ensure to see only the statistics for the last execution you need to create a new cursor, for example by adding a simple comment that makes the SQL statement text unique
- You can see in the output that the "TCF Graph" and "E-Rows*Sta" columns can be misleading for Parallel Execution - the full table scan has been divided into 52 chunks executed by four parallel slaves, hence the Starts column shows 52 but the cardinality estimate of 1000K rows was spot on instead of wrong by factor 52
- The elapsed time information for the parts executed in parallel is not the wall clock time but the accumulated time spent by all parallel slaves, hence the Graphs will be partially misleading due to the MAX value found
- In this case also the "Reads Self" column seems to indicate reads by the SORT AGGREGATE operation - this looks questionable, too
The last example shows that it takes just a simple scalar subquery to make the output misleading again - so be aware that are some exceptions (like scalar / early filter subqueries, certain Parallel Execution plans etc.) to the rules how to interpret execution plans and usually any automated interpretation of such plans is therefore mislead:
SQL> select count(id) from (select (select id from t1 t1_i where t1_i.id = t1.id) as id from t1);
COUNT(ID)
----------
1000000
Elapsed: 00:00:17.50
SQL>
SQL> @xplan_extended_display_cursor
SQL> set echo off verify off termout off
SQL_ID af2gry2z9g7vt, child number 0
-------------------------------------
select count(id) from (select (select id from t1 t1_i where t1_i.id =
t1.id) as id from t1)
Plan hash value: 1144741071
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Pid | Ord | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | A-Time Self |Bufs Self |Reads Self|A-Ti S-Graph |Bufs S-Graph |Reads S-Graph|LIO Ratio |TCF Graph |E-Rows*Sta|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | | 4 | SELECT STATEMENT | | 1 | | 1 |00:00:17.51 | 1551K| 17617 | 00:00:00.00 | 0 | 0 | | | | 0 | | |
|* 1 | 0 | 1 | INDEX RANGE SCAN | T1_IDX | 1000K| 1 | 1000K|00:00:09.36 | 1536K| 2231 | 00:00:09.36 | 1536K| 2231 | @@@@@@ | @@@@@@@@@@@@| @@ | 1 | | 1000K|
| 2 | 0 | 3 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.51 | 1551K| 17617 | 00:00:15.76 | 1536K| 2231 | @@@@@@@@@@@ | @@@@@@@@@@@@| @@ | 1536485 | | 1 |
| 3 | 2 | 2 | TABLE ACCESS FULL| T1 | 1 | 1000K| 1000K|00:00:01.74 | 15390 | 15386 | 00:00:01.74 | 15390 | 15386 | @ | | @@@@@@@@@@ | 0 | | 1000K|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1_I"."ID"=:B1)
The scalar subquery is shown as child operation to the root node (or in 10g with the missing ID = 0 operation in V$SQL_PLAN_STATISTICS(_ALL) as independent operation with no parent at all) and according to the usual rules therefore would be executed first (see the "Ord" column), but this is not true - the execution starts with the first leaf of the main branch of the plan (the full table scan of T1).
Note that not only the "Ord" column is wrong, also the fact that the SORT AGGREGATE operation includes the work performed by the scalar subquery is not interpreted correctly by the remaining logic calculating the operation self statistics.
The Script
Below you can find the current version of the script. If you're too lazy to copy&paste (and because I don't have a fancy "copy to clipboard" button) you can also download the script from here.
Of course I'm interested in feedback. This prototype is not tested much yet, so expect glitches and problems. If you get back to me with reproducible cases I'll try to address them and publish updated versions of the script.
A final note: This tool comes for free but with no warranties at all. Use at your own risk.
Happy rowsource profiling (and holiday season)!
set echo off verify off termout off
set doc off
doc
-- ----------------------------------------------------------------------------------------------
--
-- Script: xplan_extended_display_cursor.sql
--
-- Version: 0.9
-- December 2011
--
-- Author: Randolf Geist
-- oracle-randolf.blogspot.com
--
-- Description: A free-standing SQL wrapper over DBMS_XPLAN. Provides access to the
-- DBMS_XPLAN.DISPLAY_CURSOR pipelined function for a given SQL_ID and CHILD_NUMBER
--
-- This is a prototype for an extended analysis of the data provided by the
-- Runtime Profile (aka. Rowsource Statistics enabled via
-- SQL_TRACE = TRUE, STATISTICS_LEVEL = ALL or GATHER_PLAN_STATISTICS hint)
-- and reported via the ALLSTATS/MEMSTATS/IOSTATS formatting option of
-- DBMS_XPLAN.DISPLAY_CURSOR
--
-- Versions: This utility will work for all versions of 10g and upwards.
--
-- Required: The same access as DBMS_XPLAN.DISPLAY_CURSOR requires. See the documentation
-- of DISPLAY_CURSOR for your Oracle version for more information
--
-- The script directly queries
-- 1) V$SESSION
-- 2) V$SQL_PLAN_STATISTICS_ALL
--
-- Credits: Based on the original XPLAN implementation by Adrian Billington (http://www.oracle-developer.net/utilities.php
-- resp. http://www.oracle-developer.net/content/utilities/xplan.zip)
-- and inspired by Kyle Hailey's TCF query (http://dboptimizer.com/2011/09/20/display_cursor/)
--
-- Features: In addition to the PID (The PARENT_ID) and ORD (The order of execution, note that this doesn't account for the special cases so it might be wrong)
-- columns added by Adrian's wrapper the following additional columns over ALLSTATS are provided:
--
-- A_TIME_SELF : The time taken by the operation itself - this is the operation's cumulative time minus the direct descendants operation's cumulative time
-- LIO_SELF : The LIOs done by the operation itself - this is the operation's cumulative LIOs minus the direct descendants operation's cumulative LIOs
-- READS_SELF : The reads performed the operation itself - this is the operation's cumulative reads minus the direct descendants operation's cumulative reads
-- WRITES_SELF : The writes performed the operation itself - this is the operation's cumulative writes minus the direct descendants operation's cumulative writes
-- A_TIME_SELF_GRAPH : A graphical representation of A_TIME_SELF relative to the total A_TIME
-- LIO_SELF_GRAPH : A graphical representation of LIO_SELF relative to the total LIO
-- READS_SELF_GRAPH : A graphical representation of READS_SELF relative to the total READS
-- WRITES_SELF_GRAPH : A graphical representation of WRITES_SELF relative to the total WRITES
-- LIO_RATIO : Ratio of LIOs per row generated by the row source - the higher this ratio the more likely there could be a more efficient way to generate those rows (be aware of aggregation steps though)
-- TCF_GRAPH : Each "+"/"-" sign represents one order of magnitude based on ratio between E_ROWS_TIMES_START and A-ROWS. Note that this will be misleading with Parallel Execution (see E_ROWS_TIMES_START)
-- E_ROWS_TIMES_START : The E_ROWS multiplied by STARTS - this is useful for understanding the actual cardinality estimate for related combine child operations getting executed multiple times. Note that this will be misleading with Parallel Execution
--
-- More information including demos can be found online at http://oracle-randolf.blogspot.com/2011/12/extended-displaycursor-with-rowsource.html
--
-- Usage: @xplan_extended_display_cursor.sql [sql_id] [cursor_child_number] [format_option]
--
-- If both the SQL_ID and CHILD_NUMBER are omitted the previously executed SQL_ID and CHILD_NUMBER of the session will be used
-- If the SQL_ID is specified but the CHILD_NUMBER is omitted then CHILD_NUMBER 0 is assumed
--
-- This prototype does not support processing multiple child cursors like DISPLAY_CURSOR is capable of
-- when passing NULL as CHILD_NUMBER to DISPLAY_CURSOR. Hence a CHILD_NUMBER is mandatory, either
-- implicitly generated (see above) or explicitly passed
--
-- The default formatting option for the call to DBMS_XPLAN.DISPLAY_CURSOR is ALLSTATS LAST - extending this output is the primary purpose of this script
--
-- Note: You need a veeery wide terminal setting for this prototype, something like linesize 400 should suffice
--
-- This tool is free but comes with no warranty at all - use at your own risk
--
#
col plan_table_output format a400
set linesize 400 pagesize 0 tab off
/* ALLSTATS LAST is assumed as the default formatting option for DBMS_XPLAN.DISPLAY_CURSOR */
define default_fo = "ALLSTATS LAST"
column prev_sql_id new_value prev_sql_id
column prev_child_number new_value prev_cn
/* Get the previous command as default
if no SQL_ID / CHILD_NUMBER is passed */
select
prev_sql_id
, prev_child_number
from
v$session
where
sid = userenv('sid')
;
-- The following is a hack to use default
-- values for defines
column 1 new_value 1
column 2 new_value 2
column 3 new_value 3
select
'' as "1"
, '' as "2"
, '' as "3"
from
dual
where
rownum = 0;
column si new_value si
column cn new_value cn
column fo new_value fo
/* Use passed parameters else refer to previous SQL_ID / CHILD_NUMBER
ALLSTATS LAST is default formatting option */
select
nvl('&1', '&prev_sql_id') as si
, coalesce('&2', '&prev_cn', '0') as cn
, nvl('&3', '&default_fo') as fo
from
dual
;
column last new_value last
/* Last or all execution */
select
case
when instr('&fo', 'LAST') > 0
then 'last_'
end as last
from
dual
;
set termout on
with
-- The next three queries are based on the original XPLAN wrapper by Adrian Billington
-- to determine the PID and ORD information, only slightly modified to deal with
-- the 10g special case that V$SQL_PLAN_STATISTICS_ALL doesn't include the ID = 0 operation
-- and starts with 1 instead for Rowsource Statistics
sql_plan_data as
(
select
id
, parent_id
from
v$sql_plan_statistics_all
where
sql_id = '&si'
and child_number = &cn
),
hierarchy_data as
(
select
id
, parent_id
from
sql_plan_data
start with
id in
(
select
id
from
sql_plan_data p1
where
not exists
(
select
null
from
sql_plan_data p2
where
p2.id = p1.parent_id
)
)
connect by
prior id = parent_id
order siblings by
id desc
),
ordered_hierarchy_data as
(
select
id
, parent_id as pid
, row_number() over (order by rownum desc) as oid
, max(id) over () as maxid
, min(id) over () as minid
from
hierarchy_data
),
-- The following query uses the MAX values
-- rather than taking the values of PLAN OPERATION_ID = 0 (or 1 for 10g V$SQL_PLAN_STATISTICS_ALL)
-- for determining the grand totals
--
-- This is because queries that get cancelled do not
-- necessarily have yet sensible values in the root plan operation
--
-- Furthermore with Parallel Execution the elapsed time accumulated
-- with the ALLSTATS option for operations performed in parallel
-- will be greater than the wallclock elapsed time shown for the Query Coordinator
--
-- Note that if you use GATHER_PLAN_STATISTICS with the default
-- row sampling frequency the (LAST_)ELAPSED_TIME will be very likely
-- wrong and hence the time-based graphs and self-statistics will be misleading
--
-- Similar things might happen when cancelling queries
--
-- For queries running with STATISTICS_LEVEL = ALL (or sample frequency set to 1)
-- the A-TIME is pretty reliable
totals as
(
select
max(&last.cu_buffer_gets + &last.cr_buffer_gets) as total_lio
, max(&last.elapsed_time) as total_elapsed
, max(&last.disk_reads) as total_reads
, max(&last.disk_writes) as total_writes
from
v$sql_plan_statistics_all
where
sql_id = '&si'
and child_number = &cn
),
-- The totals for the direct descendants of an operation
-- These are required for calculating the work performed
-- by a (parent) operation itself
-- Basically this is the SUM grouped by PARENT_ID
direct_desc_totals as
(
select
sum(&last.cu_buffer_gets + &last.cr_buffer_gets) as lio
, sum(&last.elapsed_time) as elapsed
, sum(&last.disk_reads) as reads
, sum(&last.disk_writes) as writes
, parent_id
from
v$sql_plan_statistics_all
where
sql_id = '&si'
and child_number = &cn
group by
parent_id
),
-- Putting the three together
-- The statistics, direct descendant totals plus totals
extended_stats as
(
select
stats.id
, stats.parent_id
, stats.&last.elapsed_time as elapsed
, (stats.&last.cu_buffer_gets + stats.&last.cr_buffer_gets) as lio
, stats.&last.starts as starts
, stats.&last.output_rows as a_rows
, stats.cardinality as e_rows
, stats.&last.disk_reads as reads
, stats.&last.disk_writes as writes
, ddt.elapsed as ddt_elapsed
, ddt.lio as ddt_lio
, ddt.reads as ddt_reads
, ddt.writes as ddt_writes
, t.total_elapsed
, t.total_lio
, t.total_reads
, t.total_writes
from
v$sql_plan_statistics_all stats
, direct_desc_totals ddt
, totals t
where
stats.sql_id='&si'
and stats.child_number = &cn
and ddt.parent_id (+) = stats.id
),
-- Further information derived from above
derived_stats as
(
select
id
, greatest(elapsed - nvl(ddt_elapsed , 0), 0) as elapsed_self
, greatest(lio - nvl(ddt_lio, 0), 0) as lio_self
, trunc((greatest(lio - nvl(ddt_lio, 0), 0)) / nullif(a_rows, 0)) as lio_ratio
, greatest(reads - nvl(ddt_reads, 0), 0) as reads_self
, greatest(writes - nvl(ddt_writes,0) ,0) as writes_self
, total_elapsed
, total_lio
, total_reads
, total_writes
, trunc(log(10, nullif(starts * e_rows / nullif(a_rows, 0), 0))) as tcf_ratio
, starts * e_rows as e_rows_times_start
from
extended_stats
),
/* Format the data as required */
formatted_data1 as
(
select
id
, lio_ratio
, total_elapsed
, total_lio
, total_reads
, total_writes
, to_char(numtodsinterval(round(elapsed_self / 10000) * 10000 / 1000000, 'SECOND')) as e_time_interval
/* Imitate the DBMS_XPLAN number formatting */
, case
when lio_self >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when lio_self >= 10000000000000000000 then to_char(lio_self/1000000000000000000, 'FM99999') || 'E'
when lio_self >= 10000000000000000 then to_char(lio_self/1000000000000000, 'FM99999') || 'P'
when lio_self >= 10000000000000 then to_char(lio_self/1000000000000, 'FM99999') || 'T'
when lio_self >= 10000000000 then to_char(lio_self/1000000000, 'FM99999') || 'G'
when lio_self >= 10000000 then to_char(lio_self/1000000, 'FM99999') || 'M'
when lio_self >= 100000 then to_char(lio_self/1000, 'FM99999') || 'K'
else to_char(lio_self, 'FM99999') || ' '
end as lio_self_format
, case
when reads_self >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when reads_self >= 10000000000000000000 then to_char(reads_self/1000000000000000000, 'FM99999') || 'E'
when reads_self >= 10000000000000000 then to_char(reads_self/1000000000000000, 'FM99999') || 'P'
when reads_self >= 10000000000000 then to_char(reads_self/1000000000000, 'FM99999') || 'T'
when reads_self >= 10000000000 then to_char(reads_self/1000000000, 'FM99999') || 'G'
when reads_self >= 10000000 then to_char(reads_self/1000000, 'FM99999') || 'M'
when reads_self >= 100000 then to_char(reads_self/1000, 'FM99999') || 'K'
else to_char(reads_self, 'FM99999') || ' '
end as reads_self_format
, case
when writes_self >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when writes_self >= 10000000000000000000 then to_char(writes_self/1000000000000000000, 'FM99999') || 'E'
when writes_self >= 10000000000000000 then to_char(writes_self/1000000000000000, 'FM99999') || 'P'
when writes_self >= 10000000000000 then to_char(writes_self/1000000000000, 'FM99999') || 'T'
when writes_self >= 10000000000 then to_char(writes_self/1000000000, 'FM99999') || 'G'
when writes_self >= 10000000 then to_char(writes_self/1000000, 'FM99999') || 'M'
when writes_self >= 100000 then to_char(writes_self/1000, 'FM99999') || 'K'
else to_char(writes_self, 'FM99999') || ' '
end as writes_self_format
, case
when e_rows_times_start >= 18000000000000000000 then to_char(18000000000000000000/1000000000000000000, 'FM99999') || 'E'
when e_rows_times_start >= 10000000000000000000 then to_char(e_rows_times_start/1000000000000000000, 'FM99999') || 'E'
when e_rows_times_start >= 10000000000000000 then to_char(e_rows_times_start/1000000000000000, 'FM99999') || 'P'
when e_rows_times_start >= 10000000000000 then to_char(e_rows_times_start/1000000000000, 'FM99999') || 'T'
when e_rows_times_start >= 10000000000 then to_char(e_rows_times_start/1000000000, 'FM99999') || 'G'
when e_rows_times_start >= 10000000 then to_char(e_rows_times_start/1000000, 'FM99999') || 'M'
when e_rows_times_start >= 100000 then to_char(e_rows_times_start/1000, 'FM99999') || 'K'
else to_char(e_rows_times_start, 'FM99999') || ' '
end as e_rows_times_start_format
, rpad(' ', nvl(round(elapsed_self / nullif(total_elapsed, 0) * 12), 0) + 1, '@') as elapsed_self_graph
, rpad(' ', nvl(round(lio_self / nullif(total_lio, 0) * 12), 0) + 1, '@') as lio_self_graph
, rpad(' ', nvl(round(reads_self / nullif(total_reads, 0) * 12), 0) + 1, '@') as reads_self_graph
, rpad(' ', nvl(round(writes_self / nullif(total_writes, 0) * 12), 0) + 1, '@') as writes_self_graph
, ' ' ||
case
when tcf_ratio > 0
then rpad('-', tcf_ratio, '-')
else rpad('+', tcf_ratio * -1, '+')
end as tcf_graph
from
derived_stats
),
/* The final formatted data */
formatted_data as
(
select
/*+ Convert the INTERVAL representation to the A-TIME representation used by DBMS_XPLAN
by turning the days into hours */
to_char(to_number(substr(e_time_interval, 2, 9)) * 24 + to_number(substr(e_time_interval, 12, 2)), 'FM900') ||
substr(e_time_interval, 14, 9)
as a_time_self
, a.*
from
formatted_data1 a
),
/* Combine the information with the original DBMS_XPLAN output */
xplan_data as (
select
x.plan_table_output
, o.id
, o.pid
, o.oid
, o.maxid
, o.minid
, a.a_time_self
, a.lio_self_format
, a.reads_self_format
, a.writes_self_format
, a.elapsed_self_graph
, a.lio_self_graph
, a.reads_self_graph
, a.writes_self_graph
, a.lio_ratio
, a.tcf_graph
, a.total_elapsed
, a.total_lio
, a.total_reads
, a.total_writes
, a.e_rows_times_start_format
, x.rn
from
(
select /* Take advantage of 11g table function dynamic sampling */
/*+ dynamic_sampling(dc, 2) */
/* This ROWNUM determines the order of the output/processing */
rownum as rn
, plan_table_output
from
table(dbms_xplan.display_cursor('&si',&cn, '&fo')) dc
) x
, ordered_hierarchy_data o
, formatted_data a
where
o.id (+) = case
when regexp_like(x.plan_table_output, '^\|[\* 0-9]+\|')
then to_number(regexp_substr(x.plan_table_output, '[0-9]+'))
end
and a.id (+) = case
when regexp_like(x.plan_table_output, '^\|[\* 0-9]+\|')
then to_number(regexp_substr(x.plan_table_output, '[0-9]+'))
end
)
/* Inject the additional data into the original DBMS_XPLAN output
by using the MODEL clause */
select
plan_table_output
from
xplan_data
model
dimension by (rn as r)
measures
(
cast(plan_table_output as varchar2(4000)) as plan_table_output
, id
, maxid
, minid
, pid
, oid
, a_time_self
, lio_self_format
, reads_self_format
, writes_self_format
, e_rows_times_start_format
, elapsed_self_graph
, lio_self_graph
, reads_self_graph
, writes_self_graph
, lio_ratio
, tcf_graph
, total_elapsed
, total_lio
, total_reads
, total_writes
, greatest(max(length(maxid)) over () + 3, 6) as csize
, cast(null as varchar2(128)) as inject
, cast(null as varchar2(4000)) as inject2
)
rules sequential order
(
/* Prepare the injection of the OID / PID info */
inject[r] = case
/* MINID/MAXID are the same for all rows
so it doesn't really matter
which offset we refer to */
when id[cv(r)+1] = minid[cv(r)+1]
or id[cv(r)+3] = minid[cv(r)+3]
or id[cv(r)-1] = maxid[cv(r)-1]
then rpad('-', csize[cv()]*2, '-')
when id[cv(r)+2] = minid[cv(r)+2]
then '|' || lpad('Pid |', csize[cv()]) || lpad('Ord |', csize[cv()])
when id[cv()] is not null
then '|' || lpad(pid[cv()] || ' |', csize[cv()]) || lpad(oid[cv()] || ' |', csize[cv()])
end
/* Prepare the injection of the remaining info */
, inject2[r] = case
when id[cv(r)+1] = minid[cv(r)+1]
or id[cv(r)+3] = minid[cv(r)+3]
or id[cv(r)-1] = maxid[cv(r)-1]
then rpad('-',
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
14 else 0 end /* A_TIME_SELF */ +
case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) > 0 then
11 else 0 end /* LIO_SELF */ +
case when coalesce(total_reads[cv(r)+1], total_reads[cv(r)+3], total_reads[cv(r)-1]) > 0 then
11 else 0 end /* READS_SELF */ +
case when coalesce(total_writes[cv(r)+1], total_writes[cv(r)+3], total_writes[cv(r)-1]) > 0 then
11 else 0 end /* WRITES_SELF */ +
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
14 else 0 end /* A_TIME_SELF_GRAPH */ +
case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) > 0 then
14 else 0 end /* LIO_SELF_GRAPH */ +
case when coalesce(total_reads[cv(r)+1], total_reads[cv(r)+3], total_reads[cv(r)-1]) > 0 then
14 else 0 end /* READS_SELF_GRAPH */ +
case when coalesce(total_writes[cv(r)+1], total_writes[cv(r)+3], total_writes[cv(r)-1]) > 0 then
14 else 0 end /* WRITES_SELF_GRAPH */ +
case when coalesce(total_lio[cv(r)+1], total_lio[cv(r)+3], total_lio[cv(r)-1]) > 0 then
11 else 0 end /* LIO_RATIO */ +
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
11 else 0 end /* TCF_GRAPH */ +
case when coalesce(total_elapsed[cv(r)+1], total_elapsed[cv(r)+3], total_elapsed[cv(r)-1]) > 0 then
11 else 0 end /* E_ROWS_TIMES_START */
, '-')
when id[cv(r)+2] = minid[cv(r)+2]
then case when total_elapsed[cv(r)+2] > 0 then
lpad('A-Time Self |' , 14) end ||
case when total_lio[cv(r)+2] > 0 then
lpad('Bufs Self |' , 11) end ||
case when total_reads[cv(r)+2] > 0 then
lpad('Reads Self|' , 11) end ||
case when total_writes[cv(r)+2] > 0 then
lpad('Write Self|' , 11) end ||
case when total_elapsed[cv(r)+2] > 0 then
lpad('A-Ti S-Graph |', 14) end ||
case when total_lio[cv(r)+2] > 0 then
lpad('Bufs S-Graph |', 14) end ||
case when total_reads[cv(r)+2] > 0 then
lpad('Reads S-Graph|', 14) end ||
case when total_writes[cv(r)+2] > 0 then
lpad('Write S-Graph|', 14) end ||
case when total_lio[cv(r)+2] > 0 then
lpad('LIO Ratio |' , 11) end ||
case when total_elapsed[cv(r)+2] > 0 then
lpad('TCF Graph |' , 11) end ||
case when total_elapsed[cv(r)+2] > 0 then
lpad('E-Rows*Sta|' , 11) end
when id[cv()] is not null
then case when total_elapsed[cv()] > 0 then
lpad(a_time_self[cv()] || ' |', 14) end ||
case when total_lio[cv()] > 0 then
lpad(lio_self_format[cv()] || '|', 11) end ||
case when total_reads[cv()] > 0 then
lpad(reads_self_format[cv()] || '|', 11) end ||
case when total_writes[cv()] > 0 then
lpad(writes_self_format[cv()] || '|', 11) end ||
case when total_elapsed[cv()] > 0 then
rpad(elapsed_self_graph[cv()], 13) || '|' end ||
case when total_lio[cv()] > 0 then
rpad(lio_self_graph[cv()], 13) || '|' end ||
case when total_reads[cv()] > 0 then
rpad(reads_self_graph[cv()], 13) || '|' end ||
case when total_writes[cv()] > 0 then
rpad(writes_self_graph[cv()], 13) || '|' end ||
case when total_lio[cv()] > 0 then
lpad(lio_ratio[cv()] || ' |', 11) end ||
case when total_elapsed[cv()] > 0 then
rpad(tcf_graph[cv()], 9) || ' |' end ||
case when total_elapsed[cv()] > 0 then
lpad(e_rows_times_start_format[cv()] || '|', 11) end
end
/* Putting it all together */
, plan_table_output[r] = case
when inject[cv()] like '---%'
then inject[cv()] || plan_table_output[cv()] || inject2[cv()]
when inject[cv()] is present
then regexp_replace(plan_table_output[cv()], '\|', inject[cv()], 1, 2) || inject2[cv()]
else plan_table_output[cv()]
end
)
order by
r
;
undefine default_fo
undefine prev_sql_id
undefine prev_cn
undefine last
undefine si
undefine cn
undefine fo
undefine 1
undefine 2
undefine 3
col plan_table_output clear
col prev_sql_id clear
col prev_child_number clear
col si clear
col cn clear
col fo clear
col last clear
Oracle Core: Essential Internals for DBAs and Developers book by Jonathan Lewis
In case you didn’t know, Jonathan Lewis’es new Oracle Core: Essential Internals for DBAs and Developers book is out (for a few weeks already).
I was the technical reviewer for that book and I can say it’s awesome! It will likely be the best Oracle internals book out there for the coming 10 years, just like Steve Adams’es Oracle Internal Services book was in the last decade :)
Jonathan does a very good job explaining complex things in a simple enough way – and the book is not just dry listing of how things work inside Oracle database, but also why they work like they do and what are the benefits, limitations and side effects of the behavior.
So check out Jonathan’s book page for more details (and I think you can still order a heavily discounted ebook from Apress too):
Enjoy :)
Why was the RULE hint ignored?
Recently I got a call from a customer that had just upgraded to Oracle Database 11.2.0.2 and was panicking because they thought the Rule Based Optimizer (RBO) had been removed. The reason they thought this was because when they tried to add the RULE hint to a SQL statement they still got a Cost Based Optimizer (CBO) plan. I quickly reassured them that the RBO had not been removed in 11.2.0.2 and asked them to describe their scenario. It turned out that the statement they added the RULE hint to, already had several hints specified in it. The RULE mode is ignored if the statement contains hints other than the RULE hint itself, hence their cost based plan.
Although the RBO was depreciated in Oracle Database 10g, the RULE mode and hint are still honored assuming the query can actually use the RBO. The following object / query properties force the CBO to be used even when the RULE mode / hint is specified:
- Other hints are specified in the SQL statement
- One or more partitioned tables are accessed in the SQL statement
- One or more IOTs are accessed in the SQL statement
- One or more Materialized views are accessed in the SQL statement
- A SAMPLE clauses is specified in a SELECT statement
- A spreadsheet clause is specified
- Parallel execution is used
- Grouping sets are used
- Group outer-join is used
- A create table with a parallel clause
- A left outer join (ANSI) is specified
- A full outer join (ANSI) is specified
- Flashback cursor (AS OF) is used
You should also be aware that if you use the RULE hint you severely limit the Optimizer options. It will eliminate all cost-based transformation (including star transformation), and prevents certain access paths and join methods from being considered. For example,
- Function based indexes
- Bitmap indexes
- Reverse key indexes
- Index skip scans
- Hash Joins
Although hints are some time necessary to work around a suboptimal execution plan, it is far wiser to use a specific hint then to try and revert back to RBO via the RULE hint. Ideally you should take advantage of SQL Plan Management (SPM) in these cases, rather than adding the hint directly to the SQL Statement. Using SPM you can associate a hinted plan with a non-hinted SQL statement.
