Funny how easy it is to get side-tracked. Well not really side-tracked, but just more and more involved in one particular thing, so that you end up burrowing so deeply into it that it loses all relevance to the real world. Just as a simple SQL statement generates deeper and deeper levels of more and more obscure recursive statements, so it can be with the most simple of tasks.
Those of you still awake may recall that I was looking at materialized view refresh, and in the course of getting to grips with the optimum refresh methodology for fast refreshing multiple MV's I've been picking the bones out of some lovely warm trace files. My guidance has come from two books that are probably familiar to many readers, and which have become indispensible to me: Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning; Shee, Deshpande and Gopalakrishnan; Oracle Press, and Optimizing Oracle Performance; Millsap and Holt; O'Reilly. And let me first say that they are a beautiful compliment to each other. While the former goes into just the right level of detail on how to understand and mitigate the wait events themselves, the latter presents a methodology for data capture and event interpretation backed up with a strong background in queueing theory.
Now the usual challenge is to translate an approach aimed at OLTP databases into something that we can use in data warehousing, and fortunately this is not too tricky in the case of wait event capture and analysis. One of the major challenges associated with wait event analysis on OLTP databases is simply in capturing the correct data in the first place (a topic addressed in depth by the Millsap/Holt book) but this is relatively easy in "the other world": a problematic query that we are interested in tuning is generally well-known and easily isolated, whether it is part of the ETL routine or part of a report.
Issues relating to use of shared servers, which can scatter a single session's wait events among multiple trace files practically at random, are also probably rare. Shared server configurations are inherently problematic in an environment where system load is dominated by fewer, longer running queries than an OLTP system would experience, and so are rarely used.
On the other hand we do have the challenge of parallel query, in which the most meaningful wait events get pushed to different trace files, one for each PQ process. Without identifying these files, capturing their data, and consolidating them into a single entity for analysis there is little that can usefully be done.
And now a word about recursive SQL. Recursive SQL statements are issued by Oracle in support of the execution of the SQL statements submitted to the database. They are used to parse the query, to maintain the data dictionary (for example to add extents to table and index segments), to perform dynamic sampling, to retrieve information on segments for the optimizer to use, etc.. So we might be able to deduce immediately that recursive SQL plays a very large role in the execution of data warehouse queries for a number of reasons, the foremost of which are the high parse ratio and the complexity of the objects that the optimizer has to consider.
The high parse ratio is caused by our desire for the optimizer to be sensitive to literal values in the predicates (histogram usage being particularly important in this environment), and where an OLTP system could be unscalable due to the workload imposed by a high hard parse rate, the longer duration of the data warehouse queries renders this overhead less significant. Not wholly insignificant, but not so significant that it is worth eliminating entirely.
Now there is also the issue of object complexity. I venture to suggest that in data warehousing we have a much higher probability that partitioning is being used, and also that we will be leveraging partition pruning. So despite the apprant simplicity of many queries against star schemas there are actualy a disproportionate number of data and index segments for the optimizer to be aware of. There is also a high chance that we will be using query rewrite, and hence that there are a great many more objects to be considered even if they are never going to be touched by the optimized query.
Now I rather glossed over the issue of how easy it is in a data warehouse to identify the SQL for which we want to the wait events. However if you read much of the literature on wait event analysis you will find that this is actually a very large part of the formal methodologies generally presented. Do you want to identify the queries with the longest duration, or the most logical reads, or the most physical I/O, or what other considerations are there? When data warehousing relieves us of that issue we are left with two other major issues:
- What makes our query run for so long?
- What can we do about it?
And these are by no means trivial tasks. However I would suggest that the first of these issues really ought to be more trivial than it currently is. Consider the steps we might follow in a wait event analysis for a hypothetical data warehouse query.
- Extract the query, along with representative predicate literals
- Execute the query with the 10046 event set at level 8 ("8" rhymes with "wait", which is how I remember that level 8 gives us wait events and level 4 gives us binds :) … and level 12 gives us both of course).
- Find the trace files (parallel query giving us multiple files, of course)
- Run tkprof on the trace files to turn them into something readable
- Coalesce the results
- Analyze what events are causing us pain (generally I/O-related ones and sometimes high CPU usage, I find).
- For I/O related events examine what object they are occuring on, how many there are, their average duration, whether there are I/O hot spots on our devices etc..
For me the tricky part starts at step 4: I can extract queries, set events, find trace files without much problem, but the rest leaves me a little cold.
Let's start with tkprof. A fine program in general, but remembering the options for sorting the queries is beyond me, and having listed my major wait events I then have to go probing through the raw file again plus querying the data dictionary to find out what segment and file the events occured on. And remember that we may have upwards of a dozen file to contend with. Unless I'm missing out on something, that means a dozen executions of tkprof then a manual process to merge and examine all those raw files anyway.
Merging, sorting and reporting: these are the bread and butter tasks of a database, not a command-line programs. So while staring at the ceiling in the dark one night I had the following thoughts:
- "Load the raw files into the database then report on them from there."
- "That means I can link the I/O-related wait events directly to the data dictionary to get object and file names automatically".
- "That means we'd have a history of the the exact wait events and the real execution plan associated with a query within the database itself"
- "That's so obvious that it must have been done before."
- "Even if it has, I bet I could do it easily and customize it for my own purposes."
So that's what I've been up to for the past couple of weeks, in the little time I have between my real work and my family, instead of going through the materialized view refresh analysis that I actually promised. But it's a step in the right direction anyway — should make that process more easy.
How do you go about desinging and implementing such a methodology then? Here are the aims that I started out with.
Load into database tables ther significant data from multiple trace files
For me the data of interest are the wait events, and in order to put them into a usable context I also need to associate them with SQL statements (cursors, in the trace file), the file name, operations such as PARSE, EXEC and FETCH. They all have to be stored in a set of dedicated tables.
Exactly which data would we load for each of these types of trace information? As much as possible, and for the most part that means all of it. We can get timings, recursive depth (very useful!), user id's, just about anything we need.
Associate significant system data with the trace file data
In our world of data warehousing we regularly add, truncate, split, merge and drop partitions, and when we do so we change the object id to which the wait event parameters refers. It seems prudent for this reason and for performance purposes (data dictionary views being notoriously cranky when it comes to joining to other data) to provide the option of storing the appropriate data at the time that the trace file is loaded.
Associate multiple trace files with a single report
This is of particular use in coalescing the results of multiple files generated by parallel query for a single submitted SQL statement. So we'd want to load multiple trace files and say "these belong together".
Provide predefined mechanisms for reporting on the events
A set of views on the database tables was going to be required, to allow more simple custom queries to be defined on the trace data. Analagous to the data dictionary views provided in the database in fact. In just the same way as "select * from table(dbms_xplan.display())" reports on the most recently generated explain plan, we can provide a mechanism for reporting on the most recently loaded trace file(s) for example.
Provide some administrative tools for purging old data
Purge all the data, or all the data prior to some date, and some options like that.
The progress so far has been encouraging. Loading in a trace file with a known name is now a trivial task, and basic reporting on the events is a doddle. For example I'm currently using the following command to load a single file:
begin trace.load_file (directory =>'UDUMP', -- 1 file_name => 'slimdb_ora_4680.trc', -- 2 debug => 0, -- 3 log_extents => true, -- 4 log_files => true); -- 5 end; /
The parameters are pretty intuitive:
- The name of the directory object that points to the location of the file to be loaded.
- The name of the file to be loaded (duh).
- The level of debugging message required (spat out through DBMS_OUTPUT).
- Whether to take a snapshot of the relevant entities in the dba_extents system view, to enhance later reporting (default "false" due to relatively high overhead).
- Whether to take a snapshot of the relevant entities in the dba_data files system view, to enhance later reporting (default "true").
With the data loaded I can then report in the following manner:
SQL> select * from all_trace_files; FILE# FILE_NAME READ_DATE ---------- ------------------------------ --------- 132440 slimdb_ora_4680.trc 05-OCT-05 SQL> column SUM_WT_ELA format 999,999,990 SQL> column AVG_WT_ELA format 999,999,990 SQL> column waits format 999,990 SQL> break on report SQL> compute sum of SUM_WT_ELA on report SQL> select substr(WT_NAME,1,25) wt_name, 2 sum(WT_ELA) sum_wt_ela, 3 avg(wt_ela) avg_wt_ela, 4 count(*) waits 5 from last_trace_waits 6 group by substr(WT_NAME,1,25) 7 / WT_NAME SUM_WT_ELA AVG_WT_ELA WAITS ------------------------- ------------ ------------ -------- SQL*Net message from clie 27,372,317 9,124,106 3 SQL*Net message to client 14 5 3 db file scattered read 4,771,858 18,640 256 db file sequential read 1,490,384 3,644 409 ------------ sum 33,634,573 SQL> select CUR_DEP, 2 count(*) 3 from last_trace_cursors 4 group by cur_dep 5 order by 1 6 / CUR_DEP COUNT(*) ---------- ---------- 0 2 1 391 2 2189 3 92 4 2 5 1 6 rows selected. SQL> set pagesize 100 SQL> select substr(WT_NAME,1,23) wt_name, 2 substr(segment_name,1,20) segment_name, 3 sum(WT_ELA) sum_wt_ela, 4 avg(wt_ela) avg_wt_ela, 5 count(*) waits 6 from last_trace_obj_waits 7 group by substr(WT_NAME,1,23),substr(segment_name,1,20) 8 order by 3 desc 9 / WT_NAME SEGMENT_NAME SUM_WT_ELA AVG_WT_ELA WAITS ----------------------- -------------------- ------------ ------------ -------- db file scattered read FCT_FSA_1 4,698,204 19,908 236 db file sequential read INDSUBPART$ 3,963,720 1,149 3,450 db file sequential read C_OBJ#_INTCOL# 2,355,677 4,403 535 db file sequential read C_FILE#_BLOCK# 2,129,218 1,342 1,587 db file sequential read I_OBJ2 676,269 1,375 492 db file sequential read I_FILE#_BLOCK# 470,275 4,237 111 db file sequential read FCT_FSA_1 417,323 5,962 70 db file sequential read TABSUBPART$ 170,888 4,069 42 db file scattered read INDCOMPART$ 130,647 6,221 21 db file sequential read HIST_HEAD$ 62,906 7,863 8 db file sequential read I_INDSUBPART$ 61,507 1,922 32 db file scattered read TABCOMPART$ 57,753 6,417 9 db file sequential read I_CDEF2 34,272 4,284 8 db file sequential read SNAP$ 30,162 7,541 4 db file sequential read PARTOBJ$ 29,690 7,423 4 db file sequential read SUMAGG$ 24,404 6,101 4 db file sequential read INDCOMPART$ 19,534 9,767 2 db file sequential read SUMKEY$ 19,116 4,779 4 db file sequential read DIMJOINKEY$ 17,296 4,324 4 db file sequential read PARTCOL$ 15,068 3,767 4 db file sequential read TABCOMPART$ 14,988 7,494 2 db file sequential read SUMDELTA$ 12,143 12,143 1 db file sequential read I_TABSUBPART$ 11,146 5,573 2 db file sequential read I_CDEF3 10,627 10,627 1 db file sequential read I_SUBCOLTYPE1 10,131 10,131 1 db file sequential read I_NTAB2 9,696 9,696 1 db file sequential read I_CCOL1 9,366 4,683 2 db file sequential read C_COBJ# 9,322 9,322 1 db file sequential read SUM$ 8,722 4,361 2 db file sequential read I_OBJ#_INTCOL# 7,496 7,496 1 db file sequential read I_SUBPARTCOL$ 7,059 7,059 1 db file sequential read I_DIMATTR$_2 6,601 6,601 1 db file sequential read DEPENDENCY$ 6,432 3,216 2 db file sequential read I_HH_OBJ#_INTCOL# 6,267 6,267 1 db file sequential read I_DEFSUBPART$ 5,595 5,595 1 db file sequential read VIEW$ 5,231 5,231 1 db file sequential read DEFSUBPART$ 4,795 4,795 1 db file sequential read I_DEPENDENCY1 4,649 4,649 1 db file sequential read I_REFCON2 3,544 3,544 1 db file sequential read I_PARTCOL$ 2,705 2,705 1 db file scattered read SUM$ 742 371 2 db file sequential read I_OPQTYPE1 195 195 1 db file sequential read SUBPARTCOL$ 138 138 1 db file sequential read I_DIMLEVELKEY$_2 136 136 1 ------------ sum 15,541,655
The last two are pretty interesting, for a trace file based on a single star schema query, huh? Sorry about the crappy formating though.
There's still work to do, and at the top of my list are the following:
- Allow grouping of trace files as a single logical entity
- Add more reporting views, focusing on filtering out recursive statements and on a set of "MY_%" views
- Associating recursive queries with their parents to enable reporting views based on CONNECT BY queries
- Better handling of files
- Creating a load script for the TRACELDR schema under which all this runs
- Creating a Trace Loader role to allow other users to access the package and views.
Based on that you can probably expect to hear from me again around Christmas.
Nah, not really. Soon.