Testing a No-statistics Environment.

I am absolutely sick and tired of dealing with issues where misleading partition statistics (for example) cause ETL queries to go crazy and try to sort-merge tables with tens of millions of rows each.

Since we’re about to run a test load on production data I’m going to drop all table and index statistics and lock them. OK, actually I’m going to back the statistics up first, and then I’m going to drop and lock them.

We’ll see how we do on a 100% dynamic sampling-based system …

About these ads

16 thoughts on “Testing a No-statistics Environment.

  1. I agree, big ETL with “wrong” statistics can be a real pain! On an old 9.2 DW I spent longer maintaining statistics than I did physically loading the thing!

    But if you using the partition statistics doesn’t that imply you are using just one partition – - so perhaps good old partition exchange is a possibility; trivial for insert into an empty partition, slightly more challenging (like inserting the whole of the target partition into the exchange table before updating it and exchanging back into the partitioned table…)

  2. There are some interesting issues surrounding partitioning in this system. The system is actually completely reloaded every day, rather than adding new data on a periodic cycle, and that made sense when it was a smaller temporary solution, less sense now that it’s loading hundreds of millions of rows every day. The partitioning is arranged so that multiple updates and inserts can run against the tables at the same time, so it’s really an enabler of manual parallelism.

    It’s a bit odd, frankly. One of those systems with masny issues that you’d refactor out of the ETL if the business was willing to dedicate development days to it.

  3. Pingback: Testing a No-statistics Environment: Part II « The Oracle Sponge

  4. I’ll be interested in your results. I had a quick squizz at some of the internals of dynamic sampling. On 10.2.0.3 it samples 32 blocks ( referred to as the default block size ). Although I’ve yet to find the time to investigate whether this is the same for all dynamic sampling settings, and confirm via 10046 how the block sampling occurs ( multi block file I/O? ).

    Regardless, I’m interested to see whether this works well in an ETL environment. I’d expect that this could be a big performance hit, if for example your ETL uses literals in SQL ( it might do this to make use of histograms for example ), and therefore has to parse lots.

    Look forward to the follow up post.

    Mat.

  5. I have found in the past that level 4 seems to be a pretty good choice: documentation on the different levels in 10gR2 is here http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i43032

    According to documented behaviour we’d be sampling 64 blocks for unanalyzed tables at level 4, which ought to take in the order of two tenths of a second. Based on that I’m very willing to go higher, to levels 5 or 6, as the overwhelming majority of ETL queries take multiple tens of seconds to execute. I doubt that it’s multiblock io — in fact i hope it isn’t as that would call the randomness into question on large tabes, particularly date-partitioned ones.

    I guess that those defaults are estimates really — the sampling is implemented with a SAMPLE clause, which is not really deterministic I think. I wouldn’t be suprised if in fact the 64 blocks was in some cases 58 and in others 69 — not tested that though.

  6. I have the same issue. We partition by month. When the first of the month load happens, Oracle thinks there are “0″ rows in the table. After it is loaded, when the batch process kicks off, it does a nested loop whereas it used to do a hash join the previous month. The hash join with undreds of millions of rows finishes in an hour or so, the nested loop join takes 19 hours! I tried deleting and locking stats on the partition and it didn’t work. I tried importing stats from another partition, it didn’t work. I tried setting the partition stats by numrows and numblocks and it didn’t work. I tried increasing the high value, and it didn’t work. I am just going to force the ETL developer to gather partition stats on the first of the month after he loads the table. I don’t know what else to try.

  7. One other interesting issue to look at is the optimizer mode — I found in my testing that an optimizer mode of “CHOOSE” gave me in some cases an RBO-based execution plan even in 10g, hence a bias towards nested loop joins. Now in the context of a partitioned table that caveat may not apply — you’re going to get CBO no matter what you do.

    Did you try dynamic sampling for that? I honestly cannot recall a case where it failed to produce a god plan.

  8. There is a hidden parameter that I think controls the default block size for dynamic sampling ( obviously named ). I haven’t yet confirmed this with a test.

    Mat.

  9. I could “HINT” dynamice sampling but I was trying to avoid it in order to let the CBO make the best decision. I deleted and locked stats on the partition and dynamic sampling was not used. Could it be the high value that causes the CBO to think this won’t return any rows since the date they want is greater than what I “think” the high value is?

  10. Interesting comment about the SAMPLE clause in the doco. I really wish I had more time to check out some of this detail. The dilemma of having small children, I guess.

    The doco ref is here:

    http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#sthref9790

    This suggests that the SAMPLE clause can be asked to look at a specific number of blocks, I’m guessing based on the setting of the param to which I referred. The doco also has some notes on the restriction of access paths that might be used ( includes FTS and IFFS as a possibility).

    It also describes a mechanism to attempt to allow the DB to use the same sample from one execution to the next ( not clear what this means when the data may have changed, maybe it just accesses the same blocks, if available? )

    All of which just adds some background, and the detail may just be academic.

    Curious to find out what the behaviour is on your system, and any insights you might uncover as to how this all works.

    Cheers.

  11. Tom,

    I see the dynamic sampling hint differently to many others that you might indeed leave alone in order to let the optimizer make its own choice. By specifying a higher level of sampling than the system default you’re giving permission to the optimizer to spend more time and use more resources specifically in order to make a better decision. That’s not a choice that the optimizer is going to make right otherwise.

    Whether it is the high value in the global statistics that cause the incorrect plan or the partition statistics that say “no rows in here” depends on whether the optimizer knows at parse time which single partition the rows are going to be in.

  12. Mathew,

    If you’re refering to the BLOCK clause I believe that just changes the mechanism of identifying which data to sample — you are specifying the percentage of blocks to sample instead of the percentage of rows (or the probability of an individual row or block being sampled, depending on your POV), which may be appropriate and more efficient under some circumstances.

  13. David,

    I am going to build some test tables with partitions and play with moving stats around, collecting stats, different sampling hints etc… I will report my findings back here : )

    I also have Jonathan Lewis’s CBO book sitting here collecting dust. I bought it and haven’t read much of it. I guess I know what I will be doing this weekend!

  14. Pingback: May Dynamic Sampling be an Answer for OLAP Systems’ Statistics Needs? « H.Tonguç Yılmaz - Oracle Blog

  15. Pingback: Log Buffer #96: A Carnival of the Vanities for DBAs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s