<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Testing a No-statistics Environment.</title>
	<atom:link href="http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/feed/" rel="self" type="application/rss+xml" />
	<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/</link>
	<description>Oracle Data Warehouse Design and Architecture</description>
	<pubDate>Fri, 04 Jul 2008 18:52:20 +0000</pubDate>
	<generator>http://wordpress.org/?v=MU</generator>
		<item>
		<title>By: Tom</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-51004</link>
		<dc:creator>Tom</dc:creator>
		<pubDate>Fri, 09 May 2008 23:03:13 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-51004</guid>
		<description>David,

I am going to build some test tables with partitions and play with moving stats around, collecting stats, different sampling hints etc... I will report my findings back here : )

I also have Jonathan Lewis's CBO book sitting here collecting dust. I bought it and haven't read much of it. I guess I know what I will be doing this weekend!</description>
		<content:encoded><![CDATA[<p>David,</p>
<p>I am going to build some test tables with partitions and play with moving stats around, collecting stats, different sampling hints etc&#8230; I will report my findings back here : )</p>
<p>I also have Jonathan Lewis&#8217;s CBO book sitting here collecting dust. I bought it and haven&#8217;t read much of it. I guess I know what I will be doing this weekend!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Todor Botev</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-51003</link>
		<dc:creator>Todor Botev</dc:creator>
		<pubDate>Fri, 09 May 2008 12:59:50 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-51003</guid>
		<description>Tom and David,

As of the hints question - another well known Tom has defined the term "good hints". Those are "good" to be used because they help the optimizer to do a better choice:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:7038986332061#14621915372791</description>
		<content:encoded><![CDATA[<p>Tom and David,</p>
<p>As of the hints question - another well known Tom has defined the term &#8220;good hints&#8221;. Those are &#8220;good&#8221; to be used because they help the optimizer to do a better choice:</p>
<p><a href="http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:7038986332061#14621915372791" rel="nofollow">http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:7038986332061#14621915372791</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Aldridge</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-51002</link>
		<dc:creator>David Aldridge</dc:creator>
		<pubDate>Thu, 08 May 2008 14:36:22 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-51002</guid>
		<description>Mathew,

If you're refering to the BLOCK clause I believe that just changes the mechanism of identifying which data to sample -- you are specifying the percentage of blocks to sample instead of the percentage of rows (or the probability of an individual row or block being sampled, depending on your POV), which may be appropriate and more efficient under some circumstances.</description>
		<content:encoded><![CDATA[<p>Mathew,</p>
<p>If you&#8217;re refering to the BLOCK clause I believe that just changes the mechanism of identifying which data to sample &#8212; you are specifying the percentage of blocks to sample instead of the percentage of rows (or the probability of an individual row or block being sampled, depending on your POV), which may be appropriate and more efficient under some circumstances.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Aldridge</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-51001</link>
		<dc:creator>David Aldridge</dc:creator>
		<pubDate>Thu, 08 May 2008 14:29:28 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-51001</guid>
		<description>Tom,

I see the dynamic sampling hint differently to many others that you might indeed leave alone in order to let the optimizer make its own choice. By specifying a higher level of sampling than the system default you're giving permission to the optimizer to spend more time and use more resources specifically in order to make a better decision. That's not a choice that the optimizer is going to make right otherwise.

Whether it is the high value in the global statistics that cause the incorrect plan or the partition statistics that say "no rows in here" depends on whether the optimizer knows at parse time which single partition the rows are going to be in.</description>
		<content:encoded><![CDATA[<p>Tom,</p>
<p>I see the dynamic sampling hint differently to many others that you might indeed leave alone in order to let the optimizer make its own choice. By specifying a higher level of sampling than the system default you&#8217;re giving permission to the optimizer to spend more time and use more resources specifically in order to make a better decision. That&#8217;s not a choice that the optimizer is going to make right otherwise.</p>
<p>Whether it is the high value in the global statistics that cause the incorrect plan or the partition statistics that say &#8220;no rows in here&#8221; depends on whether the optimizer knows at parse time which single partition the rows are going to be in.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mathew Butler</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-51000</link>
		<dc:creator>Mathew Butler</dc:creator>
		<pubDate>Thu, 08 May 2008 09:26:38 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-51000</guid>
		<description>Interesting comment about the SAMPLE clause in the doco. I really wish I had more time to check out some of this detail. The dilemma of having small children, I guess.

The doco ref is here:

http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#sthref9790

This suggests that the SAMPLE clause can be asked to look at a specific number of blocks, I'm guessing based on the setting of the param to which I referred. The doco also has some notes on the restriction of access paths that might be used ( includes FTS and IFFS as a possibility).

It also describes a mechanism to attempt to allow the DB to use the same sample from one execution to the next ( not clear what this means when the data may have changed, maybe it just accesses the same blocks, if available? )

All of which just adds some background, and the detail may just be academic. 

Curious to find out what the behaviour is on your system, and any insights you might uncover as to how this all works.

Cheers.</description>
		<content:encoded><![CDATA[<p>Interesting comment about the SAMPLE clause in the doco. I really wish I had more time to check out some of this detail. The dilemma of having small children, I guess.</p>
<p>The doco ref is here:</p>
<p><a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#sthref9790" rel="nofollow">http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#sthref9790</a></p>
<p>This suggests that the SAMPLE clause can be asked to look at a specific number of blocks, I&#8217;m guessing based on the setting of the param to which I referred. The doco also has some notes on the restriction of access paths that might be used ( includes FTS and IFFS as a possibility).</p>
<p>It also describes a mechanism to attempt to allow the DB to use the same sample from one execution to the next ( not clear what this means when the data may have changed, maybe it just accesses the same blocks, if available? )</p>
<p>All of which just adds some background, and the detail may just be academic. </p>
<p>Curious to find out what the behaviour is on your system, and any insights you might uncover as to how this all works.</p>
<p>Cheers.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-50999</link>
		<dc:creator>Tom</dc:creator>
		<pubDate>Thu, 08 May 2008 00:28:56 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-50999</guid>
		<description>I could "HINT" dynamice sampling but I was trying to avoid it in order to let the CBO make the best decision. I deleted and locked stats on the partition and dynamic sampling was not used. Could it be the high value that causes the CBO to think this won't return any rows since the date they want is greater than what I "think" the high value is?</description>
		<content:encoded><![CDATA[<p>I could &#8220;HINT&#8221; dynamice sampling but I was trying to avoid it in order to let the CBO make the best decision. I deleted and locked stats on the partition and dynamic sampling was not used. Could it be the high value that causes the CBO to think this won&#8217;t return any rows since the date they want is greater than what I &#8220;think&#8221; the high value is?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mathew Butler</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-50994</link>
		<dc:creator>Mathew Butler</dc:creator>
		<pubDate>Wed, 07 May 2008 16:19:28 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-50994</guid>
		<description>There is a hidden parameter that I think controls the default block size for dynamic sampling ( obviously named ). I haven't yet confirmed this with a test.

Mat.</description>
		<content:encoded><![CDATA[<p>There is a hidden parameter that I think controls the default block size for dynamic sampling ( obviously named ). I haven&#8217;t yet confirmed this with a test.</p>
<p>Mat.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Aldridge</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-50993</link>
		<dc:creator>David Aldridge</dc:creator>
		<pubDate>Wed, 07 May 2008 14:10:17 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-50993</guid>
		<description>One other interesting issue to look at is the optimizer mode -- I found in my testing that an optimizer mode of "CHOOSE" gave me in some cases an RBO-based execution plan even in 10g, hence a bias towards nested loop joins. Now in the context of a partitioned table that caveat may not apply -- you're going to get CBO no matter what you do.

Did you try dynamic sampling for that? I honestly cannot recall a case where it failed to produce a god plan.</description>
		<content:encoded><![CDATA[<p>One other interesting issue to look at is the optimizer mode &#8212; I found in my testing that an optimizer mode of &#8220;CHOOSE&#8221; gave me in some cases an RBO-based execution plan even in 10g, hence a bias towards nested loop joins. Now in the context of a partitioned table that caveat may not apply &#8212; you&#8217;re going to get CBO no matter what you do.</p>
<p>Did you try dynamic sampling for that? I honestly cannot recall a case where it failed to produce a god plan.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tom</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-50992</link>
		<dc:creator>Tom</dc:creator>
		<pubDate>Wed, 07 May 2008 13:37:29 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-50992</guid>
		<description>I have the same issue. We partition by month. When the first of the month load happens, Oracle thinks there are "0" rows in the table. After it is loaded, when the batch process kicks off, it does a nested loop whereas it used to do a hash join the previous month. The hash join with undreds of millions of rows finishes in an hour or so, the nested loop join takes 19 hours! I tried deleting and locking stats on the partition and it didn't work. I tried importing stats from another partition, it didn't work. I tried setting the partition stats by numrows and numblocks and it didn't work. I tried increasing the high value, and it didn't work. I am just going to force the ETL developer to gather partition stats on the first of the month after he loads the table. I don't know what else to try.</description>
		<content:encoded><![CDATA[<p>I have the same issue. We partition by month. When the first of the month load happens, Oracle thinks there are &#8220;0&#8243; rows in the table. After it is loaded, when the batch process kicks off, it does a nested loop whereas it used to do a hash join the previous month. The hash join with undreds of millions of rows finishes in an hour or so, the nested loop join takes 19 hours! I tried deleting and locking stats on the partition and it didn&#8217;t work. I tried importing stats from another partition, it didn&#8217;t work. I tried setting the partition stats by numrows and numblocks and it didn&#8217;t work. I tried increasing the high value, and it didn&#8217;t work. I am just going to force the ETL developer to gather partition stats on the first of the month after he loads the table. I don&#8217;t know what else to try.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Aldridge</title>
		<link>http://oraclesponge.wordpress.com/2008/05/06/testing-a-no-statistics-environment/#comment-50991</link>
		<dc:creator>David Aldridge</dc:creator>
		<pubDate>Wed, 07 May 2008 13:04:32 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=372#comment-50991</guid>
		<description>I have found in the past that level 4 seems to be a pretty good choice: documentation on the different levels in 10gR2 is here http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i43032

According to documented behaviour we'd be sampling 64 blocks for unanalyzed tables at level 4, which ought to take in the order of two tenths of a second. Based on that I'm very willing to go higher, to levels 5 or 6, as the overwhelming majority of ETL queries take multiple tens of seconds to execute. I doubt that it's multiblock io -- in fact i hope it isn't as that would call the randomness into question on large tabes, particularly date-partitioned ones.

I guess that those defaults are estimates really -- the sampling is implemented with a SAMPLE clause, which is not really deterministic I think. I wouldn't be suprised if in fact the 64 blocks was in some cases 58 and in others 69 -- not tested that though.</description>
		<content:encoded><![CDATA[<p>I have found in the past that level 4 seems to be a pretty good choice: documentation on the different levels in 10gR2 is here <a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i43032" rel="nofollow">http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/stats.htm#i43032</a></p>
<p>According to documented behaviour we&#8217;d be sampling 64 blocks for unanalyzed tables at level 4, which ought to take in the order of two tenths of a second. Based on that I&#8217;m very willing to go higher, to levels 5 or 6, as the overwhelming majority of ETL queries take multiple tens of seconds to execute. I doubt that it&#8217;s multiblock io &#8212; in fact i hope it isn&#8217;t as that would call the randomness into question on large tabes, particularly date-partitioned ones.</p>
<p>I guess that those defaults are estimates really &#8212; the sampling is implemented with a SAMPLE clause, which is not really deterministic I think. I wouldn&#8217;t be suprised if in fact the 64 blocks was in some cases 58 and in others 69 &#8212; not tested that though.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
