Using Histograms on Evenly Distributed Values

A quick demo … here's a script …

drop table demo_hist
/

create table demo_hist
as
select floor(rownum/5) rn,do.* from dba_objects do
/

begin
dbms_stats.gather_table_stats
(
ownname => user,
tabname => 'DEMO_HIST',
estimate_percent => 100,
method_opt => 'for columns OWNER size 254'
);
end;
/

select distinct column_name from user_tab_histograms
where table_name = 'DEMO_HIST'
/

Select min(rn),
max(rn),
count(distinct rn),
count(*)
from demo_hist
/

set autotrace on

select count(*) from demo_hist
where rn = 5
/

select count(*) from demo_hist
where rn between 1 and 100
/

begin
dbms_stats.gather_table_stats
(
ownname => user,
tabname => 'DEMO_HIST',
estimate_percent => 100,
method_opt => 'for columns RN size 254'
);
end;
/

select count(*) from demo_hist
where rn = 5
/

select count(*) from demo_hist
where rn between 1 and 100
/

When I run this I get a demonstration that with a histogram on the RN column the optimizer gets a much better cardinality estimate for the two Count(*)'s on RN = 5 and RN between 1 and 100.

But I'm not going to show you by how much because you need to go and run this script yourself ;)

Whether the results are significant in terms of getting a good execution plan, well this test doesn't demonstrate that. But I believe that it does. More of that damned instinct.

OK, if someone begs me then I'll post my results. But it'll have to be a real grovelling.

About these ads

3 thoughts on “Using Histograms on Evenly Distributed Values

  1. Hi David, I’m fairly new to your site, and like it.

    I just read your example, and am curious as to why you set the estimate percent to 100 rather than doing a compute, by setting it to null (on a side note, wasn’t the default null in 9i instead of the param?).

    Also, what is the point of doing the histogram on the owner column? Just to show that it works?

    I did see how the histogram, even on an evenly distributed column, caused the cardinality in the plans to more accurately reflect the actual data.

    I’ve been suggesting to people to always gather histograms, because the more info for the CBO the better, and now I have a cool example to point them to. Thanks!

    (also, for those who tell me to only get histos on columns with skewed data distributions, how is one to know in most cases the distribution without actually calculating it? If you are going to go to the trouble of calculating it yourself, might as well let oracle do it for you and gather the histo).

  2. Hello David,
    I think you are wrong here.
    You are demonstrating the difference between optimizer estimations having histogram statistics or no statistics on the columns of interest at all (so using some defaults). Using simple statistics (no histograms, relying on NUM_DISTINCT,NUM_NULLS,DENSITY,LOWVALUE and HIGH_VALUE only) is giving (almost) the same estimations for evenly distributed values as with histogram statistics available.
    Regards,
    Rainer Stenzel

  3. Hello David,
    during preperation of a 10g migration I found this additional hint, that histograms on evenly distributed values seems to yield no advantage:
    “Oracle will verify whether the column is skewed before creating a histogram..”
    from oracle’s white paper
    Upgrading from Oracle Database
    9i to 10g: What to expect from the
    Optimizer
    section:
    New default parameter values for DBMS_STATS.GATHER_*_STATS
    I Would propose to remove this blog at least from blog The Best of The Oracle Sponge.
    Best Regards,
    Rainer Stenzel

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s