Thread: AW: [HACKERS] create index updates nrows statistics

AW: [HACKERS] create index updates nrows statistics

From
ZEUGSWETTER Andreas IZ5
Date:
> > a create index updates the statistics in pg_class,
> > this leads to substantial performance degradation compared to
> > 6.4.2.
> 
> Create index did that in 6.4.2 as well --- how could it be making
> performance worse?
> 
I am not sure why, but in 6.4.2 a create table, create index, insert,
select * from tab where indexedcol=5 did actually use the index path,
even if table reltuples and relpages was 0.
It currently uses a seq scan, which is exactly what we wanted to avoid 
in the newly created table case, but do want on an actually small table.

Please apply the patch I previously sent.

Andreas


Re: AW: [HACKERS] create index updates nrows statistics

From
jwieck@debis.com (Jan Wieck)
Date:
>
> > > a create index updates the statistics in pg_class,
> > > this leads to substantial performance degradation compared to
> > > 6.4.2.
> >
> > Create index did that in 6.4.2 as well --- how could it be making
> > performance worse?
> >
> I am not sure why, but in 6.4.2 a create table, create index, insert,
> select * from tab where indexedcol=5 did actually use the index path,
> even if table reltuples and relpages was 0.
> It currently uses a seq scan, which is exactly what we wanted to avoid
> in the newly created table case, but do want on an actually small table.
>
> Please apply the patch I previously sent.

    From memory not verified:

    Doesn't CREATE INDEX update pg_statistics? I think it does so
    the faked statistics only cause different joins to happen  as
    long  as  there  is no index created immediately after CREATE
    TABLE (HASHJOIN vs. NESTLOOP).


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: AW: [HACKERS] create index updates nrows statistics

From
Tom Lane
Date:
ZEUGSWETTER Andreas IZ5 <Andreas.Zeugswetter@telecom.at> writes:
>>>> a create index updates the statistics in pg_class,
>>>> this leads to substantial performance degradation compared to
>>>> 6.4.2.
>> 
>> Create index did that in 6.4.2 as well --- how could it be making
>> performance worse?
>> 
> I am not sure why, but in 6.4.2 a create table, create index, insert,
> select * from tab where indexedcol=5 did actually use the index path,
> even if table reltuples and relpages was 0.

Hmm, you're right.  Using 6.4.2:

play=> create table foobar (f1 int4);
CREATE
play=> explain select * from foobar where f1 = 4;
NOTICE:  QUERY PLAN:

Seq Scan on foobar  (cost=0.00 size=0 width=4)

play=> create index foobar_f1 on foobar(f1);
CREATE
play=> explain select * from foobar where f1 = 4;
NOTICE:  QUERY PLAN:

Index Scan using foobar_f1 on foobar  (cost=0.00 size=0 width=4)

whereas in 6.5 you still get a sequential scan because it estimates the
cost of the index scan at 1.0 not 0.0.  I think I'm to blame for this
behavior change: I remember twiddling costsize.c to provide more
realistic numbers for an index scan, and in particular to ensure that
an index scan would be considered more expensive than a sequential scan
unless it was able to eliminate a useful number of rows.  But when
the estimated relation size is zero (or very small) the selectivity
benefit can't make up even a mere 1.0 cost bias.

I believe 6.5 is operating as it should --- 6.4 was producing inferior
plans for small tables.  But it is clearly a Bad Thing to allow the 6.5
optimizer to believe that a relation is empty when it isn't.  I concur
with your suggestion to hack up CREATE INDEX so that creating an index
before you load the table isn't quite such a losing proposition.

> Please apply the patch I previously sent.

Will do.
        regards, tom lane