Thread: Re: question about seq scan and index scan

Re: question about seq scan and index scan

From
Hervé Piedvache
Date:
Hi Seb,

You omit to say one thing ...

We have an index on tablea on mynumber (integer).
Those tables are vacuumed every nights.

If we do :
explain select count(*) from tablea a, tableb b, tablec c where a.id=b.id and
a.id=c.id and a.mynumber>23000;
NOTICE:  QUERY PLAN:

Aggregate  (cost=51898.63..51898.63 rows=1 width=48)
  ->  Hash Join  (cost=33905.40..51887.68 rows=4379 width=48)
        ->  Seq Scan on tableb b  (cost=0.00..14391.60 rows=471460 width=16)
        ->  Hash  (cost=33894.45..33894.45 rows=4379 width=32)
              ->  Hash Join  (cost=13522.17..33894.45 rows=4379 width=32)
                    ->  Seq Scan on tablec c  (cost=0.00..16781.60
rows=471460 width=16)
                    ->  Hash  (cost=13511.23..13511.23 rows=4379 width=16)
                          ->  Seq Scan on tablea a  (cost=0.00..13511.23
rows=4379 width=16)

EXPLAIN


And If we do an :
explain select count(*) from tablea a, tableb b, tablec c where a.id=b.id and
a.id=c.id and a.mynumber>23350;
NOTICE:  QUERY PLAN:

Aggregate  (cost=50522.53..50522.53 rows=1 width=48)
  ->  Nested Loop  (cost=13521.19..50512.57 rows=3987 width=48)
        ->  Hash Join  (cost=13521.19..32709.93 rows=3987 width=32)
              ->  Seq Scan on tableb b  (cost=0.00..16781.60 rows=471460
width=16)
              ->  Hash  (cost=13511.23..13511.23 rows=3987 width=16)
                    ->  Seq Scan on tablea a  (cost=0.00..13511.23 rows=3987
width=16)
        ->  Index Scan using tablec_pkey on tablec c  (cost=0.00..4.45 rows=1
width=16)

EXPLAIN

But If I do :
explain select count(*) from tablea a, tableb b, tablec c where a.id=b.id and
a.id=c.id and a.mynumber>24000;
NOTICE:  QUERY PLAN:

Aggregate  (cost=414.72..414.72 rows=1 width=48)
  ->  Nested Loop  (cost=0.00..414.65 rows=29 width=48)
        ->  Nested Loop  (cost=0.00..284.62 rows=29 width=32)
              ->  Index Scan using ix_mynumber_key on tablea a
(cost=0.00..116.44 rows=29 width=16)
              ->  Index Scan using tableb_pkey on tableb b  (cost=0.00..5.76
rows=1 width=16)
        ->  Index Scan using tablec_pkey on tablec c  (cost=0.00..4.45 rows=1
width=16)

EXPLAIN

Thanks for you help ... to "explain" us why Postgresql do not use all the
time the index scan for this kind of request for 3 tables with the same
primary key id.

regards,

Le Vendredi 25 Octobre 2002 18:18, Sébastien PALLEAU a écrit :
> hi,
> I tried to execute the following request on a postgresql 7.2.3 system :
>
> select count(*) from table tablea a, tableb b, tablec c where a.id=b.id and
> a.id=c.id and a.mynumber > 5000; tablea, tableb and tablec have primary on
> id that identify a unique member.
>
> tablea contains 471413 records
> tableb contains 471413 records
> tablec contains 471413 records
>
> a.id is tablea primary key
> b.id is tableb primary key
> c.id is tablec primary key
>
> An explain on the request provides the following results.
> seq scan on tableb.id (I dont agree)
> seq scan on tablec.id (i dont agree)
> seq scan on tablea.id (seems normal but why executed last ?)
>
> and the most strange is that for the following request :
> explain select count(*) from table tablea a, tableb b, tablec c where
> a.id=b.id and a.id=c.id and a.mynumber > 20000; provides :
> index scan using numberpoints_tablea_key
> index scan using tableb_pkey
> index scan using tablec_pkey
>
> why doesn't postgres uses indexes in the first case ?
> thanks for your anwers.

--
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Tel. 33-144949901
fax. 33-144949902

Re: question about seq scan and index scan

From
Stephan Szabo
Date:
On Fri, 25 Oct 2002, [iso-8859-15] Herv� Piedvache wrote:


> Thanks for you help ... to "explain" us why Postgresql do not use all the
> time the index scan for this kind of request for 3 tables with the same
> primary key id.

Depending on the number of rows returned, index scan may not be faster
than a sequential scan.  As you up the constant, the estimated number of
rows drops which make the index scan a better and better plan.  Are
the estimated number of rows for the scan of a (4379, 3987, and 29)
realistic for mynumber>(23000, 23500, 24000)


Re: question about seq scan and index scan

From
Hervé Piedvache
Date:
Le Vendredi 25 Octobre 2002 20:17, Stephan Szabo a écrit :
> On Fri, 25 Oct 2002, [iso-8859-15] Hervé Piedvache wrote:
> > Thanks for you help ... to "explain" us why Postgresql do not use all the
> > time the index scan for this kind of request for 3 tables with the same
> > primary key id.
>
> Depending on the number of rows returned, index scan may not be faster
> than a sequential scan.  As you up the constant, the estimated number of
> rows drops which make the index scan a better and better plan.  Are
> the estimated number of rows for the scan of a (4379, 3987, and 29)
> realistic for mynumber>(23000, 23500, 24000)

I'm surpise about this answer for

>23000
result is 98

>23350
result is 96

>24000
result is  93

With a table of 471 413 records and most of the values for mynumber are
< of 2500 result is 467 902 records ...

I do not understand the explain result sorry ! ;o/

Regards,
--
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Tel. 33-144949901
fax. 33-144949902

Re: question about seq scan and index scan

From
Tom Lane
Date:
=?iso-8859-15?q?Herv=E9=20Piedvache?= <herve@elma.fr> writes:
> Le Vendredi 25 Octobre 2002 20:17, Stephan Szabo a �crit :
>> Are the estimated number of rows for the scan of a (4379, 3987, and 29)
>> realistic for mynumber>(23000, 23500, 24000)

> I'm surpise about this answer for

>> 23000
> result is 98

>> 23350
> result is 96

>> 24000
> result is  93

Hmm, have you "ANALYZE"d this table recently?  If so it would be
interesting to see the pg_stats row for tablea.mynumber.

If the distribution of mynumber is very uneven, it might be worth
your while to increase the statistics target for the column, so that
the planner has more data points with which to make a better estimate.
See ALTER TABLE SET STATISTICS.

            regards, tom lane

Re: question about seq scan and index scan

From
Hervé Piedvache
Date:
Hi Tom,

Le Samedi 26 Octobre 2002 16:31, Tom Lane a écrit :
> =?iso-8859-15?q?Herv=E9=20Piedvache?= <herve@elma.fr> writes:
> > Le Vendredi 25 Octobre 2002 20:17, Stephan Szabo a écrit :
> >> Are the estimated number of rows for the scan of a (4379, 3987, and 29)
> >> realistic for mynumber>(23000, 23500, 24000)
> >
> > I'm surpise about this answer for
> >
> >> 23000
> >
> > result is 98
> >
> >> 23350
> >
> > result is 96
> >
> >> 24000
> >
> > result is  93
>
> Hmm, have you "ANALYZE"d this table recently?  If so it would be
> interesting to see the pg_stats row for tablea.mynumber.

Every night ...

> If the distribution of mynumber is very uneven, it might be worth
> your while to increase the statistics target for the column, so that
> the planner has more data points with which to make a better estimate.
> See ALTER TABLE SET STATISTICS.

Yep done like this ... I have deleted my index on tablea on mynumber, and put
an alter table set statistics 100 ... and now it's really better !

Many thanks !
--
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Tel. 33-144949901
fax. 33-144949902