Thread: 7.3.1 index use / performance

7.3.1 index use / performance

From
Achilleus Mantzios
Date:
Hi,

i am just in the stage of having migrated my test system to 7.3.1
and i am experiencing some performance problems.

i have a table "noon"
                     Table "public.noon"
         Column         |          Type          | Modifiers
------------------------+------------------------+-----------
 v_code                 | character varying(4)   |
 log_no                 | bigint                 |
 report_date            | date                   |
 report_time            | time without time zone |
 voyage_no              | integer                |
 charterer              | character varying(12)  |
 port                   | character varying(24)  |
 duration               | character varying(4)   |
 rotation               | character varying(9)   |
......

with a total of 278 columns.

it has indexes:
Indexes: noonf_date btree (report_date),
         noonf_logno btree (log_no),
         noonf_rotation btree (rotation text_ops),
         noonf_vcode btree (v_code),
         noonf_voyageno btree (voyage_no)

On the test 7.3.1 system (a FreeBSD 4.7-RELEASE-p2, Celeron 1.2GHz
400Mb, with 168Mb for pgsql),
i get:
dynacom=# EXPLAIN ANALYZE select
FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
v_code='4500' and rotation='NOON     ' and report_date between
'2002-01-07' and '2003-01-07';
                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
 Index Scan using noonf_date on noon  (cost=0.00..4.46 rows=1 width=39)
(actual time=0.27..52.89 rows=259 loops=1)
   Index Cond: ((report_date >= '2002-01-07'::date) AND (report_date <=
'2003-01-07'::date))
   Filter: ((v_code = '4500'::character varying) AND (rotation = 'NOON
'::character varying))
 Total runtime: 53.98 msec
(4 rows)

after i drop the noonf_date index i actually get better performance
cause the backend uses now the more appropriate index noonf_vcode :

dynacom=# EXPLAIN ANALYZE select
FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
v_code='4500' and rotation='NOON     ' and report_date between
'2002-01-07' and '2003-01-07';
                                                               QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using noonf_vcode on noon  (cost=0.00..3122.88 rows=1
width=39) (actual time=0.16..13.92 rows=259 loops=1)
   Index Cond: (v_code = '4500'::character varying)
   Filter: ((rotation = 'NOON     '::character varying) AND (report_date
>= '2002-01-07'::date) AND (report_date <= '2003-01-07'::date))
 Total runtime: 14.98 msec
(4 rows)

On the pgsql 7.2.3 development system (a RH linux 2.4.7, PIII 1 GHz,
1Mb, with 168M for pgsql), i always get the right index use:

dynacom=# EXPLAIN ANALYZE select
FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
v_code='4500' and rotation='NOON     ' and report_date between
'2002-01-07' and '2003-01-07';
NOTICE:  QUERY PLAN:

Index Scan using noonf_vcode on noon  (cost=0.00..3046.38 rows=39
width=39) (actual time=0.09..8.55 rows=259 loops=1)
Total runtime: 8.86 msec

EXPLAIN

Is something i am missing??
Is this reasonable behaviour??

P.S.
Yes i have vaccumed analyzed both systems before the queries were issued.
==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr



Re: [SQL] 7.3.1 index use / performance

From
Tomasz Myrta
Date:
Achilleus Mantzios wrote:
<cut>

 > it has indexes:
 > Indexes: noonf_date btree (report_date),
 >          noonf_logno btree (log_no),
 >          noonf_rotation btree (rotation text_ops),
 >          noonf_vcode btree (v_code),
 >          noonf_voyageno btree (voyage_no)
 >
<cut>

 >
 >
-------------------------------------------------------------------------------------------------------------------
 >  Index Scan using noonf_date on noon  (cost=0.00..4.46 rows=1 width=39)
 > (actual time=0.27..52.89 rows=259 loops=1)
 >    Index Cond: ((report_date >= '2002-01-07'::date) AND (report_date <=
 > '2003-01-07'::date))
 >    Filter: ((v_code = '4500'::character varying) AND (rotation = 'NOON
 > '::character varying))
 >  Total runtime: 53.98 msec
 > (4 rows)
<cut>

Maybe it is not an answer to your question, but why don't you help
Postgres by yourself?
For this kind of queries it's better to drop index on report_date - your
report period is one year and answer to this condition is 10% records (I
suppose)
It would be better to change 2 indexes on v_code and rotation into one
index based on both fields.
What kind of queries do you have? How many records returns each "where"
condition? Use indexes on fields, on which condition result in smallest
amount of rows.

Regards,
Tomasz Myrta


Re: [SQL] 7.3.1 index use / performance

From
Achilleus Mantzios
Date:
On Tue, 7 Jan 2003, Tomasz Myrta wrote:

> Maybe it is not an answer to your question, but why don't you help
> Postgres by yourself?

Thanx,

i dont think that the issue here is to help postgresql by myself.
I can always stick to 7.2.3, or use indexes that 7.3.1 will
acknowledge, like noonf_vcode_date on noon (v_code,report_date).
(unfortunately when i create the above noonf_vcode_date index, it is only
used until
the next vacuum analyze, hackers is this an issue too???),
but these options are not interesting from a postgresql perspective :)

> For this kind of queries it's better to drop index on report_date - your
> report period is one year and answer to this condition is 10% records (I
> suppose)

I cannot drop the index on the report_date since a lot of other queries
need it.

> It would be better to change 2 indexes on v_code and rotation into one
> index based on both fields.
> What kind of queries do you have? How many records returns each "where"
> condition? Use indexes on fields, on which condition result in smallest
> amount of rows.
>
> Regards,
> Tomasz Myrta
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: 7.3.1 index use / performance

From
Stephan Szabo
Date:
On Tue, 7 Jan 2003, Achilleus Mantzios wrote:

> i am just in the stage of having migrated my test system to 7.3.1
> and i am experiencing some performance problems.
>
> i have a table "noon"
>                      Table "public.noon"
>          Column         |          Type          | Modifiers
> ------------------------+------------------------+-----------
>  v_code                 | character varying(4)   |
>  log_no                 | bigint                 |
>  report_date            | date                   |
>  report_time            | time without time zone |
>  voyage_no              | integer                |
>  charterer              | character varying(12)  |
>  port                   | character varying(24)  |
>  duration               | character varying(4)   |
>  rotation               | character varying(9)   |
> ......
>
> with a total of 278 columns.
>
> it has indexes:
> Indexes: noonf_date btree (report_date),
>          noonf_logno btree (log_no),
>          noonf_rotation btree (rotation text_ops),
>          noonf_vcode btree (v_code),
>          noonf_voyageno btree (voyage_no)
>
> On the test 7.3.1 system (a FreeBSD 4.7-RELEASE-p2, Celeron 1.2GHz
> 400Mb, with 168Mb for pgsql),
> i get:
> dynacom=# EXPLAIN ANALYZE select
> FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
> v_code='4500' and rotation='NOON     ' and report_date between
> '2002-01-07' and '2003-01-07';
>                                                     QUERY PLAN
>

> -------------------------------------------------------------------------------------------------------------------
>  Index Scan using noonf_date on noon  (cost=0.00..4.46 rows=1 width=39)
> (actual time=0.27..52.89 rows=259 loops=1)


>  Index Scan using noonf_vcode on noon  (cost=0.00..3122.88 rows=1
> width=39) (actual time=0.16..13.92 rows=259 loops=1)


What do the statistics for the three columns actually look like and what
are the real distributions and counts like?
Given an estimated cost of around 4 for the first scan, my guess would be
that it's not expecting alot of rows between 2002-01-07 and 2003-01-07
which would make that a reasonable plan.


Re: [SQL] 7.3.1 index use / performance

From
Achilleus Mantzios
Date:
On Tue, 7 Jan 2003, Stephan Szabo wrote:

>
> On Tue, 7 Jan 2003, Achilleus Mantzios wrote:
>
> > i am just in the stage of having migrated my test system to 7.3.1
> > and i am experiencing some performance problems.
> >
> > i have a table "noon"
> >                      Table "public.noon"
> >          Column         |          Type          | Modifiers
> > ------------------------+------------------------+-----------
> >  v_code                 | character varying(4)   |
> >  log_no                 | bigint                 |
> >  report_date            | date                   |
> >  report_time            | time without time zone |
> >  voyage_no              | integer                |
> >  charterer              | character varying(12)  |
> >  port                   | character varying(24)  |
> >  duration               | character varying(4)   |
> >  rotation               | character varying(9)   |
> > ......
> >
> > with a total of 278 columns.
> >
> > it has indexes:
> > Indexes: noonf_date btree (report_date),
> >          noonf_logno btree (log_no),
> >          noonf_rotation btree (rotation text_ops),
> >          noonf_vcode btree (v_code),
> >          noonf_voyageno btree (voyage_no)
> >
> > On the test 7.3.1 system (a FreeBSD 4.7-RELEASE-p2, Celeron 1.2GHz
> > 400Mb, with 168Mb for pgsql),
> > i get:
> > dynacom=# EXPLAIN ANALYZE select
> > FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
> > v_code='4500' and rotation='NOON     ' and report_date between
> > '2002-01-07' and '2003-01-07';
> >                                                     QUERY PLAN
> >
>
> > -------------------------------------------------------------------------------------------------------------------
> >  Index Scan using noonf_date on noon  (cost=0.00..4.46 rows=1 width=39)
> > (actual time=0.27..52.89 rows=259 loops=1)
>
>
> >  Index Scan using noonf_vcode on noon  (cost=0.00..3122.88 rows=1
> > width=39) (actual time=0.16..13.92 rows=259 loops=1)
>
>
> What do the statistics for the three columns actually look like and what
> are the real distributions and counts like?

The two databases (test 7.3.1 and development 7.2.3) are identical
(loaded from the same pg_dump).

About the stats on these 3 columns i get: (see also attachment 1 to avoid
identation/wraparound problems)

 schemaname | tablename |   attname   | null_frac | avg_width | n_distinct |
   most_common_vals                                                 |
most_common_freqs                                             |
histogram_bounds                                                        | correlation 

------------+-----------+-------------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+-------------
 public     | noon      | v_code      |         0 |         8 |        109 |
{4630,4650,4690,4670,4520,4610,4550,4560,4620,4770}                                                            |
{0.0283333,0.028,0.0256667,0.0243333,0.024,0.0236667,0.0233333,0.0233333,0.0226667,0.0226667}             |
{2070,3210,4330,4480,4570,4680,4751,4820,4870,4940,6020}
      |   -0.249905 
 public     | noon      | report_date |         0 |         4 |       3408 |
{2001-11-14,1998-10-18,2000-04-03,2000-07-04,2000-12-20,2000-12-31,2001-01-12,2001-10-08,2001-12-25,1996-01-23}|
{0.002,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00133333}|
{"0001-12-11
BC",1994-09-27,1996-03-26,1997-07-29,1998-08-26,1999-03-29,1999-11-30,2000-09-25,2001-05-25,2002-01-17,2002-12-31}|
-0.812295
 public     | noon      | rotation    |         0 |        13 |          6 | {"NOON     ","PORT LOG ","ARRIVAL
",DEPARTURE,"SEA     ","NEXT PORT"}                                         |
{0.460333,0.268667,0.139,0.119667,0.007,0.00533333}                                                       |
                                                                                                                   |
0.119698
(3 rows)


About distributions, i have:

dynacom=# SELECT rotation,count(*) from noon group by rotation;
 rotation  | count
-----------+-------
           |     2
 000000000 |    65
 ARRIVAL   |     1
 ARRIVAL   | 15471
 DEPARTURE | 15030
 NEXT PORT |   462
 NOON      | 50874
 PORT LOG  | 25688
 SEA       |  1202
(9 rows)

dynacom=# SELECT v_code,count(*) from noon group by v_code;
 v_code | count
--------+-------
 0004   |     1
 1030   |     1
 2070   |   170
 2080   |   718
 2110   |   558
 2220   |   351
 2830   |  1373
 2840   |   543
 2860   |   407
 2910   |   418
 3010   |   352
 3020   |   520
 3060   |    61
 3130   |   117
 3140   |     1
 3150   |   752
 3160   |   811
 3170   |   818
 3180   |  1064
 3190   |   640
 3200   |   998
 3210   |  1512
 3220   |   595
 3230   |   374
 3240   |   514
 3250   |    13
 3260   |   132
 3270   |   614
 4010   |   413
 4020   |   330
 4040   |   728
 4050   |   778
 4060   |   476
 4070   |   534
 4310   |   759
 4320   |   424
 4330   |   549
 4360   |   366
 4370   |   334
 4380   |   519
 4410   |   839
 4420   |   183
 4421   |   590
 4430   |   859
 4450   |   205
 4470   |   861
 4480   |   766
 4490   |   169
 4500   |   792
 4510   |  2116
 4520   |  2954
 4530   |  2142
 4531   |   217
 4540   |  2273
 4550   |  2765
 4560   |  2609
 4570   |  2512
 4580   |  1530
 4590   |  1987
 4600   |   308
 4610   |  2726
 4620   |  2698
 4630   |  2813
 4640   |  1733
 4650   |  2655
 4660   |  2139
 4661   |    65
 4670   |  2607
 4680   |  1729
 4690   |  2587
 4700   |  2101
 4710   |  1830
 4720   |  1321
 4730   |  1258
 4740   |  1506
 4750   |  1391
 4751   |   640
 4760   |  1517
 4770   |  2286
 4780   |  1353
 4790   |  1209
 4800   |  2414
 4810   |   770
 4820   |  1115
 4830   |  1587
 4840   |   983
 4841   |   707
 4850   |  1297
 4860   |   375
 4870   |  1440
 4880   |   456
 4881   |   742
 4890   |   210
 4891   |    45
 4900   |     2
 4910   |  1245
 4920   |   414
 4930   |  1130
 4940   |  1268
 4950   |   949
 4960   |   836
 4970   |  1008
 4980   |  1239
 5510   |   477
 5520   |   380
 5530   |   448
 5540   |   470
 5550   |   352
 5560   |   148
 5570   |   213
 5580   |   109
 5590   |    55
 6010   |   246
 6020   |   185
 9180   |     1

(Not all the above vessels are active or belong to me:) )

The distribution on the report_date has no probabilistic significance
since each report_date usually corresponds to one row.
So,
dynacom=# SELECT count(*) from noon;
 count
--------
 108795
(1 row)

dynacom=#

Now for the specific query the counts have as follows:

dynacom=# select count(*) from noon where v_code='4500';
 count
-------
   792
(1 row)

dynacom=# select count(*) from noon where rotation='NOON     ';
 count
-------
 50874
(1 row)

dynacom=# select count(*) from noon where report_date between '2002-01-07'
and '2003-01-07';
 count
-------
  7690
(1 row)

dynacom=#

> Given an estimated cost of around 4 for the first scan, my guess would be
> that it's not expecting alot of rows between 2002-01-07 and 2003-01-07
> which would make that a reasonable plan.
>

As we see the rows returned for v_code='4500' (792) are much fewer than
the rows returned for the dates between '2002-01-07' and '2003-01-07'
(7690).

Is there a way to provide you with more information?

And i must note that the two databases were worked on after a fresh
createdb on both systems (and as i told they are identical).
But, for some reason the 7.2.3 *always* finds the best index to use :)

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr

Re: [SQL] 7.3.1 index use / performance

From
Tom Lane
Date:
Achilleus Mantzios <achill@matrix.gatewaynet.com> writes:
> About the stats on these 3 columns i get:

Does 7.2 generate the same stats?  (minus the schemaname of course)

Also, I would like to see the results of these queries on both versions,
so that we can see what the planner thinks the index selectivity is:

EXPLAIN ANALYZE select * from noon where
v_code='4500';

EXPLAIN ANALYZE select * from noon where
report_date between '2002-01-07' and '2003-01-07';

            regards, tom lane

Re: [SQL] 7.3.1 index use / performance

From
Achilleus Mantzios
Date:
On Tue, 7 Jan 2003, Tom Lane wrote:

> Achilleus Mantzios <achill@matrix.gatewaynet.com> writes:
> > About the stats on these 3 columns i get:
>
> Does 7.2 generate the same stats?  (minus the schemaname of course)

Not absolutely but close:

(See attachment)

>
> Also, I would like to see the results of these queries on both versions,
> so that we can see what the planner thinks the index selectivity is:
>
> EXPLAIN ANALYZE select * from noon where
> v_code='4500';
>
> EXPLAIN ANALYZE select * from noon where
> report_date between '2002-01-07' and '2003-01-07';
>

On 7.3.1 (On a FreeBSD)
=======================
dynacom=# EXPLAIN ANALYZE select * from noon where v_code='4500';
                                                         QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using noonf_vcode on noon  (cost=0.00..3066.64 rows=829
width=1974) (actual time=2.02..1421.14 rows=792 loops=1)
   Index Cond: (v_code = '4500'::character varying)
 Total runtime: 1424.82 msec
(3 rows)


dynacom=# EXPLAIN ANALYZE select * from noon where report_date between
'2002-01-07' and '2003-01-07';
                                                           QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using noonf_date on noon  (cost=0.00..15919.50 rows=11139
width=1974) (actual time=2.05..13746.17 rows=7690 loops=1)
   Index Cond: ((report_date >= '2002-01-07'::date) AND (report_date <=
'2003-01-07'::date))
 Total runtime: 13775.48 msec
(3 rows)

On 7.2.3 (Linux)
==================
dynacom=# EXPLAIN ANALYZE select * from noon where v_code='4500';
NOTICE:  QUERY PLAN:

Index Scan using noonf_vcode on noon  (cost=0.00..3043.45 rows=827
width=1974) (actual time=19.59..927.06 rows=792 loops=1)
Total runtime: 928.86 msec

dynacom=# EXPLAIN ANALYZE select * from noon where report_date between
'2002-01-07' and '2003-01-07';
NOTICE:  QUERY PLAN:

Index Scan using noonf_date on noon  (cost=0.00..16426.45 rows=11958
width=1974) (actual time=29.64..8854.05 rows=7690 loops=1)
Total runtime: 8861.90 msec

EXPLAIN

>             regards, tom lane
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr

Re: [SQL] 7.3.1 index use / performance

From
Tom Lane
Date:
Achilleus Mantzios <achill@matrix.gatewaynet.com> writes:
>> Also, I would like to see the results of these queries on both versions,
>> so that we can see what the planner thinks the index selectivity is:
>>
> [ data supplied ]

There is something really, really bizarre going on there.  You have

dynacom=# EXPLAIN ANALYZE select * from noon where report_date between '2002-01-07' and '2003-01-07';
                                                           QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------
 Index Scan using noonf_date on noon  (cost=0.00..15919.50 rows=11139 width=1974) (actual time=2.05..13746.17 rows=7690
loops=1)
   Index Cond: ((report_date >= '2002-01-07'::date) AND (report_date <= '2003-01-07'::date))
 Total runtime: 13775.48 msec
(3 rows)

and from your earlier message

dynacom=# EXPLAIN ANALYZE select FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
v_code='4500' and rotation='NOON     ' and report_date between '2002-01-07' and '2003-01-07';
                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
 Index Scan using noonf_date on noon  (cost=0.00..4.46 rows=1 width=39) (actual time=0.27..52.89 rows=259 loops=1)
   Index Cond: ((report_date >= '2002-01-07'::date) AND (report_date <= '2003-01-07'::date))
   Filter: ((v_code = '4500'::character varying) AND (rotation = 'NOON'::character varying))
 Total runtime: 53.98 msec
(4 rows)

There is no way that adding the filter condition should have reduced the
estimated runtime for this plan --- reducing the estimated number of
output rows, yes, but not the runtime.  And in fact I can't duplicate
that when I try it here.  I did this on 7.3.1:

regression=# create table noon (v_code character varying(4) ,
regression(# report_date   date ,
regression(# rotation   character varying(9));
CREATE TABLE
regression=# create index noonf_date on noon(report_date);
CREATE INDEX
regression=# EXPLAIN select * from noon where report_date between
regression-# '2002-01-07' and '2003-01-07';
                                         QUERY PLAN

---------------------------------------------------------------------------------------------
 Index Scan using noonf_date on noon  (cost=0.00..17.08 rows=5 width=25)
   Index Cond: ((report_date >= '2002-01-07'::date) AND (report_date <= '2003-01-07'::date))
(2 rows)

regression=# explain select *  from noon where
regression-# v_code='4500' and rotation='NOON     ' and report_date between
regression-# '2002-01-07' and '2003-01-07';
                                            QUERY PLAN

--------------------------------------------------------------------------------
------------------
 Index Scan using noonf_date on noon  (cost=0.00..17.11 rows=1 width=25)
   Index Cond: ((report_date >= '2002-01-07'::date) AND (report_date <= '2003-01-07'::date))
   Filter: ((v_code = '4500'::character varying) AND (rotation = 'NOON     '::character varying))
(3 rows)

Note that the cost went up, not down.

I am wondering about a compiler bug, or some other peculiarity on your
platform.  Can anyone else using FreeBSD try the above experiment and
see if they get different results from mine on 7.3.* (or CVS tip)?

            regards, tom lane

Re: [SQL] 7.3.1 index use / performance

From
Achilleus Mantzios
Date:
On Tue, 7 Jan 2003, Tom Lane wrote:

> There is no way that adding the filter condition should have reduced the
> estimated runtime for this plan --- reducing the estimated number of
> output rows, yes, but not the runtime.  And in fact I can't duplicate

My case persists:
After clean install of the database, and after vacuum analyze,
i get

dynacom=# EXPLAIN ANALYZE select
FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
report_date between '2002-01-07' and '2003-01-07';
                                                         QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using noonf_date on noon  (cost=0.00..16458.54 rows=10774
width=39) (actual time=0.13..205.86 rows=7690 loops=1)
   Index Cond: ((report_date >= '2002-01-07'::date) AND (report_date <=
'2003-01-07'::date))
 Total runtime: 233.22 msec

dynacom=# EXPLAIN ANALYZE select
FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
report_date between '2002-01-07' and '2003-01-07' and v_code='4500';
                                                       QUERY PLAN


------------------------------------------------------------------------------------------------------------------------
 Index Scan using noonf_vcode on noon  (cost=0.00..3092.52 rows=83
width=39) (actual time=0.15..15.08 rows=373 loops=1)
   Index Cond: (v_code = '4500'::character varying)
   Filter: ((report_date >= '2002-01-07'::date) AND (report_date <=
'2003-01-07'::date))
 Total runtime: 16.56 msec
(4 rows)

I thought PostgreSQL in some sense (hub.org) used FreeBSD,
is there any 4.7 FreeBSD server with pgsql 7.3.1 you could use?

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: [SQL] 7.3.1 index use / performance

From
Tom Lane
Date:
Achilleus Mantzios <achill@matrix.gatewaynet.com> writes:
> My case persists:
> After clean install of the database, and after vacuum analyze,
> i get

Um ... is it persisting?  That looks like it's correctly picked the
vcode index this time.  Strange behavior though.  By "clean install"
do you mean you rebuilt Postgres, or just did dropdb/createdb/reload
data?

            regards, tom lane

Re: [SQL] 7.3.1 index use / performance

From
Achilleus Mantzios
Date:
On Tue, 7 Jan 2003, Tom Lane wrote:

> Achilleus Mantzios <achill@matrix.gatewaynet.com> writes:
> > My case persists:
> > After clean install of the database, and after vacuum analyze,
> > i get
>
> Um ... is it persisting?  That looks like it's correctly picked the
> vcode index this time.  Strange behavior though.  By "clean install"
> do you mean you rebuilt Postgres, or just did dropdb/createdb/reload
> data?

Just dropdb/createdb/reload.

>
>             regards, tom lane
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: [SQL] 7.3.1 index use / performance

From
Rod Taylor
Date:
> I am wondering about a compiler bug, or some other peculiarity on your
> platform.  Can anyone else using FreeBSD try the above experiment and
> see if they get different results from mine on 7.3.* (or CVS tip)?

On FreeBSD 4.7 I received the exact same results as Tom using the
statements shown by Tom.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Re: [SQL] 7.3.1 index use / performance

From
Tom Lane
Date:
Rod Taylor <rbt@rbt.ca> writes:
>> I am wondering about a compiler bug, or some other peculiarity on your
>> platform.  Can anyone else using FreeBSD try the above experiment and
>> see if they get different results from mine on 7.3.* (or CVS tip)?

> On FreeBSD 4.7 I received the exact same results as Tom using the
> statements shown by Tom.

On looking at the code, I do see part of a possible mechanism for this
behavior: cost_index calculates the estimated cost for qual-clause
evaluation like this:

    /*
     * Estimate CPU costs per tuple.
     *
     * Normally the indexquals will be removed from the list of restriction
     * clauses that we have to evaluate as qpquals, so we should subtract
     * their costs from baserestrictcost.  XXX For a lossy index, not all
     * the quals will be removed and so we really shouldn't subtract their
     * costs; but detecting that seems more expensive than it's worth.
     * Also, if we are doing a join then some of the indexquals are join
     * clauses and shouldn't be subtracted.  Rather than work out exactly
     * how much to subtract, we don't subtract anything.
     */
    cpu_per_tuple = cpu_tuple_cost + baserel->baserestrictcost;

    if (!is_injoin)
        cpu_per_tuple -= cost_qual_eval(indexQuals);

In theory, indexQuals will always be a subset of the qual list on which
baserestrictcost was computed, so we should always end up with a
cpu_per_tuple value at least as large as cpu_tuple_cost.  I am wondering
if somehow in Achilleus's situation, cost_qual_eval() is producing a
silly result leading to negative cpu_per_tuple.  I don't see how that
could happen though --- nor why it would happen on his machine and not
other people's.

            regards, tom lane

Re: [SQL] 7.3.1 index use / performance

From
Achilleus Mantzios
Date:
On Tue, 7 Jan 2003, Tom Lane wrote:

> Rod Taylor <rbt@rbt.ca> writes:
> >> I am wondering about a compiler bug, or some other peculiarity on your
> >> platform.  Can anyone else using FreeBSD try the above experiment and
> >> see if they get different results from mine on 7.3.* (or CVS tip)?
>
> > On FreeBSD 4.7 I received the exact same results as Tom using the
> > statements shown by Tom.
>
> On looking at the code, I do see part of a possible mechanism for this
> behavior: cost_index calculates the estimated cost for qual-clause
> evaluation like this:
>

This bizarre index decreased cost (when adding conditions) behaviour maybe
was due to some vacuums.
(i cant remember how many reloads and vacuums i did to the database
in the period petween the two emails).

However my linux machine with the same pgsql 7.3.1, with a full clean
installation also gives the same symptoms:
Choosing the slow index, and after some (random)
vacuums choosing the right index, and then after some vacuums chooses the
bad
index again.


>
>             regards, tom lane
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: [SQL] 7.3.1 index use / performance

From
Tom Lane
Date:
Just to close off the thread, here is the end-result of investigating
Achilleus Mantzios' problem.

------- Forwarded Message

Date:    Wed, 08 Jan 2003 11:54:36 -0500
From:    Tom Lane <tgl@sss.pgh.pa.us>
To:      Achilleus Mantzios <achill@matrix.gatewaynet.com>
Subject: Re: [SQL] [PERFORM] 7.3.1 index use / performance

I believe I see what's going on.  You have a number of silly outlier
values in the report_date column --- quite a few instances of '10007-06-09'
for example.  Depending on whether ANALYZE's random sample happens to
include one of these, the histogram generated by ANALYZE might look like
this (it took about half a dozen tries with ANALYZE to get this result):

dynacom=# analyze noon;
ANALYZE
dynacom=# select histogram_bounds from pg_stats where attname = 'report_date';
                                                      histogram_bounds

-----------------------------------------------------------------------------------------------------------------------------

{1969-06-26,1994-09-24,1996-04-05,1997-07-21,1998-08-27,1999-03-13,1999-11-11,2000-08-18,2001-04-18,2002-01-04,10007-06-09}
(1 row)

in which case we get this:

dynacom=# EXPLAIN select * from noon where
dynacom-# report_date between '2002-01-07' and '2003-01-07';
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Index Scan using noonf_date on noon  (cost=0.00..4.08 rows=1 width=1975)
   Index Cond: ((report_date >= '2002-01-07'::date) AND (report_date <= '2003-01-07'::date))
(2 rows)

Seeing this histogram, the planner assumes that one-tenth of the table
is uniformly distributed between 2002-01-04 and 10007-06-09, which leads
it to the conclusion that the range between 2002-01-07 and 2003-01-07
probably contains only about one row, which causes it to prefer a scan
on report_date rather than on v_code.

The reason the problem comes and goes is that any given ANALYZE run
might or might not happen across one of the outliers.  When it doesn't,
you get a histogram that leads to reasonably accurate estimates.

There are a couple of things you could do about this.  One is to
increase the statistics target for report_date (see ALTER TABLE SET
STATISTICS) so that a finer-grained histogram is generated for the
report_date column.  The other thing, which is more work but probably
the best answer in the long run, is to fix the outliers, which I imagine
must be incorrect entries.

You could perhaps put a constraint on report_date to prevent bogus
entries from sneaking in in future.

It looks like increasing the stats target would be worth doing also,
if you make many queries using ranges of report_date.

            regards, tom lane

------- End of Forwarded Message