Re: [SQL] [PERFORM] 7.3.1 index use / performance - Mailing list pgsql-general

From Achilleus Mantzios
Subject Re: [SQL] [PERFORM] 7.3.1 index use / performance
Date
Msg-id Pine.LNX.4.44.0301071754580.8183-200000@matrix.gatewaynet.com
Whole thread Raw
In response to Re: [PERFORM] 7.3.1 index use / performance  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Responses Re: [SQL] [PERFORM] 7.3.1 index use / performance
List pgsql-general
On Tue, 7 Jan 2003, Stephan Szabo wrote:

>
> On Tue, 7 Jan 2003, Achilleus Mantzios wrote:
>
> > i am just in the stage of having migrated my test system to 7.3.1
> > and i am experiencing some performance problems.
> >
> > i have a table "noon"
> >                      Table "public.noon"
> >          Column         |          Type          | Modifiers
> > ------------------------+------------------------+-----------
> >  v_code                 | character varying(4)   |
> >  log_no                 | bigint                 |
> >  report_date            | date                   |
> >  report_time            | time without time zone |
> >  voyage_no              | integer                |
> >  charterer              | character varying(12)  |
> >  port                   | character varying(24)  |
> >  duration               | character varying(4)   |
> >  rotation               | character varying(9)   |
> > ......
> >
> > with a total of 278 columns.
> >
> > it has indexes:
> > Indexes: noonf_date btree (report_date),
> >          noonf_logno btree (log_no),
> >          noonf_rotation btree (rotation text_ops),
> >          noonf_vcode btree (v_code),
> >          noonf_voyageno btree (voyage_no)
> >
> > On the test 7.3.1 system (a FreeBSD 4.7-RELEASE-p2, Celeron 1.2GHz
> > 400Mb, with 168Mb for pgsql),
> > i get:
> > dynacom=# EXPLAIN ANALYZE select
> > FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where
> > v_code='4500' and rotation='NOON     ' and report_date between
> > '2002-01-07' and '2003-01-07';
> >                                                     QUERY PLAN
> >
>
> > -------------------------------------------------------------------------------------------------------------------
> >  Index Scan using noonf_date on noon  (cost=0.00..4.46 rows=1 width=39)
> > (actual time=0.27..52.89 rows=259 loops=1)
>
>
> >  Index Scan using noonf_vcode on noon  (cost=0.00..3122.88 rows=1
> > width=39) (actual time=0.16..13.92 rows=259 loops=1)
>
>
> What do the statistics for the three columns actually look like and what
> are the real distributions and counts like?

The two databases (test 7.3.1 and development 7.2.3) are identical
(loaded from the same pg_dump).

About the stats on these 3 columns i get: (see also attachment 1 to avoid
identation/wraparound problems)

 schemaname | tablename |   attname   | null_frac | avg_width | n_distinct |
   most_common_vals                                                 |
most_common_freqs                                             |
histogram_bounds                                                        | correlation 

------------+-----------+-------------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+-------------
 public     | noon      | v_code      |         0 |         8 |        109 |
{4630,4650,4690,4670,4520,4610,4550,4560,4620,4770}                                                            |
{0.0283333,0.028,0.0256667,0.0243333,0.024,0.0236667,0.0233333,0.0233333,0.0226667,0.0226667}             |
{2070,3210,4330,4480,4570,4680,4751,4820,4870,4940,6020}
      |   -0.249905 
 public     | noon      | report_date |         0 |         4 |       3408 |
{2001-11-14,1998-10-18,2000-04-03,2000-07-04,2000-12-20,2000-12-31,2001-01-12,2001-10-08,2001-12-25,1996-01-23}|
{0.002,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00133333}|
{"0001-12-11
BC",1994-09-27,1996-03-26,1997-07-29,1998-08-26,1999-03-29,1999-11-30,2000-09-25,2001-05-25,2002-01-17,2002-12-31}|
-0.812295
 public     | noon      | rotation    |         0 |        13 |          6 | {"NOON     ","PORT LOG ","ARRIVAL
",DEPARTURE,"SEA     ","NEXT PORT"}                                         |
{0.460333,0.268667,0.139,0.119667,0.007,0.00533333}                                                       |
                                                                                                                   |
0.119698
(3 rows)


About distributions, i have:

dynacom=# SELECT rotation,count(*) from noon group by rotation;
 rotation  | count
-----------+-------
           |     2
 000000000 |    65
 ARRIVAL   |     1
 ARRIVAL   | 15471
 DEPARTURE | 15030
 NEXT PORT |   462
 NOON      | 50874
 PORT LOG  | 25688
 SEA       |  1202
(9 rows)

dynacom=# SELECT v_code,count(*) from noon group by v_code;
 v_code | count
--------+-------
 0004   |     1
 1030   |     1
 2070   |   170
 2080   |   718
 2110   |   558
 2220   |   351
 2830   |  1373
 2840   |   543
 2860   |   407
 2910   |   418
 3010   |   352
 3020   |   520
 3060   |    61
 3130   |   117
 3140   |     1
 3150   |   752
 3160   |   811
 3170   |   818
 3180   |  1064
 3190   |   640
 3200   |   998
 3210   |  1512
 3220   |   595
 3230   |   374
 3240   |   514
 3250   |    13
 3260   |   132
 3270   |   614
 4010   |   413
 4020   |   330
 4040   |   728
 4050   |   778
 4060   |   476
 4070   |   534
 4310   |   759
 4320   |   424
 4330   |   549
 4360   |   366
 4370   |   334
 4380   |   519
 4410   |   839
 4420   |   183
 4421   |   590
 4430   |   859
 4450   |   205
 4470   |   861
 4480   |   766
 4490   |   169
 4500   |   792
 4510   |  2116
 4520   |  2954
 4530   |  2142
 4531   |   217
 4540   |  2273
 4550   |  2765
 4560   |  2609
 4570   |  2512
 4580   |  1530
 4590   |  1987
 4600   |   308
 4610   |  2726
 4620   |  2698
 4630   |  2813
 4640   |  1733
 4650   |  2655
 4660   |  2139
 4661   |    65
 4670   |  2607
 4680   |  1729
 4690   |  2587
 4700   |  2101
 4710   |  1830
 4720   |  1321
 4730   |  1258
 4740   |  1506
 4750   |  1391
 4751   |   640
 4760   |  1517
 4770   |  2286
 4780   |  1353
 4790   |  1209
 4800   |  2414
 4810   |   770
 4820   |  1115
 4830   |  1587
 4840   |   983
 4841   |   707
 4850   |  1297
 4860   |   375
 4870   |  1440
 4880   |   456
 4881   |   742
 4890   |   210
 4891   |    45
 4900   |     2
 4910   |  1245
 4920   |   414
 4930   |  1130
 4940   |  1268
 4950   |   949
 4960   |   836
 4970   |  1008
 4980   |  1239
 5510   |   477
 5520   |   380
 5530   |   448
 5540   |   470
 5550   |   352
 5560   |   148
 5570   |   213
 5580   |   109
 5590   |    55
 6010   |   246
 6020   |   185
 9180   |     1

(Not all the above vessels are active or belong to me:) )

The distribution on the report_date has no probabilistic significance
since each report_date usually corresponds to one row.
So,
dynacom=# SELECT count(*) from noon;
 count
--------
 108795
(1 row)

dynacom=#

Now for the specific query the counts have as follows:

dynacom=# select count(*) from noon where v_code='4500';
 count
-------
   792
(1 row)

dynacom=# select count(*) from noon where rotation='NOON     ';
 count
-------
 50874
(1 row)

dynacom=# select count(*) from noon where report_date between '2002-01-07'
and '2003-01-07';
 count
-------
  7690
(1 row)

dynacom=#

> Given an estimated cost of around 4 for the first scan, my guess would be
> that it's not expecting alot of rows between 2002-01-07 and 2003-01-07
> which would make that a reasonable plan.
>

As we see the rows returned for v_code='4500' (792) are much fewer than
the rows returned for the dates between '2002-01-07' and '2003-01-07'
(7690).

Is there a way to provide you with more information?

And i must note that the two databases were worked on after a fresh
createdb on both systems (and as i told they are identical).
But, for some reason the 7.2.3 *always* finds the best index to use :)

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr

Attachment

pgsql-general by date:

Previous
From: "Lowther, David W"
Date:
Subject: Connection / Access / Strange Problems
Next
From: Tom Lane
Date:
Subject: Re: [SQL] [PERFORM] 7.3.1 index use / performance