Re: [SQL] [PERFORM] 7.3.1 index use / performance - Mailing list pgsql-general
From | Achilleus Mantzios |
---|---|
Subject | Re: [SQL] [PERFORM] 7.3.1 index use / performance |
Date | |
Msg-id | Pine.LNX.4.44.0301071754580.8183-200000@matrix.gatewaynet.com Whole thread Raw |
In response to | Re: [PERFORM] 7.3.1 index use / performance (Stephan Szabo <sszabo@megazone23.bigpanda.com>) |
Responses |
Re: [SQL] [PERFORM] 7.3.1 index use / performance
|
List | pgsql-general |
On Tue, 7 Jan 2003, Stephan Szabo wrote: > > On Tue, 7 Jan 2003, Achilleus Mantzios wrote: > > > i am just in the stage of having migrated my test system to 7.3.1 > > and i am experiencing some performance problems. > > > > i have a table "noon" > > Table "public.noon" > > Column | Type | Modifiers > > ------------------------+------------------------+----------- > > v_code | character varying(4) | > > log_no | bigint | > > report_date | date | > > report_time | time without time zone | > > voyage_no | integer | > > charterer | character varying(12) | > > port | character varying(24) | > > duration | character varying(4) | > > rotation | character varying(9) | > > ...... > > > > with a total of 278 columns. > > > > it has indexes: > > Indexes: noonf_date btree (report_date), > > noonf_logno btree (log_no), > > noonf_rotation btree (rotation text_ops), > > noonf_vcode btree (v_code), > > noonf_voyageno btree (voyage_no) > > > > On the test 7.3.1 system (a FreeBSD 4.7-RELEASE-p2, Celeron 1.2GHz > > 400Mb, with 168Mb for pgsql), > > i get: > > dynacom=# EXPLAIN ANALYZE select > > FUELCONSUMPTION,rpm,Steam_Hours,voyage_activity,ldin from noon where > > v_code='4500' and rotation='NOON ' and report_date between > > '2002-01-07' and '2003-01-07'; > > QUERY PLAN > > > > > ------------------------------------------------------------------------------------------------------------------- > > Index Scan using noonf_date on noon (cost=0.00..4.46 rows=1 width=39) > > (actual time=0.27..52.89 rows=259 loops=1) > > > > Index Scan using noonf_vcode on noon (cost=0.00..3122.88 rows=1 > > width=39) (actual time=0.16..13.92 rows=259 loops=1) > > > What do the statistics for the three columns actually look like and what > are the real distributions and counts like? The two databases (test 7.3.1 and development 7.2.3) are identical (loaded from the same pg_dump). About the stats on these 3 columns i get: (see also attachment 1 to avoid identation/wraparound problems) schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-----------+-------------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+------------- public | noon | v_code | 0 | 8 | 109 | {4630,4650,4690,4670,4520,4610,4550,4560,4620,4770} | {0.0283333,0.028,0.0256667,0.0243333,0.024,0.0236667,0.0233333,0.0233333,0.0226667,0.0226667} | {2070,3210,4330,4480,4570,4680,4751,4820,4870,4940,6020} | -0.249905 public | noon | report_date | 0 | 4 | 3408 | {2001-11-14,1998-10-18,2000-04-03,2000-07-04,2000-12-20,2000-12-31,2001-01-12,2001-10-08,2001-12-25,1996-01-23}| {0.002,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00166667,0.00133333}| {"0001-12-11 BC",1994-09-27,1996-03-26,1997-07-29,1998-08-26,1999-03-29,1999-11-30,2000-09-25,2001-05-25,2002-01-17,2002-12-31}| -0.812295 public | noon | rotation | 0 | 13 | 6 | {"NOON ","PORT LOG ","ARRIVAL ",DEPARTURE,"SEA ","NEXT PORT"} | {0.460333,0.268667,0.139,0.119667,0.007,0.00533333} | | 0.119698 (3 rows) About distributions, i have: dynacom=# SELECT rotation,count(*) from noon group by rotation; rotation | count -----------+------- | 2 000000000 | 65 ARRIVAL | 1 ARRIVAL | 15471 DEPARTURE | 15030 NEXT PORT | 462 NOON | 50874 PORT LOG | 25688 SEA | 1202 (9 rows) dynacom=# SELECT v_code,count(*) from noon group by v_code; v_code | count --------+------- 0004 | 1 1030 | 1 2070 | 170 2080 | 718 2110 | 558 2220 | 351 2830 | 1373 2840 | 543 2860 | 407 2910 | 418 3010 | 352 3020 | 520 3060 | 61 3130 | 117 3140 | 1 3150 | 752 3160 | 811 3170 | 818 3180 | 1064 3190 | 640 3200 | 998 3210 | 1512 3220 | 595 3230 | 374 3240 | 514 3250 | 13 3260 | 132 3270 | 614 4010 | 413 4020 | 330 4040 | 728 4050 | 778 4060 | 476 4070 | 534 4310 | 759 4320 | 424 4330 | 549 4360 | 366 4370 | 334 4380 | 519 4410 | 839 4420 | 183 4421 | 590 4430 | 859 4450 | 205 4470 | 861 4480 | 766 4490 | 169 4500 | 792 4510 | 2116 4520 | 2954 4530 | 2142 4531 | 217 4540 | 2273 4550 | 2765 4560 | 2609 4570 | 2512 4580 | 1530 4590 | 1987 4600 | 308 4610 | 2726 4620 | 2698 4630 | 2813 4640 | 1733 4650 | 2655 4660 | 2139 4661 | 65 4670 | 2607 4680 | 1729 4690 | 2587 4700 | 2101 4710 | 1830 4720 | 1321 4730 | 1258 4740 | 1506 4750 | 1391 4751 | 640 4760 | 1517 4770 | 2286 4780 | 1353 4790 | 1209 4800 | 2414 4810 | 770 4820 | 1115 4830 | 1587 4840 | 983 4841 | 707 4850 | 1297 4860 | 375 4870 | 1440 4880 | 456 4881 | 742 4890 | 210 4891 | 45 4900 | 2 4910 | 1245 4920 | 414 4930 | 1130 4940 | 1268 4950 | 949 4960 | 836 4970 | 1008 4980 | 1239 5510 | 477 5520 | 380 5530 | 448 5540 | 470 5550 | 352 5560 | 148 5570 | 213 5580 | 109 5590 | 55 6010 | 246 6020 | 185 9180 | 1 (Not all the above vessels are active or belong to me:) ) The distribution on the report_date has no probabilistic significance since each report_date usually corresponds to one row. So, dynacom=# SELECT count(*) from noon; count -------- 108795 (1 row) dynacom=# Now for the specific query the counts have as follows: dynacom=# select count(*) from noon where v_code='4500'; count ------- 792 (1 row) dynacom=# select count(*) from noon where rotation='NOON '; count ------- 50874 (1 row) dynacom=# select count(*) from noon where report_date between '2002-01-07' and '2003-01-07'; count ------- 7690 (1 row) dynacom=# > Given an estimated cost of around 4 for the first scan, my guess would be > that it's not expecting alot of rows between 2002-01-07 and 2003-01-07 > which would make that a reasonable plan. > As we see the rows returned for v_code='4500' (792) are much fewer than the rows returned for the dates between '2002-01-07' and '2003-01-07' (7690). Is there a way to provide you with more information? And i must note that the two databases were worked on after a fresh createdb on both systems (and as i told they are identical). But, for some reason the 7.2.3 *always* finds the best index to use :) > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-10-8981112 fax: +30-10-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
Attachment
pgsql-general by date: