Re: Hash join on int takes 8..114 seconds - Mailing list pgsql-performance
From | Andrus |
---|---|
Subject | Re: Hash join on int takes 8..114 seconds |
Date | |
Msg-id | 75A918DFCF25470FAF96B3FE3D25A38B@andrusnotebook Whole thread Raw |
In response to | Re: Hash join on int takes 8..114 seconds (PFC <lists@peufeu.com>) |
Responses |
Re: Hash join on int takes 8..114 seconds
Re: Hash join on int takes 8..114 seconds Re: Hash join on int takes 8..114 seconds |
List | pgsql-performance |
PFC, thank you. > OK so vmstat says you are IO-bound, this seems logical if the same plan > has widely varying timings... > > Let's look at the usual suspects : > > - how many dead rows in your tables ? are your tables data, or bloat ? > (check vacuum verbose, etc) set search_path to firma2,public; vacuum verbose dok; vacuum verbose rid INFO: vacuuming "firma2.dok" INFO: index "dok_pkey" now contains 1235086 row versions in 9454 pages DETAIL: 100 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.16s/0.38u sec elapsed 0.77 sec. INFO: index "dok_dokumnr_idx" now contains 1235086 row versions in 9454 pages DETAIL: 100 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.14s/0.40u sec elapsed 0.78 sec. INFO: index "dok_klient_idx" now contains 1235086 row versions in 18147 pages DETAIL: 887 index row versions were removed. 3265 index pages have been deleted, 3033 are currently reusable. CPU 0.36s/0.46u sec elapsed 31.87 sec. INFO: index "dok_krdokumnr_idx" now contains 1235086 row versions in 11387 pages DETAIL: 119436 index row versions were removed. 1716 index pages have been deleted, 1582 are currently reusable. CPU 0.47s/0.55u sec elapsed 63.38 sec. INFO: index "dok_kuupaev_idx" now contains 1235101 row versions in 10766 pages DETAIL: 119436 index row versions were removed. 659 index pages have been deleted, 625 are currently reusable. CPU 0.62s/0.53u sec elapsed 40.20 sec. INFO: index "dok_tasudok_idx" now contains 1235104 row versions in 31348 pages DETAIL: 119436 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.18s/1.08u sec elapsed 118.97 sec. INFO: index "dok_tasudok_unique_idx" now contains 99 row versions in 97 pages DETAIL: 98 index row versions were removed. 80 index pages have been deleted, 80 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.48 sec. INFO: index "dok_tasumata_idx" now contains 1235116 row versions in 11663 pages DETAIL: 119436 index row versions were removed. 5340 index pages have been deleted, 5131 are currently reusable. CPU 0.43s/0.56u sec elapsed 53.96 sec. INFO: index "dok_tellimus_idx" now contains 1235122 row versions in 11442 pages DETAIL: 119436 index row versions were removed. 1704 index pages have been deleted, 1569 are currently reusable. CPU 0.45s/0.59u sec elapsed 76.91 sec. INFO: index "dok_yksus_pattern_idx" now contains 1235143 row versions in 5549 pages DETAIL: 119436 index row versions were removed. 529 index pages have been deleted, 129 are currently reusable. CPU 0.19s/0.46u sec elapsed 2.72 sec. INFO: index "dok_doktyyp" now contains 1235143 row versions in 3899 pages DETAIL: 119436 index row versions were removed. 188 index pages have been deleted, 13 are currently reusable. CPU 0.14s/0.44u sec elapsed 1.40 sec. INFO: index "dok_sihtyksus_pattern_idx" now contains 1235143 row versions in 5353 pages DETAIL: 119436 index row versions were removed. 286 index pages have been deleted, 5 are currently reusable. CPU 0.13s/0.45u sec elapsed 3.01 sec. INFO: "dok": removed 119436 row versions in 13707 pages DETAIL: CPU 0.80s/0.37u sec elapsed 14.15 sec. INFO: "dok": found 119436 removable, 1235085 nonremovable row versions in 171641 pages DETAIL: 2 dead row versions cannot be removed yet. There were 1834279 unused item pointers. 0 pages are entirely empty. CPU 6.56s/6.88u sec elapsed 450.54 sec. INFO: vacuuming "pg_toast.pg_toast_40595" INFO: index "pg_toast_40595_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_40595": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "firma2.rid" INFO: index "rid_pkey" now contains 3275197 row versions in 13959 pages DETAIL: 38331 index row versions were removed. 262 index pages have been deleted, 262 are currently reusable. CPU 0.42s/1.05u sec elapsed 58.56 sec. INFO: index "rid_dokumnr_idx" now contains 3275200 row versions in 14125 pages DETAIL: 38331 index row versions were removed. 572 index pages have been deleted, 571 are currently reusable. CPU 0.49s/1.14u sec elapsed 71.57 sec. INFO: index "rid_inpdokumnr_idx" now contains 3275200 row versions in 15103 pages DETAIL: 38331 index row versions were removed. 579 index pages have been deleted, 579 are currently reusable. CPU 0.66s/1.03u sec elapsed 68.38 sec. INFO: index "rid_toode_idx" now contains 3275224 row versions in 31094 pages DETAIL: 38331 index row versions were removed. 2290 index pages have been deleted, 2290 are currently reusable. CPU 1.39s/1.58u sec elapsed 333.82 sec. INFO: index "rid_rtellimus_idx" now contains 3275230 row versions in 7390 pages DETAIL: 18591 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.18s/0.66u sec elapsed 1.78 sec. INFO: index "rid_toode_pattern_idx" now contains 3275230 row versions in 16310 pages DETAIL: 17800 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.44s/1.04u sec elapsed 6.55 sec. INFO: "rid": removed 38331 row versions in 3090 pages DETAIL: CPU 0.20s/0.10u sec elapsed 5.49 sec. INFO: "rid": found 38331 removable, 3275189 nonremovable row versions in 165282 pages DETAIL: 0 dead row versions cannot be removed yet. There were 1878923 unused item pointers. 0 pages are entirely empty. CPU 5.06s/7.27u sec elapsed 607.59 sec. Query returned successfully with no result in 1058319 ms. > - what's the size of the dataset relative to the RAM ? Db size is 7417 MB relevant table sizes in desc by size order: 1 40595 dok 2345 MB 2 1214 pg_shdepend 2259 MB 3 40926 rid 2057 MB 6 1232 pg_shdepend_depender_index 795 MB 7 1233 pg_shdepend_reference_index 438 MB 8 44286 dok_tasudok_idx 245 MB 9 44299 rid_toode_idx 243 MB 10 44283 dok_klient_idx 142 MB 11 19103791 rid_toode_pattern_idx 127 MB 14 44298 rid_inpdokumnr_idx 118 MB 15 44297 rid_dokumnr_idx 110 MB 16 43573 rid_pkey 109 MB 18 44288 dok_tasumata_idx 91 MB 19 44289 dok_tellimus_idx 89 MB 20 44284 dok_krdokumnr_idx 89 MB 21 44285 dok_kuupaev_idx 84 MB 23 43479 dok_pkey 74 MB 24 44282 dok_dokumnr_idx 74 MB 25 19076304 rid_rtellimus_idx 58 MB 26 18663923 dok_yksus_pattern_idx 43 MB 27 18801591 dok_sihtyksus_pattern_idx 42 MB 29 18774881 dok_doktyyp 30 MB 46 40967 toode 13 MB server is HP Proliant DL320 G3 http://h18000.www1.hp.com/products/quickspecs/12169_ca/12169_ca.HTML CPU is 2.93Ghz Celeron 256kb cache. Server has 2 GB RAM. It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ mirrored disks. > Now let's look more closely at the query : > > explain analyze > SELECT sum(1) > FROM dok JOIN rid USING (dokumnr) > JOIN toode USING (toode) > LEFT JOIN artliik using(grupp,liik) > WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' > > > I presume doing the query without artliik changes nothing to the runtime, > yes ? Yes. After removing artkliik from join I got response times 65 and 50 seconds, so this does not make difference. > Your problem here is that, no matter what, postgres will have to examine > - all rows where dok.kuupaev>='2008-09-01', > - and all rows where rid.toode = 'X05'. > If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have > to scan many, many more rows. Probably yes, since then it reads one year more sales data. > If you perform this query often you could CLUSTER rid on (toode) and dok > on (kuupaev), but this can screw other queries. Some reports are by sales date (dok.kuupaev) and customers. CLUSTER rid on (toode) slows them down. Also autovacuum cannot do clustering. > What is the meaning of the columns ? This is typical sales data: -- Receipt headers: DOK ( dokumnr INT SERIAL PRIMARY KEY, kuupaev DATE --- sales date ) -- Receipt details RID ( dokumnr INT, toode CHAR(20), -- item code CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr) REFERENCES dok (dokumnr), CONSTRAINT rid_toode_fkey FOREIGN KEY (toode) REFERENCES firma2.toode (toode) ) -- Products TOODE ( toode CHAR(20) PRIMARY KEY ) > To make this type of query faster I would tend to think about : > - denormalization (ie adding a column in one of your tables and a > multicolumn index) For this query it is possible to duplicate kuupaev column to rid table. However most of the this seems to go to scanning rid table, so I suspect that this will help. > - materialized views > - materialized summary tables (ie. summary of sales for last month, for > instance) There are about 1000 items and reports are different. Andrus.
pgsql-performance by date: