Re: Hash join on int takes 8..114 seconds - Mailing list pgsql-performance
From | Andrus |
---|---|
Subject | Re: Hash join on int takes 8..114 seconds |
Date | |
Msg-id | AD6DEEFF0CE54A3982082A2CA7C2F722@andrusnotebook Whole thread Raw |
In response to | Re: Hash join on int takes 8..114 seconds (PFC <lists@peufeu.com>) |
Responses |
Re: Hash join on int takes 8..114 seconds
Re: Hash join on int takes 8..114 seconds |
List | pgsql-performance |
> You could try writing a plpgsql function which would generate the data > set. > Or you could use your existing data set. Creating 3.5 mln rows using stored proc is probably slow. Probably it would be better and faster to use some random() and generate_series() trick. In this case others can try it and dataset generation is faster. > By the way, a simple way to de-bloat your big table without blocking > would be this : > > - stop all inserts and updates > - begin > - create table new like old table > - insert into new select * from old (order by perhaps) > - create indexes > - rename new into old > - commit > > If this is just a reporting database where you insert a batch of new data > every day, for instance, that's very easy to do. If it's OLTP, then, no. Those are orders and order_products tables. I ran vacuum full analyze verbose last night. Now database has 4832 MB size, including 1 GB pg_shdepend bloated indexes. I added max_fsm_pages=150000 and re-booted. Query below and other queries are still too slow set search_path to firma2,public; explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' "Aggregate (cost=181795.13..181795.14 rows=1 width=0) (actual time=23678.265..23678.268 rows=1 loops=1)" " -> Nested Loop (cost=73999.44..181733.74 rows=24555 width=0) (actual time=18459.230..23598.956 rows=21476 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=24) (actual time=0.134..0.145 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=73999.44..181482.18 rows=24555 width=24) (actual time=18459.076..23441.098 rows=21476 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4082.88..101779.03 rows=270252 width=28) (actual time=9337.782..12720.365 rows=278182 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4082.88 rows=270252 width=0) (actual time=9330.634..9330.634 rows=278183 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=69195.13..69195.13 rows=112573 width=4) (actual time=8894.465..8894.465 rows=109890 loops=1)" " -> Bitmap Heap Scan on dok (cost=1492.00..69195.13 rows=112573 width=4) (actual time=1618.763..8404.847 rows=109890 loops=1)" " Recheck Cond: (kuupaev >= '2008-09-01'::date)" " -> Bitmap Index Scan on dok_kuupaev_idx (cost=0.00..1492.00 rows=112573 width=0) (actual time=1612.177..1612.177 rows=110484 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" "Total runtime: 23678.790 ms" Here is a list of untried recommendations from this thread: 1. CLUSTER rid ON rid_toode_pkey ; CLUSTER dok ON dok_kuupaev_idx - In 8.1.4 provided form of CLUSTER causes syntax error, no idea what syntax to use. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 2. Change CHAR(20) product index to int index by adding update trigger. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 3. Denormalization of sale date to order_producs table by adding update trigger. Risky to try in prod server. Requires creating randomly distributed product_id testcase to measure difference. 4. Check on the performance of the RAID: Does it leverage NCQ appropriately when running queries in parallel ? No idea how. 5. Materialized views. I need date granularity so it is possible to sum only one days sales. http://www.pgcon.org/2008/schedule/events/69.en.html Seems to be major appl re-write, no idea how. Appoaches which probably does not change perfomance: 6. Upgrade to 8.4 or to 8.3.5 7. run server on standalone mode and recover 1 GB pg_shdepend bloated index. 8. tune some conf file parameters: > work_mem = 512 I'd consider increasing this value a little - 0.5 MB seems too low to me (but not necessarily). > effective_cache_size= 70000 Well, your server has 2GB of RAM and usually it's recommended to set this value to about 60-70% of your RAM, so using 540MB (25%) seems quite low. Data size is nearly the same as RAM size. It is unpleasant surprise that queries take so long time. What should I do next? Andrus. 1 40926 firma2.rid 1737 MB 2 40595 firma2.dok 1632 MB 3 1214 pg_catalog.pg_shdepend 1235 MB 4 1232 pg_catalog.pg_shdepend_depender_index 795 MB 7 1233 pg_catalog.pg_shdepend_reference_index 439 MB 8 44299 firma2.rid_toode_idx 298 MB 9 44286 firma2.dok_tasudok_idx 245 MB 10 19103791 firma2.rid_toode_pattern_idx 202 MB 11 44283 firma2.dok_klient_idx 160 MB 12 44298 firma2.rid_inpdokumnr_idx 148 MB 13 44297 firma2.rid_dokumnr_idx 132 MB 14 43573 firma2.rid_pkey 130 MB 17 40556 pg_toast.pg_toast_40552 112 MB 18 44288 firma2.dok_tasumata_idx 103 MB 19 44289 firma2.dok_tellimus_idx 101 MB 20 44284 firma2.dok_krdokumnr_idx 101 MB 21 44285 firma2.dok_kuupaev_idx 94 MB 22 19076304 firma2.rid_rtellimus_idx 90 MB 24 44282 firma2.dok_dokumnr_idx 74 MB 25 43479 firma2.dok_pkey 74 MB 26 18663923 firma2.dok_yksus_pattern_idx 65 MB 27 18801591 firma2.dok_sihtyksus_pattern_idx 64 MB 32 18774881 firma2.dok_doktyyp 47 MB output from vacuum full: INFO: free space map contains 14353 pages in 314 relations DETAIL: A total of 20000 page slots are in use (including overhead). 89664 page slots are required to track all free space. Current limits are: 20000 page slots, 1000 relations, using 182 KB. NOTICE: number of page slots needed (89664) exceeds max_fsm_pages (20000) HINT: Consider increasing the configuration parameter "max_fsm_pages" to a value over 89664. Query returned successfully with no result in 10513335 ms.
pgsql-performance by date: