Re: Hash join on int takes 8..114 seconds - Mailing list pgsql-performance
From | PFC |
---|---|
Subject | Re: Hash join on int takes 8..114 seconds |
Date | |
Msg-id | op.ukw669uvcigqcu@soyouz Whole thread Raw |
In response to | Re: Hash join on int takes 8..114 seconds ("Andrus" <kobruleht2@hot.ee>) |
Responses |
Re: Hash join on int takes 8..114 seconds
|
List | pgsql-performance |
OK so vmstat says you are IO-bound, this seems logical if the same plan has widely varying timings... Let's look at the usual suspects : - how many dead rows in your tables ? are your tables data, or bloat ? (check vacuum verbose, etc) - what's the size of the dataset relative to the RAM ? Now let's look more closely at the query : explain analyze SELECT sum(1) FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) LEFT JOIN artliik using(grupp,liik) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' OK, so artliik is a very small table (84 rows) : Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=20.104..29.845 rows=84 loops=1) I presume doing the query without artliik changes nothing to the runtime, yes ? Let's look at the main part of the query : FROM dok JOIN rid USING (dokumnr) JOIN toode USING (toode) WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01' Postgres's plan is logical. It starts by joining rid and dok since your WHERE is on those : -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=100386.921..114037.986 rows=20588 loops=1)" Hash Cond: ("outer".dokumnr = "inner".dokumnr)" -> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=11119.932..76225.918 rows=277294 loops=1)" Recheck Cond: (toode = 'X05'::bpchar)" -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=11105.807..11105.807 rows=280599 loops=1)" Index Cond: (toode = 'X05'::bpchar)" -> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=35082.427..35082.427 rows=105202 loops=1)" -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=42.110..34586.331 rows=105202 loops=1)" Index Cond: (kuupaev >= '2008-09-01'::date)" Your problem here is that, no matter what, postgres will have to examine - all rows where dok.kuupaev>='2008-09-01', - and all rows where rid.toode = 'X05'. If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have to scan many, many more rows. If you perform this query often you could CLUSTER rid on (toode) and dok on (kuupaev), but this can screw other queries. What is the meaning of the columns ? To make this type of query faster I would tend to think about : - materialized views - denormalization (ie adding a column in one of your tables and a multicolumn index) - materialized summary tables (ie. summary of sales for last month, for instance) "Aggregate (cost=234278.53..234278.54 rows=1 width=0) (actual time=114479.933..114479.936 rows=1 loops=1)" " -> Hash Left Join (cost=52111.20..234218.21 rows=24126 width=0) (actual time=100435.523..114403.293 rows=20588 loops=1)" " Hash Cond: (("outer".grupp = "inner".grupp) AND ("outer".liik = "inner".liik))" " -> Nested Loop (cost=52103.94..233735.35 rows=24126 width=19) (actual time=100405.258..114207.387 rows=20588 loops=1)" " -> Index Scan using toode_pkey on toode (cost=0.00..6.01 rows=1 width=43) (actual time=18.312..18.325 rows=1 loops=1)" " Index Cond: ('X05'::bpchar = toode)" " -> Hash Join (cost=52103.94..233488.08 rows=24126 width=24) (actual time=100386.921..114037.986 rows=20588 loops=1)" " Hash Cond: ("outer".dokumnr = "inner".dokumnr)" " -> Bitmap Heap Scan on rid (cost=4127.51..175020.84 rows=317003 width=28) (actual time=11119.932..76225.918 rows=277294 loops=1)" " Recheck Cond: (toode = 'X05'::bpchar)" " -> Bitmap Index Scan on rid_toode_idx (cost=0.00..4127.51 rows=317003 width=0) (actual time=11105.807..11105.807 rows=280599 loops=1)" " Index Cond: (toode = 'X05'::bpchar)" " -> Hash (cost=47376.82..47376.82 rows=93444 width=4) (actual time=35082.427..35082.427 rows=105202 loops=1)" " -> Index Scan using dok_kuupaev_idx on dok (cost=0.00..47376.82 rows=93444 width=4) (actual time=42.110..34586.331 rows=105202 loops=1)" " Index Cond: (kuupaev >= '2008-09-01'::date)" " -> Hash (cost=6.84..6.84 rows=84 width=19) (actual time=30.220..30.220 rows=84 loops=1)" " -> Seq Scan on artliik (cost=0.00..6.84 rows=84 width=19) (actual time=20.104..29.845 rows=84 loops=1)" "Total runtime: 114480.373 ms"
pgsql-performance by date: