Inefficient queryplan for query with intersectable subselects/joins - Mailing list pgsql-performance
From | Arjen van der Meijden |
---|---|
Subject | Inefficient queryplan for query with intersectable subselects/joins |
Date | |
Msg-id | 430EF844.70905@tweakers.net Whole thread Raw |
Responses |
Re: Inefficient queryplan for query with intersectable
|
List | pgsql-performance |
Hi list, I'm writing an application that will aggregate records with a few million records into averages/sums/minimums etc grouped per day. Clients can add filters and do lots of customization on what they want to see. And I've to translate that to one or more queries. Basically, I append each filter as either an extra and-in-the-where or joined with the clauses as ON-clause. The application now uses 8.1devel but I think the basic plans are similar to 8.0. At least for this first query. I noticed a query taking over 25 seconds to execute: SELECT "make a timestamp" grouper, chart_2.Prijs as field_2_0 FROM pwprijs as chart_2 JOIN pwprodukten t_0 ON chart_2.ProduktID = t_0.ID AND t_0.Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545) JOIN pwprijs t_1 ON chart_2.ProduktID = t_1.ProduktID AND t_1.LeverancierID = 938 AND t_1.recordtimestamp >= "last timestamp" WHERE chart_2.Prijs > 0 It yields quite a long plan, so I've send that as an attachment along. Basically it combines two tables against an original to fetch "all prices (of all suppliers) for products of a certain category that are sold by a certain supplier". I was wondering how rewriting it to subselects would improve performance, but that wasn't a very clear winner. It shaved of about 5 seconds. So I took the subselects and used INTERSECT to unite them and have only one IN-clause in the query. That made it go down to around 13 seconds. I noticed it was doing a seq scan on the largest table to get the "Prijs > 0"-condition. But since there are only 947 of the 7692207 with prijs = 0 and none with < 0, it shouldn't be the main thing to look for. Dropping the clause made a minor improvement in performance for the queries. But disabling sequential scans allowed an improvement to only 660 ms compared to the 13 seconds earlier! Row-estimates seem to be quite a bit off, so I already set the statistics target to 1000 and re-analyzed. Btw, adding the prijs-clause again would make it choose another index and thus resulted in much longer operation. The final query, only taking 650ms, would be: SELECT "make a timestamp" as grouper, chart_2.Prijs as field_2_0 FROM pwprijs as chart_2 WHERE chart_2.ProduktID IN (SELECT ID FROM pwprodukten WHERE Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545) INTERSECT SELECT produktid FROM pwprijs WHERE LeverancierID = 938 AND recordtimestamp >= "last timestamp") So I'm wondering: how can I make postgres decide to use the (correct) index without having to disable seq scans and how can I still add the prijs-clause without dropping the index for it (since it should be used for other filters). And for ease of use in my application I'd prefer to use the first query or the version with two seperate IN-clauses. Is that possible? I left all the configuration-stuff to the defaults since changing values didn't seem to impact much. Apart from the buffers and effective cache, increasing those made the performance worse. Best regards, Arjen Table "public.pwprijs" Column | Type | Modifiers -----------------+---------+----------- produktid | integer | not null leverancierid | integer | not null prijs | real | not null Time | bigint | not null recordtimestamp | bigint | not null Indexes: "pwprijs_levid_idx" btree (leverancierid), tablespace "raptor" "pwprijs_levid_pid_idx" btree (leverancierid, produktid), tablespace "raptor" "pwprijs_levid_rects_idx" btree (leverancierid, recordtimestamp), tablespace "raptor" "pwprijs_produktid_timestamp_idx" btree (produktid, recordtimestamp), tablespace "raptor" "pwprijs_rec_levid_pid" btree (recordtimestamp, leverancierid, produktid), tablespace "raptor" "pwprijs_recordtimestamp_idx" btree (recordtimestamp), tablespace "raptor" "pwprijs_recordtimestamp_produktid_prijs_idx" btree (recordtimestamp, produktid, prijs), tablespace "raptor" Tablespace: "raptor" Table "public.pwprodukten" Column | Type | Modifiers -----------------+-----------------------+------------------------------ id | integer | not null naam | character varying(80) | not null cat2 | smallint | not null recordtimestamp | bigint | not null Indexes: "pwprodukten_pkey" PRIMARY KEY, btree (id) "pwprodukten_cat_idx" btree (cat2) SELECT CAST('epoch'::timestamp + (chart_2.RecordTimestamp - (chart_2.RecordTimestamp % 86400)) * interval '1 second' as date)as grouper, chart_2.Prijs as field_2_0 FROM pwprijs as chart_2 JOIN pwprodukten t_0 ON chart_2.ProduktID = t_0.ID AND t_0.Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545) JOIN pwprijs t_1 ON chart_2.ProduktID = t_1.ProduktID AND t_1.LeverancierID = 938 AND t_1.recordtimestamp >= (SELECT max_date - 60 FROM last_dates WHERE table_name = 'pricetracker') WHERE chart_2.Prijs > 0 Hash Join (cost=72062.34..940100.37 rows=22901407 width=12) (actual time=746.520..29655.748 rows=58065 loops=1) Hash Cond: ("outer".produktid = "inner".id) InitPlan -> Seq Scan on last_dates (cost=0.00..1.06 rows=1 width=8) (actual time=0.041..0.045 rows=1 loops=1) Filter: ((table_name)::text = 'pricetracker'::text) -> Seq Scan on pwprijs chart_2 (cost=0.00..156721.59 rows=7691800 width=16) (actual time=8.457..14713.939 rows=7691260loops=1) Filter: (prijs > 0::double precision) -> Hash (cost=71769.47..71769.47 rows=42319 width=8) (actual time=606.996..606.996 rows=103 loops=1) -> Hash Join (cost=4329.24..71769.47 rows=42319 width=8) (actual time=599.771..606.782 rows=103 loops=1) Hash Cond: ("outer".produktid = "inner".id) -> Bitmap Heap Scan on pwprijs t_1 (cost=787.84..65786.28 rows=84640 width=4) (actual time=36.049..40.618rows=4405 loops=1) Recheck Cond: ((leverancierid = 938) AND (recordtimestamp >= $1)) -> Bitmap Index Scan on pwprijs_levid_rects_idx (cost=0.00..787.84 rows=84640 width=0) (actual time=36.004..36.004rows=4405 loops=1) Index Cond: ((leverancierid = 938) AND (recordtimestamp >= $1)) -> Hash (cost=3399.01..3399.01 rows=22156 width=4) (actual time=561.313..561.313 rows=1458 loops=1) -> Seq Scan on pwprodukten t_0 (cost=10.45..3399.01 rows=22156 width=4) (actual time=60.957..559.595rows=1458 loops=1) Filter: (subplan) SubPlan -> Materialize (cost=10.45..10.53 rows=8 width=2) (actual time=0.002..0.006 rows=8 loops=44313) -> Unique (cost=10.40..10.44 rows=8 width=2) (actual time=44.831..44.846 rows=8 loops=1) -> Sort (cost=10.40..10.42 rows=8 width=2) (actual time=44.829..44.834 rows=8loops=1) Sort Key: "?column?" -> Append (cost=0.00..10.28 rows=8 width=2) (actual time=0.006..44.754 rows=8loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..0.02 rows=1 width=0) (actualtime=0.004..0.006 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003rows=1 loops=1) -> Subquery Scan "*SELECT* 2" (cost=2.02..10.26 rows=7 width=2) (actualtime=44.723..44.738 rows=7 loops=1) -> Bitmap Heap Scan on cat (cost=2.02..10.19 rows=7 width=2)(actual time=44.719..44.726 rows=7 loops=1) Recheck Cond: (parentid = 545) -> Bitmap Index Scan on cat_parentid_idx (cost=0.00..2.02rows=7 width=0) (actual time=44.704..44.704 rows=7 loops=1) Index Cond: (parentid = 545) Total runtime: 29688.736 ms SELECT CAST('epoch'::timestamp + (chart_2.RecordTimestamp - (chart_2.RecordTimestamp % 86400)) * interval '1 second' as date)as grouper, chart_2.Prijs as field_2_0 FROM pwprijs as chart_2 WHERE chart_2.ProduktID IN (SELECT ID FROM pwprodukten WHERE Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545)) AND chart_2.ProduktID IN (SELECT produktid FROM pwprijs WHERE LeverancierID = 938 AND recordtimestamp >= (SELECT max_date - 60 FROM last_dates WHERE table_name = 'pricetracker')) AND chart_2.Prijs > 0 Nested Loop IN Join (cost=2850.25..230556.55 rows=1699 width=12) (actual time=482.862..24419.848 rows=58065 loops=1) InitPlan -> Seq Scan on last_dates (cost=0.00..1.06 rows=1 width=8) (actual time=0.022..0.025 rows=1 loops=1) Filter: ((table_name)::text = 'pricetracker'::text) -> Hash IN Join (cost=2849.18..200109.24 rows=207947 width=20) (actual time=360.511..15450.592 rows=190487 loops=1) Hash Cond: ("outer".produktid = "inner".id) -> Seq Scan on pwprijs chart_2 (cost=0.00..156721.59 rows=7691800 width=16) (actual time=11.587..10137.642 rows=7691260loops=1) Filter: (prijs > 0::double precision) -> Hash (cost=2846.19..2846.19 rows=1198 width=4) (actual time=244.576..244.576 rows=1458 loops=1) -> Nested Loop (cost=12.93..2846.19 rows=1198 width=4) (actual time=48.646..243.239 rows=1458 loops=1) -> Unique (cost=10.40..10.44 rows=8 width=2) (actual time=19.552..19.580 rows=8 loops=1) -> Sort (cost=10.40..10.42 rows=8 width=2) (actual time=19.550..19.559 rows=8 loops=1) Sort Key: "?column?" -> Append (cost=0.00..10.28 rows=8 width=2) (actual time=0.005..19.532 rows=8 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..0.02 rows=1 width=0) (actual time=0.005..0.007rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.003 rows=1loops=1) -> Subquery Scan "*SELECT* 2" (cost=2.02..10.26 rows=7 width=2) (actual time=19.501..19.518rows=7 loops=1) -> Bitmap Heap Scan on cat (cost=2.02..10.19 rows=7 width=2) (actual time=19.499..19.506rows=7 loops=1) Recheck Cond: (parentid = 545) -> Bitmap Index Scan on cat_parentid_idx (cost=0.00..2.02 rows=7 width=0)(actual time=19.493..19.493 rows=7 loops=1) Index Cond: (parentid = 545) -> Bitmap Heap Scan on pwprodukten (cost=2.52..352.58 rows=150 width=6) (actual time=3.717..27.732rows=182 loops=8) Recheck Cond: (pwprodukten.cat2 = "outer"."?column?") -> Bitmap Index Scan on pwprodukten_cat_idx (cost=0.00..2.52 rows=150 width=0) (actual time=1.510..1.510rows=182 loops=8) Index Cond: (pwprodukten.cat2 = "outer"."?column?") -> Index Scan using pwprijs_levid_pid_idx on pwprijs (cost=0.00..34.16 rows=3 width=4) (actual time=0.044..0.044 rows=0loops=190487) Index Cond: ((pwprijs.leverancierid = 938) AND ("outer".produktid = pwprijs.produktid)) Filter: (recordtimestamp >= $0) Total runtime: 24466.560 ms SELECT CAST('epoch'::timestamp + (chart_2.RecordTimestamp - (chart_2.RecordTimestamp % 86400)) * interval '1 second' as date)as grouper, chart_2.Prijs as field_2_0 FROM pwprijs as chart_2 WHERE chart_2.ProduktID IN (SELECT ID FROM pwprodukten WHERE Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545) INTERSECT SELECT produktid FROM pwprijs WHERE LeverancierID = 938 AND recordtimestamp >= (SELECT max_date - 60 FROM last_dates WHERE table_name = 'pricetracker')) AND chart_2.Prijs > 0 Hash Join (cost=77791.44..457846.83 rows=2010732 width=12) (actual time=409.838..13202.566 rows=58065 loops=1) Hash Cond: ("outer".produktid = "inner".id) -> Seq Scan on pwprijs chart_2 (cost=0.00..156721.59 rows=7691800 width=16) (actual time=7.896..8253.585 rows=7691260loops=1) Filter: (prijs > 0::double precision) -> Hash (cost=77769.98..77769.98 rows=8584 width=4) (actual time=325.008..325.008 rows=103 loops=1) -> SetOp Intersect (cost=77254.95..77684.14 rows=8584 width=4) (actual time=317.384..324.932 rows=103 loops=1) -> Sort (cost=77254.95..77469.55 rows=85838 width=4) (actual time=317.361..320.519 rows=5863 loops=1) Sort Key: id -> Append (cost=12.93..69491.92 rows=85838 width=4) (actual time=38.765..308.837 rows=5863 loops=1) -> Subquery Scan "*SELECT* 1" (cost=12.93..2858.17 rows=1198 width=4) (actual time=38.763..256.795rows=1458 loops=1) -> Nested Loop (cost=12.93..2846.19 rows=1198 width=4) (actual time=38.761..254.989 rows=1458loops=1) -> Unique (cost=10.40..10.44 rows=8 width=2) (actual time=8.994..9.027 rows=8 loops=1) -> Sort (cost=10.40..10.42 rows=8 width=2) (actual time=8.993..9.000 rows=8loops=1) Sort Key: "?column?" -> Append (cost=0.00..10.28 rows=8 width=2) (actual time=0.003..8.978rows=8 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..0.02 rows=1 width=0)(actual time=0.003..0.005 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001rows=1 loops=1) -> Subquery Scan "*SELECT* 2" (cost=2.02..10.26 rows=7 width=2)(actual time=8.949..8.963 rows=7 loops=1) -> Bitmap Heap Scan on cat (cost=2.02..10.19 rows=7 width=2)(actual time=8.945..8.952 rows=7 loops=1) Recheck Cond: (parentid = 545) -> Bitmap Index Scan on cat_parentid_idx (cost=0.00..2.02rows=7 width=0) (actual time=8.940..8.940 rows=7 loops=1) Index Cond: (parentid = 545) -> Bitmap Heap Scan on pwprodukten (cost=2.52..352.58 rows=150 width=6) (actualtime=3.847..30.522 rows=182 loops=8) Recheck Cond: (pwprodukten.cat2 = "outer"."?column?") -> Bitmap Index Scan on pwprodukten_cat_idx (cost=0.00..2.52 rows=150 width=0)(actual time=1.583..1.583 rows=182 loops=8) Index Cond: (pwprodukten.cat2 = "outer"."?column?") -> Subquery Scan "*SELECT* 2" (cost=788.90..66633.75 rows=84640 width=4) (actual time=36.653..46.266rows=4405 loops=1) -> Bitmap Heap Scan on pwprijs (cost=788.90..65787.35 rows=84640 width=4) (actual time=36.651..41.326rows=4405 loops=1) Recheck Cond: ((leverancierid = 938) AND (recordtimestamp >= $0)) InitPlan -> Seq Scan on last_dates (cost=0.00..1.06 rows=1 width=8) (actual time=0.025..0.029rows=1 loops=1) Filter: ((table_name)::text = 'pricetracker'::text) -> Bitmap Index Scan on pwprijs_levid_rects_idx (cost=0.00..787.84 rows=84640 width=0)(actual time=36.614..36.614 rows=4405 loops=1) Index Cond: ((leverancierid = 938) AND (recordtimestamp >= $0)) Total runtime: 13234.570 ms set enable_seqscan = false; without prijs-clause: Nested Loop (cost=77299.32..7858118.21 rows=2013780 width=12) (actual time=33.964..634.359 rows=58065 loops=1) -> SetOp Intersect (cost=77296.51..77727.66 rows=8623 width=4) (actual time=33.632..42.356 rows=103 loops=1) -> Sort (cost=77296.51..77512.08 rows=86231 width=4) (actual time=33.616..36.979 rows=5863 loops=1) Sort Key: id -> Append (cost=12.93..69495.92 rows=86231 width=4) (actual time=0.214..25.380 rows=5863 loops=1) -> Subquery Scan "*SELECT* 1" (cost=12.93..2858.17 rows=1198 width=4) (actual time=0.213..8.779 rows=1458loops=1) -> Nested Loop (cost=12.93..2846.19 rows=1198 width=4) (actual time=0.211..7.065 rows=1458 loops=1) -> Unique (cost=10.40..10.44 rows=8 width=2) (actual time=0.095..0.122 rows=8 loops=1) -> Sort (cost=10.40..10.42 rows=8 width=2) (actual time=0.093..0.099 rows=8 loops=1) Sort Key: "?column?" -> Append (cost=0.00..10.28 rows=8 width=2) (actual time=0.003..0.079 rows=8loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..0.02 rows=1 width=0) (actualtime=0.003..0.005 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001rows=1 loops=1) -> Subquery Scan "*SELECT* 2" (cost=2.02..10.26 rows=7 width=2) (actualtime=0.050..0.065 rows=7 loops=1) -> Bitmap Heap Scan on cat (cost=2.02..10.19 rows=7 width=2) (actualtime=0.048..0.054 rows=7 loops=1) Recheck Cond: (parentid = 545) -> Bitmap Index Scan on cat_parentid_idx (cost=0.00..2.02rows=7 width=0) (actual time=0.040..0.040 rows=7 loops=1) Index Cond: (parentid = 545) -> Bitmap Heap Scan on pwprodukten (cost=2.52..352.58 rows=150 width=6) (actual time=0.076..0.658rows=182 loops=8) Recheck Cond: (pwprodukten.cat2 = "outer"."?column?") -> Bitmap Index Scan on pwprodukten_cat_idx (cost=0.00..2.52 rows=150 width=0) (actualtime=0.056..0.056 rows=182 loops=8) Index Cond: (pwprodukten.cat2 = "outer"."?column?") -> Subquery Scan "*SELECT* 2" (cost=794.53..66637.75 rows=85033 width=4) (actual time=1.064..10.785rows=4405 loops=1) -> Bitmap Heap Scan on pwprijs (cost=794.53..65787.42 rows=85033 width=4) (actual time=1.062..5.792rows=4405 loops=1) Recheck Cond: ((leverancierid = 938) AND (recordtimestamp >= $0)) InitPlan -> Index Scan using last_dates_pkey on last_dates (cost=0.00..3.33 rows=1 width=8) (actualtime=0.042..0.044 rows=1 loops=1) Index Cond: ((table_name)::text = 'pricetracker'::text) -> Bitmap Index Scan on pwprijs_levid_rects_idx (cost=0.00..791.20 rows=85033 width=0)(actual time=1.027..1.027 rows=4405 loops=1) Index Cond: ((leverancierid = 938) AND (recordtimestamp >= $0)) -> Bitmap Heap Scan on pwprijs chart_2 (cost=2.82..895.85 rows=234 width=16) (actual time=0.363..3.980 rows=564 loops=103) Recheck Cond: (chart_2.produktid = "outer".id) -> Bitmap Index Scan on pwprijs_produktid_timestamp_idx (cost=0.00..2.82 rows=234 width=0) (actual time=0.206..0.206rows=564 loops=103) Index Cond: (chart_2.produktid = "outer".id) Total runtime: 665.335 ms with prijs-clause: Hash Join (cost=134379.17..514503.02 rows=2013674 width=12) (actual time=3089.037..15063.409 rows=58065 loops=1) Hash Cond: ("outer".produktid = "inner".id) -> Bitmap Heap Scan on pwprijs chart_2 (cost=56543.72..213260.22 rows=7691800 width=16) (actual time=2972.869..10299.608rows=7691260 loops=1) Recheck Cond: (prijs > 0::double precision) -> Bitmap Index Scan on pwprijs_recordtimestamp_produktid_prijs_idx (cost=0.00..56543.72 rows=7691800 width=0)(actual time=2962.018..2962.018 rows=7691260 loops=1) Index Cond: (prijs > 0::double precision) -> Hash (cost=77813.89..77813.89 rows=8623 width=4) (actual time=41.373..41.373 rows=103 loops=1) -> SetOp Intersect (cost=77296.51..77727.66 rows=8623 width=4) (actual time=33.770..41.297 rows=103 loops=1) -> Sort (cost=77296.51..77512.08 rows=86231 width=4) (actual time=33.756..36.971 rows=5863 loops=1) Sort Key: id -> Append (cost=12.93..69495.92 rows=86231 width=4) (actual time=0.247..25.379 rows=5863 loops=1) -> Subquery Scan "*SELECT* 1" (cost=12.93..2858.17 rows=1198 width=4) (actual time=0.246..8.816rows=1458 loops=1) -> Nested Loop (cost=12.93..2846.19 rows=1198 width=4) (actual time=0.243..7.092 rows=1458loops=1) -> Unique (cost=10.40..10.44 rows=8 width=2) (actual time=0.110..0.135 rows=8 loops=1) -> Sort (cost=10.40..10.42 rows=8 width=2) (actual time=0.108..0.115 rows=8loops=1) Sort Key: "?column?" -> Append (cost=0.00..10.28 rows=8 width=2) (actual time=0.004..0.092rows=8 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..0.02 rows=1 width=0)(actual time=0.004..0.006 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003rows=1 loops=1) -> Subquery Scan "*SELECT* 2" (cost=2.02..10.26 rows=7 width=2)(actual time=0.062..0.077 rows=7 loops=1) -> Bitmap Heap Scan on cat (cost=2.02..10.19 rows=7 width=2)(actual time=0.059..0.066 rows=7 loops=1) Recheck Cond: (parentid = 545) -> Bitmap Index Scan on cat_parentid_idx (cost=0.00..2.02rows=7 width=0) (actual time=0.042..0.042 rows=7 loops=1) Index Cond: (parentid = 545) -> Bitmap Heap Scan on pwprodukten (cost=2.52..352.58 rows=150 width=6) (actualtime=0.078..0.658 rows=182 loops=8) Recheck Cond: (pwprodukten.cat2 = "outer"."?column?") -> Bitmap Index Scan on pwprodukten_cat_idx (cost=0.00..2.52 rows=150 width=0)(actual time=0.058..0.058 rows=182 loops=8) Index Cond: (pwprodukten.cat2 = "outer"."?column?") -> Subquery Scan "*SELECT* 2" (cost=794.53..66637.75 rows=85033 width=4) (actual time=1.075..10.724rows=4405 loops=1) -> Bitmap Heap Scan on pwprijs (cost=794.53..65787.42 rows=85033 width=4) (actual time=1.072..5.846rows=4405 loops=1) Recheck Cond: ((leverancierid = 938) AND (recordtimestamp >= $0)) InitPlan -> Index Scan using last_dates_pkey on last_dates (cost=0.00..3.33 rows=1 width=8)(actual time=0.045..0.047 rows=1 loops=1) Index Cond: ((table_name)::text = 'pricetracker'::text) -> Bitmap Index Scan on pwprijs_levid_rects_idx (cost=0.00..791.20 rows=85033 width=0)(actual time=1.037..1.037 rows=4405 loops=1) Index Cond: ((leverancierid = 938) AND (recordtimestamp >= $0)) Total runtime: 15095.402 ms
pgsql-performance by date: