BUG #4284: Optimizer chooses bad plan with LEFT join - Mailing list pgsql-bugs
From | David Rowley |
---|---|
Subject | BUG #4284: Optimizer chooses bad plan with LEFT join |
Date | |
Msg-id | 200807051711.m65HBPkd083426@wwwmaster.postgresql.org Whole thread Raw |
List | pgsql-bugs |
The following bug has been logged online: Bug reference: 4284 Logged by: David Rowley Email address: dgrowley@gmail.com PostgreSQL version: 8.3.3 Operating system: Windows XP and 2003 Description: Optimizer chooses bad plan with LEFT join Details: I was asked a few days ago to gather certain information from our database. The schema was never really designed for this query and itâs quite a bodge job, but it gets the intended results. I was quite surprised that the query I came up with took over 2 minutes to run, giving that the biggest of the tables is only around 30k rows. A quick look at the query plan showed me that the optimizer was choosing to perform a nested loop for the left join, probably since it thought there was only 1 row in that (derived) table. I have managed to re-create this and I have included a full re-creation script including create tables and inserts to generate some mocked up data. Iâve only tested on 8.3.3, I will test if required on other versions but hopefully the recreation script will stop the need. Iâve tuned the number of rows in the mocked up tables so the example query only takes around 10 seconds (on this computer). That should be enough time to show that itâs not the best plan. Let me know if you are unable to re-created and Iâll see what I can do to provide more information, (settings etc.) I can get the query to run more quickly in the meantime just by using a FULL OUTER JOIN and adding a WHERE clause to filter out the NULLs Anyway⦠less talk more facts you say⦠CREATE TABLE production ( despid SERIAL, productiondate DATE NOT NULL, lineid INT NOT NULL, partcode VARCHAR(20) NOT NULL, quantity INT NOT NULL, PRIMARY KEY(despid) ); CREATE TABLE production_line ( lineid INT PRIMARY KEY, linename VARCHAR(16) NOT NULL ); CREATE TABLE batches ( batchid SERIAL, batchcode VARCHAR(16) NOT NULL, lineid INT NOT NULL, partcode VARCHAR(20) NOT NULL, productiondate DATE NOT NULL, bestbefore DATE NOT NULL, PRIMARY KEY(batchid) ); /* Table to help create mock data */ CREATE TABLE parts ( partcode VARCHAR(20) NOT NULL, PRIMARY KEY(partcode) ); /* Create fake parts */ INSERT INTO parts (partcode) VALUES('PART1'),('PART2'),('PART3'),('PART4'),('PART5'),('PART6'),('PART7'); /* Table to help create mock data */ CREATE TABLE calendar ( date DATE, PRIMARY KEY(date) ); /* 200 rows here is ok for testing */ INSERT INTO calendar (date) SELECT '2008-06-30'::DATE + num * '1 day'::INTERVAL FROM (select generate_series(1,200) AS num) AS series; INSERT INTO production_line (lineid,linename) VALUES(1,'Line 1'),(2,'Line 2'),(3,'Line 3'); /* Populate production with the cartasian product of calendar, production_line and parts * We need a table big enough to get the optimizer not to nest loop. Or at least to produce * a query where a nested loop would be slower than a merge join. */ INSERT INTO production (productiondate,lineid,partcode,quantity) SELECT cal.date, l.lineid, p.partcode, CAST(RANDOM() * 1000 AS INT) AS qty FROM calendar AS cal, production_line AS l, parts AS p; /* Now to populate the batches table, cartasian product once again. */ INSERT INTO batches (batchcode,lineid,partcode,productiondate,bestbefore) SELECT 'ABC123', l.lineid, p.partcode, cal.date, cal.date + '100 days'::INTERVAL FROM calendar AS cal, production_line AS l, parts AS p; /* Ensure we have stats for these tables */ ANALYZE; EXPLAIN ANALYZE SELECT t1.productiondate, t3.linename, t1.partcode, t1.batchcode, t1.bestbefore, t4.quantity / t2.number_of_bbdates AS est_packs_in_batch, t2.number_of_bbdates AS number_of_batches FROM batches t1 INNER JOIN (SELECT productiondate, lineid, partcode, COUNT(DISTINCT bestbefore) AS number_of_bbdates FROM batches GROUP BY productiondate,lineid,partcode ) AS t2 ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid AND t1.partcode = t2.partcode INNER JOIN production_line t3 ON t1.lineid = t3.lineid LEFT OUTER JOIN (SELECT productiondate, lineid, partcode, SUM(quantity) AS quantity FROM production GROUP BY productiondate,partcode,lineid ) t4 ON t1.productiondate = t4.productiondate AND t1.lineid = t4.lineid AND t1.partcode = t4.partcode ; /* test=# SELECT VERSION(); version ----------------------------------------------------- PostgreSQL 8.3.3, compiled by Visual C++ build 1400 EXPLAIN ANALYZE output QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- ----- - Nested Loop Left Join (cost=501.52..638.58 rows=1 width=44) (actual time=155.552..40524.149 rows=4200 loops=1) Join Filter: ((t1.productiondate = production.productiondate) AND (t1.lineid = production.lineid) AND ((t1.partcode)::text = (production.partcode)::text)) -> Hash Join (cost=390.52..510.78 rows=1 width=40) (actual time=120.452..146.105 rows=4200 loops=1) Hash Cond: ((t1.productiondate = batches.productiondate) AND (t1.lineid = batches.lineid) AND ((t1.partcode)::text = (batches.partcode)::text)) -> Seq Scan on batches t1 (cost=0.00..73.00 rows=4200 width=25) (actual time=0.020..6.100 rows=4200 loops=1) -> Hash (cost=390.41..390.41 rows=6 width=85) (actual time=120.392..120.392 rows=4200 loops=1) -> Hash Join (cost=326.83..390.41 rows=6 width=85) (actual time=35.510..106.457 rows=4200 loops=1) Hash Cond: (batches.lineid = t3.lineid) -> GroupAggregate (cost=325.76..383.51 rows=420 width=18) (actual time=35.446..90.534 rows=4200 loops=1) -> Sort (cost=325.76..336.26 rows=4200 width=18) (actual time=35.402..41.801 rows=4200 loops=1) Sort Key: batches.productiondate, batches.lineid, batches.partcode Sort Method: quicksort Memory: 424kB -> Seq Scan on batches (cost=0.00..73.00 rows=4200 width=18) (actual time=0.011..7.317 rows=4200 loops=1) -> Hash (cost=1.03..1.03 rows=3 width=11) (actual time=0.035..0.035 rows=3 loops=1) -> Seq Scan on production_line t3 (cost=0.00..1.03 rows=3 width=11) (actual time=0.014..0.020 rows=3 loops=1) -> HashAggregate (cost=111.00..116.25 rows=420 width=18) (actual time=0.007..5.387 rows=4200 loops=4200) -> Seq Scan on production (cost=0.00..69.00 rows=4200 width=18) (actual time=0.024..5.880 rows=4200 loops=1) Total runtime: 40528.569 ms (18 rows) Time: 40532.405 ms EXPLAIN ANALYZE with SET enable_nestloop = off; QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------------------------------------- - Hash Left Join (cost=518.32..638.75 rows=1 width=44) (actual time=171.395..206.840 rows=4200 loops=1) Hash Cond: ((t1.productiondate = t4.productiondate) AND (t1.lineid = t4.lineid) AND ((t1.partcode)::text = (t4.partcode)::text)) -> Hash Join (cost=390.52..510.78 rows=1 width=40) (actual time=115.980..134.411 rows=4200 loops=1) Hash Cond: ((t1.productiondate = batches.productiondate) AND (t1.lineid = batches.lineid) AND ((t1.partcode)::text = (batches.partcode)::text)) -> Seq Scan on batches t1 (cost=0.00..73.00 rows=4200 width=25) (actual time=0.020..4.160 rows=4200 loops=1) -> Hash (cost=390.41..390.41 rows=6 width=85) (actual time=115.916..115.916 rows=4200 loops=1) -> Hash Join (cost=326.83..390.41 rows=6 width=85) (actual time=35.429..102.996 rows=4200 loops=1) Hash Cond: (batches.lineid = t3.lineid) -> GroupAggregate (cost=325.76..383.51 rows=420 width=18) (actual time=35.366..86.402 rows=4200 loops=1) -> Sort (cost=325.76..336.26 rows=4200 width=18) (actual time=35.316..40.902 rows=4200 loops=1) Sort Key: batches.productiondate, batches.lineid, batches.partcode Sort Method: quicksort Memory: 424kB -> Seq Scan on batches (cost=0.00..73.00 rows=4200 width=18) (actual time=0.011..7.131 rows=4200 loops=1) -> Hash (cost=1.03..1.03 rows=3 width=11) (actual time=0.035..0.035 rows=3 loops=1) -> Seq Scan on production_line t3 (cost=0.00..1.03 rows=3 width=11) (actual time=0.007..0.013 rows=3 loops=1) -> Hash (cost=120.45..120.45 rows=420 width=74) (actual time=55.356..55.356 rows=4200 loops=1) -> Subquery Scan t4 (cost=111.00..120.45 rows=420 width=74) (actual time=22.242..42.829 rows=4200 loops=1) -> HashAggregate (cost=111.00..116.25 rows=420 width=18) (actual time=22.236..31.771 rows=4200 loops=1) -> Seq Scan on production (cost=0.00..69.00 rows=4200 width=18) (actual time=0.024..5.811 rows=4200 loops=1) Total runtime: 211.055 ms (20 rows) Time: 214.725 ms */
pgsql-bugs by date: