Re: Planner doesn't look at LIMIT? - Mailing list pgsql-performance
From | Dawid Kuroczko |
---|---|
Subject | Re: Planner doesn't look at LIMIT? |
Date | |
Msg-id | 758d5e7f05072209094156c984@mail.gmail.com Whole thread Raw |
In response to | Re: Planner doesn't look at LIMIT? (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
On 7/22/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dawid Kuroczko <qnex42@gmail.com> writes: > > qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; > > > Limit (cost=15912.20..15912.31 rows=1 width=272) > > -> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) > > This is quite strange. The nestloop plan definitely should be preferred > in the context of the LIMIT, considering that it has far lower estimated > cost. And it is preferred in simple tests for me. It seems there must > be something specific to your installation that's causing the planner to > go wrong. Can you develop a self-contained test case that behaves this > way for you? Why, certainly. I did test it also on Gentoo Linux PostgreSQL 8.0.1 (yeah, a bit older one), but the behaviour is the same. The test looks like this: -- First lets make a "small" lookup table -- 400000 rows. CREATE TABLE lookup ( lookup_id serial PRIMARY KEY, value integer NOT NULL ); INSERT INTO lookup (value) SELECT * FROM generate_series(1, 400000); VACUUM ANALYZE lookup; -- Then lets make a huge data table... CREATE TABLE huge_data ( huge_data_id serial PRIMARY KEY, lookup_id integer NOT NULL ); INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM lookup; INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 800 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 1 600 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 3 200 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 6 400 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 12 800 000 -- You may want to put ANALYZE and EXPLAIN between each of these -- steps. In my cases, at 12.8 mln rows PostgreSQL seems to go for hashjoin -- in each case. YMMV, so you may try to push it up to 1024 mln rows. INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 25 600 000 ANALYZE huge_data; EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1; My EXPLAIN FROM Linux (SMP P-III box), with PostgreSQL 8.0.1, during making this test case: qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------ Limit (cost=0.00..3.21 rows=1 width=12) -> Nested Loop (cost=0.00..19557596.04 rows=6094777 width=12) -> Seq Scan on huge_data (cost=0.00..95372.42 rows=6399942 width=8) -> Index Scan using lookup_pkey on lookup (cost=0.00..3.02 rows=1 width=8) Index Cond: ("outer".lookup_id = lookup.lookup_id) (5 rows) Time: 4,333 ms qnex=# INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 12 800 000 INSERT 0 6400000 Time: 501014,692 ms qnex=# ANALYZE huge_data; ANALYZE Time: 4243,453 ms qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------- Limit (cost=11719.00..11719.09 rows=1 width=12) -> Hash Join (cost=11719.00..1212739.73 rows=12800185 width=12) Hash Cond: ("outer".lookup_id = "inner".lookup_id) -> Seq Scan on huge_data (cost=0.00..190747.84 rows=12800184 width=8) -> Hash (cost=5961.00..5961.00 rows=400000 width=8) -> Seq Scan on lookup (cost=0.00..5961.00 rows=400000 width=8) (6 rows) Regards, Dawid
pgsql-performance by date: