Re: Planner doesn't look at LIMIT? - Mailing list pgsql-performance

From Dawid Kuroczko
Subject Re: Planner doesn't look at LIMIT?
Date
Msg-id 758d5e7f05072209094156c984@mail.gmail.com
Whole thread Raw
In response to Re: Planner doesn't look at LIMIT?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 7/22/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dawid Kuroczko <qnex42@gmail.com> writes:
> > qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1;
>
> >  Limit  (cost=15912.20..15912.31 rows=1 width=272)
> >    ->  Hash Join  (cost=15912.20..5328368.96 rows=47044336 width=272)
>
> This is quite strange.  The nestloop plan definitely should be preferred
> in the context of the LIMIT, considering that it has far lower estimated
> cost.  And it is preferred in simple tests for me.  It seems there must
> be something specific to your installation that's causing the planner to
> go wrong.  Can you develop a self-contained test case that behaves this
> way for you?

Why, certainly.  I did test it also on Gentoo Linux PostgreSQL 8.0.1 (yeah,
a bit older one), but the behaviour is the same.  The test looks like this:

-- First lets make a "small" lookup table -- 400000 rows.
CREATE TABLE lookup (
    lookup_id serial PRIMARY KEY,
    value integer NOT NULL
);
INSERT INTO lookup (value) SELECT * FROM generate_series(1, 400000);
VACUUM ANALYZE lookup;
-- Then lets make a huge data table...
CREATE TABLE huge_data (
    huge_data_id serial PRIMARY KEY,
    lookup_id integer NOT NULL
);
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM lookup;
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; --    800 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; --  1 600 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; --  3 200 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; --  6 400 000
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 12 800 000
-- You may want to put ANALYZE and EXPLAIN between each of these
-- steps.  In my cases, at 12.8 mln rows PostgreSQL seems to go for hashjoin
-- in each case.  YMMV, so you may try to push it up to 1024 mln rows.
INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 25 600 000
ANALYZE huge_data;
EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1;

My EXPLAIN FROM Linux (SMP P-III box), with PostgreSQL 8.0.1, during making
this test case:

qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1;
                                     QUERY PLAN
------------------------------------------------------------------------------------
Limit  (cost=0.00..3.21 rows=1 width=12)
  ->  Nested Loop  (cost=0.00..19557596.04 rows=6094777 width=12)
        ->  Seq Scan on huge_data  (cost=0.00..95372.42 rows=6399942 width=8)
        ->  Index Scan using lookup_pkey on lookup  (cost=0.00..3.02
rows=1 width=8)
              Index Cond: ("outer".lookup_id = lookup.lookup_id)
(5 rows)

Time: 4,333 ms
qnex=# INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM
huge_data; -- 12 800 000
INSERT 0 6400000
Time: 501014,692 ms
qnex=# ANALYZE huge_data;
ANALYZE
Time: 4243,453 ms
qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1;
                                  QUERY PLAN
-------------------------------------------------------------------------------
Limit  (cost=11719.00..11719.09 rows=1 width=12)
  ->  Hash Join  (cost=11719.00..1212739.73 rows=12800185 width=12)
        Hash Cond: ("outer".lookup_id = "inner".lookup_id)
        ->  Seq Scan on huge_data  (cost=0.00..190747.84 rows=12800184 width=8)
        ->  Hash  (cost=5961.00..5961.00 rows=400000 width=8)
              ->  Seq Scan on lookup  (cost=0.00..5961.00 rows=400000 width=8)
(6 rows)

   Regards,
     Dawid

pgsql-performance by date:

Previous
From: Madison Kelly
Date:
Subject: Solved (was: Re: Another index question)
Next
From: Tom Lane
Date:
Subject: Re: Planner doesn't look at LIMIT?