bad plan and LIMIT - Mailing list pgsql-performance

From James Nelson
Subject bad plan and LIMIT
Date
Msg-id 6B6B69ED-D08F-49B9-B43C-2A2FC2EED07E@photoshelter.com
Whole thread Raw
Responses Re: bad plan and LIMIT  (Adam Ruth <adamruth@mac.com>)
Re: bad plan and LIMIT  (Grzegorz Jaśkiewicz <gryzman@gmail.com>)
Re: bad plan and LIMIT  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hi, I'm hoping you guys can help with improving this query I'm having
a problem with. The main problem is that the query plan changes
depending on the value of the LIMIT clause, with small values using a
poor plan and running very slowly. The two times are roughly 5 minutes
for the bad plan and 1.5 secs for the good plan.

I have read a little about how the query planner takes into account
the limit clause, and I can see the effect this has on the costs shown
by explain. The problem is that the estimated cost ends up being
wildly inaccurate. I'm not sure if this a problem with the planner or
if it is something I am doing wrong on my end.

the query (without the limit clause):

SELECT ID FROM ps_image WHERE id IN (SELECT image_id FROM
ps_gallery_image WHERE gallery_id='G00007ejKGoWS_cY') ORDER BY
LOWER(FILE_NAME) ASC

The ps_image table has about 24 million rows, ps_gallery_image has
about 14 million. The query above produces roughly 50 thousand rows.

When looking at the explain with the limit, I can see the
interpolation that the planner does for the limit node (arriving at a
final cost of 458.32 for this example) but not sure why it is
inaccurate compared to the actual times.

Thanks in advance for taking a look at this, let me know if there is
additional information I should provide.

Some information about the tables  and the explains follow below.

James Nelson

[james@db2 ~] psql --version
psql (PostgreSQL) 8.3.5
contains support for command-line editing

photoshelter=# \d ps_image
                               Table "public.ps_image"
    Column     |           Type           |                 Modifiers
---------------+--------------------------
+-------------------------------------------
id            | character varying(16)    | not null
user_id       | character varying(16)    |
album_id      | character varying(16)    | not null
parent_id     | character varying(16)    |
file_name     | character varying(200)   |
file_size     | bigint                   |
.... 20 rows snipped ....
Indexes:
    "ps_image_pkey" PRIMARY KEY, btree (id)
    "i_file_name_l" btree (lower(file_name::text))
.... indexes, fk constraints and triggers snipped ....

photoshelter=# \d ps_gallery_image
                  Table "public.ps_gallery_image"
    Column     |           Type           |       Modifiers
---------------+--------------------------+------------------------
gallery_id    | character varying(16)    | not null
image_id      | character varying(16)    | not null
display_order | integer                  | not null default 0
caption       | character varying(2000)  |
ctime         | timestamp with time zone | not null default now()
mtime         | timestamp with time zone | not null default now()
id            | character varying(16)    | not null
Indexes:
    "ps_gallery_image_pkey" PRIMARY KEY, btree (id)
    "gi_gallery_id" btree (gallery_id)
    "gi_image_id" btree (image_id)
Foreign-key constraints:
    "ps_gallery_image_gallery_id_fkey" FOREIGN KEY (gallery_id)
REFERENCES ps_gallery(id) ON DELETE CASCADE
    "ps_gallery_image_image_id_fkey" FOREIGN KEY (image_id) REFERENCES
ps_image(id) ON DELETE CASCADE
Triggers:
    ps_image_gi_sync AFTER INSERT OR DELETE OR UPDATE ON
ps_gallery_image FOR EACH ROW EXECUTE PROCEDURE ps_image_sync()

=
=
=
=
=
=
=
=
=
=
=
=
=
========================================================================
explain analyze for bad plan

photoshelter=# explain  analyze SELECT ID FROM ps_image WHERE id IN
(SELECT image_id FROM ps_gallery_image WHERE
gallery_id='G00007ejKGoWS_cY') ORDER BY LOWER(FILE_NAME) ASC limit 1;
                                                                         QUERY
  PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.00..458.32 rows=1 width=36) (actual
time=709831.847..709831.847 rows=1 loops=1)
   ->  Nested Loop IN Join  (cost=0.00..17700128.78 rows=38620
width=36) (actual time=709831.845..709831.845 rows=1 loops=1)
         ->  Index Scan using i_file_name_l on ps_image
(cost=0.00..1023863.22 rows=24460418 width=36) (actual
time=0.063..271167.293 rows=8876340 loops=1)
         ->  Index Scan using gi_image_id on ps_gallery_image
(cost=0.00..0.85 rows=1 width=17) (actual time=0.048..0.048 rows=0
loops=8876340)
               Index Cond: ((ps_gallery_image.image_id)::text =
(ps_image.id)::text)
               Filter: ((ps_gallery_image.gallery_id)::text =
'G00007ejKGoWS_cY'::text)
Total runtime: 709831.932 ms

=
=
=
=
=
=
=
=
=
=
=
=
=
========================================================================
explain analyze for good plan

photoshelter=# explain  analyze SELECT ID FROM ps_image WHERE id IN
(SELECT image_id FROM ps_gallery_image WHERE
gallery_id='G00007ejKGoWS_cY') ORDER BY LOWER(FILE_NAME) ASC limit 600;
                                                                              QUERY
  PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=154650.99..154652.49 rows=600 width=36) (actual
time=1886.038..1886.404 rows=600 loops=1)
   ->  Sort  (cost=154650.99..154747.54 rows=38619 width=36) (actual
time=1886.038..1886.174 rows=600 loops=1)
         Sort Key: (lower((ps_image.file_name)::text))
         Sort Method:  top-N heapsort  Memory: 75kB
         ->  Nested Loop  (cost=42394.02..152675.86 rows=38619
width=36) (actual time=135.132..1838.491 rows=50237 loops=1)
               ->  HashAggregate  (cost=42394.02..42780.21 rows=38619
width=17) (actual time=135.079..172.563 rows=50237 loops=1)
                     ->  Index Scan using gi_gallery_id on
ps_gallery_image  (cost=0.00..42271.79 rows=48891 width=17) (actual
time=0.063..97.539 rows=50237 loops=1)
                           Index Cond: ((gallery_id)::text =
'G00007ejKGoWS_cY'::text)
               ->  Index Scan using ps_image_pkey on ps_image
(cost=0.00..2.83 rows=1 width=36) (actual time=0.031..0.031 rows=1
loops=50237)
                     Index Cond: ((ps_image.id)::text =
(ps_gallery_image.image_id)::text)
Total runtime: 1886.950 ms







pgsql-performance by date:

Previous
From: Whit Armstrong
Date:
Subject: Re: partition question for new server setup
Next
From: Adam Ruth
Date:
Subject: Re: bad plan and LIMIT