does the query planner consider work_mem? - Mailing list pgsql-performance

From Murat Tasan
Subject does the query planner consider work_mem?
Date
Msg-id CA+YV+HxVLPGYRaN9HEfH7UDUDP40g4a=dKRC-NZF+i_a8oVGVg@mail.gmail.com
Whole thread Raw
Responses Re: does the query planner consider work_mem?
Re: does the query planner consider work_mem?
List pgsql-performance
hi all - i'm having a bit of trouble with some queries that are
running painfully slowly after migrating my database from one machine
using PostgreSQL 8.2 to another machine with PostgreSQL 8.4.
as far as i can tell, the two *servers* (not the physical machines)
are set up pretty close to identically, and as far as query planning
is concerned, the only setting that seems to be different is
'default_statistics_target', which is 10 on the 8.2 sever and 100 on
the 8.4 server (the defaults)... and presumably this should be giving
the 8.4 server more data to work with for better query plans (AFAIK).
(other settings, e.g. cost estimates for page/row/etc access are
identical between the servers.)
i VACUUMed and ANALYZEd on both databases (and all tables) prior to
running any of the examples below.

i've narrowed some of the queries down to a few simpler test cases and
found one example of a huge difference.
here's a basic query reproducing the results:

explain analyze select lt.l_id from lt where lt.t_name in (select
t_name from lt x where x.l_id = 91032370) group by lt.l_id;

[YES: i know i can use a join here. i extracted this from a much
larger query that originally had 7 joins, and to help the query
planner out a bit i managed to isolate one nested query that makes a
large performance-increasing difference that the query planner was
ignoring... and for the purposes of exploring this behavior, though,
this serves as a nice example as it seems to be enforcing a nested
query.]

here's the analysis on the 8.2 machine:

HashAggregate  (cost=23240.77..23297.35 rows=5658 width=4) (actual
time=5515.065..5600.372 rows=279785 loops=1)
   ->  Nested Loop  (cost=3461.24..23226.63 rows=5658 width=4) (actual
time=134.604..5276.014 rows=304248 loops=1)
         ->  HashAggregate  (cost=3288.18..3288.19 rows=1 width=13)
(actual time=0.097..0.120 rows=11 loops=1)
               ->  Index Scan using listing__tag___listing_id on
listing__tag x  (cost=0.00..3286.06 rows=851 width=13) (actual
time=0.069..0.084 rows=11 loops=1)
                     Index Cond: (listing_id = 91032370)
         ->  Bitmap Heap Scan on listing__tag lt
(cost=173.06..19867.70 rows=5659 width=17) (actual
time=115.275..474.135 rows=27659 loops=11)
               Recheck Cond: (lt.tag_name = x.tag_name)
               ->  Bitmap Index Scan on listing__tag___tag_name
(cost=0.00..171.64 rows=5659 width=0) (actual time=113.595..113.595
rows=27659 loops=11)
                     Index Cond: (lt.tag_name = x.tag_name)
 Total runtime: 5615.036 ms

same query on the 8.4 machine, trial 1 (almost exactly the same
data... just (very) slightly fewer records):

Group  (cost=782153.70..807638.31 rows=516717 width=4) (actual
time=184264.479..184434.087 rows=275876 loops=1)
   ->  Sort  (cost=782153.70..794896.00 rows=5096921 width=4) (actual
time=184264.476..184353.314 rows=299992 loops=1)
         Sort Key: lt.listing_id
         Sort Method:  external merge  Disk: 4096kB
         ->  Nested Loop  (cost=306.17..5271.26 rows=5096921 width=4)
(actual time=126.267..183408.035 rows=299992 loops=1)
               ->  HashAggregate  (cost=270.42..270.43 rows=1
width=10) (actual time=57.744..57.771 rows=11 loops=1)
                     ->  Index Scan using listing__tag___listing_id on
listing__tag x  (cost=0.00..270.25 rows=68 width=10) (actual
time=57.728..57.731 rows=11 loops=1)
                           Index Cond: (listing_id = 91032370)
               ->  Bitmap Heap Scan on listing__tag lt
(cost=35.75..4983.92 rows=1353 width=14) (actual
time=59.723..16652.856 rows=27272 loops=11)
                     Recheck Cond: (lt.tag_name = x.tag_name)
                     ->  Bitmap Index Scan on listing__tag___tag_name
(cost=0.00..35.41 rows=1353 width=0) (actual time=58.036..58.036
rows=27272 loops=11)
                           Index Cond: (lt.tag_name = x.tag_name)
 Total runtime: 184455.567 ms

3 seconds vs 3 minutes!
so i obviously first noticed that the nested loop was taking up all
the time... though that part of the query is identical between the two
servers.
another noticeable difference is that the estimates on number of rows
are better in the 8.4 query (estimated around 500,000 vs 5,000, and
actual number of rows was around 300,000).
the only difference in query plans is the choice to Group->Sort vs
HashAggregate, but both of those operate on the Nested Loop results.
despite seeing all the time being spent on the Nested Loop, i also
noticed that it was sorting on disk... so i upped the work_mem from
the 1 MB default to 8 MB.
i then re-ran the 8.4 query, and got these times (for the same query):

Group  (cost=642783.70..668268.31 rows=516717 width=4) (actual
time=1946.838..2102.696 rows=275876 loops=1)
   ->  Sort  (cost=642783.70..655526.00 rows=5096921 width=4) (actual
time=1946.835..2023.099 rows=299992 loops=1)
         Sort Key: lt.listing_id
         Sort Method:  external merge  Disk: 4096kB
         ->  Nested Loop  (cost=306.17..5271.26 rows=5096921 width=4)
(actual time=4.336..1518.962 rows=299992 loops=1)
               ->  HashAggregate  (cost=270.42..270.43 rows=1
width=10) (actual time=0.069..0.089 rows=11 loops=1)
                     ->  Index Scan using listing__tag___listing_id on
listing__tag x  (cost=0.00..270.25 rows=68 width=10) (actual
time=0.052..0.058 rows=11 loops=1)
                           Index Cond: (listing_id = 91032370)
               ->  Bitmap Heap Scan on listing__tag lt
(cost=35.75..4983.92 rows=1353 width=14) (actual time=11.585..130.841
rows=27272 loops=11)
                     Recheck Cond: (lt.tag_name = x.tag_name)
                     ->  Bitmap Index Scan on listing__tag___tag_name
(cost=0.00..35.41 rows=1353 width=0) (actual time=6.588..6.588
rows=27272 loops=11)
                           Index Cond: (lt.tag_name = x.tag_name)
 Total runtime: 2123.992 ms

2 seconds!
great! kind of.
this really just raised a whole series of questions now that are going
to bother me for my future query-writing purposes.

1) obvious basic question: why, after raising work_mem to 8MB is the
sort still happening on disk? or, is it happening on disk (despite
being reported as so).

2) the Nested Loop time dropped dramatically (basically all the time
savings)... but i had no warning prior to this that the work_mem limit
was being hit inside that Nested Loop.
is there any way to see which steps are requiring additional disk IO
for processing... similar to the reported "external merge" for the
Sort Method?

3) here's the biggest problem/issue in my brain: work_mem on the 8.2
server was also set to the 1 MB default! but ran quite speedily!
the full migration will take a while, so there will still be query
development/optimization on one system, and i'd love for those many
hours testing to be worth something when ported over to the other
system.
in this particular example, the Nested Loop seems to fit in the 1 MB
work_mem space on the 8.2 server, but not the 8.4? does this seem
right to anybody?

4) i thought, maybe i'm reading the ANALYZE report incorrectly, and
the true bottleneck in the first 8.4 query was indeed the on-disk
sorting step (which would make sense). and upping the work_mem limit
would alleviate that (which would also make sense).
but then why is the second 8.4 query still reporting a sort on disk,
despite the giant speed boost?

5) and finally, if indeed the sort is happening on disk in the first
8.4 query, will the query planner consider work_mem as part of the
planning process?
i.e. will a slightly-faster (in terms of data page reads) be
downgraded if it looks like it'll require a sort that exceeds work_mem
when a slightly slower plan that's expected to fit in work_mem is
available?

any insights here?

cheers!

-murat

pgsql-performance by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: pg_dump and thousands of schemas
Next
From: Ants Aasma
Date:
Subject: Re: does the query planner consider work_mem?