does the query planner consider work_mem? - Mailing list pgsql-performance
From | Murat Tasan |
---|---|
Subject | does the query planner consider work_mem? |
Date | |
Msg-id | CA+YV+HxVLPGYRaN9HEfH7UDUDP40g4a=dKRC-NZF+i_a8oVGVg@mail.gmail.com Whole thread Raw |
Responses |
Re: does the query planner consider work_mem?
Re: does the query planner consider work_mem? |
List | pgsql-performance |
hi all - i'm having a bit of trouble with some queries that are running painfully slowly after migrating my database from one machine using PostgreSQL 8.2 to another machine with PostgreSQL 8.4. as far as i can tell, the two *servers* (not the physical machines) are set up pretty close to identically, and as far as query planning is concerned, the only setting that seems to be different is 'default_statistics_target', which is 10 on the 8.2 sever and 100 on the 8.4 server (the defaults)... and presumably this should be giving the 8.4 server more data to work with for better query plans (AFAIK). (other settings, e.g. cost estimates for page/row/etc access are identical between the servers.) i VACUUMed and ANALYZEd on both databases (and all tables) prior to running any of the examples below. i've narrowed some of the queries down to a few simpler test cases and found one example of a huge difference. here's a basic query reproducing the results: explain analyze select lt.l_id from lt where lt.t_name in (select t_name from lt x where x.l_id = 91032370) group by lt.l_id; [YES: i know i can use a join here. i extracted this from a much larger query that originally had 7 joins, and to help the query planner out a bit i managed to isolate one nested query that makes a large performance-increasing difference that the query planner was ignoring... and for the purposes of exploring this behavior, though, this serves as a nice example as it seems to be enforcing a nested query.] here's the analysis on the 8.2 machine: HashAggregate (cost=23240.77..23297.35 rows=5658 width=4) (actual time=5515.065..5600.372 rows=279785 loops=1) -> Nested Loop (cost=3461.24..23226.63 rows=5658 width=4) (actual time=134.604..5276.014 rows=304248 loops=1) -> HashAggregate (cost=3288.18..3288.19 rows=1 width=13) (actual time=0.097..0.120 rows=11 loops=1) -> Index Scan using listing__tag___listing_id on listing__tag x (cost=0.00..3286.06 rows=851 width=13) (actual time=0.069..0.084 rows=11 loops=1) Index Cond: (listing_id = 91032370) -> Bitmap Heap Scan on listing__tag lt (cost=173.06..19867.70 rows=5659 width=17) (actual time=115.275..474.135 rows=27659 loops=11) Recheck Cond: (lt.tag_name = x.tag_name) -> Bitmap Index Scan on listing__tag___tag_name (cost=0.00..171.64 rows=5659 width=0) (actual time=113.595..113.595 rows=27659 loops=11) Index Cond: (lt.tag_name = x.tag_name) Total runtime: 5615.036 ms same query on the 8.4 machine, trial 1 (almost exactly the same data... just (very) slightly fewer records): Group (cost=782153.70..807638.31 rows=516717 width=4) (actual time=184264.479..184434.087 rows=275876 loops=1) -> Sort (cost=782153.70..794896.00 rows=5096921 width=4) (actual time=184264.476..184353.314 rows=299992 loops=1) Sort Key: lt.listing_id Sort Method: external merge Disk: 4096kB -> Nested Loop (cost=306.17..5271.26 rows=5096921 width=4) (actual time=126.267..183408.035 rows=299992 loops=1) -> HashAggregate (cost=270.42..270.43 rows=1 width=10) (actual time=57.744..57.771 rows=11 loops=1) -> Index Scan using listing__tag___listing_id on listing__tag x (cost=0.00..270.25 rows=68 width=10) (actual time=57.728..57.731 rows=11 loops=1) Index Cond: (listing_id = 91032370) -> Bitmap Heap Scan on listing__tag lt (cost=35.75..4983.92 rows=1353 width=14) (actual time=59.723..16652.856 rows=27272 loops=11) Recheck Cond: (lt.tag_name = x.tag_name) -> Bitmap Index Scan on listing__tag___tag_name (cost=0.00..35.41 rows=1353 width=0) (actual time=58.036..58.036 rows=27272 loops=11) Index Cond: (lt.tag_name = x.tag_name) Total runtime: 184455.567 ms 3 seconds vs 3 minutes! so i obviously first noticed that the nested loop was taking up all the time... though that part of the query is identical between the two servers. another noticeable difference is that the estimates on number of rows are better in the 8.4 query (estimated around 500,000 vs 5,000, and actual number of rows was around 300,000). the only difference in query plans is the choice to Group->Sort vs HashAggregate, but both of those operate on the Nested Loop results. despite seeing all the time being spent on the Nested Loop, i also noticed that it was sorting on disk... so i upped the work_mem from the 1 MB default to 8 MB. i then re-ran the 8.4 query, and got these times (for the same query): Group (cost=642783.70..668268.31 rows=516717 width=4) (actual time=1946.838..2102.696 rows=275876 loops=1) -> Sort (cost=642783.70..655526.00 rows=5096921 width=4) (actual time=1946.835..2023.099 rows=299992 loops=1) Sort Key: lt.listing_id Sort Method: external merge Disk: 4096kB -> Nested Loop (cost=306.17..5271.26 rows=5096921 width=4) (actual time=4.336..1518.962 rows=299992 loops=1) -> HashAggregate (cost=270.42..270.43 rows=1 width=10) (actual time=0.069..0.089 rows=11 loops=1) -> Index Scan using listing__tag___listing_id on listing__tag x (cost=0.00..270.25 rows=68 width=10) (actual time=0.052..0.058 rows=11 loops=1) Index Cond: (listing_id = 91032370) -> Bitmap Heap Scan on listing__tag lt (cost=35.75..4983.92 rows=1353 width=14) (actual time=11.585..130.841 rows=27272 loops=11) Recheck Cond: (lt.tag_name = x.tag_name) -> Bitmap Index Scan on listing__tag___tag_name (cost=0.00..35.41 rows=1353 width=0) (actual time=6.588..6.588 rows=27272 loops=11) Index Cond: (lt.tag_name = x.tag_name) Total runtime: 2123.992 ms 2 seconds! great! kind of. this really just raised a whole series of questions now that are going to bother me for my future query-writing purposes. 1) obvious basic question: why, after raising work_mem to 8MB is the sort still happening on disk? or, is it happening on disk (despite being reported as so). 2) the Nested Loop time dropped dramatically (basically all the time savings)... but i had no warning prior to this that the work_mem limit was being hit inside that Nested Loop. is there any way to see which steps are requiring additional disk IO for processing... similar to the reported "external merge" for the Sort Method? 3) here's the biggest problem/issue in my brain: work_mem on the 8.2 server was also set to the 1 MB default! but ran quite speedily! the full migration will take a while, so there will still be query development/optimization on one system, and i'd love for those many hours testing to be worth something when ported over to the other system. in this particular example, the Nested Loop seems to fit in the 1 MB work_mem space on the 8.2 server, but not the 8.4? does this seem right to anybody? 4) i thought, maybe i'm reading the ANALYZE report incorrectly, and the true bottleneck in the first 8.4 query was indeed the on-disk sorting step (which would make sense). and upping the work_mem limit would alleviate that (which would also make sense). but then why is the second 8.4 query still reporting a sort on disk, despite the giant speed boost? 5) and finally, if indeed the sort is happening on disk in the first 8.4 query, will the query planner consider work_mem as part of the planning process? i.e. will a slightly-faster (in terms of data page reads) be downgraded if it looks like it'll require a sort that exceeds work_mem when a slightly slower plan that's expected to fit in work_mem is available? any insights here? cheers! -murat
pgsql-performance by date: