Thread: Postgresql 13 query engine regression
Hi, I am running Postgresql 13 as a backend for Odoo, and I believe I have discovered a regression with the query engine. This (simplified) query generated by the ORM takes 47.683s to complete (the result set is empty): SELECT "account_bank_statement_line".id FROM "account_bank_statement_line" LEFT JOIN "account_move" AS "account_bank_statement_line__move_id" ON ("account_bank_statement_line"."move_id" = "account_bank_statement_line__move_id"."id") WHERE ( ("account_bank_statement_line"."move_id" in ( SELECT "account_move".id FROM "account_move" WHERE ("account_move"."state" = 'posted') AND ("account_move"."company_id" IS NULL OR ("account_move"."company_id" in (1))) ORDER BY "account_move"."id" ) ) AND ("account_bank_statement_line__move_id"."journal_id" = 29) ) ORDER BY "account_bank_statement_line__move_id"."date" DESC, "account_bank_statement_line"."id" DESC LIMIT 1 If I remove the "LIMIT 1" on the last line, the query completes in 0.036s. If I remove the WHERE clause, the query completes in 0.032s. If I run the original query on Postgresql 12.6 (on a lower spec'd host), it completes in 0.067s. -- Jonathan Chen <jonc@chen.org.nz>
On 5/10/21 2:13 PM, Jonathan Chen wrote: > Hi, > > I am running Postgresql 13 as a backend for Odoo, and I believe I have > discovered a regression with the query engine. See here: https://wiki.postgresql.org/wiki/Slow_Query_Questions for the information needed to get an answer to this sort of question. > > This (simplified) query generated by the ORM takes 47.683s to complete > (the result set is empty): > SELECT "account_bank_statement_line".id > FROM "account_bank_statement_line" > LEFT JOIN "account_move" AS "account_bank_statement_line__move_id" > ON ("account_bank_statement_line"."move_id" = > "account_bank_statement_line__move_id"."id") > WHERE > ( > ("account_bank_statement_line"."move_id" in > ( > SELECT "account_move".id > FROM "account_move" > WHERE ("account_move"."state" = 'posted') > AND ("account_move"."company_id" IS NULL OR > ("account_move"."company_id" in (1))) > ORDER BY "account_move"."id" > ) > ) > AND ("account_bank_statement_line__move_id"."journal_id" = 29) > ) > ORDER BY "account_bank_statement_line__move_id"."date" DESC, > "account_bank_statement_line"."id" DESC LIMIT 1 > > If I remove the "LIMIT 1" on the last line, the query completes in 0.036s. > > If I remove the WHERE clause, the query completes in 0.032s. > > If I run the original query on Postgresql 12.6 (on a lower spec'd > host), it completes in 0.067s. > -- Adrian Klaver adrian.klaver@aklaver.com
Hi,
sorry to jump in, but looks like I have a similar pb on a 12.6 instance.
on a quite simple request,
with limit 1000 it takes 27ms, and with limit 10, it takes 145000 ms
looking at both the explain analyze plans, there is a huge difference:
with limit 1000, postgres do an optimization of the plan putting join in an efficient order. and using a hash left join
with limit 10: no optimization. so inefficient order and reading of a huge amount of data.and using a nested loop left join
stats have been updated through vacuum analyze.
If interested I can put the plans (in another thread...)
On Mon, May 10, 2021 at 11:13 PM Jonathan Chen <jonc@chen.org.nz> wrote:
Hi,
I am running Postgresql 13 as a backend for Odoo, and I believe I have
discovered a regression with the query engine.
This (simplified) query generated by the ORM takes 47.683s to complete
(the result set is empty):
SELECT "account_bank_statement_line".id
FROM "account_bank_statement_line"
LEFT JOIN "account_move" AS "account_bank_statement_line__move_id"
ON ("account_bank_statement_line"."move_id" =
"account_bank_statement_line__move_id"."id")
WHERE
(
("account_bank_statement_line"."move_id" in
(
SELECT "account_move".id
FROM "account_move"
WHERE ("account_move"."state" = 'posted')
AND ("account_move"."company_id" IS NULL OR
("account_move"."company_id" in (1)))
ORDER BY "account_move"."id"
)
)
AND ("account_bank_statement_line__move_id"."journal_id" = 29)
)
ORDER BY "account_bank_statement_line__move_id"."date" DESC,
"account_bank_statement_line"."id" DESC LIMIT 1
If I remove the "LIMIT 1" on the last line, the query completes in 0.036s.
If I remove the WHERE clause, the query completes in 0.032s.
If I run the original query on Postgresql 12.6 (on a lower spec'd
host), it completes in 0.067s.
--
Jonathan Chen <jonc@chen.org.nz>
Jonathan Chen <jonc@chen.org.nz> writes: > I am running Postgresql 13 as a backend for Odoo, and I believe I have > discovered a regression with the query engine. As Adrian noted, you haven't provided enough detail to let anyone offer more than generalities. However, in the spirit of generalities ... ORDER BY with a small LIMIT is often a hard problem for the planner. It has to guess whether a "fast start" plan is likely to win out over a more straightforward plan. "Fast start" typically looks like "scan from the end of an index on the ORDER BY column, immediately emitting any row that satisfies the WHERE conditions, and stop when you have enough". The other way generally involves collecting all the rows satisfying WHERE, sorting them, and emitting the first few sort outputs. For small LIMIT, the "fast start" way can win big, by not having to visit most of the rows nominally satisfying WHERE. It can also lose big, if most of the rows near the end of the index turn out not to satisfy the WHERE, so that it has to keep on scanning. With the limited facts you've offered, it appears that the v12 planner chose the right way and v13 didn't, but we can't tell which way is the one that was faster. It's unlikely that this is an actual regression, in the sense of being anything we could fix. It's at least as likely that v12 was just lucky, or was right for the wrong reasons. With an example like you have here, with hard-to-estimate WHERE conditions, there's a lot of luck involved ... especially if the WHERE conditions are at all correlated with the ORDER BY order. In this case, with an "IN sub-SELECT" condition that looks suspiciously correlated with the outer join condition, it's quite likely that the planner is arriving at a totally off-base guess about how many rows the IN eliminates. Rather than blaming the planner for not being lucky, you should be thinking about what you can do to help it out. Can you get rid of the additional join to account_move? (Note that the one I'm suspecting is extra is the LEFT JOIN, not the IN.) I'd also counsel losing the ORDER BY in the sub-select, as that's either going to be ignored altogether or force a probably-useless sort, not to mention creating an optimization barrier. regards, tom lane
Hi, On Tue, 11 May 2021 at 10:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: [...] > As Adrian noted, you haven't provided enough detail to let anyone > offer more than generalities. I'll be providing the output of EXPLAIN (ANALYZE, BUFFERS) in a separate post. > However, in the spirit of generalities ... [.. helpful explanation snipped ..] > Rather than blaming the planner for not being lucky, you should be > thinking about what you can do to help it out. Can you get rid of > the additional join to account_move? (Note that the one I'm > suspecting is extra is the LEFT JOIN, not the IN.) I'd also > counsel losing the ORDER BY in the sub-select, as that's either > going to be ignored altogether or force a probably-useless sort, > not to mention creating an optimization barrier. Unfortunately, the SQL is generated by Odoo's ORM, and as much as I'd like to be able to 'help it', their code has too many feature-hooks for me to easily hack their code ... Odoo has used Postgresql as it s database since its inception back in 2008. It is just unfortunate that it has hit this stumbling block with the latest version of the planner. Cheers. -- Jonathan Chen <jonc@chen.org.nz>
Hi, Here's a bit more detail: select version(); version --------------------------------------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 13.2 on amd64-portbld-freebsd12.2, compiled by FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2), 64-bit For reference, the query is (this is generated by the Odoo ORM, with all its redundant bits): SELECT "account_bank_statement_line".id FROM "account_bank_statement_line" LEFT JOIN "account_move" AS "account_bank_statement_line__move_id" ON ("account_bank_statement_line"."move_id" = "account_bank_statement_line__move_id"."id") WHERE ( ("account_bank_statement_line"."move_id" in ( SELECT "account_move".id FROM "account_move" WHERE ("account_move"."state" = 'posted') AND ("account_move"."company_id" IS NULL OR ("account_move"."company_id" in (1))) ORDER BY "account_move"."id" ) ) AND ("account_bank_statement_line__move_id"."journal_id" = 29) ) order by "account_bank_statement_line__move_id"."date" DESC, "account_bank_statement_line"."id" DESC LIMIT 1; Query plan: Limit (cost=14226.75..15563.99 rows=1 width=8) (actual time=129162.351..129162.409 rows=0 loops=1) Buffers: shared hit=97340 -> Nested Loop Semi Join (cost=14226.75..6134771.07 rows=4577 width=8) (actual time=129162.350..129162.408 rows=0 loops=1) Join Filter: (account_bank_statement_line.move_id = account_move.id) Rows Removed by Join Filter: 1198285616 Buffers: shared hit=97340 -> Gather Merge (cost=1010.15..20911.67 rows=4577 width=16) (actual time=3.596..24.065 rows=13453 loops=1) Workers Planned: 1 Workers Launched: 1 Buffers: shared hit=93061 -> Incremental Sort (cost=10.14..19396.75 rows=2692 width=16) (actual time=0.669..34.507 rows=6726 loops=2) Sort Key: account_bank_statement_line__move_id.date DESC, account_bank_statement_line.id DESC Presorted Key: account_bank_statement_line__move_id.date Full-sort Groups: 29 Sort Method: quicksort Average Memory: 27kB Peak Memory: 27kB Pre-sorted Groups: 17 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB Buffers: shared hit=93061 Worker 0: Full-sort Groups: 185 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB Pre-sorted Groups: 56 Sort Method: quicksort Average Memory: 28kB Peak Memory: 28kB -> Nested Loop (cost=0.58..19299.09 rows=2692 width=16) (actual time=0.284..31.917 rows=6726 loops=2) Buffers: shared hit=92996 -> Parallel Index Scan Backward using account_move_date_index on account_move account_bank_statement_line__move_id (cost=0.29..13152.37 rows=16040 width=8) (actual time=0.271..14.465 rows=13556 loops=2) Filter: (journal_id = 29) Rows Removed by Filter: 40958 Buffers: shared hit=25318 -> Index Scan using account_bank_statement_line_move_id_idx on account_bank_statement_line (cost=0.29..0.37 rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=27112) Index Cond: (move_id = account_bank_statement_line__move_id.id) Buffers: shared hit=67678 -> Materialize (cost=13216.60..14771.30 rows=88840 width=4) (actual time=0.003..3.716 rows=89072 loops=13453) Buffers: shared hit=4279 -> Sort (cost=13216.60..13438.70 rows=88840 width=4) (actual time=35.560..39.472 rows=89072 loops=1) Sort Key: account_move.id Sort Method: quicksort Memory: 6906kB Buffers: shared hit=4279 -> Seq Scan on account_move (cost=0.00..5914.43 rows=88840 width=4) (actual time=0.007..21.195 rows=89072 loops=1) Filter: (((company_id IS NULL) OR (company_id = 1)) AND ((state)::text = 'posted'::text)) Rows Removed by Filter: 19957 Buffers: shared hit=4279 Planning: Buffers: shared hit=83 Planning Time: 0.708 ms Execution Time: 129162.830 ms If I remove the "LIMIT 1", the Query plan changes to: Sort (cost=21589.50..21600.94 rows=4577 width=8) (actual time=46.266..46.269 rows=0 loops=1) Sort Key: account_bank_statement_line__move_id.date DESC, account_bank_statement_line.id DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=4390 -> Hash Join (cost=16543.33..21311.21 rows=4577 width=8) (actual time=46.263..46.266 rows=0 loops=1) Hash Cond: (account_bank_statement_line__move_id.id = account_bank_statement_line.move_id) Buffers: shared hit=4390 -> Bitmap Heap Scan on account_move account_bank_statement_line__move_id (cost=635.74..5255.59 rows=27268 width=8) (actual time=1.258..1.259 rows=1 loops=1) Recheck Cond: (journal_id = 29) Heap Blocks: exact=1 Buffers: shared hit=108 -> Bitmap Index Scan on account_move_sequence_index2 (cost=0.00..628.93 rows=27268 width=0) (actual time=1.125..1.125 rows=27112 loops=1) Index Cond: (journal_id = 29) Buffers: shared hit=107 -> Hash (cost=15678.84..15678.84 rows=18300 width=12) (actual time=44.992..44.994 rows=0 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 256kB Buffers: shared hit=4282 -> Merge Semi Join (cost=14323.71..15678.84 rows=18300 width=12) (actual time=44.991..44.992 rows=0 loops=1) Merge Cond: (account_bank_statement_line.move_id = account_move.id) Buffers: shared hit=4282 -> Index Scan using account_bank_statement_line_move_id_idx on account_bank_statement_line (cost=0.29..901.45 rows=18300 width=8) (actual time=0.004..0.004 rows=1 loops=1) Buffers: shared hit=3 -> Sort (cost=13216.60..13438.70 rows=88840 width=4) (actual time=35.261..39.169 rows=89072 loops=1) Sort Key: account_move.id Sort Method: quicksort Memory: 6906kB Buffers: shared hit=4279 -> Seq Scan on account_move (cost=0.00..5914.43 rows=88840 width=4) (actual time=0.006..20.668 rows=89072 loops=1) Filter: (((company_id IS NULL) OR (company_id = 1)) AND ((state)::text = 'posted'::text)) Rows Removed by Filter: 19957 Buffers: shared hit=4279 Planning: Buffers: shared hit=40 Planning Time: 0.471 ms Execution Time: 46.321 ms For comparison, the original query on Postgresql 12: select version(); version -------------------------------------------------------------------------------------------------------- PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit The query plan result is: Limit (cost=21228.94..21228.94 rows=1 width=8) (actual time=75.172..75.172 rows=0 loops=1) Buffers: shared hit=10127, temp read=153 written=154 -> Sort (cost=21228.94..21240.31 rows=4547 width=8) (actual time=75.171..75.171 rows=0 loops=1) Sort Key: account_bank_statement_line__move_id.date DESC, account_bank_statement_line.id DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=10127, temp read=153 written=154 -> Hash Join (cost=16505.50..21206.20 rows=4547 width=8) (actual time=75.167..75.168 rows=0 loops=1) Hash Cond: (account_bank_statement_line__move_id.id = account_bank_statement_line.move_id) Buffers: shared hit=10127, temp read=153 written=154 -> Bitmap Heap Scan on account_move account_bank_statement_line__move_id (cost=629.42..5183.52 rows=26968 width=8) (actual time=1.253..5.304 rows=27166 loops=1) Recheck Cond: (journal_id = 29) Heap Blocks: exact=1337 Buffers: shared hit=1444 -> Bitmap Index Scan on account_move_sequence_index2 (cost=0.00..622.68 rows=26968 width=0) (actual time=1.116..1.116 rows=27166 loops=1) Index Cond: (journal_id = 29) Buffers: shared hit=107 -> Hash (cost=15646.27..15646.27 rows=18385 width=12) (actual time=66.937..66.937 rows=1 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 257kB Buffers: shared hit=8683, temp read=153 written=154 -> Merge Semi Join (cost=14283.78..15646.27 rows=18385 width=12) (actual time=66.896..66.933 rows=1 loops=1) Merge Cond: (account_bank_statement_line.move_id = account_move.id) Buffers: shared hit=8683, temp read=153 written=154 -> Index Scan using account_bank_statement_line_move_id_idx on account_bank_statement_line (cost=0.29..907.70 rows=18385 width=8) (actual time=0.009..3.437 rows=18385 loops=1) Buffers: shared hit=4466 -> Sort (cost=13173.22..13395.83 rows=89045 width=4) (actual time=45.424..53.377 rows=89103 loops=1) Sort Key: account_move.id Sort Method: external merge Disk: 1224kB Buffers: shared hit=4217, temp read=153 written=154 -> Seq Scan on account_move (cost=0.00..5852.72 rows=89045 width=4) (actual time=0.007..23.127 rows=89108 loops=1) Filter: (((company_id IS NULL) OR (company_id = 1)) AND ((state)::text = 'posted'::text)) Rows Removed by Filter: 19940 Buffers: shared hit=4217 Planning Time: 0.495 ms Execution Time: 75.544 ms -- Jonathan Chen <jonc@chen.org.nz>
Hmm, there's a bit of text-mangling. Let's use https://explain.depesz.com. PG13 LIMIT 1 : https://explain.depesz.com/s/GFki PG13 no LIMIT: https://explain.depesz.com/s/U4vR PG12 LIMIT 1 : https://explain.depesz.com/s/JAp4 -- Jonathan Chen <jonc@chen.org.nz>
On Tue, 11 May 2021 at 11:34, Jonathan Chen <jonc@chen.org.nz> wrote: > PG13 LIMIT 1 : https://explain.depesz.com/s/GFki > PG13 no LIMIT: https://explain.depesz.com/s/U4vR > PG12 LIMIT 1 : https://explain.depesz.com/s/JAp4 The difference is coming from the fact that PostgreSQL 13 has incremental sort and can use the account_move_date_index to provide partially sorted input for the ORDER BY clause. PG12 didn't have this feature, so there was no index that could help getting pre-sorted input to the ORDER BY. You'd get along much better if you got rid of the account_move_date_index index and replaced it with: CREATE INDEX account_move_date_id_index ON account_move (date,id); Or instead or removing account_move_date_index, you could add an index such as: CREATE INDEX account_move_journal_id_date_index (journal_id, date, id); That should allow the query to run a bit more quickly. However, if the first of the two is fast enough then it might be better to not add too many extra indexes. David
On Tue, 11 May 2021 at 12:49, David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 11 May 2021 at 11:34, Jonathan Chen <jonc@chen.org.nz> wrote: > > PG13 LIMIT 1 : https://explain.depesz.com/s/GFki > > PG13 no LIMIT: https://explain.depesz.com/s/U4vR > > PG12 LIMIT 1 : https://explain.depesz.com/s/JAp4 > > The difference is coming from the fact that PostgreSQL 13 has > incremental sort and can use the account_move_date_index to provide > partially sorted input for the ORDER BY clause. PG12 didn't have this > feature, so there was no index that could help getting pre-sorted > input to the ORDER BY. > > You'd get along much better if you got rid of the > account_move_date_index index and replaced it with: > > CREATE INDEX account_move_date_id_index ON account_move (date,id); > > Or instead or removing account_move_date_index, you could add an index such as: > > CREATE INDEX account_move_journal_id_date_index (journal_id, date, id); > > That should allow the query to run a bit more quickly. However, if the > first of the two is fast enough then it might be better to not add too > many extra indexes. If I drop the index account_move_date_index, the speed of the query improves to 0.035s. Adding the replacement index on (date, id) or (journal_id, date, id) increases the query to > 47s again. This gives me a bit of something to play with. Thanks for the insight. -- Jonathan Chen <jonc@chen.org.nz>