Thread: Postgresql 13 query engine regression

Postgresql 13 query engine regression

From
Jonathan Chen
Date:
Hi,

I am running Postgresql 13 as a backend for Odoo, and I believe I have
discovered a regression with the query engine.

This (simplified) query generated by the ORM takes 47.683s to complete
(the result set is empty):
SELECT "account_bank_statement_line".id
FROM "account_bank_statement_line"
LEFT JOIN "account_move" AS "account_bank_statement_line__move_id"
  ON ("account_bank_statement_line"."move_id" =
"account_bank_statement_line__move_id"."id")
WHERE
(
  ("account_bank_statement_line"."move_id" in
    (
      SELECT "account_move".id
      FROM "account_move"
      WHERE ("account_move"."state" = 'posted')
      AND ("account_move"."company_id" IS NULL  OR
("account_move"."company_id" in (1)))
      ORDER BY  "account_move"."id"
    )
  )
  AND ("account_bank_statement_line__move_id"."journal_id" = 29)
)
ORDER BY "account_bank_statement_line__move_id"."date" DESC,
"account_bank_statement_line"."id" DESC LIMIT 1

If I remove the "LIMIT 1" on the last line, the query completes in 0.036s.

If I remove the WHERE clause, the query completes in 0.032s.

If I run the original query on Postgresql 12.6 (on a lower spec'd
host), it completes in 0.067s.
-- 
Jonathan Chen <jonc@chen.org.nz>



Re: Postgresql 13 query engine regression

From
Adrian Klaver
Date:
On 5/10/21 2:13 PM, Jonathan Chen wrote:
> Hi,
> 
> I am running Postgresql 13 as a backend for Odoo, and I believe I have
> discovered a regression with the query engine.

See here:

https://wiki.postgresql.org/wiki/Slow_Query_Questions

for the information needed to get an answer to this sort of question.

> 
> This (simplified) query generated by the ORM takes 47.683s to complete
> (the result set is empty):
> SELECT "account_bank_statement_line".id
> FROM "account_bank_statement_line"
> LEFT JOIN "account_move" AS "account_bank_statement_line__move_id"
>    ON ("account_bank_statement_line"."move_id" =
> "account_bank_statement_line__move_id"."id")
> WHERE
> (
>    ("account_bank_statement_line"."move_id" in
>      (
>        SELECT "account_move".id
>        FROM "account_move"
>        WHERE ("account_move"."state" = 'posted')
>        AND ("account_move"."company_id" IS NULL  OR
> ("account_move"."company_id" in (1)))
>        ORDER BY  "account_move"."id"
>      )
>    )
>    AND ("account_bank_statement_line__move_id"."journal_id" = 29)
> )
> ORDER BY "account_bank_statement_line__move_id"."date" DESC,
> "account_bank_statement_line"."id" DESC LIMIT 1
> 
> If I remove the "LIMIT 1" on the last line, the query completes in 0.036s.
> 
> If I remove the WHERE clause, the query completes in 0.032s.
> 
> If I run the original query on Postgresql 12.6 (on a lower spec'd
> host), it completes in 0.067s.
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Postgresql 13 query engine regression

From
Marc Millas
Date:
Hi,

sorry to jump in, but looks like I have a similar pb on a 12.6 instance.
on a quite simple request, 
with limit 1000 it takes 27ms, and with limit 10, it takes 145000 ms
looking at both the explain analyze plans, there is a huge difference:
with limit 1000, postgres do an optimization of the plan putting join in an efficient order. and using a hash left join 
with limit 10: no optimization. so inefficient order and reading of a huge amount of data.and using a nested loop left join

stats  have been updated through vacuum analyze.

If interested I can put the plans (in another thread...) 


Marc MILLAS
Senior Architect
+33607850334



On Mon, May 10, 2021 at 11:13 PM Jonathan Chen <jonc@chen.org.nz> wrote:
Hi,

I am running Postgresql 13 as a backend for Odoo, and I believe I have
discovered a regression with the query engine.

This (simplified) query generated by the ORM takes 47.683s to complete
(the result set is empty):
SELECT "account_bank_statement_line".id
FROM "account_bank_statement_line"
LEFT JOIN "account_move" AS "account_bank_statement_line__move_id"
  ON ("account_bank_statement_line"."move_id" =
"account_bank_statement_line__move_id"."id")
WHERE
(
  ("account_bank_statement_line"."move_id" in
    (
      SELECT "account_move".id
      FROM "account_move"
      WHERE ("account_move"."state" = 'posted')
      AND ("account_move"."company_id" IS NULL  OR
("account_move"."company_id" in (1)))
      ORDER BY  "account_move"."id"
    )
  )
  AND ("account_bank_statement_line__move_id"."journal_id" = 29)
)
ORDER BY "account_bank_statement_line__move_id"."date" DESC,
"account_bank_statement_line"."id" DESC LIMIT 1

If I remove the "LIMIT 1" on the last line, the query completes in 0.036s.

If I remove the WHERE clause, the query completes in 0.032s.

If I run the original query on Postgresql 12.6 (on a lower spec'd
host), it completes in 0.067s.
--
Jonathan Chen <jonc@chen.org.nz>


Re: Postgresql 13 query engine regression

From
Tom Lane
Date:
Jonathan Chen <jonc@chen.org.nz> writes:
> I am running Postgresql 13 as a backend for Odoo, and I believe I have
> discovered a regression with the query engine.

As Adrian noted, you haven't provided enough detail to let anyone
offer more than generalities.  However, in the spirit of generalities ...

ORDER BY with a small LIMIT is often a hard problem for the planner.
It has to guess whether a "fast start" plan is likely to win out over
a more straightforward plan.  "Fast start" typically looks like "scan
from the end of an index on the ORDER BY column, immediately emitting
any row that satisfies the WHERE conditions, and stop when you have
enough".  The other way generally involves collecting all the rows
satisfying WHERE, sorting them, and emitting the first few sort
outputs.  For small LIMIT, the "fast start" way can win big, by
not having to visit most of the rows nominally satisfying WHERE.
It can also lose big, if most of the rows near the end of the index
turn out not to satisfy the WHERE, so that it has to keep on scanning.

With the limited facts you've offered, it appears that the v12 planner
chose the right way and v13 didn't, but we can't tell which way is the
one that was faster.  It's unlikely that this is an actual regression,
in the sense of being anything we could fix.  It's at least as likely
that v12 was just lucky, or was right for the wrong reasons.  With an
example like you have here, with hard-to-estimate WHERE conditions,
there's a lot of luck involved ... especially if the WHERE conditions
are at all correlated with the ORDER BY order.  In this case, with
an "IN sub-SELECT" condition that looks suspiciously correlated with
the outer join condition, it's quite likely that the planner is
arriving at a totally off-base guess about how many rows the IN
eliminates.

Rather than blaming the planner for not being lucky, you should be
thinking about what you can do to help it out.  Can you get rid of
the additional join to account_move?  (Note that the one I'm
suspecting is extra is the LEFT JOIN, not the IN.)  I'd also
counsel losing the ORDER BY in the sub-select, as that's either
going to be ignored altogether or force a probably-useless sort,
not to mention creating an optimization barrier.

            regards, tom lane



Re: Postgresql 13 query engine regression

From
Jonathan Chen
Date:
Hi,

On Tue, 11 May 2021 at 10:34, Tom Lane <tgl@sss.pgh.pa.us> wrote:
[...]
> As Adrian noted, you haven't provided enough detail to let anyone
> offer more than generalities.

I'll be providing the output of EXPLAIN (ANALYZE, BUFFERS) in a separate post.

> However, in the spirit of generalities ...
[.. helpful explanation snipped ..]

> Rather than blaming the planner for not being lucky, you should be
> thinking about what you can do to help it out.  Can you get rid of
> the additional join to account_move?  (Note that the one I'm
> suspecting is extra is the LEFT JOIN, not the IN.)  I'd also
> counsel losing the ORDER BY in the sub-select, as that's either
> going to be ignored altogether or force a probably-useless sort,
> not to mention creating an optimization barrier.

Unfortunately, the SQL is generated by Odoo's ORM, and as much as I'd
like to be able to 'help it', their code has too many feature-hooks
for me to easily hack their code ... Odoo has used Postgresql as it s
database since its inception back in 2008. It is just unfortunate that
it has hit this stumbling block with the latest version of the
planner.

Cheers.
-- 
Jonathan Chen <jonc@chen.org.nz>



Re: Postgresql 13 query engine regression

From
Jonathan Chen
Date:
Hi,

Here's a bit more detail:
select version();

        version

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 13.2 on amd64-portbld-freebsd12.2, compiled by FreeBSD
clang version 10.0.1 (git@github.com:llvm/llvm-project.git
llvmorg-10.0.1-0-gef32c611aa2), 64-bit

For reference, the query is (this is generated by the Odoo ORM, with
all its redundant bits):
SELECT "account_bank_statement_line".id
FROM "account_bank_statement_line"
LEFT JOIN "account_move" AS "account_bank_statement_line__move_id"
  ON ("account_bank_statement_line"."move_id" =
"account_bank_statement_line__move_id"."id")
WHERE
(
  ("account_bank_statement_line"."move_id" in
    (
      SELECT "account_move".id
      FROM "account_move"
      WHERE ("account_move"."state" = 'posted')
      AND ("account_move"."company_id" IS NULL  OR
("account_move"."company_id" in (1)))
      ORDER BY  "account_move"."id"
    )
  )
  AND ("account_bank_statement_line__move_id"."journal_id" = 29)
)
order by "account_bank_statement_line__move_id"."date" DESC,
"account_bank_statement_line"."id" DESC LIMIT 1;

Query plan:
 Limit  (cost=14226.75..15563.99 rows=1 width=8) (actual
time=129162.351..129162.409 rows=0 loops=1)
   Buffers: shared hit=97340
   ->  Nested Loop Semi Join  (cost=14226.75..6134771.07 rows=4577
width=8) (actual time=129162.350..129162.408 rows=0 loops=1)
         Join Filter: (account_bank_statement_line.move_id = account_move.id)
         Rows Removed by Join Filter: 1198285616
         Buffers: shared hit=97340
         ->  Gather Merge  (cost=1010.15..20911.67 rows=4577 width=16)
(actual time=3.596..24.065 rows=13453 loops=1)
               Workers Planned: 1
               Workers Launched: 1
               Buffers: shared hit=93061
               ->  Incremental Sort  (cost=10.14..19396.75 rows=2692
width=16) (actual time=0.669..34.507 rows=6726 loops=2)
                     Sort Key:
account_bank_statement_line__move_id.date DESC,
account_bank_statement_line.id DESC
                     Presorted Key: account_bank_statement_line__move_id.date
                     Full-sort Groups: 29  Sort Method: quicksort
Average Memory: 27kB  Peak Memory: 27kB
                     Pre-sorted Groups: 17  Sort Method: quicksort
Average Memory: 30kB  Peak Memory: 30kB
                     Buffers: shared hit=93061
                     Worker 0:  Full-sort Groups: 185  Sort Method:
quicksort  Average Memory: 28kB  Peak Memory: 28kB
                       Pre-sorted Groups: 56  Sort Method: quicksort
Average Memory: 28kB  Peak Memory: 28kB
                     ->  Nested Loop  (cost=0.58..19299.09 rows=2692
width=16) (actual time=0.284..31.917 rows=6726 loops=2)
                           Buffers: shared hit=92996
                           ->  Parallel Index Scan Backward using
account_move_date_index on account_move
account_bank_statement_line__move_id  (cost=0.29..13152.37 rows=16040
width=8) (actual time=0.271..14.465 rows=13556 loops=2)
                                 Filter: (journal_id = 29)
                                 Rows Removed by Filter: 40958
                                 Buffers: shared hit=25318
                           ->  Index Scan using
account_bank_statement_line_move_id_idx on account_bank_statement_line
 (cost=0.29..0.37 rows=1 width=8) (actual time=0.001..0.001 rows=0
loops=27112)
                                 Index Cond: (move_id =
account_bank_statement_line__move_id.id)
                                 Buffers: shared hit=67678
         ->  Materialize  (cost=13216.60..14771.30 rows=88840 width=4)
(actual time=0.003..3.716 rows=89072 loops=13453)
               Buffers: shared hit=4279
               ->  Sort  (cost=13216.60..13438.70 rows=88840 width=4)
(actual time=35.560..39.472 rows=89072 loops=1)
                     Sort Key: account_move.id
                     Sort Method: quicksort  Memory: 6906kB
                     Buffers: shared hit=4279
                     ->  Seq Scan on account_move  (cost=0.00..5914.43
rows=88840 width=4) (actual time=0.007..21.195 rows=89072 loops=1)
                           Filter: (((company_id IS NULL) OR
(company_id = 1)) AND ((state)::text = 'posted'::text))
                           Rows Removed by Filter: 19957
                           Buffers: shared hit=4279
 Planning:
   Buffers: shared hit=83
 Planning Time: 0.708 ms
 Execution Time: 129162.830 ms

If I remove the "LIMIT 1", the Query plan changes to:
 Sort  (cost=21589.50..21600.94 rows=4577 width=8) (actual
time=46.266..46.269 rows=0 loops=1)
   Sort Key: account_bank_statement_line__move_id.date DESC,
account_bank_statement_line.id DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=4390
   ->  Hash Join  (cost=16543.33..21311.21 rows=4577 width=8) (actual
time=46.263..46.266 rows=0 loops=1)
         Hash Cond: (account_bank_statement_line__move_id.id =
account_bank_statement_line.move_id)
         Buffers: shared hit=4390
         ->  Bitmap Heap Scan on account_move
account_bank_statement_line__move_id  (cost=635.74..5255.59 rows=27268
width=8) (actual time=1.258..1.259 rows=1 loops=1)
               Recheck Cond: (journal_id = 29)
               Heap Blocks: exact=1
               Buffers: shared hit=108
               ->  Bitmap Index Scan on account_move_sequence_index2
(cost=0.00..628.93 rows=27268 width=0) (actual time=1.125..1.125
rows=27112 loops=1)
                     Index Cond: (journal_id = 29)
                     Buffers: shared hit=107
         ->  Hash  (cost=15678.84..15678.84 rows=18300 width=12)
(actual time=44.992..44.994 rows=0 loops=1)
               Buckets: 32768  Batches: 1  Memory Usage: 256kB
               Buffers: shared hit=4282
               ->  Merge Semi Join  (cost=14323.71..15678.84
rows=18300 width=12) (actual time=44.991..44.992 rows=0 loops=1)
                     Merge Cond: (account_bank_statement_line.move_id
= account_move.id)
                     Buffers: shared hit=4282
                     ->  Index Scan using
account_bank_statement_line_move_id_idx on account_bank_statement_line
 (cost=0.29..901.45 rows=18300 width=8) (actual time=0.004..0.004
rows=1 loops=1)
                           Buffers: shared hit=3
                     ->  Sort  (cost=13216.60..13438.70 rows=88840
width=4) (actual time=35.261..39.169 rows=89072 loops=1)
                           Sort Key: account_move.id
                           Sort Method: quicksort  Memory: 6906kB
                           Buffers: shared hit=4279
                           ->  Seq Scan on account_move
(cost=0.00..5914.43 rows=88840 width=4) (actual time=0.006..20.668
rows=89072 loops=1)
                                 Filter: (((company_id IS NULL) OR
(company_id = 1)) AND ((state)::text = 'posted'::text))
                                 Rows Removed by Filter: 19957
                                 Buffers: shared hit=4279
 Planning:
   Buffers: shared hit=40
 Planning Time: 0.471 ms
 Execution Time: 46.321 ms

For comparison, the original query on Postgresql 12:
 select version();
                                                version
--------------------------------------------------------------------------------------------------------
 PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3
20140911 (Red Hat 4.8.3-9), 64-bit

The query plan result is:
 Limit  (cost=21228.94..21228.94 rows=1 width=8) (actual
time=75.172..75.172 rows=0 loops=1)
   Buffers: shared hit=10127, temp read=153 written=154
   ->  Sort  (cost=21228.94..21240.31 rows=4547 width=8) (actual
time=75.171..75.171 rows=0 loops=1)
         Sort Key: account_bank_statement_line__move_id.date DESC,
account_bank_statement_line.id DESC
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=10127, temp read=153 written=154
         ->  Hash Join  (cost=16505.50..21206.20 rows=4547 width=8)
(actual time=75.167..75.168 rows=0 loops=1)
               Hash Cond: (account_bank_statement_line__move_id.id =
account_bank_statement_line.move_id)
               Buffers: shared hit=10127, temp read=153 written=154
               ->  Bitmap Heap Scan on account_move
account_bank_statement_line__move_id  (cost=629.42..5183.52 rows=26968
width=8) (actual time=1.253..5.304 rows=27166 loops=1)
                     Recheck Cond: (journal_id = 29)
                     Heap Blocks: exact=1337
                     Buffers: shared hit=1444
                     ->  Bitmap Index Scan on
account_move_sequence_index2  (cost=0.00..622.68 rows=26968 width=0)
(actual time=1.116..1.116 rows=27166 loops=1)
                           Index Cond: (journal_id = 29)
                           Buffers: shared hit=107
               ->  Hash  (cost=15646.27..15646.27 rows=18385 width=12)
(actual time=66.937..66.937 rows=1 loops=1)
                     Buckets: 32768  Batches: 1  Memory Usage: 257kB
                     Buffers: shared hit=8683, temp read=153 written=154
                     ->  Merge Semi Join  (cost=14283.78..15646.27
rows=18385 width=12) (actual time=66.896..66.933 rows=1 loops=1)
                           Merge Cond:
(account_bank_statement_line.move_id = account_move.id)
                           Buffers: shared hit=8683, temp read=153 written=154
                           ->  Index Scan using
account_bank_statement_line_move_id_idx on account_bank_statement_line
 (cost=0.29..907.70 rows=18385 width=8) (actual time=0.009..3.437
rows=18385 loops=1)
                                 Buffers: shared hit=4466
                           ->  Sort  (cost=13173.22..13395.83
rows=89045 width=4) (actual time=45.424..53.377 rows=89103 loops=1)
                                 Sort Key: account_move.id
                                 Sort Method: external merge  Disk: 1224kB
                                 Buffers: shared hit=4217, temp
read=153 written=154
                                 ->  Seq Scan on account_move
(cost=0.00..5852.72 rows=89045 width=4) (actual time=0.007..23.127
rows=89108 loops=1)
                                       Filter: (((company_id IS NULL)
OR (company_id = 1)) AND ((state)::text = 'posted'::text))
                                       Rows Removed by Filter: 19940
                                       Buffers: shared hit=4217
 Planning Time: 0.495 ms
 Execution Time: 75.544 ms
-- 
Jonathan Chen <jonc@chen.org.nz>



Re: Postgresql 13 query engine regression

From
Jonathan Chen
Date:
Hmm, there's a bit of text-mangling.

Let's use https://explain.depesz.com.

PG13 LIMIT 1 : https://explain.depesz.com/s/GFki
PG13 no LIMIT: https://explain.depesz.com/s/U4vR
PG12 LIMIT 1 : https://explain.depesz.com/s/JAp4
-- 
Jonathan Chen <jonc@chen.org.nz>



Re: Postgresql 13 query engine regression

From
David Rowley
Date:
On Tue, 11 May 2021 at 11:34, Jonathan Chen <jonc@chen.org.nz> wrote:
> PG13 LIMIT 1 : https://explain.depesz.com/s/GFki
> PG13 no LIMIT: https://explain.depesz.com/s/U4vR
> PG12 LIMIT 1 : https://explain.depesz.com/s/JAp4

The difference is coming from the fact that PostgreSQL 13 has
incremental sort and can use the account_move_date_index to provide
partially sorted input for the ORDER BY clause. PG12 didn't have this
feature, so there was no index that could help getting pre-sorted
input to the ORDER BY.

You'd get along much better if you got rid of the
account_move_date_index index and replaced it with:

CREATE INDEX account_move_date_id_index ON account_move (date,id);

Or instead or removing account_move_date_index, you could add an index such as:

CREATE INDEX account_move_journal_id_date_index (journal_id, date, id);

That should allow the query to run a bit more quickly. However, if the
first of the two is fast enough then it might be better to not add too
many extra indexes.

David



Re: Postgresql 13 query engine regression

From
Jonathan Chen
Date:
On Tue, 11 May 2021 at 12:49, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 11 May 2021 at 11:34, Jonathan Chen <jonc@chen.org.nz> wrote:
> > PG13 LIMIT 1 : https://explain.depesz.com/s/GFki
> > PG13 no LIMIT: https://explain.depesz.com/s/U4vR
> > PG12 LIMIT 1 : https://explain.depesz.com/s/JAp4
>
> The difference is coming from the fact that PostgreSQL 13 has
> incremental sort and can use the account_move_date_index to provide
> partially sorted input for the ORDER BY clause. PG12 didn't have this
> feature, so there was no index that could help getting pre-sorted
> input to the ORDER BY.
>
> You'd get along much better if you got rid of the
> account_move_date_index index and replaced it with:
>
> CREATE INDEX account_move_date_id_index ON account_move (date,id);
>
> Or instead or removing account_move_date_index, you could add an index such as:
>
> CREATE INDEX account_move_journal_id_date_index (journal_id, date, id);
>
> That should allow the query to run a bit more quickly. However, if the
> first of the two is fast enough then it might be better to not add too
> many extra indexes.

If I drop the index account_move_date_index, the speed of the query
improves to 0.035s. Adding the replacement index on (date, id) or
(journal_id, date, id) increases the query to > 47s again.

This gives me a bit of something to play with.

Thanks for the insight.
-- 
Jonathan Chen <jonc@chen.org.nz>