Potential bug introduced in PG17 with query parallelization - plan flip - Mailing list pgsql-hackers

From Jon Jenkins
Subject Potential bug introduced in PG17 with query parallelization - plan flip
Date
Msg-id CAO2j-nrdyx7AbpoRHJ1Tu_NFzCA3gRpqHSu_hN_NwY21K2dB1Q@mail.gmail.com
Whole thread Raw
Responses Re: Potential bug introduced in PG17 with query parallelization - plan flip
List pgsql-hackers
Hello!

My organization GitLab is planning on upgrading to PG17 in the near
future, and we performed a series of automated regression tests
against a set of sample queries. Most of the plans looked identical or
better between 16 and 17, but one that stood out was for the following
query:

SELECT
    COUNT(*)
FROM
    "merge_requests"
WHERE (NOT EXISTS (
        SELECT
            1
        FROM
            "banned_users"
        WHERE (banned_users.user_id = (merge_requests.author_id + 0))))
AND "merge_requests"."target_project_id" = 16567575

(I can provide schemata for the relevant tables if need be.)

In PG16, we the planner suggests a simple index scan:

Aggregate  (cost=2256.86..2256.87 rows=1 width=8) (actual
time=22.899..22.900 rows=1 loops=1)
   Buffers: shared hit=37138
   I/O Timings: read=0.000 write=0.000
   ->  Nested Loop Anti Join  (cost=1.00..2255.90 rows=386 width=0)
(actual time=0.045..22.424 rows=9271 loops=1)
         Buffers: shared hit=37138
         I/O Timings: read=0.000 write=0.000
         ->  Index Scan using index_on_merge_requests_for_latest_diffs
on public.merge_requests  (cost=0.57..1011.36 rows=773 width=4)
(actual time=0.015..10.901 rows=9271 loops=1)
               Index Cond: (merge_requests.target_project_id = 16567575)
               Buffers: shared hit=9322
               I/O Timings: read=0.000 write=0.000
         ->  Index Only Scan using banned_users_pkey on
public.banned_users  (cost=0.43..1.84 rows=1 width=8) (actual
time=0.001..0.001 rows=0 loops=9271)
               Index Cond: (banned_users.user_id =
(merge_requests.author_id + 0))
               Heap Fetches: 0
               Buffers: shared hit=27816
               I/O Timings: read=0.000 write=0.000


However, on 17, a "more expensive" plan is proposed that does a gather
between workers, but largely performs the exact same:

Aggregate  (cost=28112.54..28112.55 rows=1 width=8) (actual
time=17.605..22.876 rows=1 loops=1)
  Buffers: shared hit=36075
  ->  Gather  (cost=28112.42..28112.53 rows=1 width=8) (actual
time=17.347..22.871 rows=2 loops=1)
    Buffers: shared hit=36075
    ->  Aggregate  (cost=27112.42..27112.43 rows=1 width=8) (actual
time=14.414..14.415 rows=1 loops=2)
      Buffers: shared hit=36075
      ->  Nested Loop  (cost=1.00..27102.10 rows=4131 width=0) (actual
time=0.052..14.126 rows=4498 loops=2)
        Buffers: shared hit=36075
        Join Type: Anti
        ->  Index Scan using
index_merge_requests_on_target_project_id_and_created_at_and_id
(cost=0.57..19664.54 rows=8262 width=4) (actual time=0.021..7.890
rows=4498 loops=2)
          Buffers: shared hit=9086
          Index Cond: (merge_requests.target_project_id = 16567575)
        ->  Index Only Scan using banned_users_pkey  (cost=0.43..0.99
rows=1 width=8) (actual time=0.001..0.001 rows=0 loops=8995)
          Buffers: shared hit=26989
          Index Cond: (banned_users.user_id = (merge_requests.author_id + 0))

Planning Time: 0.369 ms
Execution Time: 22.919 ms

(Sorry for the formatting differences, I had to rehydrate this one from JSON)

I don't know enough about planner internals to specifically track this
code, but I did see that this patch has been added to 17 that deals
with parallelizing:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e08d74ca1

Since both queries perform roughly the same in terms of buffers read
and execution time (both DBs have the exact or near-exact same heap
snapshot), I wonder if there's an issue with the cost calculation
method for the more "expensive" plan?

-- 
Jon Jenkins
Senior Backend Engineer | GitLab



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: MinGW compiler warnings in ecpg tests
Next
From: Philip Alger
Date:
Subject: [PATCH] Add pg_get_type_ddl() to retrieve the CREATE TYPE statement