BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10 - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10
Date
Msg-id 18177-a282c2eaaf791f21@postgresql.org
Whole thread Raw
Responses Re: BUG #18177: certain queries under certain contexts take multiple orders of magnitude longer compared to v10  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18177
Logged by:          Robert Leach
Email address:      rleach@princeton.edu
PostgreSQL version: 13.12
Operating system:   macOS and linux
Description:

We upgraded our Django on a large project from 3.2 to 4.2 (and postgres from
10 to 13) and found that our test suite which typically took about 25-30
minutes roughly doubled in running time (50 to 75 minutes).  Switching back
to Django 3.2 but keeping postgres 13 also experiences the same slow-down.
I isolated one test that normally took 90s that now takes consistently 400s
and further narrowed it down to a single (simple) query.  That test would
run fast if run by itself, but it run after a series of other tests, would
take 400s.

I extracted the SQL generated for the query and performed an `explain` on
it, and it showed that the actual time took way longer than the estimated
cost.  Doing a vacuum full analyze was able to restore the speed, but the
only place where I was able to use it to get the speed back, relatively
caused weird errors in other tests.

I cannot claim to understand the bug that is causing this issue, so the best
I can do is simply provide the explain output and try to keep from providing
confusing details, because this is outside the realm of my expertise:

```
('Aggregate  (cost=45.77..45.78 rows=1 width=8) (actual
time=202326.210..202326.214 rows=1 loops=1)',)
('  ->  Unique  (cost=0.88..45.76 rows=1 width=1052) (actual
time=10492.963..202326.205 rows=1 loops=1)',)
('        ->  Nested Loop  (cost=0.88..45.75 rows=1 width=1052) (actual
time=10492.959..202326.188 rows=8 loops=1)',)
('              Join Filter: ("DataRepo_peakdata".id =
"DataRepo_peakdatalabel".peak_data_id)',)
('              Rows Removed by Join Filter: 3076',)
('              ->  Nested Loop  (cost=0.62..37.47 rows=1 width=8) (actual
time=10492.455..202323.935 rows=8 loops=1)',)
('                    Join Filter: ("DataRepo_peakgroup".id =
"DataRepo_peakdata".peak_group_id)',)
('                    Rows Removed by Join Filter: 7888',)
('                    ->  Index Scan using
"DataRepo_peakdata_peak_group_id_4dd87f4a" on "DataRepo_peakdata"
(cost=0.25..8.26 rows=1 width=8) (actual time=0.017..7.149 rows=7896
loops=1)',)
('                    ->  Nested Loop  (cost=0.38..29.20 rows=1 width=4)
(actual time=2.942..25.621 rows=1 loops=7896)',)
('                          Join Filter:
("DataRepo_peakgroup".peak_annotation_file_id =
"DataRepo_archivefile".id)',)
('                          ->  Nested Loop  (cost=0.38..18.43 rows=1
width=8) (actual time=2.935..25.614 rows=1 loops=7896)',)
('                                Join Filter: ("DataRepo_msrun".sample_id =
"DataRepo_sample".id)',)
('                                Rows Removed by Join Filter: 83',)
('                                ->  Nested Loop  (cost=0.12..10.15 rows=1
width=12) (actual time=0.028..2.513 rows=84 loops=7896)',)
('                                      Join Filter:
("DataRepo_peakgroup".msrun_id = "DataRepo_msrun".id)',)
('                                      Rows Removed by Join Filter:
3486',)
('                                      ->  Index Scan using
"DataRepo_peakgroup_peak_annotation_file_id_6dc2fc25" on
"DataRepo_peakgroup"  (cost=0.12..8.14 rows=1 width=12) (actual
time=0.006..0.464 rows=84 loops=7896)',)
("                                            Filter: ((name)::text =
'serine'::text)",)
('                                            Rows Removed by Filter:
1008',)
('                                      ->  Seq Scan on "DataRepo_msrun"
(cost=0.00..2.00 rows=1 width=8) (actual time=0.013..0.017 rows=42
loops=663264)',)
('                                ->  Index Scan using
"DataRepo_sample_pkey" on "DataRepo_sample"  (cost=0.25..8.27 rows=1
width=4) (actual time=0.194..0.275 rows=1 loops=663264)',)
("                                      Filter: ((name)::text =
'xzl5_panc'::text)",)
('                                      Rows Removed by Filter: 281',)
('                          ->  Seq Scan on "DataRepo_archivefile"
(cost=0.00..10.75 rows=1 width=4) (actual time=0.002..0.002 rows=1
loops=7896)',)
("                                Filter: ((filename)::text =
'alafasted_cor.xlsx'::text)",)
('              ->  Index Scan using
"DataRepo_peakdatalabel_peak_data_id_f6151d4a" on "DataRepo_peakdatalabel"
(cost=0.25..8.27 rows=1 width=12) (actual time=0.007..0.217 rows=386
loops=8)',)
("                    Filter: ((element)::text = 'C'::text)",)
('                    Rows Removed by Filter: 180',)
('Planning Time: 0.963 ms',)
('Execution Time: 202327.334 ms',)
```

There is a thread on the Django forum with lots of other details, but at the
outset of that discussion, I didn't even know what was causing the speed
issue.  For all I knew, it was some other change to our code, but with the
help of the folks in the Django forum, I think that this is conclusively due
to postgres 13.

https://forum.djangoproject.com/t/test-time-doubled-after-django-3-2-4-2-and-postgres-10-13-update/24843/16

If this is a known issue, is there a version of postgres (after v13) that
resolves it?


pgsql-bugs by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: BUG #18170: Unexpected error: no relation entry for relid 3
Next
From: PG Bug reporting form
Date:
Subject: BUG #18178: New Restriction on "ON SELECT" rules on tables