The following bug has been logged on the website:
Bug reference: 19449
Logged by: Adrian
Email address: adrian.moennich@cern.ch
PostgreSQL version: 18.3
Operating system: Linux
Description:
In Indico (an open source conference mgmt tool which I maintain and develop)
I noticed that a
certain query to gather statistics became extremely slow on newer Postgres
version on our production
database. And with extremely slow I mean 3 hours instead of a few seconds.
To replicate:
$ podman run -it --rm -p 65432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust
--shm-size 8G docker.io/postgres:XX-alpine
$ createdb -h 127.0.0.1 -p 65432 -U postgres test
$ psql -h 127.0.0.1 -p 65432 -U postgres test -f data.sql
$ psql -h 127.0.0.1 -p 65432 -U postgres test -f stats.sql
Likely works fine with Docker as well, or with a non-containerized setup.
I just used podman/containers because of the convenience to run different
Postgres versions.
XX=14: Works fine, even w/o the increased shm-size of the container
XX=15: Works fine but only with the increased shm-size of the container
XX={16,17,17}: Massive CPU and disk usage (tens of gigabytes)
On these simple reproducers I did not keep the query running on 16+.
However, I ran it on a postgres 16.11 instance on our production setup (with
our real database),
and there the query finished only after over 3 hours(!).
This is extreme both in general and compared to the performance we got on
14/15, where the same
query took just a few seconds.
Here are EXPLAIN ANALYZE outputs from when I tested this a few weeks ago on
14 and 16
using our real production database.
https://explain.depesz.com/s/17Fp
https://explain.depesz.com/s/0dHI
For the reproducer above I created a dumbed down version of my real data
which basically just has
the relevant columns, FKs and indexes but no actual data. I'm sharing a link
to the data.sql file
since it's 250 MB uncompressed and still 50 MB compressed.
Structure + dummy data: https://fd.aeum.net/pgperf/data.sql.bz2
Problematic query: https://fd.aeum.net/pgperf/stats.sql
For the sake of having the query here and not just in an external file:
```
EXPLAIN ANALYZE SELECT count(attachments.attachments.id) AS count_1
FROM attachments.attachments
JOIN attachments.folders ON attachments.folders.id =
attachments.attachments.folder_id
JOIN events.events ON events.events.id = attachments.folders.event_id
LEFT OUTER JOIN events.sessions ON events.sessions.id =
attachments.folders.session_id
LEFT OUTER JOIN events.contributions ON events.contributions.id =
attachments.folders.contribution_id
LEFT OUTER JOIN events.subcontributions ON events.subcontributions.id =
attachments.folders.subcontribution_id
LEFT OUTER JOIN events.contributions AS contributions_1 ON
contributions_1.id = events.subcontributions.contribution_id
WHERE attachments.folders.link_type != 1
AND NOT attachments.attachments.is_deleted
AND NOT attachments.folders.is_deleted
AND NOT events.events.is_deleted
AND NOT coalesce(events.sessions.is_deleted,
events.contributions.is_deleted, events.subcontributions.is_deleted, false)
AND (contributions_1.is_deleted IS NULL
OR NOT contributions_1.is_deleted)
```