Re: BUG #18966: Invalid SQL queries hang indefinitely until server restart - Mailing list pgsql-bugs

From Jérémy Spriet
Subject Re: BUG #18966: Invalid SQL queries hang indefinitely until server restart
Date
Msg-id CA+JbTT+aS+gM0dicANUfDWwBTka6FXQzyT5VSMmAmj+2wBsrDw@mail.gmail.com
Whole thread Raw
List pgsql-bugs
> What exactly is "invalid" about the troublesome queries?  What does
> pg_stat_activity show them as doing?  When the problem is occurring,
> can you run EXPLAIN (*not* EXPLAIN ANALYZE) on them, or does that
> hang too?
Yes, they hang when I run EXPLAIN on them, too.

Here is a 'select * from pg_stat_activity' in CSV format when the problem occurred:
datid,datname,pid,leader_pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start,xact_start,query_start,state_change,wait_event_type,wait_event,state,backend_xid,backend_xmin,query_id,query,backend_type
16384,mydb,61,,10,role_superadmin,pg_cron scheduler,,,,2025-06-25 08:15:40.365377+00,,,,Extension,Extension,,,,,,pg_cron launcher
,,60,,,,,,,,2025-06-25 08:15:40.365266+00,,,,Activity,AutoVacuumMain,,,,,,autovacuum launcher
,,63,,10,role_superadmin,,,,,2025-06-25 08:15:40.366251+00,,,,Activity,LogicalLauncherMain,,,,,,logical replication launcher
,,62,,10,role_superadmin,TimescaleDB Background Worker Launcher,,,,2025-06-25 08:15:40.365899+00,,,,Extension,Extension,,,,,,TimescaleDB Background Worker Launcher
16384,mydb,69,,10,role_superadmin,,127.0.0.1,,49760,2025-06-25 08:15:40.518269+00,,2025-06-25 08:15:40.539926+00,2025-06-25 08:15:40.539959+00,Client,ClientRead,idle,,,-8817007929064609608,SELECT pg_advisory_unlock($1),client backend
16384,mydb,168,,16386,role_admin,,,,-1,2025-06-25 08:37:46.868834+00,2025-06-25 08:44:11.558469+00,2025-06-25 08:44:11.558469+00,2025-06-25 08:44:11.558472+00,,,active,,,3749380189022910195,ROLLBACK,client backend
5,postgres,78,,16385,role_monitoring,,127.0.0.1,,49766,2025-06-25 08:15:42.344303+00,,2025-06-25 13:08:42.355812+00,2025-06-25 13:08:42.355985+00,Client,ClientRead,idle,,,4452344162736673220,SELECT * FROM pg_stat_database_conflicts;,client backend
5,postgres,79,,16385,role_monitoring,,127.0.0.1,,49770,2025-06-25 08:15:42.345051+00,,2025-06-25 13:08:42.34076+00,2025-06-25 13:08:42.347132+00,Client,ClientRead,idle,,,8133081727313365291,"SELECT pg_database.datname
,pg_database_size(pg_database.datname)
FROM pg_database;",client backend
16384,mydb,207,,16386,role_admin,,,,-1,2025-06-25 08:55:00.037971+00,2025-06-25 08:55:15.378432+00,2025-06-25 08:55:15.378432+00,2025-06-25 08:55:15.378433+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,719,,16386,role_admin,,,,-1,2025-06-25 11:17:57.910472+00,2025-06-25 11:17:57.932789+00,2025-06-25 11:17:57.94765+00,2025-06-25 11:17:57.947651+00,,,active,,323299,1708853975061214229,SELECT pg_cancel_backend(0);,client backend
16384,mydb,320,,16386,role_admin,,,,-1,2025-06-25 09:44:53.568006+00,2025-06-25 09:44:53.597946+00,2025-06-25 09:44:53.939852+00,2025-06-25 09:44:53.939856+00,,,active,,323299,,"WITH infos_base AS (
  SELECT DISTINCT aca, region, departement, libelle_nature
  FROM contexte.comptage
  WHERE uai = 'S105806'
  LIMIT 1
),
derniere_annee AS (
  SELECT MAX(annee_sc) AS annee
  FROM contexte.comptage
  WHERE uai = 'S105806'
)

-- uai_base
SELECT 'uai_base' AS type, e.uai AS id,
       e.type_etab,
       e.libelle_nature,
       e.annee_sc,
       SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
  AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc

UNION

-- uai_comp
SELECT 'uai_comp' AS type, e.uai AS id,
       e.type_etab,
       e.libelle_nature,
       e.annee_sc,
       SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN dernier",client backend
16384,mydb,321,,16386,role_admin,,,,-1,2025-06-25 09:45:01.856679+00,2025-06-25 09:47:47.979227+00,2025-06-25 09:47:48.046105+00,2025-06-25 09:47:48.046109+00,,,active,,323299,,"SELECT 'uai_base' AS type, e.uai AS id,
       e.type_etab,
       e.libelle_nature,
       e.annee_sc,
       SUM(e.nb_peoples) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
  AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc
LIMIT 1001",client backend
16384,mydb,781,,16386,role_admin,,,,-1,2025-06-25 11:46:45.062895+00,2025-06-25 11:46:45.082754+00,2025-06-25 11:46:45.096873+00,2025-06-25 11:46:45.096874+00,,,active,,,6650430479887907887,BEGIN,client backend
16384,mydb,789,,16386,role_admin,,,,-1,2025-06-25 11:50:10.987069+00,2025-06-25 11:50:15.903804+00,2025-06-25 11:50:15.903804+00,2025-06-25 11:50:15.903807+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,914,,10,role_superadmin,psql,,,-1,2025-06-25 12:29:28.685433+00,2025-06-25 12:31:49.087451+00,2025-06-25 12:31:49.087451+00,2025-06-25 12:31:49.087454+00,,,active,,323329,,select * from bad_table;,client backend
,,53,,,,,,,,2025-06-25 08:15:40.338819+00,,,,Activity,BgWriterHibernate,,,,,,background writer
,,52,,,,,,,,2025-06-25 08:15:40.337776+00,,,,,,,,,,,checkpointer
,,59,,,,,,,,2025-06-25 08:15:40.364229+00,,,,Activity,WalWriterMain,,,,,,walwriter

Also, when I checked the lock queries, I saw this:
SELECT pid, locktype, mode, relation::regclass AS table, virtualtransaction, granted FROM pg_locks WHERE relation IS NOT NULL ORDER BY relation, mode;
 pid | locktype |      mode       |          table           | virtualtransaction | granted
-----+----------+-----------------+--------------------------+--------------------+---------
 914 | relation | AccessShareLock | pg_locks                 | 18/17              | t
 321 | relation | AccessShareLock | contexte.comptage | 12/43              | t
 320 | relation | AccessShareLock | contexte.comptage | 11/157             | t
(3 rows)

> I don't suppose you can take timescaledb out of the mix to see
> if it's contributing to this.
I will check, but it is not possible to do so easily.

> Yes.  One speculation that occurs to me is that the control group settings
> of your container are such that a process that exceeds the memory quota is
> suspended until enough memory becomes free.
The RAM on my Docker was 1GB/4GB when the problem occurred. However, after I continued to run queries, the memory usage increased to 4GB.

Jérémy

pgsql-bugs by date:

Previous
From: Daniil Davydov
Date:
Subject: Re: BUG #16961: Could not access status of transaction
Next
From: Nathan Bossart
Date:
Subject: Re: BUG #18964: `ALTER DATABASE ... RESET ...` fails to reset extension parameters that no longer exist