Re: BUG #18966: Invalid SQL queries hang indefinitely until server restart - Mailing list pgsql-bugs
From | Jérémy Spriet |
---|---|
Subject | Re: BUG #18966: Invalid SQL queries hang indefinitely until server restart |
Date | |
Msg-id | CA+JbTT+aS+gM0dicANUfDWwBTka6FXQzyT5VSMmAmj+2wBsrDw@mail.gmail.com Whole thread Raw |
List | pgsql-bugs |
> What exactly is "invalid" about the troublesome queries? What does
> pg_stat_activity show them as doing? When the problem is occurring,
> can you run EXPLAIN (*not* EXPLAIN ANALYZE) on them, or does that
> hang too?
Yes, they hang when I run EXPLAIN on them, too.
Here is a 'select * from pg_stat_activity' in CSV format when the problem occurred:
datid,datname,pid,leader_pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start,xact_start,query_start,state_change,wait_event_type,wait_event,state,backend_xid,backend_xmin,query_id,query,backend_type
16384,mydb,61,,10,role_superadmin,pg_cron scheduler,,,,2025-06-25 08:15:40.365377+00,,,,Extension,Extension,,,,,,pg_cron launcher
,,60,,,,,,,,2025-06-25 08:15:40.365266+00,,,,Activity,AutoVacuumMain,,,,,,autovacuum launcher
,,63,,10,role_superadmin,,,,,2025-06-25 08:15:40.366251+00,,,,Activity,LogicalLauncherMain,,,,,,logical replication launcher
,,62,,10,role_superadmin,TimescaleDB Background Worker Launcher,,,,2025-06-25 08:15:40.365899+00,,,,Extension,Extension,,,,,,TimescaleDB Background Worker Launcher
16384,mydb,69,,10,role_superadmin,,127.0.0.1,,49760,2025-06-25 08:15:40.518269+00,,2025-06-25 08:15:40.539926+00,2025-06-25 08:15:40.539959+00,Client,ClientRead,idle,,,-8817007929064609608,SELECT pg_advisory_unlock($1),client backend
16384,mydb,168,,16386,role_admin,,,,-1,2025-06-25 08:37:46.868834+00,2025-06-25 08:44:11.558469+00,2025-06-25 08:44:11.558469+00,2025-06-25 08:44:11.558472+00,,,active,,,3749380189022910195,ROLLBACK,client backend
5,postgres,78,,16385,role_monitoring,,127.0.0.1,,49766,2025-06-25 08:15:42.344303+00,,2025-06-25 13:08:42.355812+00,2025-06-25 13:08:42.355985+00,Client,ClientRead,idle,,,4452344162736673220,SELECT * FROM pg_stat_database_conflicts;,client backend
5,postgres,79,,16385,role_monitoring,,127.0.0.1,,49770,2025-06-25 08:15:42.345051+00,,2025-06-25 13:08:42.34076+00,2025-06-25 13:08:42.347132+00,Client,ClientRead,idle,,,8133081727313365291,"SELECT pg_database.datname
,pg_database_size(pg_database.datname)
FROM pg_database;",client backend
16384,mydb,207,,16386,role_admin,,,,-1,2025-06-25 08:55:00.037971+00,2025-06-25 08:55:15.378432+00,2025-06-25 08:55:15.378432+00,2025-06-25 08:55:15.378433+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,719,,16386,role_admin,,,,-1,2025-06-25 11:17:57.910472+00,2025-06-25 11:17:57.932789+00,2025-06-25 11:17:57.94765+00,2025-06-25 11:17:57.947651+00,,,active,,323299,1708853975061214229,SELECT pg_cancel_backend(0);,client backend
16384,mydb,320,,16386,role_admin,,,,-1,2025-06-25 09:44:53.568006+00,2025-06-25 09:44:53.597946+00,2025-06-25 09:44:53.939852+00,2025-06-25 09:44:53.939856+00,,,active,,323299,,"WITH infos_base AS (
SELECT DISTINCT aca, region, departement, libelle_nature
FROM contexte.comptage
WHERE uai = 'S105806'
LIMIT 1
),
derniere_annee AS (
SELECT MAX(annee_sc) AS annee
FROM contexte.comptage
WHERE uai = 'S105806'
)
-- uai_base
SELECT 'uai_base' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc
UNION
-- uai_comp
SELECT 'uai_comp' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN dernier",client backend
16384,mydb,321,,16386,role_admin,,,,-1,2025-06-25 09:45:01.856679+00,2025-06-25 09:47:47.979227+00,2025-06-25 09:47:48.046105+00,2025-06-25 09:47:48.046109+00,,,active,,323299,,"SELECT 'uai_base' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.nb_peoples) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc
LIMIT 1001",client backend
16384,mydb,781,,16386,role_admin,,,,-1,2025-06-25 11:46:45.062895+00,2025-06-25 11:46:45.082754+00,2025-06-25 11:46:45.096873+00,2025-06-25 11:46:45.096874+00,,,active,,,6650430479887907887,BEGIN,client backend
16384,mydb,789,,16386,role_admin,,,,-1,2025-06-25 11:50:10.987069+00,2025-06-25 11:50:15.903804+00,2025-06-25 11:50:15.903804+00,2025-06-25 11:50:15.903807+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,914,,10,role_superadmin,psql,,,-1,2025-06-25 12:29:28.685433+00,2025-06-25 12:31:49.087451+00,2025-06-25 12:31:49.087451+00,2025-06-25 12:31:49.087454+00,,,active,,323329,,select * from bad_table;,client backend
,,53,,,,,,,,2025-06-25 08:15:40.338819+00,,,,Activity,BgWriterHibernate,,,,,,background writer
,,52,,,,,,,,2025-06-25 08:15:40.337776+00,,,,,,,,,,,checkpointer
,,59,,,,,,,,2025-06-25 08:15:40.364229+00,,,,Activity,WalWriterMain,,,,,,walwriter
> pg_stat_activity show them as doing? When the problem is occurring,
> can you run EXPLAIN (*not* EXPLAIN ANALYZE) on them, or does that
> hang too?
Yes, they hang when I run EXPLAIN on them, too.
Here is a 'select * from pg_stat_activity' in CSV format when the problem occurred:
datid,datname,pid,leader_pid,usesysid,usename,application_name,client_addr,client_hostname,client_port,backend_start,xact_start,query_start,state_change,wait_event_type,wait_event,state,backend_xid,backend_xmin,query_id,query,backend_type
16384,mydb,61,,10,role_superadmin,pg_cron scheduler,,,,2025-06-25 08:15:40.365377+00,,,,Extension,Extension,,,,,,pg_cron launcher
,,60,,,,,,,,2025-06-25 08:15:40.365266+00,,,,Activity,AutoVacuumMain,,,,,,autovacuum launcher
,,63,,10,role_superadmin,,,,,2025-06-25 08:15:40.366251+00,,,,Activity,LogicalLauncherMain,,,,,,logical replication launcher
,,62,,10,role_superadmin,TimescaleDB Background Worker Launcher,,,,2025-06-25 08:15:40.365899+00,,,,Extension,Extension,,,,,,TimescaleDB Background Worker Launcher
16384,mydb,69,,10,role_superadmin,,127.0.0.1,,49760,2025-06-25 08:15:40.518269+00,,2025-06-25 08:15:40.539926+00,2025-06-25 08:15:40.539959+00,Client,ClientRead,idle,,,-8817007929064609608,SELECT pg_advisory_unlock($1),client backend
16384,mydb,168,,16386,role_admin,,,,-1,2025-06-25 08:37:46.868834+00,2025-06-25 08:44:11.558469+00,2025-06-25 08:44:11.558469+00,2025-06-25 08:44:11.558472+00,,,active,,,3749380189022910195,ROLLBACK,client backend
5,postgres,78,,16385,role_monitoring,,127.0.0.1,,49766,2025-06-25 08:15:42.344303+00,,2025-06-25 13:08:42.355812+00,2025-06-25 13:08:42.355985+00,Client,ClientRead,idle,,,4452344162736673220,SELECT * FROM pg_stat_database_conflicts;,client backend
5,postgres,79,,16385,role_monitoring,,127.0.0.1,,49770,2025-06-25 08:15:42.345051+00,,2025-06-25 13:08:42.34076+00,2025-06-25 13:08:42.347132+00,Client,ClientRead,idle,,,8133081727313365291,"SELECT pg_database.datname
,pg_database_size(pg_database.datname)
FROM pg_database;",client backend
16384,mydb,207,,16386,role_admin,,,,-1,2025-06-25 08:55:00.037971+00,2025-06-25 08:55:15.378432+00,2025-06-25 08:55:15.378432+00,2025-06-25 08:55:15.378433+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,719,,16386,role_admin,,,,-1,2025-06-25 11:17:57.910472+00,2025-06-25 11:17:57.932789+00,2025-06-25 11:17:57.94765+00,2025-06-25 11:17:57.947651+00,,,active,,323299,1708853975061214229,SELECT pg_cancel_backend(0);,client backend
16384,mydb,320,,16386,role_admin,,,,-1,2025-06-25 09:44:53.568006+00,2025-06-25 09:44:53.597946+00,2025-06-25 09:44:53.939852+00,2025-06-25 09:44:53.939856+00,,,active,,323299,,"WITH infos_base AS (
SELECT DISTINCT aca, region, departement, libelle_nature
FROM contexte.comptage
WHERE uai = 'S105806'
LIMIT 1
),
derniere_annee AS (
SELECT MAX(annee_sc) AS annee
FROM contexte.comptage
WHERE uai = 'S105806'
)
-- uai_base
SELECT 'uai_base' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc
UNION
-- uai_comp
SELECT 'uai_comp' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.effectif) AS nb_peoples
FROM contexte.comptage e
JOIN dernier",client backend
16384,mydb,321,,16386,role_admin,,,,-1,2025-06-25 09:45:01.856679+00,2025-06-25 09:47:47.979227+00,2025-06-25 09:47:48.046105+00,2025-06-25 09:47:48.046109+00,,,active,,323299,,"SELECT 'uai_base' AS type, e.uai AS id,
e.type_etab,
e.libelle_nature,
e.annee_sc,
SUM(e.nb_peoples) AS nb_peoples
FROM contexte.comptage e
JOIN derniere_annee da ON e.annee_sc = da.annee
WHERE e.uai = 'S105806'
AND e.libelle_nature = (SELECT libelle_nature FROM infos_base LIMIT 1)
GROUP BY e.uai, e.type_etab, e.libelle_nature, e.annee_sc
LIMIT 1001",client backend
16384,mydb,781,,16386,role_admin,,,,-1,2025-06-25 11:46:45.062895+00,2025-06-25 11:46:45.082754+00,2025-06-25 11:46:45.096873+00,2025-06-25 11:46:45.096874+00,,,active,,,6650430479887907887,BEGIN,client backend
16384,mydb,789,,16386,role_admin,,,,-1,2025-06-25 11:50:10.987069+00,2025-06-25 11:50:15.903804+00,2025-06-25 11:50:15.903804+00,2025-06-25 11:50:15.903807+00,,,active,,,2064869707185898531,COMMIT,client backend
16384,mydb,914,,10,role_superadmin,psql,,,-1,2025-06-25 12:29:28.685433+00,2025-06-25 12:31:49.087451+00,2025-06-25 12:31:49.087451+00,2025-06-25 12:31:49.087454+00,,,active,,323329,,select * from bad_table;,client backend
,,53,,,,,,,,2025-06-25 08:15:40.338819+00,,,,Activity,BgWriterHibernate,,,,,,background writer
,,52,,,,,,,,2025-06-25 08:15:40.337776+00,,,,,,,,,,,checkpointer
,,59,,,,,,,,2025-06-25 08:15:40.364229+00,,,,Activity,WalWriterMain,,,,,,walwriter
Also, when I checked the lock queries, I saw this:
SELECT pid, locktype, mode, relation::regclass AS table, virtualtransaction, granted FROM pg_locks WHERE relation IS NOT NULL ORDER BY relation, mode;
pid | locktype | mode | table | virtualtransaction | granted
-----+----------+-----------------+--------------------------+--------------------+---------
914 | relation | AccessShareLock | pg_locks | 18/17 | t
321 | relation | AccessShareLock | contexte.comptage | 12/43 | t
320 | relation | AccessShareLock | contexte.comptage | 11/157 | t
(3 rows)
> I don't suppose you can take timescaledb out of the mix to see
> if it's contributing to this.
I will check, but it is not possible to do so easily.
> Yes. One speculation that occurs to me is that the control group settings
> of your container are such that a process that exceeds the memory quota is
> suspended until enough memory becomes free.
The RAM on my Docker was 1GB/4GB when the problem occurred. However, after I continued to run queries, the memory usage increased to 4GB.
Jérémy
SELECT pid, locktype, mode, relation::regclass AS table, virtualtransaction, granted FROM pg_locks WHERE relation IS NOT NULL ORDER BY relation, mode;
pid | locktype | mode | table | virtualtransaction | granted
-----+----------+-----------------+--------------------------+--------------------+---------
914 | relation | AccessShareLock | pg_locks | 18/17 | t
321 | relation | AccessShareLock | contexte.comptage | 12/43 | t
320 | relation | AccessShareLock | contexte.comptage | 11/157 | t
(3 rows)
> I don't suppose you can take timescaledb out of the mix to see
> if it's contributing to this.
I will check, but it is not possible to do so easily.
> Yes. One speculation that occurs to me is that the control group settings
> of your container are such that a process that exceeds the memory quota is
> suspended until enough memory becomes free.
The RAM on my Docker was 1GB/4GB when the problem occurred. However, after I continued to run queries, the memory usage increased to 4GB.
Jérémy
pgsql-bugs by date: