Thread: XX000: tuple concurrently deleted during DROP STATISTICS
I found this in our logs, and reproduced it under v11-v16. CREATE TABLE t(a int, b int); INSERT INTO t SELECT generate_series(1,999); CREATE STATISTICS t_stats ON a,b FROM t; while :; do psql postgres -qtxc "ANALYZE t"; done & while :; do psql postgres -qtxc "begin; DROP STATISTICS t_stats"; done & It's known that concurrent DDL can hit elog(). But in this case, there's only one DDL operation. (gdb) bt #0 0x00000000009442a0 in pg_re_throw () #1 0x0000000000943504 in errfinish () #2 0x00000000004fcafe in simple_heap_delete () #3 0x0000000000639d3f in RemoveStatisticsDataById () #4 0x0000000000639d79 in RemoveStatisticsById () #5 0x000000000057a428 in deleteObjectsInList () #6 0x000000000057a8f0 in performMultipleDeletions () #7 0x000000000060b5ed in RemoveObjects () #8 0x00000000008099ce in ProcessUtilitySlow.isra.1 () #9 0x0000000000808c71 in standard_ProcessUtility () #10 0x00007efbfed7a508 in pgss_ProcessUtility () from /usr/pgsql-16/lib/pg_stat_statements.so #11 0x000000000080745a in PortalRunUtility () #12 0x0000000000807579 in PortalRunMulti () #13 0x00000000008079dc in PortalRun () #14 0x0000000000803927 in exec_simple_query () #15 0x0000000000803f28 in PostgresMain () #16 0x000000000077bae6 in ServerLoop () #17 0x000000000077cbaa in PostmasterMain () #18 0x00000000004ba788 in main () -- Justin
On 11/8/23 16:10, Justin Pryzby wrote: > I found this in our logs, and reproduced it under v11-v16. > > CREATE TABLE t(a int, b int); > INSERT INTO t SELECT generate_series(1,999); > CREATE STATISTICS t_stats ON a,b FROM t; > > while :; do psql postgres -qtxc "ANALYZE t"; done & > while :; do psql postgres -qtxc "begin; DROP STATISTICS t_stats"; done & > > It's known that concurrent DDL can hit elog(). But in this case, > there's only one DDL operation. > AFAICS this happens because store_statext (after ANALYZE builds the new statistics) does this: ---------------------------- /* * Delete the old tuple if it exists, and insert a new one. It's easier * than trying to update or insert, based on various conditions. */ RemoveStatisticsDataById(statOid, inh); /* form and insert a new tuple */ stup = heap_form_tuple(RelationGetDescr(pg_stextdata), values, nulls); CatalogTupleInsert(pg_stextdata, stup); ---------------------------- So it deletes the tuple first (if there's one), and then inserts the new statistics tuple. We could update the tuple instead, but that would be more complex (as the comment explains), and it doesn't actually fix anything because then simple_heap_delete just fails with TM_Updated instead. I think the only solution would be to lock the statistics tuple before running ANALYZE, or something like that. Or maybe we should even lock the statistics object itself, so that ANALYZE and DROP can't run concurrently on it? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 11/8/23 16:10, Justin Pryzby wrote: >> I found this in our logs, and reproduced it under v11-v16. >> >> CREATE TABLE t(a int, b int); >> INSERT INTO t SELECT generate_series(1,999); >> CREATE STATISTICS t_stats ON a,b FROM t; >> >> while :; do psql postgres -qtxc "ANALYZE t"; done & >> while :; do psql postgres -qtxc "begin; DROP STATISTICS t_stats"; done & >> >> It's known that concurrent DDL can hit elog(). But in this case, >> there's only one DDL operation. > AFAICS this happens because store_statext (after ANALYZE builds the new > statistics) does this: Shouldn't DROP STATISTICS be taking a lock on the associated table that is strong enough to lock out ANALYZE? regards, tom lane
On 11/8/23 16:52, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> On 11/8/23 16:10, Justin Pryzby wrote: >>> I found this in our logs, and reproduced it under v11-v16. >>> >>> CREATE TABLE t(a int, b int); >>> INSERT INTO t SELECT generate_series(1,999); >>> CREATE STATISTICS t_stats ON a,b FROM t; >>> >>> while :; do psql postgres -qtxc "ANALYZE t"; done & >>> while :; do psql postgres -qtxc "begin; DROP STATISTICS t_stats"; done & >>> >>> It's known that concurrent DDL can hit elog(). But in this case, >>> there's only one DDL operation. > >> AFAICS this happens because store_statext (after ANALYZE builds the new >> statistics) does this: > > Shouldn't DROP STATISTICS be taking a lock on the associated table > that is strong enough to lock out ANALYZE? > Yes, I think that's the correct thing to do. I recall having a discussion about this with someone while working on the patch, leading to the current code. But I haven't managed to find that particular bit in the archives :-( Anyway, the attached patch should fix this by getting the lock, I think. - RemoveStatisticsById is what gets called drop DROP STATISTICS (or for dependencies), so that's where we get the AE lock - RemoveStatisticsDataById gets called from ANALYZE, so that already should have a lock (so no need to acquire another one) regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > On 11/8/23 16:52, Tom Lane wrote: >> Shouldn't DROP STATISTICS be taking a lock on the associated table >> that is strong enough to lock out ANALYZE? > Yes, I think that's the correct thing to do. I recall having a > discussion about this with someone while working on the patch, leading > to the current code. But I haven't managed to find that particular bit > in the archives :-( > Anyway, the attached patch should fix this by getting the lock, I think. This looks generally correct, but surely we don't need it to be as strong as AccessExclusiveLock? There seems no reason to conflict with ordinary readers/writers of the table. ANALYZE takes ShareUpdateExclusiveLock, and offhand I think this command should do the same. regards, tom lane
On 11/8/23 20:58, Tom Lane wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: >> On 11/8/23 16:52, Tom Lane wrote: >>> Shouldn't DROP STATISTICS be taking a lock on the associated table >>> that is strong enough to lock out ANALYZE? > >> Yes, I think that's the correct thing to do. I recall having a >> discussion about this with someone while working on the patch, leading >> to the current code. But I haven't managed to find that particular bit >> in the archives :-( >> Anyway, the attached patch should fix this by getting the lock, I think. > > This looks generally correct, but surely we don't need it to be as > strong as AccessExclusiveLock? There seems no reason to conflict with > ordinary readers/writers of the table. > > ANALYZE takes ShareUpdateExclusiveLock, and offhand I think this > command should do the same. > Right. I did copy that from DROP TRIGGER code somewhat mindlessly, but you're right this does not need block readers/writers. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I've pushed a cleaned up version of the fix. I had to make some adjustments in the backbranches, because the way we store the analyzed statistics evolved, and RemoveStatisticsById() used to do everything. I ended up introducing RemoveStatisticsDataById() in the backbranches too, but only as a static function - that makes the code much cleaner. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company