Bug in point releases 9.3.6 and 9.2.10? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Bug in point releases 9.3.6 and 9.2.10?
Date
Msg-id CAM3SWZR_fAQVZ+d8oaJV5a62J+te26S4x4Oe0A82z73E_B02JA@mail.gmail.com
Whole thread Raw
Responses Re: Bug in point releases 9.3.6 and 9.2.10?  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
Heroku Postgres runs provisioning code that performs certain actions
on roles when creating a new "fork" of an existing database. This
often causes the new fork to be on the latest point release, where the
database being forked was not.

We want to create a new role when this happens, for various reasons.
This occurs after recovery ends, but before the database has been
"unfenced". The template code that generates various ALTER ROLE
statements in our internal provisioning system - which has apparently
worked just fine for a long time - is:

db.execute("ALTER ROLE #{old_database_user} RENAME TO #{database_user}")
db.execute("ALTER ROLE #{database_user} PASSWORD '#{database_password}' LOGIN")
db.execute("CREATE ROLE \"#{old_database_user}\" PASSWORD
'#{old_database_password}' IN ROLE \"#{database_user}\" LOGIN")

I've seen multiple reports of apparent corruption, appearing as the
resulting ALTER ROLE statements are executed:

PG::DataCorrupted: ERROR: could not read block 0 in file
"global/12811": read only 0 of 8192 bytes
or:
PG::DataCorrupted: ERROR: could not read block 0 in file
"global/12785": read only 0 of 8192 bytes
or:
PG::DataCorrupted: ERROR: could not read block 0 in file
"global/12811": read only 0 of 8192 bytes


The only common factor is that this occurs on the latest point
releases (either 9.3.6 and 9.2.10, at least so far). In all cases I've
seen so far, the relation in question is the pg_auth_members heap
relation. For example:

redacteddb=#  select pg_relation_filenode(oid), oid, relname, relkind
from pg_class where pg_relation_filenode(oid) = 12785;pg_relation_filenode | oid  |     relname     | relkind
----------------------+------+-----------------+---------               12785 | 1261 | pg_auth_members | r
(1 row)


Running "VACUUM FULL pg_auth_members;" has made the problem go away
(to the extent that the above code doesn't trip up and everything is
at least superficially okay) on at least one occasion. I'm currently
investigating how consistently that works as a short term remediation.

Theories?
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Moving Pivotal's Greenplum work upstream
Next
From: Thom Brown
Date:
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)