Thread: First-draft release notes for next week's releases

First-draft release notes for next week's releases

From

Tom Lane

Date:

15 March 2014, 20:02:24

First-draft release notes are committed, and should be visible at
http://www.postgresql.org/docs/devel/static/release-9-3-4.html
once guaibasaurus does its next buildfarm run a few minutes from
now.  Any suggestions?
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Josh Berkus

Date:

16 March 2014, 18:36:56

On 03/15/2014 01:02 PM, Tom Lane wrote:
> First-draft release notes are committed, and should be visible at
> http://www.postgresql.org/docs/devel/static/release-9-3-4.html
> once guaibasaurus does its next buildfarm run a few minutes from
> now.  Any suggestions?

Hmmm, not sure I like this.  It's complicated without being complete,
and supplies just enough information to get someone into trouble:

Also, the error fixed in the second changelog entry below could have
caused some bloat in statistics data. Users who have done many DROP
DATABASE commands since upgrading to 9.3 may wish to manually remove
files in $PGDATA/pg_stat_tmp (or $PGDATA/pg_stat if the server is not
running) that have old modification times and do not correspond to any
database OID present in $PGDATA/base. If you do this, note that the file
db_0.stat is a valid file even though it does not correspond to any
$PGDATA/base subdirectory.

I kind of think that either we should provide complete instructions
(which would be about 3/4 of a page), or provide limited instructions
and assume the only users who will do this are ones who already
understand pg_stat (a reasonable assumption in my opinion), so my
suggestion is move the advice paragraph from E 1.1 to the individual fix
entry in E.1.2, and change it to this:

* Remove the correct per-database statistics file during DROP DATABASE
(Tomas Vondra)

This fix prevents a permanent leak of statistics file space.

Users who have done many DROP DATABASE commands in PostgreSQL 9.3 may
wish to examine their statistics directory for statistics files which do
not correspond to any existing database and delete them.  Please note
that db_0.stat is a needed statistics file.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: First-draft release notes for next week's releases

From

Greg Stark

Date:

16 March 2014, 19:32:51

This is not really accurate:

"This error allowed multiple versions of the same row to become
visible to queries, resulting in apparent duplicates. Since the error
is in WAL replay, it would only manifest during crash recovery or on
standby servers."

I think the idea is coming from what the second sentence below is
getting at but it may be too complex to explain in a release note:

The error causes some rows to disappear from indexes resulting in
inconsistent query results on a hot standby depending on whether
indexes are used. If the standby is subsequently activated or if it
occurs during recovery after a crash or backup restore it could result
in unique constraint violations as well.

I would consider adding something like "For the problem to occur a
foreign key from another table must exist and a new row must be added
to that other table around the same time (possibly in the same
transaction) as an update to the referenced row" That would help
people judge whether their databases are vulnerable. If they don't
have foreign keys or if they have a coding pattern that causes this to
happen regularly then they should be able to figure that out and
possibly disable them if they can't update promptly.

Re: First-draft release notes for next week's releases

From

Josh Berkus

Date:

17 March 2014, 03:56:17

On 03/16/2014 12:32 PM, Greg Stark wrote:
> I would consider adding something like "For the problem to occur a
> foreign key from another table must exist and a new row must be added
> to that other table around the same time (possibly in the same
> transaction) as an update to the referenced row" That would help
> people judge whether their databases are vulnerable. If they don't
> have foreign keys or if they have a coding pattern that causes this to
> happen regularly then they should be able to figure that out and
> possibly disable them if they can't update promptly.

I don't think that will actually help people know whether they're
vulnerable without a longer explanation.

It's starting to sound like we need a wiki page for this release?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

17 March 2014, 15:28:54

Greg Stark <stark@mit.edu> writes:
> This is not really accurate:
> "This error allowed multiple versions of the same row to become
> visible to queries, resulting in apparent duplicates. Since the error
> is in WAL replay, it would only manifest during crash recovery or on
> standby servers."

> I think the idea is coming from what the second sentence below is
> getting at but it may be too complex to explain in a release note:

> The error causes some rows to disappear from indexes resulting in
> inconsistent query results on a hot standby depending on whether
> indexes are used. If the standby is subsequently activated or if it
> occurs during recovery after a crash or backup restore it could result
> in unique constraint violations as well.

Hm ... "rows disappearing from indexes" might make people think that
they could fix or mitigate the damage via REINDEX.  That's not really
true though is it?  It looks to me like IndexBuildHeapScan will suffer
an Assert failure in an assert-enabled build, or build a bogus index
if not assert-enabled, when it comes across a "heap-only" tuple that
has no parent.

I'm thinking we'd better promote that Assert to a normal runtime elog.
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Josh Berkus

Date:

17 March 2014, 17:04:07

On 03/17/2014 08:28 AM, Tom Lane wrote:
> Greg Stark <stark@mit.edu> writes:
>> The error causes some rows to disappear from indexes resulting in
>> inconsistent query results on a hot standby depending on whether
>> indexes are used. If the standby is subsequently activated or if it
>> occurs during recovery after a crash or backup restore it could result
>> in unique constraint violations as well.
> 
> Hm ... "rows disappearing from indexes" might make people think that
> they could fix or mitigate the damage via REINDEX.  That's not really
> true though is it?  It looks to me like IndexBuildHeapScan will suffer
> an Assert failure in an assert-enabled build, or build a bogus index
> if not assert-enabled, when it comes across a "heap-only" tuple that
> has no parent.

First, see suggested text in my first-draft release announcement.

Second, if a user has encountered this kind of data corruption on their
master (due to crash recovery), how exactly *do* they fix it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 17:08:29

On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:
> On 03/17/2014 08:28 AM, Tom Lane wrote:
> > Greg Stark <stark@mit.edu> writes:
> >> The error causes some rows to disappear from indexes resulting in
> >> inconsistent query results on a hot standby depending on whether
> >> indexes are used. If the standby is subsequently activated or if it
> >> occurs during recovery after a crash or backup restore it could result
> >> in unique constraint violations as well.
> > 
> > Hm ... "rows disappearing from indexes" might make people think that
> > they could fix or mitigate the damage via REINDEX.  That's not really
> > true though is it?  It looks to me like IndexBuildHeapScan will suffer
> > an Assert failure in an assert-enabled build, or build a bogus index
> > if not assert-enabled, when it comes across a "heap-only" tuple that
> > has no parent.
> 
> First, see suggested text in my first-draft release announcement.

I don't think that text is any better, it's imo even wrong:
"The bug causes rows to vanish from indexes during recovery due to
simultaneous updates of rows on both sides of a foreign key."

Neither is a foreign key, nor simultaneous updates, nor both sides a
prerequisite.

> Second, if a user has encountered this kind of data corruption on their
> master (due to crash recovery), how exactly *do* they fix it?

Dump/restore is the most obvious candidate. The next best thing I can
think of is a noop rewriting ALTER TABLE, that doesn't deal with ctid
chains IIRC, in contrast to CLUSTER/VACUUM FULL.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 17:35:29

On 2014-03-15 16:02:19 -0400, Tom Lane wrote:
> First-draft release notes are committed, and should be visible at
> http://www.postgresql.org/docs/devel/static/release-9-3-4.html
> once guaibasaurus does its next buildfarm run a few minutes from
> now.  Any suggestions?

So, the current text is:
"This error allowed multiple versions of the same row to become visible
to queries, resulting in apparent duplicates. Since the error is in WAL
replay, it would only manifest during crash recovery or on standby
servers."

what about:

The most prominent consequence of this bug is that rows can appear to
not exist when accessed via an index, while still being visible in
sequential scans. This in turn can lead to constraints, including unique
and foreign key ones, to be violated lateron.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

17 March 2014, 17:43:12

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:
>> First, see suggested text in my first-draft release announcement.

> I don't think that text is any better, it's imo even wrong:
> "The bug causes rows to vanish from indexes during recovery due to
> simultaneous updates of rows on both sides of a foreign key."

> Neither is a foreign key, nor simultaneous updates, nor both sides a
> prerequisite.

What I've got at the moment is
     This error caused updated rows to disappear from indexes, resulting     in inconsistent query results depending on
whetheran index scan was     used.  Subsequent processing could result in unique-key violations,     since the
previouslyupdated row would not be found by later index     searches.  Since this error is in WAL replay, it would only
manifest    during crash recovery or on standby servers.  The improperly-replayed     case can arise when a table row
thatis referenced by a foreign-key     constraint is updated concurrently with creation of a     referencing row.
 

OK, or not?  The time window for bikeshedding this is dwindling rapidly.
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 17:52:09

On 2014-03-17 13:42:59 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:
> >> First, see suggested text in my first-draft release announcement.
> 
> > I don't think that text is any better, it's imo even wrong:
> > "The bug causes rows to vanish from indexes during recovery due to
> > simultaneous updates of rows on both sides of a foreign key."
> 
> > Neither is a foreign key, nor simultaneous updates, nor both sides a
> > prerequisite.
> 
> What I've got at the moment is
> 
>       This error caused updated rows to disappear from indexes, resulting
>       in inconsistent query results depending on whether an index scan was
>       used.  Subsequent processing could result in unique-key violations,
>       since the previously updated row would not be found by later index
>       searches.  Since this error is in WAL replay, it would only manifest
>       during crash recovery or on standby servers.  The improperly-replayed
>       case can arise when a table row that is referenced by a foreign-key
>       constraint is updated concurrently with creation of a
>       referencing row.
> 
> OK, or not?  The time window for bikeshedding this is dwindling rapidly.

That's much better, yes. Two things:

* I'd change the warning about unique key violations into a more general one about constraints. Foreign key and
exclusionconstraint are also affected...
 
* I wonder if we should make the possible origins a bit more general as it's perfectly possible to trigger the problem
withoutforeign keys. Maybe: "can arise when a table row that has been updated is row locked; that can e.g. happen when
foreignkeys are used."
 

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 17:56:14

On 2014-03-17 11:28:45 -0400, Tom Lane wrote:
> Hm ... "rows disappearing from indexes" might make people think that
> they could fix or mitigate the damage via REINDEX.

Good point. I guess in some cases it will end up working because
VACUUM/hot pruning have cleaned up the mess, but that's certainly not
something I'd want to rely upon. They very well could have messed up
things when presented with bogus input data.

>  That's not really
> true though is it?  It looks to me like IndexBuildHeapScan will suffer
> an Assert failure in an assert-enabled build, or build a bogus index
> if not assert-enabled, when it comes across a "heap-only" tuple that
> has no parent.
> 
> I'm thinking we'd better promote that Assert to a normal runtime elog.

I wonder if we also should make rewriteheap.c warn about such
things. Although I don't immediately see a trivial way to do so.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

17 March 2014, 18:01:12

Andres Freund <andres@2ndquadrant.com> writes:
> That's much better, yes. Two things:

> * I'd change the warning about unique key violations into a more general
>   one about constraints. Foreign key and exclusion constraint are also
>   affected...

I'll see what I can do.

> * I wonder if we should make the possible origins a bit more
>   general as it's perfectly possible to trigger the problem without
>   foreign keys. Maybe: "can arise when a table row that has been updated
>   is row locked; that can e.g. happen when foreign keys are used."

IIUC, this case only occurs when using the new-in-9.3 types of
nonexclusive row locks.  I'm willing to bet that the number of
applications using those is negligible; so I think it's all right to not
mention that case explicitly, as long as the wording doesn't say that
foreign keys are the *only* cause (which I didn't).
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 18:08:51

On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > * I wonder if we should make the possible origins a bit more
> >   general as it's perfectly possible to trigger the problem without
> >   foreign keys. Maybe: "can arise when a table row that has been updated
> >   is row locked; that can e.g. happen when foreign keys are used."
> 
> IIUC, this case only occurs when using the new-in-9.3 types of
> nonexclusive row locks.  I'm willing to bet that the number of
> applications using those is negligible; so I think it's all right to not
> mention that case explicitly, as long as the wording doesn't say that
> foreign keys are the *only* cause (which I didn't).

I actually think the issue could also occur with row locks of other
severities (is that the correct term?). Alvaro probably knows better,
but if I see correctly it's also triggerable if a backend waits for an
updating transaction to finish and follow_updates = true is passed to
heap_lock_tuple(). Which e.g. nodeLockRows.c does...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

17 March 2014, 18:16:54

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
>> IIUC, this case only occurs when using the new-in-9.3 types of
>> nonexclusive row locks.  I'm willing to bet that the number of
>> applications using those is negligible; so I think it's all right to not
>> mention that case explicitly, as long as the wording doesn't say that
>> foreign keys are the *only* cause (which I didn't).

> I actually think the issue could also occur with row locks of other
> severities (is that the correct term?).

The commit log entry says      We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on   WAL replay of
atuple-lock operation, which is incorrect when the tuple   is already updated.      Back-patch to 9.3.  The clearing of
bothheader elements was there   previously, but since no update could be present on a tuple that was   being locked, it
washarmless.

which I read to mean that the case can't occur with the types of row
locks that were allowed pre-9.3.

> but if I see correctly it's also triggerable if a backend waits for an
> updating transaction to finish and follow_updates = true is passed to
> heap_lock_tuple(). Which e.g. nodeLockRows.c does...

That sounds backwards.  nodeLockRows locks the latest tuple in the chain,
so it can't be subject to this.
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 18:27:29

On 2014-03-17 14:16:41 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
> >> IIUC, this case only occurs when using the new-in-9.3 types of
> >> nonexclusive row locks.  I'm willing to bet that the number of
> >> applications using those is negligible; so I think it's all right to not
> >> mention that case explicitly, as long as the wording doesn't say that
> >> foreign keys are the *only* cause (which I didn't).
> 
> > I actually think the issue could also occur with row locks of other
> > severities (is that the correct term?).
> 
> The commit log entry says
>     
>     We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
>     WAL replay of a tuple-lock operation, which is incorrect when the tuple
>     is already updated.
>     
>     Back-patch to 9.3.  The clearing of both header elements was there
>     previously, but since no update could be present on a tuple that was
>     being locked, it was harmless.
> 
> which I read to mean that the case can't occur with the types of row
> locks that were allowed pre-9.3.

That's not an unreasonable interpretation of the commit message, but I
don't think it's correct with respect to the code :(

> > but if I see correctly it's also triggerable if a backend waits for an
> > updating transaction to finish and follow_updates = true is passed to
> > heap_lock_tuple(). Which e.g. nodeLockRows.c does...
> 
> That sounds backwards.  nodeLockRows locks the latest tuple in the chain,
> so it can't be subject to this.

Hm, I don't see anything in the code preventing it, that's the
lock_tuple() before the EPQ stuff... in ExecLockRows():foreach(lc, node->lr_arowMarks){    test =
heap_lock_tuple(erm->relation,&tuple,                           estate->es_output_cid,
lockmode,erm->noWait, true,                           &buffer, &hufd);    ReleaseBuffer(buffer);    switch (test)    {
     case HeapTupleSelfUpdated:
 
...

the true passed to heap_lock_tuple() is the follow_updates
parameter. And then in heap_lock_tuple():    if (require_sleep)    {        if (infomask & HEAP_XMAX_IS_MULTI)
{
...            /* if there are updates, follow the update chain */            if (follow_updates &&
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))           {                HTSU_Result res;
 
                res = heap_lock_updated_tuple(relation, tuple, &t_ctid,
GetCurrentTransactionId(),
...        else        {            /* wait for regular transaction to end */            if (nowait)            {
        if (!ConditionalXactLockTableWait(xwait))                    ereport(ERROR,
(errcode(ERRCODE_LOCK_NOT_AVAILABLE),                            errmsg("could not obtain lock on row in relation
\"%s\"",                                   RelationGetRelationName(relation))));            }            else
    XactLockTableWait(xwait);
 
            /* if there are updates, follow the update chain */            if (follow_updates &&
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))
...if (RelationNeedsWAL(relation)){    xl_heap_lock xlrec;    XLogRecPtr    recptr;    XLogRecData rdata[2];
    xlrec.target.node = relation->rd_node;    xlrec.target.tid = tuple->t_self;
...

To me that looks sufficient to trigger the bug, because we're issuing a
wal record about the row that was passed to heap_lock_update(), not the
latest one in the ctid chain. When replaying that record, it will reset
the t_ctid field, thus breaking the chain.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

17 March 2014, 18:30:13

Andres Freund <andres@2ndquadrant.com> writes:
> To me that looks sufficient to trigger the bug, because we're issuing a
> wal record about the row that was passed to heap_lock_update(), not the
> latest one in the ctid chain. When replaying that record, it will reset
> the t_ctid field, thus breaking the chain.

[ scratches head ... ]  If that's what's happening, isn't it a bug in
itself?  Surely the WAL record ought to point at the tuple that was
locked.
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 18:32:14

On 2014-03-17 14:29:56 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > To me that looks sufficient to trigger the bug, because we're issuing a
> > wal record about the row that was passed to heap_lock_update(), not the
> > latest one in the ctid chain. When replaying that record, it will reset
> > the t_ctid field, thus breaking the chain.
> 
> [ scratches head ... ]  If that's what's happening, isn't it a bug in
> itself?  Surely the WAL record ought to point at the tuple that was
> locked.

There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
version, emitted by heap_lock_updated_tuple_rec(). This really is mind
bendingly complex :(.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

17 March 2014, 18:53:50

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-17 14:29:56 -0400, Tom Lane wrote:
>> [ scratches head ... ]  If that's what's happening, isn't it a bug in
>> itself?  Surely the WAL record ought to point at the tuple that was
>> locked.

> There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
> version, emitted by heap_lock_updated_tuple_rec(). This really is mind
> bendingly complex :(.

Ah, I see; so only the original tuple in the chain is at risk?

How about this:
     This error caused updated rows to not be found by index scans, resulting     in inconsistent query results
dependingon whether an index scan was     used.  Subsequent processing could result in constraint violations,     since
thepreviously updated row would not be found by later index     searches, thus possibly allowing conflicting rows to be
inserted.    Since this error is in WAL replay, it would only manifest during crash     recovery or on standby servers.
The improperly-replayed case most     commonly arises when a table row that is referenced by a foreign-key
constraintis updated concurrently with creation of a referencing row;     but it can also occur when any variant of
<command>SELECTFOR UPDATE</>     is applied to a row that is being concurrently updated.
 
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 18:58:21

On 2014-03-17 14:52:25 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2014-03-17 14:29:56 -0400, Tom Lane wrote:
> >> [ scratches head ... ]  If that's what's happening, isn't it a bug in
> >> itself?  Surely the WAL record ought to point at the tuple that was
> >> locked.
> 
> > There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
> > version, emitted by heap_lock_updated_tuple_rec(). This really is mind
> > bendingly complex :(.
> 
> Ah, I see; so only the original tuple in the chain is at risk?

Depending on what you define the "original tuple in the chain" to
be. No, if you happen to mean the root tuple of a ctid chain or similar;
which I guess you didn't. Yes, if you mean the tuplepassed to
heap_lock_tuple(). heap_xlog_lock_updated() looks (and has looked)
correct.

> How about this:

Sounds good to me.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Alvaro Herrera

Date:

17 March 2014, 19:17:47

Andres Freund wrote:
> On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > * I wonder if we should make the possible origins a bit more
> > >   general as it's perfectly possible to trigger the problem without
> > >   foreign keys. Maybe: "can arise when a table row that has been updated
> > >   is row locked; that can e.g. happen when foreign keys are used."
> > 
> > IIUC, this case only occurs when using the new-in-9.3 types of
> > nonexclusive row locks.  I'm willing to bet that the number of
> > applications using those is negligible; so I think it's all right to not
> > mention that case explicitly, as long as the wording doesn't say that
> > foreign keys are the *only* cause (which I didn't).
> 
> I actually think the issue could also occur with row locks of other
> severities (is that the correct term?). Alvaro probably knows better,
> but if I see correctly it's also triggerable if a backend waits for an
> updating transaction to finish and follow_updates = true is passed to
> heap_lock_tuple(). Which e.g. nodeLockRows.c does...

Uhm.  But at the bottom of that block, right above the "failed:" label
(heapam.c line 4527 in current master), we recheck the tuple for
"locked-only-ness"; and fail the whole operation by returning
HeapTupleUpdated, if it's not locked-only, no?  Which would cause
ExecLockRows to grab the next version via EvalPlanQualFetch.
Essentially that check is a lock-conflict test, and the only thing that
does not conflict with an update is a FOR KEY SHARE lock.

Note the only way to pass that test is that either the tuple is
locked-only (spelled in three different ways), or "!require_sleep".

Am I completely misunderstanding what's being said here?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

17 March 2014, 19:26:04

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Uhm.  But at the bottom of that block, right above the "failed:" label
> (heapam.c line 4527 in current master), we recheck the tuple for
> "locked-only-ness"; and fail the whole operation by returning
> HeapTupleUpdated, if it's not locked-only, no?  Which would cause
> ExecLockRows to grab the next version via EvalPlanQualFetch.
> Essentially that check is a lock-conflict test, and the only thing that
> does not conflict with an update is a FOR KEY SHARE lock.

Right, the row-lock acquisition has to succeed for there to be a risk.
I wasn't sure whether 9.3 had introduced any such cases for existing
row lock types.
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 20:39:34

On 2014-03-17 16:17:35 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
> > > Andres Freund <andres@2ndquadrant.com> writes:
> > > > * I wonder if we should make the possible origins a bit more
> > > >   general as it's perfectly possible to trigger the problem without
> > > >   foreign keys. Maybe: "can arise when a table row that has been updated
> > > >   is row locked; that can e.g. happen when foreign keys are used."
> > > 
> > > IIUC, this case only occurs when using the new-in-9.3 types of
> > > nonexclusive row locks.  I'm willing to bet that the number of
> > > applications using those is negligible; so I think it's all right to not
> > > mention that case explicitly, as long as the wording doesn't say that
> > > foreign keys are the *only* cause (which I didn't).
> > 
> > I actually think the issue could also occur with row locks of other
> > severities (is that the correct term?). Alvaro probably knows better,
> > but if I see correctly it's also triggerable if a backend waits for an
> > updating transaction to finish and follow_updates = true is passed to
> > heap_lock_tuple(). Which e.g. nodeLockRows.c does...
> 
> Uhm.  But at the bottom of that block, right above the "failed:" label
> (heapam.c line 4527 in current master), we recheck the tuple for
> "locked-only-ness"; and fail the whole operation by returning
> HeapTupleUpdated, if it's not locked-only, no?  Which would cause
> ExecLockRows to grab the next version via EvalPlanQualFetch.
> Essentially that check is a lock-conflict test, and the only thing that
> does not conflict with an update is a FOR KEY SHARE lock.

What I was thinking of is the case where heap_lock_tuple() notices it
needs to sleep and then in the if (require_sleep) block does a
lock_updated_tuple(). If the updating transaction aborts while waiting
lock_updated_tuple_rec() will issue a XLOG_HEAP2_LOCK_UPDATED for that
row and then return MayBeUpdated. Which will make heap_lock_tuple()
successfully lock the row, thereby resetting t_ctid during replay. What
I missed is that case resetting the ctid chain is perfectly fine, since
the pointed to tuple is actually dead. I was just confused by the fact
that we do actually issue a XLOG_HEAP2_LOCK_UPDATED for a dead row.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Greg Stark

Date:

17 March 2014, 21:10:01

On Mon, Mar 17, 2014 at 3:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm thinking we'd better promote that Assert to a normal runtime elog.


I wasn't sure about this but on further thought I think it's a really
good idea and should be mentioned in the release notes. One of the
things that's been bothering me about this bug is that it's really
hard to be know if your standby has suffered from the problem and if
you've promoted it you don't know if your database has a problem.

That said, it would be nice to actually fix the problem, not just
detect it. Eventually vacuum would fix the problem. I think. I'm not
really sure what will happen actually.

-- 
greg

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 23:39:27

On 2014-03-17 21:09:10 +0000, Greg Stark wrote:
> That said, it would be nice to actually fix the problem, not just
> detect it. Eventually vacuum would fix the problem. I think. I'm not
> really sure what will happen actually.

Indexes will quite possibly stay corrupted. I think. If there was a
index lookup for a affected row, the kill_prior_tuple logic will have
quite possibly have zapped the index entry.

Aside from that, it looks like VACUUM will have a hard time cleaning up
as well. It looks to me like heap_prune_chain() won't remove tuples that
are marked as both HeapTupleHeaderIsHeapOnly() and
HeapTupleHeaderIsHotUpdated(), i.e. intermediate tuples in a HOT
chain. Neither will lazy_scan_heap()...

I think the best way to really cleanup a table is to use something like:
ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
where text is the previous type of the column. That should trigger a
full table rewrite, without any finesse about tracking ctid chains.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Alvaro Herrera

Date:

17 March 2014, 23:51:45

Andres Freund wrote:
> On 2014-03-17 21:09:10 +0000, Greg Stark wrote:
> > That said, it would be nice to actually fix the problem, not just
> > detect it. Eventually vacuum would fix the problem. I think. I'm not
> > really sure what will happen actually.
> 
> Indexes will quite possibly stay corrupted. I think. If there was a
> index lookup for a affected row, the kill_prior_tuple logic will have
> quite possibly have zapped the index entry.
> 
> Aside from that, it looks like VACUUM will have a hard time cleaning up
> as well. It looks to me like heap_prune_chain() won't remove tuples that
> are marked as both HeapTupleHeaderIsHeapOnly() and
> HeapTupleHeaderIsHotUpdated(), i.e. intermediate tuples in a HOT
> chain. Neither will lazy_scan_heap()...

Ugh :-(

> I think the best way to really cleanup a table is to use something like:
> ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
> where text is the previous type of the column. That should trigger a
> full table rewrite, without any finesse about tracking ctid chains.

Isn't this what VACUUM FULL does?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

17 March 2014, 23:55:09

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-17 21:09:10 +0000, Greg Stark wrote:
>> That said, it would be nice to actually fix the problem, not just
>> detect it. Eventually vacuum would fix the problem. I think. I'm not
>> really sure what will happen actually.

> Indexes will quite possibly stay corrupted. I think. If there was a
> index lookup for a affected row, the kill_prior_tuple logic will have
> quite possibly have zapped the index entry.

Whether it did or not, there's no way for the index entry to reach the
now-live tuple version (if it was a HOT update), so the question is moot.
What seems more interesting is whether REINDEX could fix the problem,
but at least with the current logic in catalog/index.c the answer seems
to be "no".

It's possible that a REINDEX attempt would work to detect whether you have
a problem (ie, see if you get one of the errors I just added).  I'm not
sure that's bulletproof though.

> I think the best way to really cleanup a table is to use something like:
> ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
> where text is the previous type of the column. That should trigger a
> full table rewrite, without any finesse about tracking ctid chains.

Um... don't we have logic in there that's smart enough to short-circuit
that?
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 23:55:38

On 2014-03-17 20:51:31 -0300, Alvaro Herrera wrote:
> > I think the best way to really cleanup a table is to use something like:
> > ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
> > where text is the previous type of the column. That should trigger a
> > full table rewrite, without any finesse about tracking ctid chains.
> 
> Isn't this what VACUUM FULL does?

No, it uses rewriteheap.c via cluster.c, which tries to preserve
visibility information. There's tracking/mapping of t_ctid... I am not
entirely sure how much it preserves, but I am not really hopeful.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

17 March 2014, 23:56:52

On 2014-03-17 19:55:01 -0400, Tom Lane wrote:
> > I think the best way to really cleanup a table is to use something like:
> > ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
> > where text is the previous type of the column. That should trigger a
> > full table rewrite, without any finesse about tracking ctid chains.
> 
> Um... don't we have logic in there that's smart enough to short-circuit
> that?

Not with USING present I think?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

17 March 2014, 23:58:25

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Andres Freund wrote:
>> I think the best way to really cleanup a table is to use something like:
>> ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
>> where text is the previous type of the column. That should trigger a
>> full table rewrite, without any finesse about tracking ctid chains.

> Isn't this what VACUUM FULL does?

AFAIR, both VACUUM FULL and CLUSTER will attempt to preserve update
chains, and thus will probably get confused by this bug (though I've
not looked into exactly what will happen).  I'm not real sure that ALTER
TABLE is any better --- doesn't all that stuff go through rewriteheap.c
now?
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

18 March 2014, 00:01:13

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-03-17 19:55:01 -0400, Tom Lane wrote:
>>> I think the best way to really cleanup a table is to use something like:
>>> ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);

>> Um... don't we have logic in there that's smart enough to short-circuit
>> that?

> Not with USING present I think?

regression=# create table foo as select generate_series(1,1000000) as data;
SELECT 1000000
Time: 575.113 ms
regression=# alter table foo alter column data type int4 using (data);
ALTER TABLE
Time: 1001.859 ms
regression=# alter table foo alter column data type int4 using (data);
ALTER TABLE
Time: 0.821 ms
regression=# alter table foo alter column data type int8 using (data);
ALTER TABLE
Time: 677.810 ms
regression=# alter table foo alter column data type int8 using (data);
ALTER TABLE
Time: 1.085 ms

Looks like it's smart enough to me.
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Andres Freund

Date:

18 March 2014, 00:01:22

On 2014-03-17 19:58:18 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Andres Freund wrote:
> >> I think the best way to really cleanup a table is to use something like:
> >> ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
> >> where text is the previous type of the column. That should trigger a
> >> full table rewrite, without any finesse about tracking ctid chains.
> 
> > Isn't this what VACUUM FULL does?
> 
> AFAIR, both VACUUM FULL and CLUSTER will attempt to preserve update
> chains, and thus will probably get confused by this bug (though I've
> not looked into exactly what will happen).  I'm not real sure that ALTER
> TABLE is any better --- doesn't all that stuff go through rewriteheap.c
> now?

Nope, it's using ATRewriteTable(), which just does a loop around
heap_getnext() and does a plain heap_insert(). Normally that's rather
annoying because it completely destroys visibility information, but in
this case...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: First-draft release notes for next week's releases

From

Josh Berkus

Date:

18 March 2014, 20:11:39

Folks:

So another question, which I've already received from the field, is how
can you detect this kind of corruption in the first place, if it's not
causing a user-visible error?

Got that question from someone who failed over between master and
replica on 9.3.2 last weekend.  They're not seeing PK/FK errors, but
that doesn't mean they're clean.  How can they tell?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: First-draft release notes for next week's releases

From

Josh Berkus

Date:

19 March 2014, 20:50:59

All,

So, I'll ask again (because I didn't see a reply): is there any way
users can *check* if they've been corrupted?  Short of waiting for PK/FK
violations?

Given that all of the fixes we recommend involve extensive downtimes,
nobody is going to want to do them "just in case".  How can they test
for the issue?  I can't see any reasonable way ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: First-draft release notes for next week's releases

From

Tom Lane

Date:

19 March 2014, 20:55:31

Josh Berkus <josh@agliodbs.com> writes:
> So, I'll ask again (because I didn't see a reply): is there any way
> users can *check* if they've been corrupted?  Short of waiting for PK/FK
> violations?

I think it would work to do a REINDEX on each table (doesn't matter which
index you select) and see if it complains about not being able to find any
parent tuples.  This would definitely catch the case as long as the
orphaned HOT tuple is still live.  If it's not, the immediate problem is
gone ... but it's still possible you have follow-on corruption.
        regards, tom lane

Re: First-draft release notes for next week's releases

From

Alvaro Herrera

Date:

19 March 2014, 21:01:43

Josh Berkus wrote:
> All,
> 
> So, I'll ask again (because I didn't see a reply): is there any way
> users can *check* if they've been corrupted?  Short of waiting for PK/FK
> violations?

Obviously there are queries you can run to check each FK -- the same
queries that ri_triggers.c would run when you create an FK.  It's
cumbersome to write, but not impossible.  In fact, it can be done
mechanically.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: First-draft release notes for next week's releases

From

Josh Berkus

Date:

19 March 2014, 21:09:22

On 03/19/2014 02:01 PM, Alvaro Herrera wrote:
> Josh Berkus wrote:
>> All,
>>
>> So, I'll ask again (because I didn't see a reply): is there any way
>> users can *check* if they've been corrupted?  Short of waiting for PK/FK
>> violations?
> 
> Obviously there are queries you can run to check each FK -- the same
> queries that ri_triggers.c would run when you create an FK.  It's
> cumbersome to write, but not impossible.  In fact, it can be done
> mechanically.

Would users which this corruption necessarily have broken FKs which
would show up as such on a simple query?  That is, if I did:

SELECT ref_id FROM referenced WHERE ref_id NOT IN ( SELECT ref_id FROM
referencing )

... or something similar, would that show the issue?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: First-draft release notes for next week's releases

From

Alvaro Herrera

Date:

19 March 2014, 22:18:01

Josh Berkus wrote:
> On 03/19/2014 02:01 PM, Alvaro Herrera wrote:
> > Josh Berkus wrote:
> >> All,
> >>
> >> So, I'll ask again (because I didn't see a reply): is there any way
> >> users can *check* if they've been corrupted?  Short of waiting for PK/FK
> >> violations?

Some notes:

1. if there's been no crash with 9.3 installed in a single system, or in
a master system, corruption cannot have occured.

2. replicas are very likely to have gotten corrupted if referenced
tables are updated at all.  Many workloads do not update referenced
tables; those are not at risk.

3. Master that are failed-over at-risk replicas are thus very likely to
have been corrupted.

> > Obviously there are queries you can run to check each FK -- the same
> > queries that ri_triggers.c would run when you create an FK.  It's
> > cumbersome to write, but not impossible.  In fact, it can be done
> > mechanically.
> 
> Would users which this corruption necessarily have broken FKs which
> would show up as such on a simple query?  That is, if I did:
> 
> SELECT ref_id FROM referenced WHERE ref_id NOT IN ( SELECT ref_id FROM
> referencing )
> 
> ... or something similar, would that show the issue?

Yes, AFAICT that would show the issue, as long as the query uses an
index.  I assume, without checking, that setting enable_seqscan to OFF
would have that effect on most but the largest tables.  I think it'd be
better to write that as an EXISTS query, though.  You also need to
consider details such as the MATCH mode of the FK, for multicolumn ones.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services