Thread: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled

A customer of ours recently hit a problem where after an autovacuum was
cancelled on a table, the app started getting the message in $subject:

ERROR:  could not read block 6 of relation 1663/35078/1761966: read only 0 of 8192 bytes

(block numbers vary from 1 to 6).  Things remained in this state until
another autovacuum came along and cleaned up the table, 4 minutes later
(this is a high traffic table; there are several inserts per second).

The log looks like this:

2009-10-20 04:02:07 PDT [27396]: [1-1]  LOG:  automatic vacuum of table "database.public.tabname": index scans: 1
pages:6 removed, 1 remain       tuples: 755 removed, 2 remain       system usage: CPU 0.00s/0.00u sec elapsed 1.42 sec
 
2009-10-20 04:02:07 PDT [27396]: [2-1]  ERROR:  canceling autovacuum task
2009-10-20 04:02:07 PDT [27396]: [3-1]  CONTEXT:  automatic vacuum of table "database.public.tabname"

What I thought could have happened is that the table was truncated, and
then the sinval message telling that to other backends was not sent due
to the rollback.  When they tried to insert to the page they had
recorded as rd_targblock, they try to read the page but it's no longer
there.

I can reproduce this by adding a sleep and CHECK_FOR_INTERRUPTS after
lazy_vacuum_rel() returns, and before CommitTransactionCommand.

So far as I can see, what we need is to make sure the sinval message is
sent regardless of transaction commit/abort.  How can that be done?  It
is quite ugly to have an untimely autovacuum cancel disrupt the ability
to insert into a table.

Thoughts?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Alvaro Herrera <alvherre@commandprompt.com> writes:
> What I thought could have happened is that the table was truncated, and
> then the sinval message telling that to other backends was not sent due
> to the rollback.

Hmm.

> So far as I can see, what we need is to make sure the sinval message is
> sent regardless of transaction commit/abort.  How can that be done?

I would argue that once we've truncated, it's too late to abort.  The
interrupt facility should be disabled from just before issuing the
truncate till after commit.  It would probably be relatively painless to
do that with some manipulation of the interrupt holdoff stuff.
        regards, tom lane


Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:

> > So far as I can see, what we need is to make sure the sinval message is
> > sent regardless of transaction commit/abort.  How can that be done?
>
> I would argue that once we've truncated, it's too late to abort.  The
> interrupt facility should be disabled from just before issuing the
> truncate till after commit.  It would probably be relatively painless to
> do that with some manipulation of the interrupt holdoff stuff.

That cures my (admittedly simplistic) testcase.  The patch is a bit ugly
because the interrupts are held off in lazy_vacuum_rel and need to be
released by its caller.  I don't see any other way around the problem
though.

The attached patch is for 8.4; back branches all need a bit of editing.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> I would argue that once we've truncated, it's too late to abort.  The
>> interrupt facility should be disabled from just before issuing the
>> truncate till after commit.  It would probably be relatively painless to
>> do that with some manipulation of the interrupt holdoff stuff.

> That cures my (admittedly simplistic) testcase.  The patch is a bit ugly
> because the interrupts are held off in lazy_vacuum_rel and need to be
> released by its caller.  I don't see any other way around the problem
> though.

I wonder whether we shouldn't extend this into VACUUM FULL too, to
prevent cancel once it's done that internal commit.  It would fix
the "PANIC: can't abort a committed transaction" problem V.F. has.
        regards, tom lane


Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> I would argue that once we've truncated, it's too late to abort.  The
> >> interrupt facility should be disabled from just before issuing the
> >> truncate till after commit.  It would probably be relatively painless to
> >> do that with some manipulation of the interrupt holdoff stuff.
>
> > That cures my (admittedly simplistic) testcase.  The patch is a bit ugly
> > because the interrupts are held off in lazy_vacuum_rel and need to be
> > released by its caller.  I don't see any other way around the problem
> > though.
>
> I wonder whether we shouldn't extend this into VACUUM FULL too, to
> prevent cancel once it's done that internal commit.  It would fix
> the "PANIC: can't abort a committed transaction" problem V.F. has.

Hmm, it seems to work.  The attached is for 8.1.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> I wonder whether we shouldn't extend this into VACUUM FULL too, to
>> prevent cancel once it's done that internal commit.  It would fix
>> the "PANIC: can't abort a committed transaction" problem V.F. has.

> Hmm, it seems to work.  The attached is for 8.1.

Looks OK, but please update the comment right before the
RecordTransactionCommit, along the lines of "We prevent cancel
interrupts after this point to mitigate the problem that you
can't abort the transaction now".
        regards, tom lane


Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Tom Lane wrote:
> >> I wonder whether we shouldn't extend this into VACUUM FULL too, to
> >> prevent cancel once it's done that internal commit.  It would fix
> >> the "PANIC: can't abort a committed transaction" problem V.F. has.
> 
> > Hmm, it seems to work.  The attached is for 8.1.
> 
> Looks OK, but please update the comment right before the
> RecordTransactionCommit, along the lines of "We prevent cancel
> interrupts after this point to mitigate the problem that you
> can't abort the transaction now".

BTW I'm thinking in backpatching this all the way back to 7.4 -- are
we agreed on that?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Looks OK, but please update the comment right before the
>> RecordTransactionCommit, along the lines of "We prevent cancel
>> interrupts after this point to mitigate the problem that you
>> can't abort the transaction now".

> BTW I'm thinking in backpatching this all the way back to 7.4 -- are
> we agreed on that?

Yeah, I would think the problems can manifest all the way back.
        regards, tom lane


Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> >> Looks OK, but please update the comment right before the
> >> RecordTransactionCommit, along the lines of "We prevent cancel
> >> interrupts after this point to mitigate the problem that you
> >> can't abort the transaction now".
> 
> > BTW I'm thinking in backpatching this all the way back to 7.4 -- are
> > we agreed on that?
> 
> Yeah, I would think the problems can manifest all the way back.

Done, thanks for the discussion.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


2009/11/10 Alvaro Herrera <alvherre@commandprompt.com>:
> Tom Lane wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> >> Looks OK, but please update the comment right before the
>> >> RecordTransactionCommit, along the lines of "We prevent cancel
>> >> interrupts after this point to mitigate the problem that you
>> >> can't abort the transaction now".
>>
>> > BTW I'm thinking in backpatching this all the way back to 7.4 -- are
>> > we agreed on that?
>>
>> Yeah, I would think the problems can manifest all the way back.
>
> Done, thanks for the discussion.

Hello

do you have a idea abou lazy vacuum lockinkg problem?

any plans?

Regards
Pavel Stehule

>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Pavel Stehule escribió:

> Hello
> 
> do you have a idea abou lazy vacuum lockinkg problem?
> 
> any plans?

Well, I understand the issue and we have an idea on how to attack it,
but I have no concrete plans to fix it ATM ...

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


2009/11/10 Alvaro Herrera <alvherre@commandprompt.com>:
> Pavel Stehule escribió:
>
>> Hello
>>
>> do you have a idea abou lazy vacuum lockinkg problem?
>>
>> any plans?
>
> Well, I understand the issue and we have an idea on how to attack it,
> but I have no concrete plans to fix it ATM ...

ok
Pavel
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
>