Thread: Re: [HACKERS] Aborted Transaction During Vacuum

Re: [HACKERS] Aborted Transaction During Vacuum

From
"G. Anthony Reina"
Date:
Tom Lane wrote:

> However, there should have been an "ERROR" message if something reported
> an error.  You said you only saw "NOTICE: AbortTransaction and not in
> in-progress state" and not any "ERROR" before or after it?  The NOTICE
> presumably came out of AbortTransaction itself, after whatever went
> wrong went wrong...
>

Yes, I have an ERROR message (either I didn't notice it before or it is
new):

NOTICE:  Index pkex_ellipse_opto_proc: Pages 138; Tuples 30535. Elapsed 0/0
sec.
NOTICE:  Index pkex_ellipse_opto_proc: Pages 138; Tuples 30535. Elapsed 0/0
sec.
ERROR:  vacuum: can't destroy lock file!
NOTICE:  AbortTransaction and not in in-progress state
NOTICE:  AbortTransaction and not in in-progress state
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 
We have lost the connection to the backend, so further processing is
impossible.  Terminating.


It looks like the error is either occuring on table ex_ellipse_opto_proc or
the next table in the list ex_ellipse_proc. However, I think the error is
more general than that. It appears to occur just before the last table in
the database gets vacuumed. Here's the list of my tables:

Database    = db01+------------------+----------------------------------+----------+|  Owner           |
Relation            |   Type   |+------------------+----------------------------------+----------+| postgres         |
center_out                      | table    || postgres         | center_out_analog                | table    ||
postgres        | center_out_analog_proc           | table    || postgres         | center_out_cell                  |
table   || postgres         | center_out_cell_proc             | table    || postgres         | center_out_opto
        | table    || postgres         | center_out_opto_proc             | table    || postgres         |
center_out_pref_direction       | table    || postgres         | center_out_proc                  | table    ||
postgres        | electrode                        | table    || postgres         | ellipse                          |
table   || postgres         | ellipse_analog                   | table    || dan              | ellipse_analog_proc
        | table    || postgres         | ellipse_cell                     | table    || dan              |
ellipse_cell_proc               | table    || postgres         | ellipse_opto                     | table    || dan
        | ellipse_opto_proc                | table    || dan              | ellipse_proc                     | table
||dan              | ex_ellipse                       | table    || dan              | ex_ellipse_analog_proc
|table    || dan              | ex_ellipse_cell                  | table    || dan              | ex_ellipse_cell_proc
          | table    || dan              | ex_ellipse_opto                  | table    || dan              |
ex_ellipse_opto_proc            | table
 
|                             <---- ERROR occurs somewhere after here| dan              | ex_ellipse_proc
  | table    |+------------------+----------------------------------+----------+
 

Yesterday, I was adding tables in one by one from a previous pg_dump. The
error didn't pop up until after I had about 9 or 10 tables restored. I
didn't think about it then, but it may have always occured after the second
to last table in the list.  But don't hold me to that.

In any case, I'll try to re-build everything like you've asked to get a
better error message. Maybe if I go through step-by-step again. You'll be
able to help me find where the error is taking place.

Thanks Tom and Oliver. I'll get back to you when I finish the rebuild.

-Tony




Re: [HACKERS] Aborted Transaction During Vacuum

From
Tom Lane
Date:
"G. Anthony Reina" <reina@nsi.edu> writes:
> ERROR:  vacuum: can't destroy lock file!
> NOTICE:  AbortTransaction and not in in-progress state
> NOTICE:  AbortTransaction and not in in-progress state
> pqReadData() -- backend closed the channel unexpectedly.

Ah-hah!  Oliver is right, then, at least in part --- that error message
from vacuum suggests that the vc_abort bug *is* biting you.  However,
there may be more going on, because what Oliver and others observed did
not include a coredump (at least not that I heard about).

You can probably suppress the problem by installing the patch I posted
to pgsql-patches a few days ago.  However, I'd appreciate it if you'd
first try to reproduce the problem again with debug/assert turned on,
so that we can get some idea whether there is an additional bug that's
only biting you and not the previous reporters.

BTW, if vc_abort is involved then the occurrence of the problem probably
depends on whether any other backends are using the database and what
they are doing.  (The vc_abort bug escaped notice up till last week
because it doesn't trigger when vacuum is the only thing running.)
Do you have other clients running when you do the vacuum?  What are
they doing?
        regards, tom lane