Thread: BUG #5984: Got FailedAssertion("!(opaque->btpo_prev == target)", File: "nbtpage.c", Line: 1166)

The following bug has been logged online:

Bug reference:      5984
Logged by:          BORSCHNECK Pascal
Email address:      borschneck@hotmail.com
PostgreSQL version: 8.4.4 and 8.4.7
Operating system:   Linux  2.6.18-194.32.1.el5 #1 SMP Mon Dec 20 10:52:42
EST 2010 x86_64 x86_64 x86_64 GNU/Linux
Description:        Got FailedAssertion("!(opaque->btpo_prev == target)",
File: "nbtpage.c", Line: 1166)
Details:

I have a postgresql with several database on a test VM. I got this while a
backup script did a "VACUUM FULL ANALYZE"
2011-04-05 00:05:07 CEST   pid:19313 LOG:  no left sibling (concurrent
deletion?) in "i_cmsttry_dtype"
TRAP: FailedAssertion("!(opaque->btpo_prev == target)", File: "nbtpage.c",
Line: 1166)
2011-04-05 00:05:07 CEST   pid:10127 LOG:  server process (PID 19313) was
terminated by signal 6: Aborted
2011-04-05 00:05:07 CEST   pid:10127 LOG:  terminating any other active
server processes

... and postgresql crash ...

On a VM is important:
-=-=-=-=-=-=-=-=-=-=-=-
Because I noticed that doing a pause to a VM (to make a VM copy for example)
often creates duplicates key problems error in the logs.

So
- it's on a VM -> creates duplicate (and crash also creates new ones ;) )
- there are duplicates key problems like "ERROR:  duplicate key value
violates unique constraint "pg_XXXXXXX_index""
- autovacuum is on
- "old process scripts on a db does a VACUUM FULL ANALYZE" during it's
backup process
(I know this shouldn't be done cf
http://wiki.postgresql.org/wiki/VACUUM_FULL#When_.28not.29_to_use_VACUUM_FUL
L but it's not my script ;) )
- I also tried a VACUUM ANALYZE in order to detect incoherent database, same
result: crash

(A "reindexdb --all" may correct the duplicate key and avoid this crash, but
it may occur to other people so I posted it here)

Regards,
Pascal
On Mon, Apr 18, 2011 at 5:16 AM, BORSCHNECK Pascal
<borschneck@hotmail.com> wrote:
> Because I noticed that doing a pause to a VM (to make a VM copy for example)
> often creates duplicates key problems error in the logs.

That's kind of surprising, but I can't help wondering if it's a bug in
what virtualization tool you are using.

Any way to get a stack trace on the assertion failure?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi Robert,
Sorry for the late reply, I was not here last week.
About your question: "That's kind of surprising, but I can't help wondering=
 if it's a bug in what virtualization tool you are using."I noticed the pro=
blem FIRST on VM.  And by pausing/stopping/launching a VM it was easy to ge=
t these error back.
But since some weeks, I also had duplicates on real server.  And same resul=
t if I try to launch a VACUUM: crash with the same assertion.Which "stack t=
race on the assertion failure" do you want to have ?
Regards,



Pascal BORSCHNECK


Arfy's blog http://www.arfy.fr



> Date: Wed, 11 May 2011 15:48:29 -0400
> Subject: Re: [BUGS] BUG #5984: Got FailedAssertion("!(opaque->btpo_prev =
=3D=3D target)", File: "nbtpage.c", Line: 1166)
> From: robertmhaas@gmail.com
> To: borschneck@hotmail.com
> CC: pgsql-bugs@postgresql.org
>=20
> On Mon, Apr 18, 2011 at 5:16 AM, BORSCHNECK Pascal
> <borschneck@hotmail.com> wrote:
> > Because I noticed that doing a pause to a VM (to make a VM copy for exa=
mple)
> > often creates duplicates key problems error in the logs.
>=20
> That's kind of surprising, but I can't help wondering if it's a bug in
> what virtualization tool you are using.
>=20
> Any way to get a stack trace on the assertion failure?
>=20
> --=20
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
=20=09=09=20=09=20=20=20=09=09=20=20=
On Mon, May 16, 2011 at 8:04 AM, Pascal Borschneck
<borschneck@hotmail.com>wrote:

>  Hi Robert,
>
> Sorry for the late reply, I was not here last week.
>
> About your question: "*That's kind of surprising, but I can't help
> wondering if it's a bug in what virtualization tool you are using.*"
> I noticed the problem FIRST on VM.  And by pausing/stopping/launching a VM
> it was easy to get these error back.
>
> *But since some weeks*, I also had duplicates on real server.  And same
> result if I try to launch a VACUUM: crash with the same assertion.
> Which "*stack trace on the assertion failure*" do you want to have ?
>

Well, when a backend crashes, it would be helpful to know what was going on
when it crashed.  See this wiki page:

http://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
<meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">



    Hi Robert,

    I'll try this, if possible, with one of the server with
    "duplicates".
    I found another problem to generate duplicates: play with DB with no
    disk space left.

    After that, if I launch a VACCUM FULL, same crash.

    Global question as I didn't find it: are "ERROR:   duplicate key
      value violates unique constraint" fatal errors ? As they may
    (most of the time) be corrected with reindexdb --all
    Someone told me some weeks ago "it's only a notice, the database
      is not corrupted" ...

    Regards,

      <div style="font-size: 13.3px; font-family:
        Verdana,Arial,Helvetica,sans-serif;">

          <span style="font-weight: bold;
              color: rgb(51, 51, 51);">Pascal BORSCHNECK<br
              style="color: rgb(51, 51, 51);">
            Port: +33
              (0)630495775
             
            <a
                href="mailto:borschneck@hotmail.com">borschneck@hotmail.com
              http://www.arfy.fr

        My profiles:  <a
          href="http://fr.linkedin.com/in/pascalborschneck"
          style="text-decoration: underline;"><img alt="LinkedIn"
            style="padding: 0px 0px 5px; vertical-align: middle;"
            src="cid:part1.06080504.05080708@hotmail.com" width="16"
            border="0" height="16"> <a
          href="http://www.facebook.com/borschneck"
          style="text-decoration: underline;"><img alt="Facebook"
            style="padding: 0px 0px 5px; vertical-align: middle;"
            src="cid:part2.08010508.01010204@hotmail.com" width="16"
            border="0" height="16"> <a
          href="http://twitter.com/Arfang" style="text-decoration:
          underline;"><img alt="Twitter" style="padding: 0px 0px 5px;
            vertical-align: middle;"
            src="cid:part3.02000203.07070804@hotmail.com" width="16"
            border="0" height="16">
        Chat <img alt="MSN"
          style="padding: 0px 0px 5px; vertical-align: middle;"
          src="cid:part4.04030500.08000303@hotmail.com" width="16"
          border="0" height="16"> borschneck@hotmail.com <img
          alt="Skype" style="padding: 0px 0px 5px; vertical-align:
          middle;" src="cid:part5.06070705.07090906@hotmail.com"
          width="16" border="0" height="16"> arfyfr <img alt="Google
          Talk" style="padding: 0px 0px 5px; vertical-align: middle;"
          src="cid:part6.08050308.01000907@hotmail.com" width="16"
          border="0" height="16"> pascal.borschneck@gmail.com
        Arfy's
            blog <a
            href="http://feedproxy.google.com/%7Er/Arfyz/%7E3/uUwbMY5UHSg/index.php"><span
              style="color: rgb(0, 0, 238); text-decoration: underline;">Geeky-gadgets.com
              - Eton Raptor Solar Survival Tool
        <img src="cid:part7.00040903.02000002@hotmail.com" width="1"
          height="1">



    Le 23/05/2011 19:12, Robert Haas a écrit :
    <blockquote
      cite="mid:BANLkTim2GUXA9TF93jRfDxCR6pTUOWZu3A@mail.gmail.com"
      type="cite">On Mon, May 16, 2011 at 8:04 AM, Pascal Borschneck <span
        dir="ltr"><<a moz-do-not-send="true"
          href="mailto:borschneck@hotmail.com">borschneck@hotmail.com>
      wrote:

        <blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
          0.8ex; border-left: 1px solid rgb(204, 204, 204);
          padding-left: 1ex;">

            Hi Robert,


            Sorry for the late reply, I was not here last week.


            About your question: "That's kind of surprising, but
                I can't help wondering if it's a bug in what
                virtualization tool you are using."
            I noticed the problem FIRST on VM.  And by
              pausing/stopping/launching a VM it was easy to get these
              error back.


            But since some weeks, I also had duplicates on
              real server.  And same result if I try to launch a VACUUM:
              crash with the same assertion.
            Which "stack trace on the assertion failure" do
              you want to have ?





      Well, when a backend crashes, it would be helpful to know what was
      going on when it crashed.  See this wiki page:

      <a moz-do-not-send="true"

href="http://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend">http://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend<br
        clear="all">

      --
      Robert Haas
      EnterpriseDB: <a moz-do-not-send="true"
        href="http://www.enterprisedb.com" target="_blank">http://www.enterprisedb.com
      The Enterprise PostgreSQL Company
On Wed, May 25, 2011 at 4:06 AM, Pascal Borschneck
<borschneck@hotmail.com>wrote:

>  Global question as I didn't find it: are "ERROR:   duplicate key value
> violates unique constraint" fatal errors ? As they may (most of the time)
> be corrected with reindexdb --all
> Someone told me some weeks ago "*it's only a notice, the database is not
> corrupted*" ...
>

Well, if you get that error, then that either means that the index is
confused (it has duplicate index pointers for the same value when it really
shouldn't) or that the table is confused (it has duplicate rows with the
supposedly unique value when it shouldn't).  In the first case, a REINDEX
will fix it; in the second case, your data is corrupt.

But even your indexes really shouldn't be getting corrupted if everything is
working properly.  Are you running with fsync=off?  If not, you probably
want to check your drive write caches and the integrity of your hardware.
This kind of thing usually means that corruption is happening somewhere, and
if you don't track it down and fix it, eventually it will probably add up to
something serious.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company