possible bug - Mailing list pgsql-bugs

From Brent Ewing
Subject possible bug
Date
Msg-id 199910022318.QAA08813@hoh.genome.washington.edu
Whole thread Raw
List pgsql-bugs
If PostgreSQL failed to compile on your computer or you found a bug that
is likely to be specific to one platform then please fill out this form
and e-mail it to pgsql-ports@postgresql.org.

To report any other bug, fill out the form below and e-mail it to
pgsql-bugs@postgresql.org.

If you not only found the problem but solved it and generated a patch
then e-mail it to pgsql-patches@postgresql.org instead.  Please use the
command "diff -c" to generate the patch.

You may also enter a bug report at http://www.postgresql.org/ instead of
e-mail-ing this form.

============================================================================
                        POSTGRESQL BUG REPORT TEMPLATE
============================================================================


Your name        :    Brent Ewing
Your email address    :    bge@u.washington.edu


System Configuration
---------------------
  Architecture (example: Intel Pentium)      :    DEC Alpha

  Operating System (example: Linux 2.0.26 ELF)     :    Digital UNIX 4.0D

  PostgreSQL version (example: PostgreSQL-6.5.2):   PostgreSQL-6.5.2

  Compiler used (example:  gcc 2.8.0)        :    mostly cc


Please enter a FULL description of your problem:
------------------------------------------------

In short, the backend crashes while trying to create certain indexes on a
table. I added some diagnostics in the modules nbtsort.c and bufpage.c
where it's dying. The backend output is

------

hoh> postmaster -d
FindExec: found "/usr/local/pgsql/bin/postgres" using argv[0]

/usr/local/pgsql/bin/postmaster: BackendStartup: pid 26070 user bge db est_db socket 6
FindExec: found "/usr/local/pgsql/bin/postgres" using argv[0]
started: host=localhost user=bge database=est_db
InitPostgres
StartTransactionCommand
ProcessQuery
CommitTransactionCommand
StartTransactionCommand
ProcessUtility


PageAddItem: lower > upper: lower: 920  upper: 912: alignedSize: 32  pageManagerShuffle: 1  shuffled: 1
sizeof_itemiddata:4  pd_lower: 916  pd_upper: 944  offsetNumber: 228  limit: 228 
_bt_buildadd: alloc flag: 1  pgspc_old: 0   btisz_old: 32  PageGetFreeSpace: 24
FATAL 1:  btree: failed to add item to the page in _bt_sort (2)
proc_exit(0) [#0]
shmem_exit(0) [#0]
exit(0)
/usr/local/pgsql/bin/postmaster: reaping dead processes...
/usr/local/pgsql/bin/postmaster: CleanupProc: pid 26070 exited with status 0

------



The problem occurs (is detected) in the function PageAddItem() (in bufpage.c)
in the block that now looks like

------

        if (offsetNumber > limit)
                lower = (Offset) (((char *) (&((PageHeader) page)->pd_linp[offsetNumber])) - ((char *) page));
        else if (offsetNumber == limit || shuffled == true)
                lower = ((PageHeader) page)->pd_lower + sizeof(ItemIdData);
        else
                lower = ((PageHeader) page)->pd_lower;

        alignedSize = DOUBLEALIGN(size);

        upper = ((PageHeader) page)->pd_upper - alignedSize;

        if (lower > upper)
        {
 fprintf( stderr, "PageAddItem: lower > upper: lower: %d  upper: %d: alignedSize: %d  pageManagerShuffle: %d  shuffled:
%d sizeof_itemidd 
ata: %d  pd_lower: %d  pd_upper: %d  offsetNumber: %d  limit: %d\n",
 (int)lower, (int)upper, (int)alignedSize, (int)PageManagerShuffle, shuffled, sizeof( ItemIdData ),
(int)((PageHeader) page)->pd_lower, (int)((PageHeader) page)->pd_upper, offsetNumber, limit );
          return InvalidOffsetNumber;
        }

------


 The problem is that lower > upper!

 The bit of output from the calling function, _bt_buildadd in nbtsort.c, shows
the values of pgspc and btisz near the start of the function. The code and my
additions at this point are

------

        nbuf = state->btps_buf;
        npage = state->btps_page;
        first_off = state->btps_firstoff;
        last_off = state->btps_lastoff;
        last_bti = state->btps_lastbti;

        pgspc = PageGetFreeSpace(npage);
        btisz = BTITEMSZ(bti);
        btisz = MAXALIGN(btisz);
        if (pgspc < btisz)
        {
                Buffer          obuf = nbuf;
                Page            opage = npage;
                OffsetNumber o,
                                        n;
                ItemId          ii;
                ItemId          hii;

pgspc_sav = pgspc;
btisz_sav = btisz;

                _bt_blnewpage(index, &nbuf, &npage, flags);

alloc_spc_flag = 1;


------



nd the code and my additions at the point where PageAddItem is called
and returns failure looks like

------

        /*
         * if this item is different from the last item added, we start a new
         * chain of duplicates.
         */
        off = OffsetNumberNext(last_off);
        if (PageAddItem(npage, (Item) bti, btisz, off, LP_USED) == InvalidOffsetNumber)
        {
fprintf( stderr, "_bt_buildadd: alloc flag: %d  pgspc_old: %d   btisz_old: %d  PageGetFreeSpace: %d\n", alloc_spc_flag,
pgspc_sav,btisz_s 
av, (int)PageGetFreeSpace(npage) );
                elog(FATAL, "btree: failed to add item to the page in _bt_sort (2)");
        }
#ifdef NOT_USED
#if defined(FASTBUILD_DEBUG) && defined(FASTBUILD_MERGE)
        {
                bool            isnull;
                Datum           d = index_getattr(&(bti->bti_itup), 1, index->rd_att, &isnull);

                printf("_bt_buildadd: inserted <%x> at offset %d at level %d\n",
                           d, off, state->btps_level);
        }
#endif   /* FASTBUILD_DEBUG && FASTBUILD_MERGE */
#endif


------

Incidentally, I vacuumed several times, without affecting the outcome.
Also, the problem surfaced as I ran PG v6.5. I subsequently installed
v6.5.2 without modifying the database, and tried again with the same
result.


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------

 I can repeat this on my data set, perfectly consistently. If this is
really a bug, I can add diagnostic code where ever you would like it
added. (The data set is over a Gbyte so it is not easily sent.)



If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------



pgsql-bugs by date:

Previous
From: Krzysztof Czuma
Date:
Subject: ...
Next
From: Brent Ewing
Date:
Subject: possible bug