Thread: Postgres "invalid page header"

Postgres "invalid page header"

From
Carl Anderson
Date:
Hubert,

I too have noticed such an issue

Would you be willing to try to load a large table (from a SQL file).
It reliably demonstrates the behavior "invalid page header"
after completion (for me)

It seems to me that while there are WAL logs being flushed, a certain
type of failed SQL insert will cause a page to be marked wrong.

C.

> Hi list,
>
> I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8
GHz,4 GB memory and a RAID 5 system with ca. 500 GB diskspace (xfs file system) 
>
>
> When doing big transactions or changes (UPDATE several million rows in one step) on a database with ca. 50 GB
diskspaceand ca 15 million entries, PostGIS and lots of indexes, I get errors like 
>
> ERROR: invalid page header in block 582024 of relation ...
>
> The error does not occur regularly. It seems to me that the error is related with heavy load and heavy I/O . CPU and
memorydoes not seem the problem, according to my hotsanic tools. 
>
> I have worked through
>
> http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
> http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
>
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8
>
>
> which suggest to me that I have run into hardware trouble (disk? memory?!) But I supposed that my server was not SO
badfor this database application... 
>
> How to fix that? (I mean - not to repair the tables or pages but to get a stable one? ) Trying another sotrage
hardware?!
> --
> -------------------------------------------------------------------------------
> Dr.-Ing. Hubert Fröhlich
> Bezirksfinanzdirektion München
> Alexandrastr. 3, D-80538 München, GERMANY
> Tel. :+49 (0)89 / 2190 - 2980
> Fax :+49 (0)89 / 2190 - 2997
> hubert dot froehlich at bvv dot bayern dot de
>


--
Carl Anderson
GIS Manager, Fulton County E&CD
404.730.8026
carl.anderson@co.fulton.ga.us


Re: Postgres "invalid page header"

From
"Florian G. Pflug"
Date:
On Thu, Jun 17, 2004 at 06:07:05PM -0400, Carl Anderson wrote:
> >I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP ProLiant
> >DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a RAID 5
> >system with ca. 500 GB diskspace (xfs file system)
> >
> >
> >When doing big transactions or changes (UPDATE several million rows in one
> >step) on a database with ca. 50 GB diskspace and ca 15 million entries,
> >PostGIS and lots of indexes, I get errors like
> >
> >ERROR: invalid page header in block 582024 of relation ...
> >
> >The error does not occur regularly. It seems to me that the error is
> >related with heavy load and heavy I/O . CPU and memory does not seem the
> >problem, according to my hotsanic tools.

Hi

I believe that I had a similar problem - but since that database in question
was still running I thought it was releated to the 7.4.1 bug regarding the
wrong alignment information.

Our database is running on an 2x Xeon 2.66GHz, 2GB of RAM, and two 120GB
Drives (Seagate ST3120026AS) combined to a software-raid-1 volume (using the
md driver). Postgres has it's data on an XFS-Filesystem (74GB). We are using
kernel 2.6.6 with the "deadline" io-scheduler).

We daily load about 2-3 million rows into the database (in one big
transaction). The first problem
that appeared where crashing selects - the "Invalid page header"-problem
appeared when I tried to upgrade to 7.4.2, in the "analyze" step right after
fixing the wrong alignment info in the system table.

Since we couldn't repair the database (and more and more selects started to
crash), we dumped everything that was still dumpable, and reinitialized the
database. The new database is now running for about 1 1/2 weeks, and there
haven't been any problems until now.

So - maybe it's really an XFS problem... Which version of linux are you
running?

greetings, Florian Pflug

Re: Postgres "invalid page header"

From
Hubert Fröhlich
Date:
Florian,

my hardware

 >>> I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP
ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a
RAID 5 system with ca. 500 GB diskspace (xfs file system)
 >>>
seems somewhat similar to yours: I have been using the 2.4.22 kernel. I
wanted do upgrade to a 2.6.X, but your experience ... (see below)

 >>>
 >>> When doing big transactions or changes (UPDATE several million rows
in one step) on a database with ca. 50 GB diskspace and ca 15 million
entries, PostGIS and lots of indexes, I get errors like
 >>>
 >>> ERROR: invalid page header in block 582024 of relation ...
 >>>
 >>> The error does not occur regularly. It seems to me that the error
is related with heavy load and heavy I/O . CPU and memory does not seem
the problem, according to my hotsanic tools.
 >
 >
 >
 > Hi
 >
 > I believe that I had a similar problem - but since that database in
question
 > was still running I thought it was releated to the 7.4.1 bug
regarding the
 > wrong alignment information.
 >
 > Our database is running on an 2x Xeon 2.66GHz, 2GB of RAM, and two 120GB
 > Drives (Seagate ST3120026AS) combined to a software-raid-1 volume
(using the
 > md driver). Postgres has it's data on an XFS-Filesystem (74GB). We
are using
 > kernel 2.6.6 with the "deadline" io-scheduler).
 >
 > We daily load about 2-3 million rows into the database (in one big
 > transaction). The first problem
 > that appeared where crashing selects - the "Invalid page header"-problem
 > appeared when I tried to upgrade to 7.4.2, in the "analyze" step
right after
 > fixing the wrong alignment info in the system table.


So, can I conclude that
a) the error happened with 7.4.1 and with 7.4.2 upgraded from 7.4.1 as
described in your posting
http://archives.postgresql.org/pgsql-general/2004-06/msg00647.php .
Are you sure that it does no longer occur in your "clean" install of 7.4.2?
b) this also happens with a 2.6 kernel so this is not a 2.4 kernel issue?

Reproducing the error is a bit difficult as it seems to occur only on
high load in big databases.

I'll try to isolate the problem on a smaller scale. Anybody on the air
who has had the problem on a smaller issue?

 >
 > Since we couldn't repair the database (and more and more selects
started to
 > crash), we dumped everything that was still dumpable, and
reinitialized the
 > database. The new database is now running for about 1 1/2 weeks, and
there
 > haven't been any problems until now.
 >
 > So - maybe it's really an XFS problem... Which version of linux are you
 > running?


2.4.22, see above. To isolate filesystem problems I'd like to try with a
NAS if I get one. Do you think this makes sense?

Greetings,

Hubert Fröhlich

--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Fröhlich
Bezirksfinanzdirektion München
Alexandrastr. 3, D-80538 München, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax  :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de


Re: Postgres "invalid page header"

From
Hubert Fröhlich
Date:
Hi list,
hello Carl, Florian,

maybe you remember my posting from 18.06.2004 and your reaction(-s)  The
problem is reproducing my error that does not happen very often, i.e.
isolating the problem ... As Tom Lane put it: "I would very much like to
see a test case that reliably produces "invalid page header" ..."

Meanwhile, I was able to reproduce the error (and others) with some very
basic scripts enclosed. If you are interested in the problem, feel free
to use them (No liability ...)

The basic doing is

1) prepCrashtest.sh sets up SQL files for creating & populating (via
COPY) a database  and doing a big update in 1 transaction. You can scale
it by varying the numberOfRows parameter which sets the size of the only
table (this may take a while because of random generation via openssl).
There is only one table just with characters and integers and some
indexes. No special datatypes, just very basic SQL.

2) multiCrashtest.sh starts working on several databases at the same
time on each db:
  a) creating, populating, dumping, vcuuming the db and  performing a
"big" update operation (by singleCrashtest.sh),
  b) all that in several cycles (5 to 10 seems absolutely enough).

You can scale this by varying the number of databases accessed
simultaneously

E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19
kernel), I regularly crash when using
numberOfRows=300000 and ndatabases=3
I regularly get crashes like

ERROR:  could not access status of transaction 1497515332
ERROR:  could not open segment 1 of relation "idx_vmpnfl_obnr" (...
ERROR:  invalid page header in block 25190 of relation "vmpnfl"
PANIC:  right sibling's left-link doesn't match
ERROR:  could not access status of transaction 188219392

(even PANIC!)
Same holds for an even smaller notebook .

On my big box (see below)  i'm still testing with larger values. But as
you can see, the error can be reproduced on a smaller scale with a
smaller box.


The postgresql.conf settings in all cases are somewhat standard, and I
don't believe I can influence the behavior by using different settings.
I am not too familiar with postgres internals, let alone the C coding,
but I guess that there might be some buffer overflow?!

As the hardware on testing was very different, Meanwhile, I dont believe
this is a hardware issue; but this is just a "feeling", no expert's
opinion.

The crashes seem to happen only when  *** several *** databases are
accessed simultaneously on the same box - and only when there is heavy
load.

The three scripts are enclosed below in plaintext to avoid cutting of
attachments (sorry for bad line formatting). The scripts may be VERY
BASIC but I hope they can shed some light on the issue

<snip>
#!/bin/bash
# prepCrashtest.sh
# prepare a SQL file containing a COPY with $1 randomly generated rows
# ---------------------------------------------------------------------
#

THIS=`basename $0`
if [ $# != 1 ]
then
    echo "usage: $THIS numberOfRows"
    exit 1
fi
nrows=$1

make_copy_row()
# ----------------------------------------------------------------------
# create a row for COPY command randomly
{
    obnr=`openssl rand -base64 20 | cut -c1-16`
    pkn=`openssl rand -base64 25 | cut -c1-30`
    echo -e "$obnr\t0\t2147483647\t10\t$pkn\t\\N\t0"
}

make_copy_nrows()
{
#
----------------------------------------------------------------------------
# create a COPY command for a table with $nrows randomly generated rows
   echo "COPY vmpnfl (obnr, edat, udat, vma, pkn, dpp, orient) FROM stdin;"
   n=0
   while [[ $n -lt $nrows ]]
   do
     make_copy_row
     n=`expr $n + 1`
   done
   echo "\."
}

echo "...create SQL CREATE TABLE file"

echo "CREATE TABLE vmpnfl (obnr character(16), edat integer DEFAULT 0,
udat integer DEFAULT 2147483647, vma smallint, pkn character(30), dpp
smallint, orient double precision DEFAULT 0 );">crashSetup.sql
echo "CREATE INDEX idx_vmpnfl_obnr ON vmpnfl USING btree
(obnr);">>crashSetup.sql
echo "CREATE INDEX idx_vmpnfl_udat ON vmpnfl USING btree
(udat);">>crashSetup.sql
echo "CREATE INDEX idx_vmpnfl_oid ON vmpnfl USING btree
(oid);">>crashSetup.sql

echo "...create SQL UPDATE file"

echo "begin; update vmpnfl set dpp = 1234; commit" >crashUpdate.sql

echo "...create SQL COPY file with $nrows rows ... this may take a
little while"

make_copy_nrows >crashTable.sql

echo "Finished. Try now:"
echo "multiCrashtest.sh ndatabases servername ncycles"
<snip>
#!/bin/bash
# singleCrashtest.sh
# testing a single database on heavy load
# ----------------------------------------------------------------------
#    setting up a database with one table and some indexex
#    populating the db
#    dumping the db somewhere
#    vacuuming the db
#    performing a big update operation: update of all columns in one row


THIS=`basename $0`
if [ $# != 3 ]
then
    echo "usage: $THIS dbname servername ncycles"
    exit 1
fi


database=$1
server=$2
ncycles=$3
dumpfile=dump$database.sql

echo "=== test on database $database, server $server, $ncycles R/W
cycles: "`date +%Y%m%d%H%M%S`" ==="

cycle=0

while [[ $cycle -lt $ncycles ]]
do
   cycle=`expr $cycle + 1`
   echo "=== cycle $cycle: "`date +%Y%m%d%H%M%S`" ==="


   dropdb -h $server $database 2>&1
   createdb -h $server $database 2>&1
   psql -h $server $database -f crashSetup.sql 2>&1| grep -v CREATE #
reducing debug output

   psql -h $server $database -f crashTable.sql  2>&1 | grep -v INSERT #
reducing debug output
   sleep 3
   echo "--- cycle $cycle: Setup       "`date +%Y%m%d%H%M%S`" ---"

   pg_dump -h $server $database >$dumpfile 2>&1
   sleep 3
   echo "--- cycle $cycle: Dump        "`date +%Y%m%d%H%M%S`" ---"
   rm $dumpfile

   echo -n "Tuples: "
   psql $database -h $server -P tuples_only -c "select count(*) from vmpnfl"
   psql $database -h $server -c "vacuum verbose analyze" 2>&1
   sleep 3
   echo "--- cycle $cycle: Vacuum      "`date +%Y%m%d%H%M%S`" ---"

   psql -h $server $database -f crashUpdate.sql 2>&1
   sleep 3
   echo "--- cycle $cycle: BIG Update  "`date +%Y%m%d%H%M%S`" ---"

done


<snip>
#!/bin/bash
# multiCrashtest.sh
# testing databases on heavy load:
# Reading/writing into several databases simultaneously:
# this can be looped several times
# load can be increased
#   by increasing ndb = no. of databases being accessed simultaneously
#   or by increasing the number of rows in prepCrashtest.sh
# ----------------------------------------------------------------------
# ndatabases
# servername = name of database server
# ncycles =

THIS=`basename $0`
if [ $# != 3 ]
then
    echo "usage: $THIS ndatabases servername ncycles"
    exit 1
fi


ndb=$1
server=$2
ncycles=$3

# ----------------------------------------------------------------------
echo "...loop over $ndb databases on server $servername: $ncycles R/W
cycles per database"

db=0
while [[ $db -lt $ndb ]]
do
   # putting job to background
   singleCrashtest.sh  crash${db} localhost $ncycles >log_crash${db} &
   echo "   ... process for database crash${db} started"
   sleep 5
   db=`expr $db + 1`
done
exit


<snip>


>
> Hubert,
>
> I too have noticed such an issue
>
> Would you be willing to try to load a large table (from a SQL file).
> It reliably demonstrates the behavior "invalid page header"
> after completion (for me)
>
> It seems to me that while there are WAL logs being flushed, a certain
> type of failed SQL insert will cause a page to be marked wrong.
>
> C.
>
>> Hi list,
>>
>> I am working with 7.4.1 under Linux (SuSE 8.1) The server is a HP
>> ProLiant DL 380-G3, 2x Intel Pentium4-Xeon, 2.8 GHz, 4 GB memory and a
>> RAID 5 system with ca. 500 GB diskspace (xfs file system)
>>
>>
>> When doing big transactions or changes (UPDATE several million rows in
>> one step) on a database with ca. 50 GB diskspace and ca 15 million
>> entries, PostGIS and lots of indexes, I get errors like
>>
>> ERROR: invalid page header in block 582024 of relation ...
>>
>> The error does not occur regularly. It seems to me that the error is
>> related with heavy load and heavy I/O . CPU and memory does not seem
>> the problem, according to my hotsanic tools.
>>
>> I have worked through
>>
>> http://archives.postgresql.org/pgsql-general/2003-11/msg01288.php
>> http://archives.postgresql.org/pgsql-admin/2003-09/msg00423.php
>>
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=20030922162322.E12708%40quartz.newn.cam.ac.uk&rnum=8&prev=/groups%3Fq%3Dpg_filedump%26hl%3Den%26lr%3D%26ie%3DUTF-8%26selm%3D20030922162322.E12708%2540quartz.newn.cam.ac.uk%26rnum%3D8

>>
>>
>>
>> which suggest to me that I have run into hardware trouble (disk?
>> memory?!) But I supposed that my server was not SO bad for this
>> database application...
>>
>> How to fix that? (I mean - not to repair the tables or pages but to
>> get a stable one? ) Trying another sotrage hardware?!
>>
--
-------------------------------------------------------------------------------
Dr.-Ing. Hubert Fröhlich
Bezirksfinanzdirektion München
Alexandrastr. 3, D-80538 München, GERMANY
Tel. :+49 (0)89 / 2190 - 2980
Fax  :+49 (0)89 / 2190 - 2997
hubert dot froehlich at bvv dot bayern dot de


Re: Postgres "invalid page header"

From
Tom Lane
Date:
=?ISO-8859-1?Q?Hubert_Fr=F6hlich?= <hubert.froehlich@bvv.bayern.de> writes:
> E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19
> kernel), I regularly crash when using
> numberOfRows=300000 and ndatabases=3

FWIW, I could not reproduce any such problem here.  I ran three test
cycles with numberOfRows=300000 and ndatabases=3 under PG 7.4.3 and
another three under CVS tip (amounting to something over six hours
total runtime).  I saw no errors whatsoever.  I was using Red Hat 8.0
with 2.4.18-24.8.0 kernel on a Dell P4 desktop machine.

I'm leaning to the theory that you've got intermittent hardware
problems.

            regards, tom lane

Re: Postgres "invalid page header"

From
Manfred Koizar
Date:
On Fri, 23 Jul 2004 14:33:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>=?ISO-8859-1?Q?Hubert_Fr=F6hlich?= <hubert.froehlich@bvv.bayern.de> writes:
>> E.g. On a client (Athlon 1400+, 512 RAM, XFS, SuSE Linux 8.1 with 2.4.19
>> kernel), I regularly crash [...]

Hubert, have you ever tested with something different than XFS?  I
volunteer to cross-check any test you want me to, as long as you can
reproduce the problem on an ext2 file system first.

>FWIW, I could not reproduce any such problem here.  [...]
>  I was using Red Hat 8.0
>with 2.4.18-24.8.0 kernel on a Dell P4 desktop machine.

Tom, do you remember what type of FS you used?

Servus
 Manfred

Re: Postgres "invalid page header"

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
> On Fri, 23 Jul 2004 14:33:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> FWIW, I could not reproduce any such problem here.  [...]
>> I was using Red Hat 8.0
>> with 2.4.18-24.8.0 kernel on a Dell P4 desktop machine.

> Tom, do you remember what type of FS you used?

That machine is using ext3.

            regards, tom lane