Re: "error with invalid page header" while vacuuming pgbench data - Mailing list pgsql-performance

From John Rouillard
Subject Re: "error with invalid page header" while vacuuming pgbench data
Date
Msg-id 20110525220716.GF31823@renesys.com
Whole thread Raw
In response to Re: "error with invalid page header" while vacuuming pgbench data  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: "error with invalid page header" while vacuuming pgbench data
List pgsql-performance
On Wed, May 25, 2011 at 03:19:59PM -0500, Kevin Grittner wrote:
> John Rouillard <rouilj@renesys.com> wrote:
> > On Mon, May 23, 2011 at 05:21:04PM -0500, Kevin Grittner wrote:
> >> John Rouillard <rouilj@renesys.com> wrote:
> >>
> >> >  I seem to be able to provoke this error:
> >> >
> >> >    vacuum...ERROR:  invalid page header in
> >> >             block 2128910 of relation base/16385/21476
> >>
> >> What version of PostgreSQL?
> >
> > Hmm, I thought I replied to this, but I haven't seen it come back
> > to me on list.  It's postgres version: 8.4.5.
> >
> > rpm -q shows
> >
> >    postgresql84-server-8.4.5-1.el5_5.1
>
> I was hoping someone else would jump in, but I see that your
> previous post didn't copy the list, which solves *that* mystery.
>
> I'm curious whether you might have enabled one of the "it's OK to
> trash my database integrity to boost performance" options.  (People
> with enough replication often feel that this *is* OK.)  Please run
> the query on this page and post the results:
>
> http://wiki.postgresql.org/wiki/Server_Configuration
>
> Basically, if fsync or full_page_writes is turned off and there was
> a crash, that explains it.  If not, it provides more information to
> proceed.

Nope. Neither is turned off. I can't run the query at the moment since
the system is in the middle of a memtest86+ check of 96GB of
memory. The relevent parts from the config file from the Configuration
Management system are:

  #fsync = on                             # turns forced synchronization
                                          # on or off
  #synchronous_commit = on                # immediate fsync at commit
  #wal_sync_method = fsync                # the default is the first option

  #full_page_writes = on                  # recover from partial page writes

this is the same setup I use on all my data warehouse systems (with
minor pgtune type changes based on amount of memory). Running the
query on another system (using ext3, centos 5.5) shows:

 version                        | PostgreSQL 8.4.5 on
x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red
Hat 4.1.2-48), 64-bit
 archive_command                | if test ! -e
/var/lib/pgsql/data/ARCHIVE_ENABLED; then exit 0; fi; test ! -f
/var/bak/pgsql/%f && cp %p /var/bak/p
gsql/%f
 archive_mode                   | on
 checkpoint_completion_target   | 0.9
 checkpoint_segments            | 64
 constraint_exclusion           | on
 custom_variable_classes        | pg_stat_statements
 default_statistics_target      | 100
 effective_cache_size           | 8GB
 lc_collate                     | en_US.UTF-8
 lc_ctype                       | en_US.UTF-8
 listen_addresses               | *
 log_checkpoints                | on
 log_connections                | on
 log_destination                | stderr,syslog
 log_directory                  | pg_log
 log_filename                   | postgresql-%a.log
 log_line_prefix                | %t %u@%d(%p)i:
 log_lock_waits                 | on
 log_min_duration_statement     | 2s
 log_min_error_statement        | warning
 log_min_messages               | notice
 log_rotation_age               | 1d
 log_rotation_size              | 0
 log_temp_files                 | 0
 log_truncate_on_rotation       | on
 logging_collector              | on
 maintenance_work_mem           | 1GB
 max_connections                | 300
 max_locks_per_transaction      | 128
 max_stack_depth                | 2MB
 port                           | 5432
 server_encoding                | UTF8
 shared_buffers                 | 4GB
 shared_preload_libraries       | pg_stat_statements
 superuser_reserved_connections | 3
 tcp_keepalives_count           | 0
 tcp_keepalives_idle            | 0
 tcp_keepalives_interval        | 0
 TimeZone                       | UTC
 wal_buffers                    | 32MB
 work_mem                       | 16MB

> You might want to re-start the thread on pgsql-general, though.  Not
> everybody who might be able to help with a problem like this follows
> the performance list.  Or, if you didn't set any of the dangerous
> configuration options, this sounds like a bug -- so pgsql-bugs might
> be even better.

Well I am also managing to panic the kernel on some runs as well.  So
my guess is this is not only a postgres bug (if it's a postgres issue
at all).

As gregg mentioned in another followup ext4 under centos 5.x may be an
issue. I'll drop back to ext3 and see if I can replicate the
corruption or crashes one I rule out some potential hardware issues.

If I can replicate with ext3, then I'll follow up on -general or
-bugs.

Ext4 pgbench results complete faster, but if it's not reliable ....

Thanks for your help.

--
                -- rouilj

John Rouillard       System Administrator
Renesys Corporation  603-244-9084 (cell)  603-643-9300 x 111

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Speeding up loops in pl/pgsql function
Next
From: Merlin Moncure
Date:
Subject: Re: FW: KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)