Home > mailing lists

Re: Corrupt indexes on slave when using pg_bulkload on master - Mailing list pgsql-general

From	James Cowell
Subject	Re: Corrupt indexes on slave when using pg_bulkload on master
Date	December 10, 2012 17:08:21
Msg-id	1355148494.4511.YahooMailNeo@web186006.mail.ir2.yahoo.com Whole thread Raw
In response to	Re: Corrupt indexes on slave when using pg_bulkload on master ("James Cowell" <jcowell@btinternet.com>)
List	pgsql-general

Tree view

Hello,

Does anyone else have any insight or information around this issue?  I can't find anything out there on the web.

If it's a simple incompatability then fine, but I'd still like to understand why the indexes on the secondary node corrupt when the data seems to be replicated without issue.

Thanks,

James
From: James Cowell <jcowell@btinternet.com>
To: 'Jeff Janes' <jeff.janes@gmail.com> 
Cc: pgsql-general@postgresql.org 
Sent: Wednesday, 5 December 2012, 18:56
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on master

Hi Jeff,

Thanks for your reply.

I can understand that it would be incompatible if you bypassed the WAL logs,
but I've got it configured to use the buffered loader so that WAL logs are
written.  The data replicates fine, the indexes don't, yet a reindex on the
primary fixes the index on the secondary.  I'd like to understand why if
possible.

I've seen mention of bug(s) around corrupt indexes on the secondary node,
there was a fix in 9.1.6, but it doesn't appear to be the same problem I'm
having. 

I've also seen other people with similar errors that aren't using
pg_bulkload, for example:
http://postgresql.1045698.n5.nabble.com/BUG-7562-could-not-read-block-0-in-f
ile-quot-base-16385-16585-quot-read-only-0-of-8192-bytes-td5724738.html

Cheers,

James

-----Original Message-----
From: Jeff Janes [mailto:jeff.janes@gmail.com] 
Sent: 05 December 2012 18:02
To: James Cowell
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Corrupt indexes on slave when using pg_bulkload on
master

On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@btinternet.com> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a 
> postgres database hourly.
>
> This database is replicated to a second node.

pg_bulkload is fundamentally incompatible with PITR, streaming, and forms of
replication that depend on those things.

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I'm using pg_bulkload to load large amounts of CSV data into a postgres database hourly.
This database is replicated to a second node.
Whenever a bulk load happens the indexes on the updated tables on the secondary node corrupt and are unusable until a reindex is run on the primary node. I get the error below on node 2:
ERROR: index "tablename" contains unexpected zero page at block 0
SQL state: XX002
Hint: Please REINDEX it.
I'm assuming that this is because of the way pg_bulkload builds the index on the primary, and possibly has something to do with the way pg_bulkload overwrites rows in the event of a constraint violation, but at the same time if something works on the primary shouldn't the replicated node be able to process the WAL log?
I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build 3.1.1-1.pg91.rhel6 and it happens every time.
Does anyone have any experience in this area or advice they could give? If you can point out something stupid I'm doing that would be very welcome :)
Thanks,
James

pgsql-general by date:

From: Henry Drexler
Date: 10 December 2012, 16:26:42
Subject: Re: query performance, though it was timestamps,maybe just table size?

From: Christian Hammers
Date: 10 December 2012, 17:14:54
Subject: logcheck compatible daily logfile rotation?

Re: Corrupt indexes on slave when using pg_bulkload on master - Mailing list pgsql-general

Previous

Next