Thread: PITR Recovery and out-of-sync indexes

PITR Recovery and out-of-sync indexes

From

Brian Wipf

Date:

02 October 2007, 20:11:21

We are running a production server off of a new database that was
synchronized using PITR recovery. We found that many of the btree
indexes were out of sync with the underlying data after bringing the
new server out of recovery mode, but the data itself appeared to be
okay.

Both servers have identical Intel processors and both are running 64-
bit PostgreSQL 8.2.4. The original server is running 64-bit openSUSE
10.2 (Linux 2.6.18.2-34-default #1 SMP Mon Jul 16 01:16:32 GMT 2007
x86_64 x86_64 x86_64 GNU/Linux) and the new server is running Mac OS
X Leopard Server.

The first tip-off that things were amiss was this error in the log:
[2007-10-02 01:12:27 MDT] postgres@ssprod host:192.168.0.54(53976)
ERROR:  duplicate key violates unique constraint
"fed_product__unique__data_feed_id_prod_id_from_src_idx"
[2007-10-02 01:12:27 MDT] postgres@ssprod host:192.168.0.54(53976)
STATEMENT:  UPDATE FED_PRODUCT SET FEEDS_TO_MERCHANT_PRODUCT_ID =
5108509 WHERE (PRODUCT_ID = decode
('0000C0A80036000007D22F00000001155F68741EFE1555FB','hex') AND
DATA_FEED_ID = decode
('0000C0A80012000007D718000000010442E7CCFC929764DE','hex'))

This update threw a duplicate key error that should have been
triggered when the row was inserted. Looking at the row and the
application logs, I verified the conflicting row was inserted in the
new database after it was brought out of recovery mode. (I included
the fed_product's table definition below).

I performed a query qualifying using equals for the data_feed_id and
product_id_from_source to find the original row and the new bogus row
and no rows were returned. I updated the query to qualify using like
'xxxxx%' instead of equals forcing a sequential scan and two, albeit
conflicting, rows were returned.

I ran a query to delete any newly inserted bogus rows (there were 85
in all) and reindexed the fed_product table. Subsequent searches and
inserts against this table work as expected.

I ran queries against other tables and many indexes were returning
zero rows for rows that exist. I have now reindexed the complete
database and everything seems okay.

In the Continuous Archiving Point-In-Time Recovery section of the
docs, one of the caveats listed is:
"Operations on hash indexes are not presently WAL-logged, so replay
will not update these indexes. The recommended workaround is to
manually REINDEX each such index after completing a recovery operation"

Is it possible there are issues with btree indexes being maintained
properly as well? Any other ideas?

Brian Wipf
Clickspace Interactive Inc.
<brian@clickspace.com>

Table "public.fed_product"
  data_feed_id                    | bytea                        | not null
  date_created                    | timestamp without time zone    | not null
  date_modified                    | timestamp without time zone    |
  feeds_to_merchant_product_id    | integer                        |
  feeds_to_product_id                | integer                        |
  product_id                        | bytea                        | not null
  product_id_from_source            | character varying(512)        | not null
Indexes:
     "fed_product_pk" PRIMARY KEY, btree (product_id)
     "fed_product__unique__data_feed_id_prod_id_from_src_idx" UNIQUE,
btree (data_feed_id, product_id_from_source)
     "fed_product__additional_1__idx" btree (product_id_from_source)
     "fed_product__additional_4__idx" btree
(feeds_to_merchant_product_id)
     "fed_product__data_feed_id_fk_idx" btree (data_feed_id)

Re: PITR Recovery and out-of-sync indexes

From

Richard Huxton

Date:

03 October 2007, 04:03:31

Brian Wipf wrote:
> We are running a production server off of a new database that was
> synchronized using PITR recovery. We found that many of the btree
> indexes were out of sync with the underlying data after bringing the new
> server out of recovery mode, but the data itself appeared to be okay.
>
> Both servers have identical Intel processors and both are running 64-bit
> PostgreSQL 8.2.4. The original server is running 64-bit openSUSE 10.2
> (Linux 2.6.18.2-34-default #1 SMP Mon Jul 16 01:16:32 GMT 2007 x86_64
> x86_64 x86_64 GNU/Linux) and the new server is running Mac OS X Leopard
> Server.

This isn't necessarily safe. If your setup isn't *identical* then you
need to do a lot of checking to make sure this will work. In particular
you'd want to make sure that all your ./configure options are compatible
(e.g. --enable-integer-datetimes can change on-disk representations).

You also need to make sure there aren't any differences in behaviour in
any OS libraries used. That's not implausible since you're contrasting
Linux with a BSD-based system.

> In the Continuous Archiving Point-In-Time Recovery section of the docs,
> one of the caveats listed is:
> "Operations on hash indexes are not presently WAL-logged, so replay will
> not update these indexes. The recommended workaround is to manually
> REINDEX each such index after completing a recovery operation"
>
> Is it possible there are issues with btree indexes being maintained
> properly as well? Any other ideas?

If there is a problem then it's a bug. However, it's quite likely that
you're seeing underlying platform differences. Sounds like you want
replication rather than PITR.

--
   Richard Huxton
   Archonet Ltd

Re: PITR Recovery and out-of-sync indexes

From

Tom Lane

Date:

03 October 2007, 11:07:37

Richard Huxton <dev@archonet.com> writes:
> Brian Wipf wrote:
>> Both servers have identical Intel processors and both are running 64-bit
>> PostgreSQL 8.2.4. The original server is running 64-bit openSUSE 10.2
>> (Linux 2.6.18.2-34-default #1 SMP Mon Jul 16 01:16:32 GMT 2007 x86_64
>> x86_64 x86_64 GNU/Linux) and the new server is running Mac OS X Leopard
>> Server.

> This isn't necessarily safe. If your setup isn't *identical* then you
> need to do a lot of checking to make sure this will work.

PG 8.2 does store data in the pg_control file with which it can check
for the most common disk-format-incompatibility problems (to wit,
endiannness, maxalign, and --enable-integer-datetimes).  If Brian has
stumbled on another such foot-gun, it'd be good to identify it so we
can think about adding more checking.

Noting that one of the columns in the corrupted index was varchar,
I am wondering if the culprit could have been a locale/encoding problem
of some sort.  PG tries to enforce the same LC_COLLATE and LC_CTYPE
values (via pg_control entries) but when you are migrating across
widely different operating systems like this, identical spelling of
locale names proves damn near nothing.

What are the settings being used, anyway?  (pg_controldata can tell
you.)  Try using sort(1) to sort the values of product_id_from_source on
both systems, in that locale, and see if you get the same sort ordering.

            regards, tom lane

Re: PITR Recovery and out-of-sync indexes

From

Brian Wipf

Date:

03 October 2007, 15:16:41

On 3-Oct-07, at 8:07 AM, Tom Lane wrote:
> PG 8.2 does store data in the pg_control file with which it can check
> for the most common disk-format-incompatibility problems (to wit,
> endiannness, maxalign, and --enable-integer-datetimes).  If Brian has
> stumbled on another such foot-gun, it'd be good to identify it so we
> can think about adding more checking.
>
> Noting that one of the columns in the corrupted index was varchar,
> I am wondering if the culprit could have been a locale/encoding
> problem
> of some sort.  PG tries to enforce the same LC_COLLATE and LC_CTYPE
> values (via pg_control entries) but when you are migrating across
> widely different operating systems like this, identical spelling of
> locale names proves damn near nothing.
>
> What are the settings being used, anyway?  (pg_controldata can tell
> you.)  Try using sort(1) to sort the values of
> product_id_from_source on
> both systems, in that locale, and see if you get the same sort
> ordering.

PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X, the
value of en_US.utf8 didn't exist, so I created a soft link to
en_US.UTF-8 in the /usr/share/locale/ directory. When I sort the
values of product_id_from_source on both systems using the locales in
this manner I get different orderings:

Linux box:

select product_id_from_source from fed_product order by
product_id_from_source desc limit 5;
  product_id_from_source
------------------------
  ZZring
  ZZR0-70-720
  zzdangle
  ZZC0-68-320 -05
  ZZ538264
(5 rows)

OS X box:

select product_id_from_source from fed_product order by
product_id_from_source desc limit 10;
  product_id_from_source
------------------------
  zzdangle
  zz06
  zz05
  zz04
  zz03
(5 rows)

and all of these rows exist on both databases. The data appears to be
okay. Is it possible the only issue is with indexes?

I can happily live with rebuilding indexes if this is the only
problem I can expect to encounter, and I would still prefer PITR over
replication. We tried PG Pool for replication and the performance was
poor compared to a single standalone server. Slony-I worked better
for us, but it is more difficult to maintain than PG's PITR and a
warm standby is sufficient for us. It would be nice to be able to use
the read-only warm stand-by PITR at some point as well, although with
the different locale orderings, I suppose this wouldn't be possible.

Brian Wipf
ClickSpace Interactive Inc.
<brian@clickspace.com>

Heres the output from pg_controldata on both boxes:

Linux box:
pg_control version number:            822
Catalog version number:               200611241
Database system identifier:           5087840078460068765
Database cluster state:               in production
pg_control last modified:             Wed 03 Oct 2007 11:16:34 AM MDT
Current log file ID:                  1126
Next log file segment:                99
Latest checkpoint location:           466/62000020
Prior checkpoint location:            466/61000020
Latest checkpoint's REDO location:    466/62000020
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/1720940695
Latest checkpoint's NextOID:          506360
Latest checkpoint's NextMultiXactId:  16963
Latest checkpoint's NextMultiOffset:  41383
Time of latest checkpoint:            Wed 03 Oct 2007 11:16:34 AM MDT
Minimum recovery ending location:     0/0
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           en_US.utf8
LC_CTYPE:                             en_US.utf8

OS X box:
pg_control version number:            822
Catalog version number:               200611241
Database system identifier:           5087840078460068765
Database cluster state:               in production
pg_control last modified:             Wed Oct  3 11:25:59 2007
Current log file ID:                  1166
Next log file segment:                48
Latest checkpoint location:           48E/2A09A428
Prior checkpoint location:            48E/251024C8
Latest checkpoint's REDO location:    48E/2A086140
Latest checkpoint's UNDO location:    0/0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/1750418938
Latest checkpoint's NextOID:          530936
Latest checkpoint's NextMultiXactId:  17655
Latest checkpoint's NextMultiOffset:  43050
Time of latest checkpoint:            Wed Oct  3 11:23:31 2007
Minimum recovery ending location:     42B/701FDB0
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Date/time type storage:               floating-point numbers
Maximum length of locale name:        128
LC_COLLATE:                           en_US.utf8
LC_CTYPE:                             en_US.utf8

Re: PITR Recovery and out-of-sync indexes

From

Tom Lane

Date:

03 October 2007, 15:32:19

Brian Wipf <brian@clickspace.com> writes:
> PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X, the
> value of en_US.utf8 didn't exist, so I created a soft link to
> en_US.UTF-8 in the /usr/share/locale/ directory. When I sort the
> values of product_id_from_source on both systems using the locales in
> this manner I get different orderings:

Hmph, hadn't remembered that, but indeed it seems that en_US sorting
is ASCII order, or nearly so, on Darwin.  On Linux it's "dictionary
order", which means case-insensitive, spaces are second class citizens,
and some other strange rules.

Linux:

$ LANG=en_US.utf8 sort zzz
ZZ538264
zz barf
zzdangle
zz echo
ZZring
$

Darwin, same data:

$ LANG=en_US.UTF-8 sort zzz
ZZ538264
ZZring
zz barf
zz echo
zzdangle
$

> I can happily live with rebuilding indexes if this is the only
> problem I can expect to encounter, and I would still prefer PITR over
> replication.

The whole notion scares the daylights out of me.  If you really need
to use PITR between these two particular platforms, use a locale
with common behavior --- C/POSIX would work.

            regards, tom lane

Re: PITR Recovery and out-of-sync indexes

From

Richard Huxton

Date:

03 October 2007, 15:46:49

Tom Lane wrote:
> Brian Wipf <brian@clickspace.com> writes:
>> PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X, the
>> value of en_US.utf8 didn't exist, so I created a soft link to
>> en_US.UTF-8 in the /usr/share/locale/ directory. When I sort the
>> values of product_id_from_source on both systems using the locales in
>> this manner I get different orderings:

>> I can happily live with rebuilding indexes if this is the only
>> problem I can expect to encounter, and I would still prefer PITR over
>> replication.
>
> The whole notion scares the daylights out of me.  If you really need
> to use PITR between these two particular platforms, use a locale
> with common behavior --- C/POSIX would work.

Could you run Linux in a virtual-machine in OS X?

--
   Richard Huxton
   Archonet Ltd

Re: PITR Recovery and out-of-sync indexes

From

Alvaro Herrera

Date:

03 October 2007, 15:59:30

Richard Huxton wrote:
> Tom Lane wrote:
>> Brian Wipf <brian@clickspace.com> writes:
>>> PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X, the  value
>>> of en_US.utf8 didn't exist, so I created a soft link to  en_US.UTF-8 in
>>> the /usr/share/locale/ directory. When I sort the  values of
>>> product_id_from_source on both systems using the locales in  this manner
>>> I get different orderings:
>
>>> I can happily live with rebuilding indexes if this is the only  problem I
>>> can expect to encounter, and I would still prefer PITR over  replication.
>> The whole notion scares the daylights out of me.  If you really need
>> to use PITR between these two particular platforms, use a locale
>> with common behavior --- C/POSIX would work.
>
> Could you run Linux in a virtual-machine in OS X?

I think it would be easier (and more performant) to define a new locale
on OS/X (or on Linux) to match the behavior of the other system.
(Perhaps define a new locale on both, with matching name and matching
behavior).

--
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"

Re: PITR Recovery and out-of-sync indexes

From

Brian Wipf

Date:

03 October 2007, 16:00:37

On 3-Oct-07, at 12:46 PM, Richard Huxton wrote:
> Tom Lane wrote:
>> Brian Wipf <brian@clickspace.com> writes:
>>> PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X,
>>> the  value of en_US.utf8 didn't exist, so I created a soft link
>>> to  en_US.UTF-8 in the /usr/share/locale/ directory. When I sort
>>> the  values of product_id_from_source on both systems using the
>>> locales in  this manner I get different orderings:
>
>>> I can happily live with rebuilding indexes if this is the only
>>> problem I can expect to encounter, and I would still prefer PITR
>>> over  replication.
>> The whole notion scares the daylights out of me.  If you really need
>> to use PITR between these two particular platforms, use a locale
>> with common behavior --- C/POSIX would work.
>
> Could you run Linux in a virtual-machine in OS X?

That's an idea. Performance-wise though, I think we'd be better off
wiping OS X and installing Linux. As an added bonus, we'll be able to
get way better performance out of our Infortrend S16F-R/G1430 Fibre
to SAS RAID box, which isn't getting near the I/O its capable of
under OS X.

Brian Wipf
ClickSpace Interactive Inc.
<brian@clickspace.com>

Re: PITR Recovery and out-of-sync indexes

From

Richard Huxton

Date:

03 October 2007, 16:03:16

Brian Wipf wrote:
> On 3-Oct-07, at 12:46 PM, Richard Huxton wrote:
>> Could you run Linux in a virtual-machine in OS X?
>
> That's an idea. Performance-wise though, I think we'd be better off
> wiping OS X and installing Linux. As an added bonus, we'll be able to
> get way better performance out of our Infortrend S16F-R/G1430 Fibre to
> SAS RAID box, which isn't getting near the I/O its capable of under OS X.

Oh, fair enough. I assumed you had some OSX specific app you were
running on it.

--
   Richard Huxton
   Archonet Ltd

Re: PITR Recovery and out-of-sync indexes

From

Tom Lane

Date:

03 October 2007, 19:17:16

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Richard Huxton wrote:
>> Could you run Linux in a virtual-machine in OS X?

> I think it would be easier (and more performant) to define a new locale
> on OS/X (or on Linux) to match the behavior of the other system.
> (Perhaps define a new locale on both, with matching name and matching
> behavior).

Given that the OP doesn't seem to care about the difference in
behavior between Linux and OS/X interpretations of en_US, I'd think
that using C locale on both would suit him just fine.  (Of course,
that would require initdb on both :-()

            regards, tom lane

Re: PITR Recovery and out-of-sync indexes

From

Simon Riggs

Date:

04 October 2007, 11:14:47

On Tue, 2007-10-02 at 17:11 -0600, Brian Wipf wrote:

> Both servers have identical Intel processors and both are running 64-
> bit PostgreSQL 8.2.4. The original server is running 64-bit openSUSE
> 10.2 (Linux 2.6.18.2-34-default #1 SMP Mon Jul 16 01:16:32 GMT 2007
> x86_64 x86_64 x86_64 GNU/Linux) and the new server is running Mac OS
> X Leopard Server.

The First Commandment is Make Thy Servers Identical, which applies to
OS, OS version, disk layouts/config as well as basic hardware. If
they're not then you're going to get some strange results.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: PITR Recovery and out-of-sync indexes

From

Brian Wipf

Date:

04 October 2007, 12:22:00

On 4-Oct-07, at 8:14 AM, Simon Riggs wrote:
> The First Commandment is Make Thy Servers Identical, which applies to
> OS, OS version, disk layouts/config as well as basic hardware. If
> they're not then you're going to get some strange results.

Other than the corrupt indexes on varchar columns, there appear to be
no problems. The UTF-8 data in our case is all okay. Once we use
compatible locales even the indexes will be okay. Although identical
servers guarantees no issues with PITR, testing may show non-
identical servers may work okay too. In our case, an Intel Xserve
running Mac OS X being the backup to an openSUSE Linux box.

The Apple Xserve is easy to maintain and rock solid reliable. If it
had better performance to its Fibre Channel RAID array, it would be a
better main server too. The Linux box is a better performer, but in
our experience at least, more difficult to maintain when things go
wrong.

Brian Wipf
ClickSpace Interactive Inc.
<brian@clickspace.com>

Re: PITR Recovery and out-of-sync indexes

From

Simon Riggs

Date:

04 October 2007, 12:33:11

On Thu, 2007-10-04 at 09:21 -0600, Brian Wipf wrote:
> On 4-Oct-07, at 8:14 AM, Simon Riggs wrote:
> > The First Commandment is Make Thy Servers Identical, which applies to
> > OS, OS version, disk layouts/config as well as basic hardware. If
> > they're not then you're going to get some strange results.
>
> Other than the corrupt indexes on varchar columns, there appear to be
> no problems. The UTF-8 data in our case is all okay. Once we use
> compatible locales even the indexes will be okay. Although identical
> servers guarantees no issues with PITR, testing may show non-
> identical servers may work okay too. In our case, an Intel Xserve
> running Mac OS X being the backup to an openSUSE Linux box.
>
> The Apple Xserve is easy to maintain and rock solid reliable. If it
> had better performance to its Fibre Channel RAID array, it would be a
> better main server too. The Linux box is a better performer, but in
> our experience at least, more difficult to maintain when things go
> wrong.

It is a small gain for much risk. If you have weird problems, remember
that you are doing something we (or at least I) said not to... even if
it works for you it may not do for someone else that tries.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

Re: PITR Recovery and out-of-sync indexes

From

Tom Lane

Date:

04 October 2007, 14:23:59

Simon Riggs <simon@2ndquadrant.com> writes:
> On Thu, 2007-10-04 at 09:21 -0600, Brian Wipf wrote:
>> The Apple Xserve is easy to maintain and rock solid reliable. If it
>> had better performance to its Fibre Channel RAID array, it would be a
>> better main server too. The Linux box is a better performer, but in
>> our experience at least, more difficult to maintain when things go
>> wrong.

> It is a small gain for much risk. If you have weird problems, remember
> that you are doing something we (or at least I) said not to... even if
> it works for you it may not do for someone else that tries.

Well, the other side of the coin is that using a nonidentical backup
server might protect him against some types of common-mode failures.
Consider for example a worm that can penetrate only one of the two OSes.

He needs to get the locale sort orders in sync, but beyond that I
can't think of any hard reason why he can't use this combination.

BTW, the reasons I don't like the "we'll reindex after we bring up
the backup server" theory are:
1. I'm not convinced that the locale mismatch couldn't cause a failure
   while following the WAL log.
2. It's the sort of step that you are certain to forget when it's
   four AM and you're really in urgent need of that backup server.
Fix the locales, instead.

            regards, tom lane