Thread: PITR Recovery and out-of-sync indexes
We are running a production server off of a new database that was synchronized using PITR recovery. We found that many of the btree indexes were out of sync with the underlying data after bringing the new server out of recovery mode, but the data itself appeared to be okay. Both servers have identical Intel processors and both are running 64- bit PostgreSQL 8.2.4. The original server is running 64-bit openSUSE 10.2 (Linux 2.6.18.2-34-default #1 SMP Mon Jul 16 01:16:32 GMT 2007 x86_64 x86_64 x86_64 GNU/Linux) and the new server is running Mac OS X Leopard Server. The first tip-off that things were amiss was this error in the log: [2007-10-02 01:12:27 MDT] postgres@ssprod host:192.168.0.54(53976) ERROR: duplicate key violates unique constraint "fed_product__unique__data_feed_id_prod_id_from_src_idx" [2007-10-02 01:12:27 MDT] postgres@ssprod host:192.168.0.54(53976) STATEMENT: UPDATE FED_PRODUCT SET FEEDS_TO_MERCHANT_PRODUCT_ID = 5108509 WHERE (PRODUCT_ID = decode ('0000C0A80036000007D22F00000001155F68741EFE1555FB','hex') AND DATA_FEED_ID = decode ('0000C0A80012000007D718000000010442E7CCFC929764DE','hex')) This update threw a duplicate key error that should have been triggered when the row was inserted. Looking at the row and the application logs, I verified the conflicting row was inserted in the new database after it was brought out of recovery mode. (I included the fed_product's table definition below). I performed a query qualifying using equals for the data_feed_id and product_id_from_source to find the original row and the new bogus row and no rows were returned. I updated the query to qualify using like 'xxxxx%' instead of equals forcing a sequential scan and two, albeit conflicting, rows were returned. I ran a query to delete any newly inserted bogus rows (there were 85 in all) and reindexed the fed_product table. Subsequent searches and inserts against this table work as expected. I ran queries against other tables and many indexes were returning zero rows for rows that exist. I have now reindexed the complete database and everything seems okay. In the Continuous Archiving Point-In-Time Recovery section of the docs, one of the caveats listed is: "Operations on hash indexes are not presently WAL-logged, so replay will not update these indexes. The recommended workaround is to manually REINDEX each such index after completing a recovery operation" Is it possible there are issues with btree indexes being maintained properly as well? Any other ideas? Brian Wipf Clickspace Interactive Inc. <brian@clickspace.com> Table "public.fed_product" data_feed_id | bytea | not null date_created | timestamp without time zone | not null date_modified | timestamp without time zone | feeds_to_merchant_product_id | integer | feeds_to_product_id | integer | product_id | bytea | not null product_id_from_source | character varying(512) | not null Indexes: "fed_product_pk" PRIMARY KEY, btree (product_id) "fed_product__unique__data_feed_id_prod_id_from_src_idx" UNIQUE, btree (data_feed_id, product_id_from_source) "fed_product__additional_1__idx" btree (product_id_from_source) "fed_product__additional_4__idx" btree (feeds_to_merchant_product_id) "fed_product__data_feed_id_fk_idx" btree (data_feed_id)
Brian Wipf wrote: > We are running a production server off of a new database that was > synchronized using PITR recovery. We found that many of the btree > indexes were out of sync with the underlying data after bringing the new > server out of recovery mode, but the data itself appeared to be okay. > > Both servers have identical Intel processors and both are running 64-bit > PostgreSQL 8.2.4. The original server is running 64-bit openSUSE 10.2 > (Linux 2.6.18.2-34-default #1 SMP Mon Jul 16 01:16:32 GMT 2007 x86_64 > x86_64 x86_64 GNU/Linux) and the new server is running Mac OS X Leopard > Server. This isn't necessarily safe. If your setup isn't *identical* then you need to do a lot of checking to make sure this will work. In particular you'd want to make sure that all your ./configure options are compatible (e.g. --enable-integer-datetimes can change on-disk representations). You also need to make sure there aren't any differences in behaviour in any OS libraries used. That's not implausible since you're contrasting Linux with a BSD-based system. > In the Continuous Archiving Point-In-Time Recovery section of the docs, > one of the caveats listed is: > "Operations on hash indexes are not presently WAL-logged, so replay will > not update these indexes. The recommended workaround is to manually > REINDEX each such index after completing a recovery operation" > > Is it possible there are issues with btree indexes being maintained > properly as well? Any other ideas? If there is a problem then it's a bug. However, it's quite likely that you're seeing underlying platform differences. Sounds like you want replication rather than PITR. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > Brian Wipf wrote: >> Both servers have identical Intel processors and both are running 64-bit >> PostgreSQL 8.2.4. The original server is running 64-bit openSUSE 10.2 >> (Linux 2.6.18.2-34-default #1 SMP Mon Jul 16 01:16:32 GMT 2007 x86_64 >> x86_64 x86_64 GNU/Linux) and the new server is running Mac OS X Leopard >> Server. > This isn't necessarily safe. If your setup isn't *identical* then you > need to do a lot of checking to make sure this will work. PG 8.2 does store data in the pg_control file with which it can check for the most common disk-format-incompatibility problems (to wit, endiannness, maxalign, and --enable-integer-datetimes). If Brian has stumbled on another such foot-gun, it'd be good to identify it so we can think about adding more checking. Noting that one of the columns in the corrupted index was varchar, I am wondering if the culprit could have been a locale/encoding problem of some sort. PG tries to enforce the same LC_COLLATE and LC_CTYPE values (via pg_control entries) but when you are migrating across widely different operating systems like this, identical spelling of locale names proves damn near nothing. What are the settings being used, anyway? (pg_controldata can tell you.) Try using sort(1) to sort the values of product_id_from_source on both systems, in that locale, and see if you get the same sort ordering. regards, tom lane
On 3-Oct-07, at 8:07 AM, Tom Lane wrote: > PG 8.2 does store data in the pg_control file with which it can check > for the most common disk-format-incompatibility problems (to wit, > endiannness, maxalign, and --enable-integer-datetimes). If Brian has > stumbled on another such foot-gun, it'd be good to identify it so we > can think about adding more checking. > > Noting that one of the columns in the corrupted index was varchar, > I am wondering if the culprit could have been a locale/encoding > problem > of some sort. PG tries to enforce the same LC_COLLATE and LC_CTYPE > values (via pg_control entries) but when you are migrating across > widely different operating systems like this, identical spelling of > locale names proves damn near nothing. > > What are the settings being used, anyway? (pg_controldata can tell > you.) Try using sort(1) to sort the values of > product_id_from_source on > both systems, in that locale, and see if you get the same sort > ordering. PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X, the value of en_US.utf8 didn't exist, so I created a soft link to en_US.UTF-8 in the /usr/share/locale/ directory. When I sort the values of product_id_from_source on both systems using the locales in this manner I get different orderings: Linux box: select product_id_from_source from fed_product order by product_id_from_source desc limit 5; product_id_from_source ------------------------ ZZring ZZR0-70-720 zzdangle ZZC0-68-320 -05 ZZ538264 (5 rows) OS X box: select product_id_from_source from fed_product order by product_id_from_source desc limit 10; product_id_from_source ------------------------ zzdangle zz06 zz05 zz04 zz03 (5 rows) and all of these rows exist on both databases. The data appears to be okay. Is it possible the only issue is with indexes? I can happily live with rebuilding indexes if this is the only problem I can expect to encounter, and I would still prefer PITR over replication. We tried PG Pool for replication and the performance was poor compared to a single standalone server. Slony-I worked better for us, but it is more difficult to maintain than PG's PITR and a warm standby is sufficient for us. It would be nice to be able to use the read-only warm stand-by PITR at some point as well, although with the different locale orderings, I suppose this wouldn't be possible. Brian Wipf ClickSpace Interactive Inc. <brian@clickspace.com> Heres the output from pg_controldata on both boxes: Linux box: pg_control version number: 822 Catalog version number: 200611241 Database system identifier: 5087840078460068765 Database cluster state: in production pg_control last modified: Wed 03 Oct 2007 11:16:34 AM MDT Current log file ID: 1126 Next log file segment: 99 Latest checkpoint location: 466/62000020 Prior checkpoint location: 466/61000020 Latest checkpoint's REDO location: 466/62000020 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 0/1720940695 Latest checkpoint's NextOID: 506360 Latest checkpoint's NextMultiXactId: 16963 Latest checkpoint's NextMultiOffset: 41383 Time of latest checkpoint: Wed 03 Oct 2007 11:16:34 AM MDT Minimum recovery ending location: 0/0 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: en_US.utf8 LC_CTYPE: en_US.utf8 OS X box: pg_control version number: 822 Catalog version number: 200611241 Database system identifier: 5087840078460068765 Database cluster state: in production pg_control last modified: Wed Oct 3 11:25:59 2007 Current log file ID: 1166 Next log file segment: 48 Latest checkpoint location: 48E/2A09A428 Prior checkpoint location: 48E/251024C8 Latest checkpoint's REDO location: 48E/2A086140 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 0/1750418938 Latest checkpoint's NextOID: 530936 Latest checkpoint's NextMultiXactId: 17655 Latest checkpoint's NextMultiOffset: 43050 Time of latest checkpoint: Wed Oct 3 11:23:31 2007 Minimum recovery ending location: 42B/701FDB0 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: en_US.utf8 LC_CTYPE: en_US.utf8
Brian Wipf <brian@clickspace.com> writes: > PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X, the > value of en_US.utf8 didn't exist, so I created a soft link to > en_US.UTF-8 in the /usr/share/locale/ directory. When I sort the > values of product_id_from_source on both systems using the locales in > this manner I get different orderings: Hmph, hadn't remembered that, but indeed it seems that en_US sorting is ASCII order, or nearly so, on Darwin. On Linux it's "dictionary order", which means case-insensitive, spaces are second class citizens, and some other strange rules. Linux: $ LANG=en_US.utf8 sort zzz ZZ538264 zz barf zzdangle zz echo ZZring $ Darwin, same data: $ LANG=en_US.UTF-8 sort zzz ZZ538264 ZZring zz barf zz echo zzdangle $ > I can happily live with rebuilding indexes if this is the only > problem I can expect to encounter, and I would still prefer PITR over > replication. The whole notion scares the daylights out of me. If you really need to use PITR between these two particular platforms, use a locale with common behavior --- C/POSIX would work. regards, tom lane
Tom Lane wrote: > Brian Wipf <brian@clickspace.com> writes: >> PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X, the >> value of en_US.utf8 didn't exist, so I created a soft link to >> en_US.UTF-8 in the /usr/share/locale/ directory. When I sort the >> values of product_id_from_source on both systems using the locales in >> this manner I get different orderings: >> I can happily live with rebuilding indexes if this is the only >> problem I can expect to encounter, and I would still prefer PITR over >> replication. > > The whole notion scares the daylights out of me. If you really need > to use PITR between these two particular platforms, use a locale > with common behavior --- C/POSIX would work. Could you run Linux in a virtual-machine in OS X? -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Tom Lane wrote: >> Brian Wipf <brian@clickspace.com> writes: >>> PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X, the value >>> of en_US.utf8 didn't exist, so I created a soft link to en_US.UTF-8 in >>> the /usr/share/locale/ directory. When I sort the values of >>> product_id_from_source on both systems using the locales in this manner >>> I get different orderings: > >>> I can happily live with rebuilding indexes if this is the only problem I >>> can expect to encounter, and I would still prefer PITR over replication. >> The whole notion scares the daylights out of me. If you really need >> to use PITR between these two particular platforms, use a locale >> with common behavior --- C/POSIX would work. > > Could you run Linux in a virtual-machine in OS X? I think it would be easier (and more performant) to define a new locale on OS/X (or on Linux) to match the behavior of the other system. (Perhaps define a new locale on both, with matching name and matching behavior). -- Alvaro Herrera http://www.advogato.org/person/alvherre "No hay hombre que no aspire a la plenitud, es decir, la suma de experiencias de que un hombre es capaz"
On 3-Oct-07, at 12:46 PM, Richard Huxton wrote: > Tom Lane wrote: >> Brian Wipf <brian@clickspace.com> writes: >>> PG tried to enforce the same LC_COLLATE and LC_CTYPE. On OS X, >>> the value of en_US.utf8 didn't exist, so I created a soft link >>> to en_US.UTF-8 in the /usr/share/locale/ directory. When I sort >>> the values of product_id_from_source on both systems using the >>> locales in this manner I get different orderings: > >>> I can happily live with rebuilding indexes if this is the only >>> problem I can expect to encounter, and I would still prefer PITR >>> over replication. >> The whole notion scares the daylights out of me. If you really need >> to use PITR between these two particular platforms, use a locale >> with common behavior --- C/POSIX would work. > > Could you run Linux in a virtual-machine in OS X? That's an idea. Performance-wise though, I think we'd be better off wiping OS X and installing Linux. As an added bonus, we'll be able to get way better performance out of our Infortrend S16F-R/G1430 Fibre to SAS RAID box, which isn't getting near the I/O its capable of under OS X. Brian Wipf ClickSpace Interactive Inc. <brian@clickspace.com>
Brian Wipf wrote: > On 3-Oct-07, at 12:46 PM, Richard Huxton wrote: >> Could you run Linux in a virtual-machine in OS X? > > That's an idea. Performance-wise though, I think we'd be better off > wiping OS X and installing Linux. As an added bonus, we'll be able to > get way better performance out of our Infortrend S16F-R/G1430 Fibre to > SAS RAID box, which isn't getting near the I/O its capable of under OS X. Oh, fair enough. I assumed you had some OSX specific app you were running on it. -- Richard Huxton Archonet Ltd
Alvaro Herrera <alvherre@commandprompt.com> writes: > Richard Huxton wrote: >> Could you run Linux in a virtual-machine in OS X? > I think it would be easier (and more performant) to define a new locale > on OS/X (or on Linux) to match the behavior of the other system. > (Perhaps define a new locale on both, with matching name and matching > behavior). Given that the OP doesn't seem to care about the difference in behavior between Linux and OS/X interpretations of en_US, I'd think that using C locale on both would suit him just fine. (Of course, that would require initdb on both :-() regards, tom lane
On Tue, 2007-10-02 at 17:11 -0600, Brian Wipf wrote: > Both servers have identical Intel processors and both are running 64- > bit PostgreSQL 8.2.4. The original server is running 64-bit openSUSE > 10.2 (Linux 2.6.18.2-34-default #1 SMP Mon Jul 16 01:16:32 GMT 2007 > x86_64 x86_64 x86_64 GNU/Linux) and the new server is running Mac OS > X Leopard Server. The First Commandment is Make Thy Servers Identical, which applies to OS, OS version, disk layouts/config as well as basic hardware. If they're not then you're going to get some strange results. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On 4-Oct-07, at 8:14 AM, Simon Riggs wrote: > The First Commandment is Make Thy Servers Identical, which applies to > OS, OS version, disk layouts/config as well as basic hardware. If > they're not then you're going to get some strange results. Other than the corrupt indexes on varchar columns, there appear to be no problems. The UTF-8 data in our case is all okay. Once we use compatible locales even the indexes will be okay. Although identical servers guarantees no issues with PITR, testing may show non- identical servers may work okay too. In our case, an Intel Xserve running Mac OS X being the backup to an openSUSE Linux box. The Apple Xserve is easy to maintain and rock solid reliable. If it had better performance to its Fibre Channel RAID array, it would be a better main server too. The Linux box is a better performer, but in our experience at least, more difficult to maintain when things go wrong. Brian Wipf ClickSpace Interactive Inc. <brian@clickspace.com>
On Thu, 2007-10-04 at 09:21 -0600, Brian Wipf wrote: > On 4-Oct-07, at 8:14 AM, Simon Riggs wrote: > > The First Commandment is Make Thy Servers Identical, which applies to > > OS, OS version, disk layouts/config as well as basic hardware. If > > they're not then you're going to get some strange results. > > Other than the corrupt indexes on varchar columns, there appear to be > no problems. The UTF-8 data in our case is all okay. Once we use > compatible locales even the indexes will be okay. Although identical > servers guarantees no issues with PITR, testing may show non- > identical servers may work okay too. In our case, an Intel Xserve > running Mac OS X being the backup to an openSUSE Linux box. > > The Apple Xserve is easy to maintain and rock solid reliable. If it > had better performance to its Fibre Channel RAID array, it would be a > better main server too. The Linux box is a better performer, but in > our experience at least, more difficult to maintain when things go > wrong. It is a small gain for much risk. If you have weird problems, remember that you are doing something we (or at least I) said not to... even if it works for you it may not do for someone else that tries. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: > On Thu, 2007-10-04 at 09:21 -0600, Brian Wipf wrote: >> The Apple Xserve is easy to maintain and rock solid reliable. If it >> had better performance to its Fibre Channel RAID array, it would be a >> better main server too. The Linux box is a better performer, but in >> our experience at least, more difficult to maintain when things go >> wrong. > It is a small gain for much risk. If you have weird problems, remember > that you are doing something we (or at least I) said not to... even if > it works for you it may not do for someone else that tries. Well, the other side of the coin is that using a nonidentical backup server might protect him against some types of common-mode failures. Consider for example a worm that can penetrate only one of the two OSes. He needs to get the locale sort orders in sync, but beyond that I can't think of any hard reason why he can't use this combination. BTW, the reasons I don't like the "we'll reindex after we bring up the backup server" theory are: 1. I'm not convinced that the locale mismatch couldn't cause a failure while following the WAL log. 2. It's the sort of step that you are certain to forget when it's four AM and you're really in urgent need of that backup server. Fix the locales, instead. regards, tom lane