Thread: db replication and errors
Hello everyone, The scene : Currently, we r using, Postgresql v 7.3, on RedHat 9 machines. We provide a hot standby at all times for the Primary server. So, we have to maintain a duplicate of the entire database. The problem arises, when the standby boots. I wanted to know, whenever the standby boots up, what all files shud it copy frm the Primary. All files, under the data directory?? Viz., global, base, clog and xlog? What are those pg_internal.init, pgstat.stat files? What's the difference between global and base directories? If the async feature is used on the primary, shud we copy on the xlog and clog files onto the backup as well? As of now, we shut down postmaster, on the Primary whenever the standby boots up, and then copy all the above said files, from the primary to the standby. Duz this ensure, all data is written onto the disk b4 postmaster shut downs? Quite a few times, I have encountered errors, like, xlogflush is not satisfied, bogus attribute number for <some num , eg. -2>, catalog is missing, cache lookup failed. I am pretty sure, these are related to the copying of files I described above. Any links for me to find more abt these files, and these typical error conditions encountered? The archive lists, did give me some information abt these errors. But cudnt make out much. These and more... one at a time. Regards -- Benjamin Jacob. Disclaimer : ------------------------------------------------------------------------------ If you are not the intended recipient of this transmission to whom it is addressed, or have received this transmission in error, you are hereby notified that any dissemination, distribution or copying of this transmission is strictly prohibited. Please notify us immediately and delete this e-mail from your system. The sender does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission, which cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, arrive at wrong address or contain viruses. If verification is required please request a hard-copy version. This e-mail contains only the personal opinions of the sender and does not represent an official communication from NetYantra of any manner. ------------------------------------------------------------------------------
Benjamin wrote: > > Hello everyone, > > The scene : > Currently, we r using, Postgresql v 7.3, on RedHat 9 > machines. > > We provide a hot standby at all times for the Primary > server. > > So, we have to maintain a duplicate of the entire > database. > The problem arises, when the standby boots. > > I wanted to know, whenever the standby boots up, what > all files shud it copy frm the Primary. > All files, under the data directory?? > Viz., global, base, clog and xlog? Yes - all files. > What are those pg_internal.init, pgstat.stat files? > What's the difference between global and base > directories? The "base" directory is the main data store - you can see what the numbers refer to by using the oid2name utility in contrib/ > If the async feature is used on the primary, shud we > copy on the xlog and clog files onto the backup as > well? What is the "async feature"? > As of now, we shut down postmaster, on the Primary > whenever the standby boots up, and then copy all the above said files, > from the primary to the standby. > > Duz this ensure, all data is written onto the disk b4 > postmaster shut downs? Provided the postmaster shuts down cleanly, and you've synced to disk then all should be OK. > Quite a few times, I have encountered errors, like, xlogflush is not > satisfied, > bogus attribute number for <some num , eg. -2>, catalog is missing, > cache lookup failed. One of 4 things could be at fault: 1. Files aren't being sync'ed to disk 2. You aren't copying the right files 3. The versions of PG don't match 4. The platforms you are running on are different (e.g. Sun-Sparc vs x86) It might be worth looking at "slony" to run a replication setup, rather than copying files. -- Richard Huxton Archonet Ltd
Thanx for the pointers, Richard. >> If the async feature is used on the primary, shud we >> copy on the xlog and clog files onto the backup as >> well? > > > What is the "async feature"? I meant fsync. I meant to ask, if FSYNC is enabled, is all pending data written onto the disk? > >> As of now, we shut down postmaster, on the Primary >> whenever the standby boots up, and then copy all the above said files, >> from the primary to the standby. >> >> Duz this ensure, all data is written onto the disk b4 >> postmaster shut downs? > > > Provided the postmaster shuts down cleanly, and you've synced to disk > then all should be OK. Wot decides this "sync" n how do i check it? >> Quite a few times, I have encountered errors, like, xlogflush is not >> satisfied, >> bogus attribute number for <some num , eg. -2>, catalog is missing, >> cache lookup failed. > > > One of 4 things could be at fault: > 1. Files aren't being sync'ed to disk > 2. You aren't copying the right files > 3. The versions of PG don't match > 4. The platforms you are running on are different (e.g. Sun-Sparc vs > x86) The latter two are not the case, I use Redhat 9 on all the machines, with PG VERSION 7.3 The former two, yes, I agree, cud be the cause of problems. I would like to know, where to look on such errors.eg for cache lookup failure, wot triggers that??how do i get abt tracking down the issue? > > It might be worth looking at "slony" to run a replication setup, > rather than copying files. Did think of slony previously. But slony has the limitation of not being able to replicate large objects, rite? How large are these large objects supposed to be? Run-time replication is not an issue, as I have other mechanisms for that, which are part of this server, and they work fine. The only problem I am facing now, is of the case when the standby is booting up. I have to ensure an absolutely correct copying of files. I want to know how do i go about diagnosing problems, if and when they arise. I have come across pg_filedump. But cant really make out much frm the output that pg_filedump produces. A long way to go... I agree. -- Benjamin Jacob. Disclaimer : ------------------------------------------------------------------------------ If you are not the intended recipient of this transmission to whom it is addressed, or have received this transmission in error, you are hereby notified that any dissemination, distribution or copying of this transmission is strictly prohibited. Please notify us immediately and delete this e-mail from your system. The sender does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission, which cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, arrive at wrong address or contain viruses. If verification is required please request a hard-copy version. This e-mail contains only the personal opinions of the sender and does not represent an official communication from NetYantra of any manner. ------------------------------------------------------------------------------
Benjamin wrote: > Thanx for the pointers, Richard. > >>> If the async feature is used on the primary, shud we >>> copy on the xlog and clog files onto the backup as >>> well? >> >> >> >> What is the "async feature"? > > > I meant fsync. > I meant to ask, if FSYNC is enabled, is all pending data written onto > the disk? Yes. Turn it on if you want your data to survive a power failure. Oh, and make sure your disks aren't write-caching even when you sync. Search the list archives for cache and IDE for plenty of discussion. >> >>> As of now, we shut down postmaster, on the Primary >>> whenever the standby boots up, and then copy all the above said files, >>> from the primary to the standby. >>> >>> Duz this ensure, all data is written onto the disk b4 >>> postmaster shut downs? >> >> Provided the postmaster shuts down cleanly, and you've synced to disk >> then all should be OK. > > > Wot decides this "sync" n how do i check it? > >>> Quite a few times, I have encountered errors, like, xlogflush is not >>> satisfied, >>> bogus attribute number for <some num , eg. -2>, catalog is missing, >>> cache lookup failed. >> >> >> One of 4 things could be at fault: >> 1. Files aren't being sync'ed to disk >> 2. You aren't copying the right files >> 3. The versions of PG don't match >> 4. The platforms you are running on are different (e.g. Sun-Sparc vs >> x86) > > > The latter two are not the case, I use Redhat 9 on all the machines, > with PG VERSION 7.3 > The former two, yes, I agree, cud be the cause of problems. > > I would like to know, where to look on such errors.eg for cache lookup > failure, wot triggers that??how do i get abt tracking down the issue? A cache lookup failure is usually due to the OID of an object changing, where you drop/recreate a temporary table and a function is still referring to its old OID. In your case, I'm not sure what's causing the problem. It could be you've not copied the table definitions over and you've updated your schema on the original machine. >> It might be worth looking at "slony" to run a replication setup, >> rather than copying files. > > Did think of slony previously. But slony has the limitation of not being > able to replicate large objects, rite? > How large are these large objects supposed to be? Um, large as you like. See the manuals for discussion of large object support. I'm guessing you're not using it. > Run-time replication is not an issue, as I have other mechanisms for > that, which are part of this server, and they work fine. > The only problem I am facing now, is of the case when the standby is > booting up. I have to ensure an absolutely correct copying of files. If you've got a replicated version of the database why bother copying the files? Thinking about it, why copy the files at all anyway? If the server is still running why has PG stopped? > I want to know how do i go about diagnosing problems, if and when they > arise. > I have come across pg_filedump. But cant really make out much frm the > output that pg_filedump produces. If you have *any* problems, then the file copy didn't work. Bin it and restore from backup. It's only when you don't have a backup that it's worse messing with pg_filedump. -- Richard Huxton Archonet Ltd
> Run-time replication is not an issue, as I have other mechanisms for > that, which are part of this server, and they work fine. > >> The only problem I am facing now, is of the case when the standby is >> booting up. I have to ensure an absolutely correct copying of files. > > > If you've got a replicated version of the database why bother copying > the files? > Thinking about it, why copy the files at all anyway? If the server is > still running why has PG stopped? Well, the standby has to be a replica. So there would be occasion, say on a machine crash or something, when the standby reboots, or it is installed much later than the Primary. So when it is booted, I have to make sure it is an exact replica of the Primary. > >> I want to know how do i go about diagnosing problems, if and when >> they arise. >> I have come across pg_filedump. But cant really make out much frm the >> output that pg_filedump produces. > > > If you have *any* problems, then the file copy didn't work. Bin it and > restore from backup. It's only when you don't have a backup that it's > worse messing with pg_filedump. > will work on that for sure.. Now, the server as such, and the copying mechanism itself has been sort of cleaned up, and possible error points in the copying of files are being logged. Now things seem a lot brighter!! thanx for the help. -- Benjamin Jacob. Disclaimer : ------------------------------------------------------------------------------ If you are not the intended recipient of this transmission to whom it is addressed, or have received this transmission in error, you are hereby notified that any dissemination, distribution or copying of this transmission is strictly prohibited. Please notify us immediately and delete this e-mail from your system. The sender does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission, which cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, arrive at wrong address or contain viruses. If verification is required please request a hard-copy version. This e-mail contains only the personal opinions of the sender and does not represent an official communication from NetYantra of any manner. ------------------------------------------------------------------------------