Thread: Need Help Recovering from Botched Upgrade Attempt
Despite trying to be careful, I managed to mess up the upgrade from -8.1.4 to -8.3.3 on my Slackware-11.0 server/workstation. I expect that someone here will see my error and point me in the right direction to recover a working dbms. Here's what I did: 1.) As a user, I ran pg_dumpall on version 8.1.4 and had that written to /usr4/postgres-backups/. 2.) Created /usr4/pgsql_old/, and copied all of /var/lib/pgsql/ there ('cp -a /var/lib/pgsql/* .') 3.) In /usr4/pgsql_old/data/postgresql.conf, changed port to 5466. 4.) As root, ran '/etc/rc.d/rc.postgresql stop', which reported that's what it did. 5.) Built postgresql-8.3.3 using the SlackBuild script, then ran 'upgradepkg postgresql-8.3.3*tgz'; other than reporting not finding an expected pid file, that went smoothly. 6.) Tried starting the new version, /etc/rc.d/rc.postgresql start. It reported that the server was already running, but that it started anyway. 7.) A search of the process list could not find the postmaster process, or any other indication of pgsql running. Trying to run pgsql as a user also failed. Tried, as user postgres, to run 'pg_sql start' but that also failed. TIA, Rich -- Richard B. Shepard, Ph.D. | Integrity Credibility Applied Ecosystem Services, Inc. | Innovation <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Rich Shepard wrote: > Despite trying to be careful, I managed to mess up the upgrade from -8.1.4 > to -8.3.3 on my Slackware-11.0 server/workstation. I expect that someone > here will see my error and point me in the right direction to recover a > working dbms. > > Here's what I did: > > 1.) As a user, I ran pg_dumpall on version 8.1.4 and had that written to > /usr4/postgres-backups/. > > 2.) Created /usr4/pgsql_old/, and copied all of /var/lib/pgsql/ there ('cp > -a /var/lib/pgsql/* .') > > 3.) In /usr4/pgsql_old/data/postgresql.conf, changed port to 5466. > > 4.) As root, ran '/etc/rc.d/rc.postgresql stop', which reported that's > what it did. > You copied the files without stopping the database? move 4 to 2. > 5.) Built postgresql-8.3.3 using the SlackBuild script, then ran > 'upgradepkg postgresql-8.3.3*tgz'; other than reporting not finding an > expected pid file, that went smoothly. > Is there an initdb in here somewhere? Or is the 8.3 server trying to start with an 8.1 file structure? > 6.) Tried starting the new version, /etc/rc.d/rc.postgresql start. It > reported that the server was already running, but that it started anyway. > klint. -- Klint Gore Database Manager Sheep CRC A.G.B.U. University of New England Armidale NSW 2350 Ph: 02 6773 3789 Fax: 02 6773 3266 EMail: kgore4@une.edu.au
On Wed, 18 Jun 2008, Klint Gore wrote: > You copied the files without stopping the database? move 4 to 2. Klint, Yes, actually. There was no activity on any of the databases. > Is there an initdb in here somewhere? Or is the 8.3 server trying to start > with an 8.1 file structure? Ah, yes. I missed stating that. I switched to user postgres and ran: postgres@salmo:/var/lib/pgsql$ initdb -D ./data/ The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locales COLLATE: C CTYPE: en_US MESSAGES: en_US MONETARY: en_US NUMERIC: en_US TIME: en_US could not determine encoding for locale "en_US": codeset is "ANSI_X3.4-1968" initdb: could not find suitable encoding for locale en_US Rerun initdb with the -E option. Try "initdb --help" for more information. But, using the -E option with en_US tells me that it's not a valid locale. Thanks, Rich -- Richard B. Shepard, Ph.D. | Integrity Credibility Applied Ecosystem Services, Inc. | Innovation <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Wed, 18 Jun 2008, Klint Gore wrote: >> 5.) Built postgresql-8.3.3 using the SlackBuild script, then ran >> 'upgradepkg postgresql-8.3.3*tgz'; other than reporting not finding an >> expected pid file, that went smoothly. >> > Is there an initdb in here somewhere? Or is the 8.3 server trying to start > with an 8.1 file structure? Klint, Backed up a couple of steps, and tried again. Removed postgresql-8.3.3; deleted all contents of /var/lib/pgsql/data (because initdb is supposed to create the contents, if I correctly read the Postgresql book); re-installed postgresql-8.3.3; ran (as user postgres) 'initdb -D /var/lib/pgsql/data'. Nothing! I've really FUBARed this and don't understand how, or what to do to recover. Thanks, Rich -- Richard B. Shepard, Ph.D. | Integrity Credibility Applied Ecosystem Services, Inc. | Innovation <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Tuesday 17 June 2008 7:18 pm, Rich Shepard wrote: > On Wed, 18 Jun 2008, Klint Gore wrote: > >> 5.) Built postgresql-8.3.3 using the SlackBuild script, then ran > >> 'upgradepkg postgresql-8.3.3*tgz'; other than reporting not finding an > >> expected pid file, that went smoothly. > > > > Is there an initdb in here somewhere? Or is the 8.3 server trying to > > start with an 8.1 file structure? > > Klint, > > Backed up a couple of steps, and tried again. Removed postgresql-8.3.3; > deleted all contents of /var/lib/pgsql/data (because initdb is supposed to > create the contents, if I correctly read the Postgresql book); re-installed > postgresql-8.3.3; ran (as user postgres) 'initdb -D /var/lib/pgsql/data'. > Nothing! Define nothing. When you ran initdb there where no messages? Also when in doubt I use the full path /var/lib/pgsql/bin/initdb as you have an old version of initdb present in the old version directory you copied. When you have two versions present at the same time it is easy to get cross reference problems. > > I've really FUBARed this and don't understand how, or what to do to > recover. > > Thanks, > > Rich > > -- > Richard B. Shepard, Ph.D. | Integrity Credibility > Applied Ecosystem Services, Inc. | Innovation > <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863 -- Adrian Klaver aklaver@comcast.net
Rich Shepard wrote: > On Wed, 18 Jun 2008, Klint Gore wrote: > > >> 5.) Built postgresql-8.3.3 using the SlackBuild script, then ran > >> 'upgradepkg postgresql-8.3.3*tgz'; other than reporting not finding an > >> expected pid file, that went smoothly. > >> > > Is there an initdb in here somewhere? Or is the 8.3 server trying to start > > with an 8.1 file structure? > > Klint, > > Backed up a couple of steps, and tried again. Removed postgresql-8.3.3; > deleted all contents of /var/lib/pgsql/data (because initdb is supposed to > create the contents, if I correctly read the Postgresql book); re-installed > postgresql-8.3.3; ran (as user postgres) 'initdb -D /var/lib/pgsql/data'. > Nothing! > > I've really FUBARed this and don't understand how, or what to do to > recover. > > Thanks, > > Make sure that initdb is the version you want initdb --version then initdb -E UTF8 -D /var/lib/pgsql/data then post the output of that. kllint. -- Klint Gore Database Manager Sheep CRC A.G.B.U. University of New England Armidale NSW 2350 Ph: 02 6773 3789 Fax: 02 6773 3266 EMail: kgore4@une.edu.au
Klint Gore wrote: > Rich Shepard wrote: >> Despite trying to be careful, I managed to mess up the upgrade from >> -8.1.4 >> to -8.3.3 on my Slackware-11.0 server/workstation. I expect that someone >> here will see my error and point me in the right direction to recover a >> working dbms. >> >> Here's what I did: >> >> 1.) As a user, I ran pg_dumpall on version 8.1.4 and had that >> written to >> /usr4/postgres-backups/. >> >> 2.) Created /usr4/pgsql_old/, and copied all of /var/lib/pgsql/ >> there ('cp >> -a /var/lib/pgsql/* .') I hope you mean cp -aR , because you need those subdirectories if you're ever going to try to use the _old copy. Even if you actually did a recursive copy, if you really copied the data directories with the DB server running and without executing: select pg_start_backup('migrate'); or similar before starting the copy then you're going to have problems using that data. You can copy a working postgresql instance's data directories, but only if you've enabled WAL logging and you tell Pg about it so it can write appropriate markers for recovery. It would probably have been better to: - pg_dumpall - STOP THE 8.1 DATABASE SERVER and make sure it's stopped (no postmaster or postgres processes hanging around). - Make the old DB server binaries non-executable with chmod and/or remove them from the PATH - mv /var/lib/pgsql to /usr4/pgsql_old - /path/to/8.3/bin/initdb -D /var/lib/pgsql/data - Adjust postgresql.conf and pg_hba.conf as required - Start the 8.3 server - pg_restore all databases from dumps Right now, you probably need to make REALLY sure you've put a copy of those dumps somewhere safe, because I suspect your _old copy will be useless. Then use 8.3's initdb on a new, empty directory, verify that the config files are correct, and start the 8.3 server. -- Craig Ringer
On Wed, 18 Jun 2008, Craig Ringer wrote: > I hope you mean cp -aR , because you need those subdirectories if you're > ever going to try to use the _old copy. Even if you actually did a > recursive copy, if you really copied the data directories with the DB > server running and without executing: Craig, According to the cp man page here, 'cp -a' is equivalent to 'cp -dpR'. > select pg_start_backup('migrate'); > > or similar before starting the copy then you're going to have problems > using that data. You can copy a working postgresql instance's data > directories, but only if you've enabled WAL logging and you tell Pg about > it so it can write appropriate markers for recovery. This is interesting. I've not read about this before. As I'm the only one using the databases (primarily for the accounting data) I know that nothing was changed during the copy operation. > Right now, you probably need to make REALLY sure you've put a copy of > those dumps somewhere safe, because I suspect your _old copy will be > useless. Then use 8.3's initdb on a new, empty directory, verify that the > config files are correct, and start the 8.3 server. Every file from /var/lib/pgsql/ before I started this is on the weekly backup tape from last Friday night. If need be I can restore from that and start over. Thanks, Rich -- Richard B. Shepard, Ph.D. | Integrity Credibility Applied Ecosystem Services, Inc. | Innovation <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Wed, 18 Jun 2008, Klint Gore wrote: > Make sure that initdb is the version you want > initdb --version Klint, Yes, it is: 8.3. > then > initdb -E UTF8 -D /var/lib/pgsql/data > then post the output of that. Very interesting. While en_US is not accepted, UTF8 is. postgres@salmo:/var/lib/pgsql$ initdb -E UTF8 -D /var/lib/pgsql/data The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locales COLLATE: C CTYPE: en_US MESSAGES: en_US MONETARY: en_US NUMERIC: en_US TIME: en_US could not determine encoding for locale "en_US": codeset is "ANSI_X3.4-1968" The default text search configuration will be set to "english". fixing permissions on existing directory /var/lib/pgsql/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers/max_fsm_pages ... 24MB/153600 creating configuration files ... ok creating template1 database in /var/lib/pgsql/data/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok vacuuming database template1 ... ok copying template1 to template0 ... WARNING: could not determine encoding for locale "en_US": codeset is "ANSI_X3.4-1968" DETAIL: Please report this to <pgsql-bugs@postgresql.org>. ok copying template1 to postgres ... WARNING: could not determine encoding for locale "en_US": codeset is "ANSI_X3.4-1968" DETAIL: Please report this to <pgsql-bugs@postgresql.org>. ok WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the -A option the next time you run initdb. Success. You can now start the database server using: postgres -D /var/lib/pgsql/data or pg_ctl -D /var/lib/pgsql/data -l logfile start Notice the warning detail about us_EN encoding. Is there a protocol for what to include in a report to pgsql-bugs? I will now try a new dumpall using the 8.3.3 version, once I find how to specify a different $PGDATA on the command line. Rich -- Richard B. Shepard, Ph.D. | Integrity Credibility Applied Ecosystem Services, Inc. | Innovation <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Tue, 17 Jun 2008, Adrian Klaver wrote: > Define nothing. When you ran initdb there where no messages? Also when in > doubt I use the full path /var/lib/pgsql/bin/initdb as you have an old > version of initdb present in the old version directory you copied. When > you have two versions present at the same time it is easy to get cross > reference problems. Adrian, No messages, other than that us_EN was not an encoding found in the locale. Re-running initdb specifying UTF8 worked. I've no idea if this affects the existing databases. Thanks, Rich -- Richard B. Shepard, Ph.D. | Integrity Credibility Applied Ecosystem Services, Inc. | Innovation <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Rich Shepard wrote: > According to the cp man page here, 'cp -a' is equivalent to 'cp -dpR'. You're quite right. I was thinking "aah, a BSD-ism" but no, it's true for Linux too. Sorry. >> select pg_start_backup('migrate'); >> >> or similar before starting the copy then you're going to have problems >> using that data. You can copy a working postgresql instance's data >> directories, but only if you've enabled WAL logging and you tell Pg about >> it so it can write appropriate markers for recovery. > > This is interesting. I've not read about this before. As I'm the only one > using the databases (primarily for the accounting data) I know that nothing > was changed during the copy operation. However, that may not be sufficient. Just because you didn't perform any DB operations does *not* mean that Pg didn't write anything. Autovacuum, for example, could be working away. With 8.1 that may not be such a concern, but it's not the only thing that might be touching the DB files. You need to stop the DB or use WAL archiving and pg_start_backup if you want to be reasonably safe. > Every file from /var/lib/pgsql/ before I started this is on the weekly > backup tape from last Friday night. If need be I can restore from that and > start over. Well, no worries then. I'm sure you can understand that for many people - way TOO many people - that is not the case, so it's well worth stressing the point. -- Craig Ringer
On Wednesday 18 June 2008, Craig Ringer <craig@postnewspapers.com.au> wrote: > > Every file from /var/lib/pgsql/ before I started this is on the > > weekly backup tape from last Friday night. If need be I can restore > > from that and start over. > > Well, no worries then. I'm sure you can understand that for many people > - way TOO many people - that is not the case, so it's well worth > stressing the point. If the database was in use when _that_ backup was taken, it may also not be usable. You can't just backup a live database from the filesystem level and expect it to work ... -- Alan
Alan Hodgson wrote: > On Wednesday 18 June 2008, Craig Ringer <craig@postnewspapers.com.au> wrote: >>> Every file from /var/lib/pgsql/ before I started this is on the >>> weekly backup tape from last Friday night. If need be I can restore >>> from that and start over. >> Well, no worries then. I'm sure you can understand that for many people >> - way TOO many people - that is not the case, so it's well worth >> stressing the point. > > If the database was in use when _that_ backup was taken, it may also not be > usable. > > You can't just backup a live database from the filesystem level and expect > it to work ... It should be OK, if less than ideal, if: - You have fsync enabled (which you do if you care about your data); and - Your filesystem supports consistent snapshots If you can take a point-in-time snapshot at the filesystem level and copy that, it should be OK. Pg will still have to do a bunch of work when started up off the restored data, though. It makes much more sense to just warn Pg about the copy about to be taken, or use pg_dump . Any decent backup system will provide hooks to run pre- and post- scripts to do this sort of thing. -- Craig Ringer
On Wed, Jun 18, 2008 at 11:24:16PM +0800, Craig Ringer wrote: > Alan Hodgson wrote: > >You can't just backup a live database from the filesystem level and expect > >it to work ... > > It should be OK, if less than ideal, if: > > - You have fsync enabled (which you do if you care about your data); I've always wondered about this! Isn't fsync only a side-effect of having a write-back cache between programs and the disk? This means it's only purpose is to ensure that the cache is consistent with what's on disk. Because all programs running within a system are running on top of the cache they don't know or care whether the cache actually matches up to the disk. Therefore, if I understand things correctly, the state of fsync shouldn't matter in this use case. It's equally borken independent to the state of fsync. Sam
On Wed, 18 Jun 2008, Alan Hodgson wrote: > If the database was in use when _that_ backup was taken, it may also not > be usable. > > You can't just backup a live database from the filesystem level and expect > it to work ... Alan, The only database in the cluster that has seen any use recently is the one for SQL-Ledger. When daily incremental and weekly full backups are made, there is no activity by any user. All users are logged out of the system because it's the middle of the night. Also, while I'm glad to learn more than I knew before how to go about making backups and upgrading the PostgreSQL installation, having folks telling me all I did incorrectly is not as helpful to me as guidance on getting the cluster correctly transferred from the 8.1.4 version to the 8.3.3 version. My readings and research obviously did not sufficiently educate me on the proper process. Thanks, Rich -- Richard B. Shepard, Ph.D. | Integrity Credibility Applied Ecosystem Services, Inc. | Innovation <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
On Wed, Jun 18, 2008 at 08:48:41AM -0700, Rich Shepard wrote: > while I'm glad to learn more than I knew before how to go about > making backups and upgrading the PostgreSQL installation, having folks > telling me all I did incorrectly is not as helpful to me as guidance on > getting the cluster correctly transferred from the 8.1.4 version to the > 8.3.3 version. My readings and research obviously did not sufficiently > educate me on the proper process. What I'd try doing is this: find a 8.1 version of PG (8.1.4 or later) and run this against the data you saved off, once this is running you can then run 8.3's version of pg_dump against it, then you can restore this dump into the new version of PG. Sam
On Wed, 18 Jun 2008, Sam Mason wrote: > What I'd try doing is this: find a 8.1 version of PG (8.1.4 or later) and > run this against the data you saved off, once this is running you can then > run 8.3's version of pg_dump against it, then you can restore this dump > into the new version of PG. Sam, OK. I'm downloading 8.1.13. I'll build this to install in /usr/local. If I cd to /usr/local/bin/ and specify 'postgres -D /usr4/pgsql_old/data', then I can specify /usr/bin/pg_dumpall to use the 8.3.3 version against the running 8.1.13 server. (Writing to myself, unless there's an egregious error here. In that case, please let me know!) Results when I have them. Thanks, Rich -- Richard B. Shepard, Ph.D. | Integrity Credibility Applied Ecosystem Services, Inc. | Innovation <http://www.appl-ecosys.com> Voice: 503-667-4517 Fax: 503-667-8863
Re: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt)
From
Greg Smith
Date:
On Wed, 18 Jun 2008, Sam Mason wrote: > Isn't fsync only a side-effect of having a write-back cache between > programs and the disk? This means it's only purpose is to ensure that > the cache is consistent with what's on disk. Because all programs > running within a system are running on top of the cache they don't know > or care whether the cache actually matches up to the disk. Most programs don't. PostgreSQL writes to the database in two stages: the WAL, followed by an fsync, then later to the main database files. You can't trust the WAL will be around for recovery until the first fsync returns. The checkpoint process makes sure everything that went into the WAL then made it to the main database files, and again it doesn't trust that it's really on disk until the fsync returns. > Therefore, if I understand things correctly, the state of fsync > shouldn't matter in this use case. It's equally borken independent to > the state of fsync. Quote borken indeed, and fsync has nothing to do with it. The theory proposed is that since no writes were done, the backup should be consistant. This is quite wrong. The most obvious case showing that is one where a time-driven checkpoint occured (as happens every 5 minutes by default) while you were in the middle of backing up. Let's say the main database files are backed up before the checkpoint, but the backup is still going on some giant archival table. The checkpoint happens; it updates the earlier files already in the backup. The checkpoint finishes, and erases the WAL logs. Now the backup makes it way to the WAL files. You're screwed when you try and recover this database from the backup. The database doesn't have the latest updates, and the WAL can't recover them because it already cleared its copy of them out thinking they weren't needed anymore. You'll be lucky to get the database to start at all, it's missing data you thought was commited before the backup started, and who knows what subtle corruption you'll find. Now, in reality, even time-driven checkpoints don't do anything if there hasn't been activity, so it may very well be the case that any one database backup is fine. But you can't ignore the requirement to do a pg_start_backup before making a filesystem level backup and expect you'll get that lucky--sooner or later you will get a backup that won't restore if you keep that up. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Re: Understanding fsync (was: Need Help Recovering from Botched Upgrade Attempt)
From
Sam Mason
Date:
On Wed, Jun 18, 2008 at 02:17:00PM -0400, Greg Smith wrote: > On Wed, 18 Jun 2008, Sam Mason wrote: > > >Isn't fsync only a side-effect of having a write-back cache between > >programs and the disk? This means it's only purpose is to ensure that > >the cache is consistent with what's on disk. Because all programs > >running within a system are running on top of the cache they don't know > >or care whether the cache actually matches up to the disk. > > Most programs don't. PostgreSQL writes to the database in two stages: > the WAL, followed by an fsync, then later to the main database files. Sorry, I wasn't being clear. When I said "they don't know or care" I meant that if you've got a PG process writing it's database files and a backup process running on the same machine then the backup process will see the data written by PG independently of whether fsync is called or not. > You can't trust the WAL will be around for recovery until the first fsync > returns. The checkpoint process makes sure everything that went into the > WAL then made it to the main database files, and again it doesn't trust > that it's really on disk until the fsync returns. Yes > >Therefore, if I understand things correctly, the state of fsync > >shouldn't matter in this use case. It's equally borken independent to > >the state of fsync. > > Quote borken indeed, and fsync has nothing to do with it. My original note was mainly in response to Craig's comment that implied fsync doing far more than it actually does. I remember seeing a few comments recently saying similar things about fsync, so sorry for picking specifically on you Craig. Device/filesystem level snapshotting is exactly what's needed and is independent of any fsync settings. Sam
Sam Mason wrote: > My original note was mainly in response to Craig's comment that implied > fsync doing far more than it actually does. I remember seeing a few > comments recently saying similar things about fsync, so sorry for > picking specifically on you Craig. Device/filesystem level snapshotting > is exactly what's needed and is independent of any fsync settings. Good point. I guess fsync would only matter for certain if you were using snapshots on shared storage (a SAN or similar) where the current host's view of the yet-to-be-flushed-to-storage FS state is not known. The main snapshot system I use is LVM, which takes a snapshot of the block device underlying the file system. It's not actually filesystem level at all, but volume/block level. I'm just not sure whether filesystems like ext3 and the underlying Linux VFS will always pass blocks written by applications through to the block layer immediately and in order. After all, they do have tricks like delayed allocation that I'd expect to result in data not being passed to the file system. If they don't always pass writes to the block layer in the order they're issued then fsync would also be significant for LVM snapshots. Any idea? ext3's data=ordered refers to data being written before metadata, not the order in which different data write requests are processed, so that's no help. -- Craig Ringer
On Thu, Jun 19, 2008 at 08:56:20AM +0800, Craig Ringer wrote: > >My original note was mainly in response to Craig's comment that implied > >fsync doing far more than it actually does. I remember seeing a few > >comments recently saying similar things about fsync, so sorry for > >picking specifically on you Craig. Device/filesystem level snapshotting > >is exactly what's needed and is independent of any fsync settings. > > Good point. I guess fsync would only matter for certain if you were > using snapshots on shared storage (a SAN or similar) where the current > host's view of the yet-to-be-flushed-to-storage FS state is not known. Hum that's a fun one! I've also just realized my original statement was wrong---I combined device and filesystem snapshotting together inappropriately. I think users should care about fsync precisely when they care about the underlying block device being in a consistent state. This applies to the normal case that people worry about of power loss, as well as the case of block level backups and modern filesystems that maintain a lot of state independently of the OS's buffer cache. To summarize, fsync can be turned off when the user is unconcerned with power loss and when backups are always performed at the filesystem level. Sam