Thread: pg v. 8.4.5 misses objects and data after restoring from backup using wal
Hi! I did my usual test restoration from backup which leaded me to suspect that some data is missing although database starts up. I did some more testing to reproduce the problem as generally as i could and i wish to present it here and hope that somebody could comment on this. It may well be my bad as i am doing some systematic error or something else. I use debian lenny amd64 with postgresql from backports which is # dpkg -l postgre\* | grep ii ii postgresql-8.4 8.4.5-2~bpo50+1 object-relational SQL database, version 8.4 ii postgresql-client-8.4 8.4.5-2~bpo50+1 front-end programs for PostgreSQL 8.4 ii postgresql-client-common 111~bpo50+1 manager for multiple PostgreSQL client versi ii postgresql-common 111~bpo50+1 PostgreSQL database-cluster manager ii postgresql-contrib-8.4 8.4.5-2~bpo50+1 additional facilities for PostgreSQL ii postgresql-plperl-8.4 8.4.5-2~bpo50+1 PL/Perl procedural language for PostgreSQL 8 ii postgresql-server-dev-8.4 8.4.5-2~bpo50+1 development files for PostgreSQL 8.4 server- In addition i have compiled and installed pljava, plproxy and skytools libraries but in this test i do not use them, i.e. i guess it doest really matter that the are in /usr/lib/postgresql/8.4/lib directory since no languages/functions are defined to use them. PostgreSQL config is default except wal logging is turned on. My procedure consists of the following steps 1. start up database 2. created schema test1 and into it table test like CREATE TABLE test1.test ( test_id serial NOT NULL, pilt bytea, CONSTRAINT test_id_pkey PRIMARY KEY (test_id) ) and started inserting data with this script import psycopg2, cPickle conn = psycopg2.connect("dbname=test host=127.0.0.1 user=postgres password=xxx port=5432") cur = conn.cursor() f = open('/boot/initrd.img-2.6.26-2-amd64', 'rb') binary = f.read() sql = "insert into test1.test (pilt) values (%s)"; cur.execute(sql, (psycopg2.Binary(binary),)) cur.close() conn.commit() conn.close() like $ for i in `seq 1 200`; do echo $i; python pi1.py; sleep 10; done 3. issued pg_start_backup('test') while script is still running 4. created schemas and tables test2.test and test3.test and started insterting data similarly 5. while three scripts are running issued rsync -avH --bwlimit=1024 root@localhost:/var/lib/postgresql/ /mnt/backup/ 6. while rsync is running i created schemas and tables test4.test and test5.test and started insterting data similarly 7. when scripts and rsync have been completed i said pg_stop_backup 8. created schema and table test6.test and started insterting data similarly When everything was finished i had approx 5g used as data cluster filesystem and about 400 wal logs by 16 MB each. Restore went like this 1. mounted /mnt/backup/ under /var/lib/postgresql 2. removed backup_label 3. created recovery.conf with a line restore_command = 'cp /data/backup/postgresql/archive-logs/%f %p' 4. emptied pg_xlog directory while having directory itself 5. started postgresql and it finishes accepting connections And now observations 1. while i have 400 wal log files during recovery it used only about last 20 of them 2. while original working postgresql data cluster filesystem's used size was about 5 gb after restore used space is 2.6 gb 3. looking thru tables, most of them have quarter of half less entries than was actually insterted and is present in original database 4. some tables are missing (test4.test) and some give errors like (test5.test) when i say select count(*) from test5.test; 2010-12-30 12:41:23 EET ERROR: catalog is missing 2 attribute(s) for relid 16586 at character 22 2010-12-30 12:41:23 EET STATEMENT: select count(*) from test5.test; tried it also with compiled version of 8.4.5 but i chose to use lenny + backports because it may be this way more reproducable that my compiled instance. Actually my data size is about 160 gb, and i tested also with sizes about 1g and 4 tables but successful reproduction seems to need database size in filesystem around 5 g and 6 tables. I undestand that this report is not very precise, i.e. i have not fixed how much data was interted into database at the moment rsyncing started but i hope it has some useful points and someone could help me further, i am also ready to do some more testing if it is helpful. Imre
On Dec 30, 2010, at 4:03 AM, Imre Oolberg wrote: > 3. issued pg_start_backup('test') while script is still running Well, yeah. Your procedure is going to get you a consistent snapshot of the state of the database when you issued that command. -- Scott Ribe scott_ribe@elevated-dev.com http://www.elevated-dev.com/ (303) 722-0567 voice
Re: pg v. 8.4.5 misses objects and data after restoring from backup using wal
From
Imre Oolberg
Date:
Hi and Happy New Year! Unfortunately i still have the same problem, in the mean time i tried out with different versions and still get same results. Experimenting on Debian Lenny v 5.0 with v. compiled v. 9.0.2 where default conf is changed only in this wal_level = archive archive_mode = on archive_command = 'test ! -f /data/backup/postgresql/archive-logs/%f && cp %p /data/backup/postgresql/archive-logs' and having as a starting point of the recovery 1. data cluster file sytem copied with rsync between pg_start_backup and pg_stop backup 2. wal logs archived starting earier that pg_start_backup and continuing after pg_stop_backup I guess that my problem is probably that although my .backup file says 000000010000000000000009.00000020.backup START WAL LOCATION: 0/9000020 (file 000000010000000000000009) STOP WAL LOCATION: 1/6325B2E0 (file 000000010000000100000063) CHECKPOINT LOCATION: 0/A2C4908 START TIME: 2011-01-03 11:32:17 EET LABEL: test STOP TIME: 2011-01-03 12:18:27 EET looking at the pg_ctl.log it starts reading wal logs beginning with 000000010000000100000063 and not with 000000010000000000000009 LOG: database system was interrupted; last known up at 2011-01-03 11:58:56 EET LOG: creating missing WAL directory "pg_xlog/archive_status" LOG: starting archive recovery LOG: restored log file "000000010000000100000063" from archive LOG: consistent recovery state reached at 1/6325B2B8 LOG: redo starts at 1/6325B2B8 LOG: restored log file "000000010000000100000064" from archive LOG: restored log file "000000010000000100000065" from archive LOG: restored log file "000000010000000100000066" from archive ... LOG: restored log file "000000010000000100000081" from archive cp: cannot stat `/data/backup/postgresql/archive-logs/000000010000000100000082': No such file or directory LOG: could not open file "pg_xlog/000000010000000100000082" (log file 1, segment 130): No such file or directory LOG: redo done at 1/81530E18 LOG: last completed transaction was at log time 2011-01-03 12:20:31.917695+02 LOG: restored log file "000000010000000100000081" from archive cp: cannot stat `/data/backup/postgresql/archive-logs/00000002.history': No such file or directory LOG: selected new timeline ID: 2 cp: cannot stat `/data/backup/postgresql/archive-logs/00000001.history': No such file or directory LOG: archive recovery complete LOG: autovacuum launcher started LOG: database system is ready to accept connections I tried to follow a pitr procedure as per http://www.postgresql.org/docs/9.0/interactive/continuous-archiving.html and i would be very greatful if somebody comments this and points me to the right direction. Best regards, Imre On 12/30/10 13:03, Imre Oolberg wrote: > Hi! > > I did my usual test restoration from backup which leaded me to suspect > that some data is missing although database starts up. I did some more > testing to reproduce the problem as generally as i could and i wish to > present it here and hope that somebody could comment on this. It may > well be my bad as i am doing some systematic error or something else. > > I use debian lenny amd64 with postgresql from backports which is > > # dpkg -l postgre\* | grep ii > ii postgresql-8.4 8.4.5-2~bpo50+1 > object-relational SQL database, version 8.4 > ii postgresql-client-8.4 8.4.5-2~bpo50+1 front-end > programs for PostgreSQL 8.4 > ii postgresql-client-common 111~bpo50+1 manager for > multiple PostgreSQL client versi > ii postgresql-common 111~bpo50+1 PostgreSQL > database-cluster manager > ii postgresql-contrib-8.4 8.4.5-2~bpo50+1 additional > facilities for PostgreSQL > ii postgresql-plperl-8.4 8.4.5-2~bpo50+1 PL/Perl > procedural language for PostgreSQL 8 > ii postgresql-server-dev-8.4 8.4.5-2~bpo50+1 development > files for PostgreSQL 8.4 server- > > In addition i have compiled and installed pljava, plproxy and skytools > libraries but in this test i do not use them, i.e. i guess it doest > really matter that the are in /usr/lib/postgresql/8.4/lib directory > since no languages/functions are defined to use them. > > PostgreSQL config is default except wal logging is turned on. > > My procedure consists of the following steps > > 1. start up database > 2. created schema test1 and into it table test like > > CREATE TABLE test1.test ( > test_id serial NOT NULL, > pilt bytea, > CONSTRAINT test_id_pkey PRIMARY KEY (test_id) > ) > > and started inserting data with this script > > import psycopg2, cPickle > conn = psycopg2.connect("dbname=test host=127.0.0.1 user=postgres > password=xxx port=5432") > cur = conn.cursor() > > f = open('/boot/initrd.img-2.6.26-2-amd64', 'rb') > binary = f.read() > sql = "insert into test1.test (pilt) values (%s)"; > cur.execute(sql, (psycopg2.Binary(binary),)) > cur.close() > conn.commit() > conn.close() > > like > > $ for i in `seq 1 200`; do echo $i; python pi1.py; sleep 10; done > > 3. issued pg_start_backup('test') while script is still running > 4. created schemas and tables test2.test and test3.test and started > insterting data similarly > 5. while three scripts are running issued rsync -avH --bwlimit=1024 > root@localhost:/var/lib/postgresql/ /mnt/backup/ > 6. while rsync is running i created schemas and tables test4.test and > test5.test and started insterting data similarly > 7. when scripts and rsync have been completed i said pg_stop_backup > 8. created schema and table test6.test and started insterting data > similarly > > When everything was finished i had approx 5g used as data cluster > filesystem and about 400 wal logs by 16 MB each. > > Restore went like this > > 1. mounted /mnt/backup/ under /var/lib/postgresql > 2. removed backup_label > 3. created recovery.conf with a line > > restore_command = 'cp /data/backup/postgresql/archive-logs/%f %p' > > 4. emptied pg_xlog directory while having directory itself > 5. started postgresql and it finishes accepting connections > > And now observations > > 1. while i have 400 wal log files during recovery it used only about > last 20 of them > 2. while original working postgresql data cluster filesystem's used > size was about 5 gb after restore used space is 2.6 gb > 3. looking thru tables, most of them have quarter of half less entries > than was actually insterted and is present in original database > 4. some tables are missing (test4.test) and some give errors like > (test5.test) when i say select count(*) from test5.test; > > 2010-12-30 12:41:23 EET ERROR: catalog is missing 2 attribute(s) for > relid 16586 at character 22 > 2010-12-30 12:41:23 EET STATEMENT: select count(*) from test5.test; > > tried it also with compiled version of 8.4.5 but i chose to use lenny > + backports because it may be this way more reproducable that my > compiled instance. Actually my data size is about 160 gb, and i tested > also with sizes about 1g and 4 tables but successful reproduction > seems to need database size in filesystem around 5 g and 6 tables. > > I undestand that this report is not very precise, i.e. i have not > fixed how much data was interted into database at the moment rsyncing > started but i hope it has some useful points and someone could help me > further, i am also ready to do some more testing if it is helpful. > > > Imre > >
Re: pg v. 8.4.5 misses objects and data after restoring from backup using wal
From
"Kevin Grittner"
Date:
Imre Oolberg <imre@auul.pri.ee> wrote: > archive_command = 'test ! -f > /data/backup/postgresql/archive-logs/%f && > cp %p /data/backup/postgresql/archive-logs' I suppose that will work, but why not specify %f in the copy target? > I guess that my problem is probably that although my .backup file > says > > 000000010000000000000009.00000020.backup > START WAL LOCATION: 0/9000020 (file 000000010000000000000009) > STOP WAL LOCATION: 1/6325B2E0 (file 000000010000000100000063) > CHECKPOINT LOCATION: 0/A2C4908 > START TIME: 2011-01-03 11:32:17 EET > LABEL: test > STOP TIME: 2011-01-03 12:18:27 EET > > looking at the pg_ctl.log it starts reading wal logs beginning > with 000000010000000100000063 and not with > 000000010000000000000009 What did you have in your recovery.conf file? -Kevin
Re: pg v. 8.4.5 misses objects and data after restoring from backup using wal
From
Imre Oolberg
Date:
On 01/03/11 19:36, Kevin Grittner wrote:
I have there exactly one line
restore_command = 'cp /data/backup/postgresql/archive-logs/%f %p'
and with it i expect to restore as much data as i have wal files archived. The most strange thing is that it uses archived wal files which are generated after pg_stop_backup but skips all others.
Imre
Yep, that works as it is since wal files are created althought to use %f should not hurt either.Imre Oolberg <imre@auul.pri.ee> wrote:archive_command = 'test ! -f /data/backup/postgresql/archive-logs/%f && cp %p /data/backup/postgresql/archive-logs'I suppose that will work, but why not specify %f in the copy target?
I guess that my problem is probably that although my .backup file says 000000010000000000000009.00000020.backup START WAL LOCATION: 0/9000020 (file 000000010000000000000009) STOP WAL LOCATION: 1/6325B2E0 (file 000000010000000100000063) CHECKPOINT LOCATION: 0/A2C4908 START TIME: 2011-01-03 11:32:17 EET LABEL: test STOP TIME: 2011-01-03 12:18:27 EET looking at the pg_ctl.log it starts reading wal logs beginning with 000000010000000100000063 and not with 000000010000000000000009What did you have in your recovery.conf file?
I have there exactly one line
restore_command = 'cp /data/backup/postgresql/archive-logs/%f %p'
and with it i expect to restore as much data as i have wal files archived. The most strange thing is that it uses archived wal files which are generated after pg_stop_backup but skips all others.
Imre
-Kevin