Thread: On-line backup
Hi Tom, Is this 2003 advice still relevant with postgresql 8.1.0? Our b-tree indexes corrupt pretty often on our production server running 8.1.0 and we are grasping for a solution. We perform online backups like this: In addition, we archive the transactions and replay them for PITR. /usr/local/pgsql.v810/bin/psql -p 5442 -d postgres -c "SELECT pg_start_backup('share.sqldumps.Linux.pgsql.serf.base.backup');" sleep 20 /bin/tar -zcvf /sprj/sqldumps_Linux.pgsql_serf5442/base.backup.serf.`date '+d%m%d%Y'`.tar.gz /anothershare/Linux.pgsql/serf5442/* sleep 30 /usr/local/pgsql.v810/bin/psql -p 5442 -d postgres -c "SELECT pg_stop_backup();" http://archives.postgresql.org/pgsql-general/2003-04/msg00655.php Thanks, ~DjK
"Mr. Dan" <bitsandbytes88@hotmail.com> writes: > Is this 2003 advice still relevant with postgresql 8.1.0? Our b-tree > indexes corrupt pretty often on our production server running 8.1.0 and we > are grasping for a solution. Corrupt how --- what's the exact symptoms? The *first* bit of advice I'd give you is that you shouldn't be on 8.1.0 anymore. We don't make update releases just for amusement. However, whether this represents an already-fixed problem is impossible to tell with no details. (As for that 2003 discussion, that predates the availability of PITR. Now, you can use a plain tar backup ... as long as you've got WAL logs to go with it.) regards, tom lane
Hi Tom, We do a complete re-index(reindexcb) to the cluster on the weekend. The index corruption is characterized by incorrect result sets returned from a query. What happens is that we have a 'hot' table (one with many many transactions) that gets inserted and deleted often. About once a month now when we do a select from that table the results of the select do not match the where clause, ex. select * from recent_projects where user_id = 139 sometimes produces these results: user_id project_id 139 3 139 1 139 17 754 11 The last record does not belong in that result set. The solution to this problem has been to rebuild the indexes, which makes the query return the correct results. But given that we have customers who run Squish at 2am EST, we wouldn't be able to rebuild the indexes if they have a problem and this could result in 4-5 hours worth of down time for them, which is completely unacceptable. I've always agreed with staying current with minor releases, and forcing everyone to move along with the new minor releases, but that's just me. ###################################################### old ###################################################### > >"Mr. Dan" <bitsandbytes88@hotmail.com> writes: > > Is this 2003 advice still relevant with postgresql 8.1.0? Our b-tree > > indexes corrupt pretty often on our production server running 8.1.0 and >we > > are grasping for a solution. > >Corrupt how --- what's the exact symptoms? > >The *first* bit of advice I'd give you is that you shouldn't be on 8.1.0 >anymore. We don't make update releases just for amusement. However, >whether this represents an already-fixed problem is impossible to tell >with no details. > >(As for that 2003 discussion, that predates the availability of PITR. >Now, you can use a plain tar backup ... as long as you've got WAL logs >to go with it.) > > regards, tom lane
"Mr. Dan" <bitsandbytes88@hotmail.com> writes: > ... What happens is that we have a 'hot' table (one with many many > transactions) that gets inserted and deleted often. About once a month now > when we do a select from that table the results of the select do not match > the where clause, ex. > select * from recent_projects > where user_id = 139 > sometimes produces these results: > user_id project_id > 139 3 > 139 1 > 139 17 > 754 11 Hmmm .... that looks sorta familiar. What is the query plan that's used for this SELECT? regards, tom lane
Hey Tom, Here is the query: DELETE FROM recent_projects WHERE project_id = 3 AND user_id = 139; And here is the query plan: Index Scan using pk_recent_projects on recent_projects (cost=0.00..5.81 rows=1 width=6) Index Cond: ((user_id = 139) AND (project_id = 3)) The table definition is : CREATE TABLE recent_projects ( user_id int4 NOT NULL, project_id int4 NOT NULL, last_viewed timestamp, CONSTRAINT pk_recent_projects PRIMARY KEY (user_id, project_id), CONSTRAINT fk_recent_project_id FOREIGN KEY (project_id) REFERENCES project (project_id) MATCH SIMPLE ON UPDATE RESTRICT ON DELETE CASCADE, CONSTRAINT fk_recent_user_id FOREIGN KEY (user_id) REFERENCES users (user_id) MATCH SIMPLE ON UPDATE RESTRICT ON DELETE RESTRICT ) WITHOUT OIDS; This is a table with a lot of transactions. The behavior we noticed is that we do the delete as specified above and then do a reinsert with a new timestamp and everything else the same (lazy I know, but not my code). What happens some of the time is that the reinsert fails and give a duplicate key failure. What has fixed this in the past is reindexing the table - but we don't want to rely on that forever. We are also have an issue with processes locking up. We can't kill -9 pid because postgres ends up restarting the whole cluster. What can we do? kill -s INT TERM or SIGQUIT don't seem to work either. Should we be root or logged in as postgres when we try to kill these? Thanks! ~DjK >From: Tom Lane <tgl@sss.pgh.pa.us> >To: "Mr. Dan" <bitsandbytes88@hotmail.com> >CC: pgsql-admin@postgresql.org >Subject: Re: [ADMIN] On-line backup Date: Mon, 17 Jul 2006 14:43:30 -0400 > >"Mr. Dan" <bitsandbytes88@hotmail.com> writes: > > ... What happens is that we have a 'hot' table (one with many many > > transactions) that gets inserted and deleted often. About once a month >now > > when we do a select from that table the results of the select do not >match > > the where clause, ex. > > > select * from recent_projects > > where user_id = 139 > > > sometimes produces these results: > > > user_id project_id > > 139 3 > > 139 1 > > 139 17 > > 754 11 > >Hmmm .... that looks sorta familiar. What is the query plan that's used >for this SELECT? > > regards, tom lane