Thread: On-line backup

On-line backup

From

"Mr. Dan"

Date:

17 July 2006, 17:15:23

Hi Tom,

Is this 2003 advice still relevant with postgresql 8.1.0?   Our b-tree
indexes corrupt pretty often on our production server running 8.1.0 and we
are grasping for a solution.  We perform online backups like this:  In
addition, we archive the transactions and replay them for PITR.

/usr/local/pgsql.v810/bin/psql -p 5442 -d postgres -c "SELECT
pg_start_backup('share.sqldumps.Linux.pgsql.serf.base.backup');"
sleep 20
/bin/tar -zcvf /sprj/sqldumps_Linux.pgsql_serf5442/base.backup.serf.`date
'+d%m%d%Y'`.tar.gz /anothershare/Linux.pgsql/serf5442/*
sleep 30
/usr/local/pgsql.v810/bin/psql -p 5442 -d postgres -c "SELECT
pg_stop_backup();"

http://archives.postgresql.org/pgsql-general/2003-04/msg00655.php

Thanks,
~DjK

Re: On-line backup

From

Tom Lane

Date:

17 July 2006, 17:40:56

"Mr. Dan" <bitsandbytes88@hotmail.com> writes:
> Is this 2003 advice still relevant with postgresql 8.1.0?   Our b-tree
> indexes corrupt pretty often on our production server running 8.1.0 and we
> are grasping for a solution.

Corrupt how --- what's the exact symptoms?

The *first* bit of advice I'd give you is that you shouldn't be on 8.1.0
anymore.  We don't make update releases just for amusement.  However,
whether this represents an already-fixed problem is impossible to tell
with no details.

(As for that 2003 discussion, that predates the availability of PITR.
Now, you can use a plain tar backup ... as long as you've got WAL logs
to go with it.)

            regards, tom lane

Re: On-line backup

From

"Mr. Dan"

Date:

17 July 2006, 18:21:39

Hi Tom,

We do a complete re-index(reindexcb) to the cluster on the weekend.

The index corruption is characterized by incorrect result sets returned from
a query. What happens is that we have a 'hot' table (one with many many
transactions) that gets inserted and deleted often. About once a month now
when we do a select from that table the results of the select do not match
the where clause, ex.

select * from recent_projects
where user_id = 139

sometimes produces these results:

user_id    project_id
139        3
139        1
139        17
754        11

The last record does not belong in that result set. The solution to this
problem has been to rebuild the indexes, which makes the query return the
correct results. But given that we have customers who run Squish at 2am EST,
we wouldn't be able to rebuild the indexes if they have a problem and this
could result in 4-5 hours worth of down time for them, which is completely
unacceptable.

I've always agreed with staying current with minor releases, and forcing
everyone to move along with the new minor releases, but that's just me.

######################################################
old
######################################################
>
>"Mr. Dan" <bitsandbytes88@hotmail.com> writes:
> > Is this 2003 advice still relevant with postgresql 8.1.0?   Our b-tree
> > indexes corrupt pretty often on our production server running 8.1.0 and
>we
> > are grasping for a solution.
>
>Corrupt how --- what's the exact symptoms?
>
>The *first* bit of advice I'd give you is that you shouldn't be on 8.1.0
>anymore.  We don't make update releases just for amusement.  However,
>whether this represents an already-fixed problem is impossible to tell
>with no details.
>
>(As for that 2003 discussion, that predates the availability of PITR.
>Now, you can use a plain tar backup ... as long as you've got WAL logs
>to go with it.)
>
>            regards, tom lane

Re: On-line backup

From

Tom Lane

Date:

17 July 2006, 18:44:57

"Mr. Dan" <bitsandbytes88@hotmail.com> writes:
> ... What happens is that we have a 'hot' table (one with many many
> transactions) that gets inserted and deleted often. About once a month now
> when we do a select from that table the results of the select do not match
> the where clause, ex.

> select * from recent_projects
> where user_id = 139

> sometimes produces these results:

> user_id    project_id
> 139        3
> 139        1
> 139        17
> 754        11

Hmmm .... that looks sorta familiar.  What is the query plan that's used
for this SELECT?

            regards, tom lane

Re: On-line backup

From

"Mr. Dan"

Date:

19 July 2006, 15:51:46

Hey Tom,

Here is the query:

DELETE FROM recent_projects WHERE project_id = 3 AND user_id = 139;

And here is the query plan:

Index Scan using pk_recent_projects on recent_projects  (cost=0.00..5.81
rows=1 width=6)
  Index Cond: ((user_id = 139) AND (project_id = 3))

The table definition is :

CREATE TABLE recent_projects
(
  user_id int4 NOT NULL,
  project_id int4 NOT NULL,
  last_viewed timestamp,
  CONSTRAINT pk_recent_projects PRIMARY KEY (user_id, project_id),
  CONSTRAINT fk_recent_project_id FOREIGN KEY (project_id)
      REFERENCES project (project_id) MATCH SIMPLE
      ON UPDATE RESTRICT ON DELETE CASCADE,
  CONSTRAINT fk_recent_user_id FOREIGN KEY (user_id)
      REFERENCES users (user_id) MATCH SIMPLE
      ON UPDATE RESTRICT ON DELETE RESTRICT
)
WITHOUT OIDS;

This is a table with a lot of transactions.

The behavior we noticed is that we do the delete as specified above and then
do a reinsert with a new timestamp and everything else the same (lazy I
know, but not my code). What happens some of the time is that the reinsert
fails and give a duplicate key failure. What has fixed this in the past is
reindexing the table - but we don't want to rely on that forever.

We are also have an issue with processes locking up.   We can't kill -9 pid
because postgres ends up restarting the whole cluster.  What can we do?
kill -s INT TERM or SIGQUIT don't seem to work either.  Should we be root or
logged in as postgres when we try to kill these?

Thanks!
~DjK

>From: Tom Lane <tgl@sss.pgh.pa.us>
>To: "Mr. Dan" <bitsandbytes88@hotmail.com>
>CC: pgsql-admin@postgresql.org
>Subject: Re: [ADMIN] On-line backup Date: Mon, 17 Jul 2006 14:43:30 -0400
>
>"Mr. Dan" <bitsandbytes88@hotmail.com> writes:
> > ... What happens is that we have a 'hot' table (one with many many
> > transactions) that gets inserted and deleted often. About once a month
>now
> > when we do a select from that table the results of the select do not
>match
> > the where clause, ex.
>
> > select * from recent_projects
> > where user_id = 139
>
> > sometimes produces these results:
>
> > user_id    project_id
> > 139        3
> > 139        1
> > 139        17
> > 754        11
>
>Hmmm .... that looks sorta familiar.  What is the query plan that's used
>for this SELECT?
>
>            regards, tom lane