Thread: pg_dump in a production environment

pg_dump in a production environment

From
"Thomas F. O'Connell"
Date:
I have a web application backed by a PostgreSQL 7.4.6 database. It's an application with a fairly standard login process verified against the database.

I'd like to use pg_dump to grab a live backup and, based on the documentation, this would seem to be a realistic possibility. When I try, though, during business hours, when people are frequently logging in and otherwise using the application, the application becomes almost unusable (to the point where logins take on the order of minutes).

According to the documentation, pg_dump shouldn't block other operations on the database other than operations that operate with exclusive locks. Ordinarily, I run pg_autovacuum on the box, so I tried again after killing that, thinking that perhaps any substantial vacuum activity might affect pg_dump. I tried again to no avail.

Excepting the rest of the application, the login process should be completely read-only and shouldn't require any exclusive locks.

Connections don't really pile up excessively, and load on the machine does not get in the red zone. Is there anything else I should be noticing?

-tfo

--

Thomas F. O'Connell

Co-Founder, Information Architect

Sitening, LLC


Strategic Open Source: Open Your i™


http://www.sitening.com/

110 30th Avenue North, Suite 6

Nashville, TN 37203-6320

615-260-0005


Re: pg_dump in a production environment

From
Scott Marlowe
Date:
On Mon, 2005-05-23 at 14:54, Thomas F. O'Connell wrote:
> I have a web application backed by a PostgreSQL 7.4.6 database. It's
> an application with a fairly standard login process verified against
> the database.
>
> I'd like to use pg_dump to grab a live backup and, based on the
> documentation, this would seem to be a realistic possibility. When I
> try, though, during business hours, when people are frequently logging
> in and otherwise using the application, the application becomes almost
> unusable (to the point where logins take on the order of minutes).
>
> According to the documentation, pg_dump shouldn't block other
> operations on the database other than operations that operate with
> exclusive locks. Ordinarily, I run pg_autovacuum on the box, so I
> tried again after killing that, thinking that perhaps any substantial
> vacuum activity might affect pg_dump. I tried again to no avail.
>
> Excepting the rest of the application, the login process should be
> completely read-only and shouldn't require any exclusive locks.
>
> Connections don't really pile up excessively, and load on the machine
> does not get in the red zone. Is there anything else I should be
> noticing?

Basically, it sounds like postgresql is doing a lot of very long
sequential scans to do this backup.  HAve you done a vacuum full
lately?  It could be that you've got a lot of table bloat that's making
the seq scans take so long.

You could be I/O saturated already, and the backup is just pushing you
over the edge of the performance knee.

I do a 'vacuum analyze verbose'  and see if you need more fsm setup for
your regular vacuums to keep up.

Re: pg_dump in a production environment

From
"Matthew T. O'Connor"
Date:
Thomas F. O'Connell wrote:

> I have a web application backed by a PostgreSQL 7.4.6 database. It's
> an application with a fairly standard login process verified against
> the database.
>
> I'd like to use pg_dump to grab a live backup and, based on the
> documentation, this would seem to be a realistic possibility. When I
> try, though, during business hours, when people are frequently logging
> in and otherwise using the application, the application becomes almost
> unusable (to the point where logins take on the order of minutes).


Could this be an I/O saturation issue like the one the vacuum delay
settings are supposed to help with?  Perhaps we could either extend the
vacuum delay settings to effect pg_dump, or make new option to pg_dump
that would have it slow down the dump.

BTW, have you tried running pg_dump from a separate machine?  Or even
just making sure that the dump file is being written to a different disk
drive than PostgreSQL is running on.  All that disk write activity is
bound to slow the system down.

Matthew


Re: pg_dump in a production environment

From
Martijn van Oosterhout
Date:
What's you pg_dump command? Some options may take a lot of memory.

If you list the processes while this is going on, do you see one
chewing all your memory? i.e what's really causing the problem...

Hope this helps,

On Mon, May 23, 2005 at 02:54:46PM -0500, Thomas F. O'Connell wrote:
> I have a web application backed by a PostgreSQL 7.4.6 database. It's
> an application with a fairly standard login process verified against
> the database.
>
> I'd like to use pg_dump to grab a live backup and, based on the
> documentation, this would seem to be a realistic possibility. When I
> try, though, during business hours, when people are frequently
> logging in and otherwise using the application, the application
> becomes almost unusable (to the point where logins take on the order
> of minutes).
>
> According to the documentation, pg_dump shouldn't block other
> operations on the database other than operations that operate with
> exclusive locks. Ordinarily, I run pg_autovacuum on the box, so I
> tried again after killing that, thinking that perhaps any substantial
> vacuum activity might affect pg_dump. I tried again to no avail.
>
> Excepting the rest of the application, the login process should be
> completely read-only and shouldn't require any exclusive locks.
>
> Connections don't really pile up excessively, and load on the machine
> does not get in the red zone. Is there anything else I should be
> noticing?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: pg_dump in a production environment

From
"Thomas F. O'Connell"
Date:
Okay, I collated the three replies I got below for ease in replying.

I vacuum full analyze and reindexdb approximately once a month, but I use pg_autovacuum as a matter of ongoing maintenance, and it seems to hit equilibrium pretty well and seems to prevent bloat. The last time I checked a vacuum analyze verbose, I had plenty of FSM to spare. The data grows, but it doesn't seem to grow so quickly that I'd already be out of FSM space.

I actually run pg_dump from a remote machine, so I/O contention on the partition with $PGDATA shouldn't be an issue.

And here is the actual command:

pg_dump -h <host> -F c <database> > <dumpfile>

Pretty basic, although it is compressing.

As far as I can tell, the postmaster handling the dump request takes up quite a bit of CPU, but not itself to the point where the database should be unusable under ordinary circumstances. E.g., when a query/backend eats up that much CPU, it doesn't prevent further access.

I'm suspicious more of something involving locks than of CPU.

Oh, and one other small(ish) detail: the dumping client is using a 7.4.8 installation, whereas the server itself is 7.4.6.

-tfo

--

Thomas F. O'Connell

Co-Founder, Information Architect

Sitening, LLC


Strategic Open Source: Open Your i™


http://www.sitening.com/

110 30th Avenue North, Suite 6

Nashville, TN 37203-6320

615-260-0005


From: Scott Marlowe <smarlowe@g2switchworks.com>
Date: May 23, 2005 3:18:33 PM CDT
To: "Thomas F. O'Connell" <tfo@sitening.com>
Cc: PgSQL General <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] pg_dump in a production environment

Basically, it sounds like postgresql is doing a lot of very long
sequential scans to do this backup.  HAve you done a vacuum full
lately?  It could be that you've got a lot of table bloat that's making
the seq scans take so long.

You could be I/O saturated already, and the backup is just pushing you
over the edge of the performance knee.

I do a 'vacuum analyze verbose'  and see if you need more fsm setup for
your regular vacuums to keep up.

From: "Matthew T. O'Connor" <matthew@zeut.net>
Date: May 23, 2005 3:18:18 PM CDT
To: "Thomas F. O'Connell" <tfo@sitening.com>
Cc: PgSQL General <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] pg_dump in a production environment

Could this be an I/O saturation issue like the one the vacuum delay settings are supposed to help with?  Perhaps we could either extend the vacuum delay settings to effect pg_dump, or make new option to pg_dump that would have it slow down the dump.

BTW, have you tried running pg_dump from a separate machine?  Or even just making sure that the dump file is being written to a different disk drive than PostgreSQL is running on.  All that disk write activity is bound to slow the system down.

Matthew

From: Martijn van Oosterhout <kleptog@svana.org>
Date: May 23, 2005 3:25:23 PM CDT
To: "Thomas F. O'Connell" <tfo@sitening.com>
Cc: PgSQL General <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] pg_dump in a production environment
Reply-To: Martijn van Oosterhout <kleptog@svana.org>


What's you pg_dump command? Some options may take a lot of memory.

If you list the processes while this is going on, do you see one
chewing all your memory? i.e what's really causing the problem...

Hope this helps,

Re: pg_dump in a production environment

From
"Thomas F. O'Connell"
Date:
A note about database design, though: there are thousands of tables
in this database, most of them inherited. I haven't looked at the
internals of pg_dump, but generally, how do the sequential scans
work? Why would these prevent the tables from being accessed by
queries that don't require exclusive locks?

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On May 23, 2005, at 3:18 PM, Scott Marlowe wrote:

> Basically, it sounds like postgresql is doing a lot of very long
> sequential scans to do this backup.  HAve you done a vacuum full
> lately?  It could be that you've got a lot of table bloat that's
> making
> the seq scans take so long.

Re: pg_dump in a production environment

From
Scott Marlowe
Date:
The real problem is that with 7.4's buffering algorithm, the sequential
scans blow the other data out of the internal buffers of postgresql.
And, since a backup needs all the data in the tables, it's gonna seq
scan them anyway.  the tables can still be accessed, just the access is
going to be slow because your other processes are fighting the backup
AND nothing in the buffer is likely to be useful to them, except the one
table currently being backed up.

On Mon, 2005-05-23 at 15:58, Thomas F. O'Connell wrote:
> A note about database design, though: there are thousands of tables
> in this database, most of them inherited. I haven't looked at the
> internals of pg_dump, but generally, how do the sequential scans
> work? Why would these prevent the tables from being accessed by
> queries that don't require exclusive locks?
>
> -tfo
>
> --
> Thomas F. O'Connell
> Co-Founder, Information Architect
> Sitening, LLC
>
> Strategic Open Source: Open Your i™
>
> http://www.sitening.com/
> 110 30th Avenue North, Suite 6
> Nashville, TN 37203-6320
> 615-260-0005
>
> On May 23, 2005, at 3:18 PM, Scott Marlowe wrote:
>
> > Basically, it sounds like postgresql is doing a lot of very long
> > sequential scans to do this backup.  HAve you done a vacuum full
> > lately?  It could be that you've got a lot of table bloat that's
> > making
> > the seq scans take so long.

Re: pg_dump in a production environment

From
Chris Kratz
Date:
Hello Thomas,

We've had and have the exact same issue and have been unable to find a
satisfactory solution to the problem.  Currently we "just live with it".  We
do periodic backups with pg_dump on an hourly basis.  During the dump, other
accesses to the db are incredibly slow making our web app feel somewhat
sluggish for 5 to 10 minutes while the db is dumped.

After a lot of research, it appears to be an i/o and memory contention issue.
Basically, the dump procedure has to pull in all data and tables into memory
to dump them which means any other requests have to pull the data they need
back off of disk (because they got paged out to make room for the dump data)
making them very slow.  This is compounded by the fact that pg_dump usually
saturates your I/O throughput.  Since postgres doesn't manage the file system
buffers (the os does), there appears to be no easy way to tell it to only use
x amount of memory for the dump leaving all the other memory available for
the running database.  I have a hunch that the same thing happens with the
shared buffers, though I haven't proven that.  This wasn't a problem for us
while the db fit into ram, but we've grown far past that point now.

The  only solution we have ever found is simply to use something like a slony
slave and do dumps from the slave.  The slave takes the performance hit and
your primary db keeps on running at full speed.  Once the dump is done, then
the slave can "catch up" if it needs to.  Unfortunately, I believe there are
issues currently with restoring off of a dump from a slave.

-Chris

On Monday 23 May 2005 04:56 pm, Thomas F. O'Connell wrote:
> Okay, I collated the three replies I got below for ease in replying.
>
> I vacuum full analyze and reindexdb approximately once a month, but I
> use pg_autovacuum as a matter of ongoing maintenance, and it seems to
> hit equilibrium pretty well and seems to prevent bloat. The last time
> I checked a vacuum analyze verbose, I had plenty of FSM to spare. The
> data grows, but it doesn't seem to grow so quickly that I'd already
> be out of FSM space.
>
> I actually run pg_dump from a remote machine, so I/O contention on
> the partition with $PGDATA shouldn't be an issue.
>
> And here is the actual command:
>
> pg_dump -h <host> -F c <database> > <dumpfile>
>
> Pretty basic, although it is compressing.
>
> As far as I can tell, the postmaster handling the dump request takes
> up quite a bit of CPU, but not itself to the point where the database
> should be unusable under ordinary circumstances. E.g., when a query/
> backend eats up that much CPU, it doesn't prevent further access.
>
> I'm suspicious more of something involving locks than of CPU.
>
> Oh, and one other small(ish) detail: the dumping client is using a
> 7.4.8 installation, whereas the server itself is 7.4.6.
>
> -tfo
>
> --
> Thomas F. O'Connell
> Co-Founder, Information Architect
> Sitening, LLC
>
> Strategic Open Source: Open Your i™
>
> http://www.sitening.com/
> 110 30th Avenue North, Suite 6
> Nashville, TN 37203-6320
> 615-260-0005
>
> > From: Scott Marlowe <smarlowe@g2switchworks.com>
> > Date: May 23, 2005 3:18:33 PM CDT
> > To: "Thomas F. O'Connell" <tfo@sitening.com>
> > Cc: PgSQL General <pgsql-general@postgresql.org>
> > Subject: Re: [GENERAL] pg_dump in a production environment
> >
> > Basically, it sounds like postgresql is doing a lot of very long
> > sequential scans to do this backup.  HAve you done a vacuum full
> > lately?  It could be that you've got a lot of table bloat that's
> > making
> > the seq scans take so long.
> >
> > You could be I/O saturated already, and the backup is just pushing you
> > over the edge of the performance knee.
> >
> > I do a 'vacuum analyze verbose'  and see if you need more fsm setup
> > for
> > your regular vacuums to keep up.
> >
> > From: "Matthew T. O'Connor" <matthew@zeut.net>
> > Date: May 23, 2005 3:18:18 PM CDT
> > To: "Thomas F. O'Connell" <tfo@sitening.com>
> > Cc: PgSQL General <pgsql-general@postgresql.org>
> > Subject: Re: [GENERAL] pg_dump in a production environment
> >
> > Could this be an I/O saturation issue like the one the vacuum delay
> > settings are supposed to help with?  Perhaps we could either extend
> > the vacuum delay settings to effect pg_dump, or make new option to
> > pg_dump that would have it slow down the dump.
> >
> > BTW, have you tried running pg_dump from a separate machine?  Or
> > even just making sure that the dump file is being written to a
> > different disk drive than PostgreSQL is running on.  All that disk
> > write activity is bound to slow the system down.
> >
> > Matthew
> >
> > From: Martijn van Oosterhout <kleptog@svana.org>
> > Date: May 23, 2005 3:25:23 PM CDT
> > To: "Thomas F. O'Connell" <tfo@sitening.com>
> > Cc: PgSQL General <pgsql-general@postgresql.org>
> > Subject: Re: [GENERAL] pg_dump in a production environment
> > Reply-To: Martijn van Oosterhout <kleptog@svana.org>
> >
> >
> > What's you pg_dump command? Some options may take a lot of memory.
> >
> > If you list the processes while this is going on, do you see one
> > chewing all your memory? i.e what's really causing the problem...
> >
> > Hope this helps,

--
Chris Kratz

Re: pg_dump in a production environment

From
Scott Marlowe
Date:
On Mon, 2005-05-23 at 16:54, Chris Kratz wrote:
> Hello Thomas,
>
> We've had and have the exact same issue and have been unable to find a
> satisfactory solution to the problem.  Currently we "just live with it".  We
> do periodic backups with pg_dump on an hourly basis.  During the dump, other
> accesses to the db are incredibly slow making our web app feel somewhat
> sluggish for 5 to 10 minutes while the db is dumped.
>
> After a lot of research, it appears to be an i/o and memory contention issue.
> Basically, the dump procedure has to pull in all data and tables into memory
> to dump them which means any other requests have to pull the data they need
> back off of disk (because they got paged out to make room for the dump data)
> making them very slow.  This is compounded by the fact that pg_dump usually
> saturates your I/O throughput.  Since postgres doesn't manage the file system
> buffers (the os does), there appears to be no easy way to tell it to only use
> x amount of memory for the dump leaving all the other memory available for
> the running database.  I have a hunch that the same thing happens with the
> shared buffers, though I haven't proven that.  This wasn't a problem for us
> while the db fit into ram, but we've grown far past that point now.

Are you folks running 8.0 with its improved caching algorithms?  Just
wondering if that helps or not.

> The  only solution we have ever found is simply to use something like a slony
> slave and do dumps from the slave.  The slave takes the performance hit and
> your primary db keeps on running at full speed.  Once the dump is done, then
> the slave can "catch up" if it needs to.  Unfortunately, I believe there are
> issues currently with restoring off of a dump from a slave.

Actually, there's a special dump program somewhere in the slony source
tree, one of the perl scripts.  That should fix the issues with the
backups.  I ran into it a while back and have to start using the same
file.

Re: pg_dump in a production environment

From
Tom Lane
Date:
"Thomas F. O'Connell" <tfo@sitening.com> writes:
> I'd like to use pg_dump to grab a live backup and, based on the
> documentation, this would seem to be a realistic possibility. When I
> try, though, during business hours, when people are frequently
> logging in and otherwise using the application, the application
> becomes almost unusable (to the point where logins take on the order
> of minutes).

The pg_dump sources contain some comments about throttling the rate
at which data is pulled from the server, with a statement that this
idea was discussed during July 2000 and eventually dropped.  Perhaps
you can think of a better implementation.

            regards, tom lane

Re: pg_dump in a production environment

From
"Thomas F. O'Connell"
Date:
Actually, I would find this to be an interesting project, but we're
on the verge of moving to 8.0 via Slony and will have a replicated
cluster, reducing the need for live dumps on the primary read/write
database.

It's too bad round tuits are so expensive!

I was trying to think of a way today in which pg_dump might be able
to use statistics in almost the opposite way of pg_autovacuum, such
that it steered clear of objects in heavy use, but I'm not familiar
enough with the source to know how this might work.

-tfo

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On May 23, 2005, at 11:12 PM, Tom Lane wrote:

> "Thomas F. O'Connell" <tfo@sitening.com> writes:
>
>> I'd like to use pg_dump to grab a live backup and, based on the
>> documentation, this would seem to be a realistic possibility. When I
>> try, though, during business hours, when people are frequently
>> logging in and otherwise using the application, the application
>> becomes almost unusable (to the point where logins take on the order
>> of minutes).
>>
>
> The pg_dump sources contain some comments about throttling the rate
> at which data is pulled from the server, with a statement that this
> idea was discussed during July 2000 and eventually dropped.  Perhaps
> you can think of a better implementation.
>
>             regards, tom lane
>


Re: (Ideas) pg_dump in a production environment

From
Russell Smith
Date:
On Tue, 24 May 2005 02:12 pm, Tom Lane wrote:
> "Thomas F. O'Connell" <tfo@sitening.com> writes:
> > I'd like to use pg_dump to grab a live backup and, based on the
> > documentation, this would seem to be a realistic possibility. When I
> > try, though, during business hours, when people are frequently
> > logging in and otherwise using the application, the application
> > becomes almost unusable (to the point where logins take on the order
> > of minutes).
>
> The pg_dump sources contain some comments about throttling the rate
> at which data is pulled from the server, with a statement that this
> idea was discussed during July 2000 and eventually dropped.  Perhaps
> you can think of a better implementation.

A brief look at the code suggests a couple of possibilities for fixing problems.
There seem to be a least two different issues here from the user point of view.

Issue 1: Large Tables cause server slowdown
---
There are two dump cases in my short reading of the pg_dump code.  Case 1 is
the copy dump, which is done as one command.  The server does most of the work.
Case 2 is the INSERT type dump, where the pg_dump client does most of the work
creating the INSERT statement.  Case 2 is done with a cursor, and it would be easy to
insert a fixed delay sleep at the end of a certain amount of record dumps.  I'm sure we
could work out the average size of a tuple in this case (2), and even pause after a certain
amount of data has been transferred.

I am unsure about how to attack Case 1, as mentioned it is handled mostly in the backend code
which we don't really control.  If it could be declared as a CURSOR you could you the same
principal as Case 2.  The current throttling suggestions are all based on time.  I think that
a data/counter based solution would be less intense on the system.  When counter is reached,
just do a usleep for the throttle time.


Issue 2: Full Backups of large amount of data saturate disk I/O (Many tables make it slow)
---
If the backup dump is large, and given all files will be sequentially scanned during the backed,
the server IO is going to be pushed to the limit.  A pause between dumping tables seems a
simple possibility to reduce the ongoing IO load on the server to allow for a period where other
requests can be served.  This would result in a bursty type performance improvement.  In environments
with large numbers of tables of a reasonable size, this could give a benefit.


---
In releases prior to 8.0, any sort of wait on a certain amount of data would possibly not evict high use
data as the wait time would mean that the frequently used data would have been accessed again,
meaning you would evict the seqscan data you requested for the previous part of the dump.
In post 8.0, or 8.1 with clock sweep, it's possibly the same situation with regard to the delays, but
you could possibly process larger amounts of data before the sleep, as you would keep recycling the same
buffers.  You would use the sleep to reduce disk IO more than the reduce cache eviction.

The problem with timing waits for any backups are the database is not able to be vacuumed.  In some
limited circumstances (like mine), If you have a long running transaction that blocks vacuum to certain
small high update tables, you lose performance as the table bloats and can only fix it with a vacuum full.

Both of these suggestions may be totally bogus.  So I suppose I'm asking for feedback on them to see if
they would be worthwhile implementing.

Regards

Russell Smith

Re: pg_dump in a production environment

From
Chris Kratz
Date:
On Monday 23 May 2005 06:09 pm, Scott Marlowe wrote:
> On Mon, 2005-05-23 at 16:54, Chris Kratz wrote:
> Are you folks running 8.0 with its improved caching algorithms?  Just
> wondering if that helps or not.

I should have noted that we are still using 7.4 on our production servers.  We
are planning an upgrade to 8.x later this year, so it is very plausible that
the new caching algorithms would help immensely.  We are planning on moving
to a backup off of a slony slave at that point as well, so the issue will
probably be moot for us though it would still be nice to be able to do a
backup off of a running production machine.

> Actually, there's a special dump program somewhere in the slony source
> tree, one of the perl scripts.  That should fix the issues with the
> backups.  I ran into it a while back and have to start using the same
> file.

Interesting, I didn't know that, thanks for the pointer.  We will look into
it.

-Chris
--
Chris Kratz