Thread: pg_dump / pg_dumpall / memory issues

pg_dump / pg_dumpall / memory issues

From
Ericson Smith
Date:
Hi,

We have a nightly backup going on our db server. We use pg_dumpall ,
which when its done, generates a text dump around 9 Gigs.

Most nights, the backup runs at around a load of 2.5 -- with a normal
load of arount 2 -- (this on a dual 2.4Ghz Xeon machine with 6GB ram).
Our schema has a huge table (about 5 million tuples) which gets queried
about 30 times per second. These queries fetch one records at a time
pretty evenly throughout this large table, so I would imagine this table
would dominate the shared RAM (currently set at 320MB).

As you can imagine, at times the backup process (or in fact any large
query that dominates the cache), tends to spike up the load pretty
severely. At some point, we experimented with more shared memory, but
that actually decreased overall performance, as was discussed here
earlier.

What can we do to alleviate this problem? Its going to be difficult to
not query the large table at any given time (24/7 service and all).

Are there any strategies that we can take with pg_dump/pg_dumpall? My
dump command is :
> pg_dumpall -c > /tmp/backupfile.sql

Help!!!

--
Ericson Smith <eric@did-it.com>


Re: pg_dump / pg_dumpall / memory issues

From
Tom Lane
Date:
Ericson Smith <eric@did-it.com> writes:
> As you can imagine, at times the backup process (or in fact any large
> query that dominates the cache), tends to spike up the load pretty
> severely. At some point, we experimented with more shared memory, but
> that actually decreased overall performance, as was discussed here
> earlier.

> What can we do to alleviate this problem?

There was a great deal of discussion back in July 2000 about throttling
the rate at which pg_dump pulls down data.  You might check the
archives, and also look in pg_dump.c for the comments about it therein.
No one seemed to be able to come up with a great answer that time
around, but perhaps you can find a way that works.

            regards, tom lane


Re: pg_dump / pg_dumpall / memory issues

From
"scott.marlowe"
Date:
On 9 Apr 2003, Ericson Smith wrote:

> Hi,
>
> We have a nightly backup going on our db server. We use pg_dumpall ,
> which when its done, generates a text dump around 9 Gigs.
>
> Most nights, the backup runs at around a load of 2.5 -- with a normal
> load of arount 2 -- (this on a dual 2.4Ghz Xeon machine with 6GB ram).
> Our schema has a huge table (about 5 million tuples) which gets queried
> about 30 times per second. These queries fetch one records at a time
> pretty evenly throughout this large table, so I would imagine this table
> would dominate the shared RAM (currently set at 320MB).
>
> As you can imagine, at times the backup process (or in fact any large
> query that dominates the cache), tends to spike up the load pretty
> severely. At some point, we experimented with more shared memory, but
> that actually decreased overall performance, as was discussed here
> earlier.
>
> What can we do to alleviate this problem? Its going to be difficult to
> not query the large table at any given time (24/7 service and all).
>
> Are there any strategies that we can take with pg_dump/pg_dumpall? My
> dump command is :
> > pg_dumpall -c > /tmp/backupfile.sql

What version of pgsql are you running?  I've gotten MUCH better backup
restore performance on 7.3 series than I did with 7.2.  I can backup 1 gig
of data in about 10 minutes across 100 Base Tx network with a pipe like
so:

pg_dump -h hostname databasename | psql databasename

It took something like 30 minutes to an hour before to do this in 7.2.x.

(My box is a dual PIII 750 with 1.5 gig ram, and a 10KRPM UWScsi drive for
the database seperate from the system.)


Re: pg_dump / pg_dumpall / memory issues

From
"scott.marlowe"
Date:
Oh, as a followup on that last message I sent off, I ran the backup by
hand, which basically runs a PHP script on a postgresql 7.3 machine to
backup a postgresql 7.2 machine.  Since 7.3's pg_dumpall isn't so good at
talking to the 7.2 machine, I had to write my own, that's ok, there were
other things to do as well, it's a backup script after all.

Anyway, the backup script gets a list of all the databases on the 7.2
server and initializes an empty place with 'initdb --locate=C' then
promptly fires off line after line like this:

createdb postgres;pg_dump -O -h mainbox postgres | psql postgres

In fact it fires it off 62 times for our system. With the backup script
running the load factor on the mainbox was about 1.09 while the % CPU for
the postmaster doing the backup was 50%.  when I ran pgbench -c 4 -t
1000000 to provide some PTL (parallel thrash load :-) the usage of the
postmasters running the pgbench was about 24%, while the postmaster
running the backup was about 35 to 40%.  Note that this was a dual
PIII-750, so the totals can add up to 200% in RH Linux.

The responsiveness of the main box is about the same during just the
backup, but the pgbench was a killer, with or without the backup, that
slows the machine down a lot more for me.

So I'm wondering if the simple solution might be to either use a slower
box / network connection / throttled port on the backup box, or just
backup into another database since the copies into the other machine
probably slow things down enough to render less of a load on the server
being backed up.

both the two servers in this test are identical, except the mainline box
is still running 7.2.4 while the backup / test box is running 7.3.2.  both
have dual 750 MHz CPUs and 1.5 gig ram with a 10krpm USCSI one for system,
one for postgresql.


Re: pg_dump / pg_dumpall / memory issues

From
Ericson Smith
Date:
We're running the 7.3 series. We are getting better backup performance
than 7.2 indeed. Looking at pg_dump.c -- it seems to use the COPY
command so perhaps throttling in the code may not be the best solution.
I did see the notes about this that Tom mentioned.

> So I'm wondering if the simple solution might be to either use a slower
> box / network connection / throttled port on the backup box, or just
> backup into another database since the copies into the other machine
> probably slow things down enough to render less of a load on the server
> being backed up.

Hmmm... that might be an interesting solution. We do have a slower
standby DB, that would be excellent for that purpose. It would be an
added incentive too, because the standby DB would be hot after backup.
I'm gonna give this a shot and report back. I guess we can do a
pg_dumpall from the standby DB as soon as the main DB has finished
backing up too!

Regards
- Ericson Smith
eric@did-it.com

scott.marlowe wrote:

>What version of pgsql are you running?  I've gotten MUCH better backup
>restore performance on 7.3 series than I did with 7.2.  I can backup 1 gig
>of data in about 10 minutes across 100 Base Tx network with a pipe like
>so:
>
>pg_dump -h hostname databasename | psql databasename
>
>It took something like 30 minutes to an hour before to do this in 7.2.x.
>
>(My box is a dual PIII 750 with 1.5 gig ram, and a 10KRPM UWScsi drive for
>the database seperate from the system.)
>
>
>
>
>


Re: pg_dump / pg_dumpall / memory issues

From
Shridhar Daithankar
Date:
On Thursday 10 April 2003 11:14, you wrote:
> > So I'm wondering if the simple solution might be to either use a slower
> > box / network connection / throttled port on the backup box, or just
> > backup into another database since the copies into the other machine
> > probably slow things down enough to render less of a load on the server
> > being backed up.
>
> Hmmm... that might be an interesting solution. We do have a slower
> standby DB, that would be excellent for that purpose. It would be an
> added incentive too, because the standby DB would be hot after backup.
> I'm gonna give this a shot and report back. I guess we can do a
> pg_dumpall from the standby DB as soon as the main DB has finished
> backing up too!

If you have an OS that binds nice value of a process to it's I/O priority, you
can lower the priority  of pg_dump so that it runs very slow, so to speak.

I believe freeBSD does this. Not sure if linux does it as well.

HTH

 Shridhar


Re: pg_dump / pg_dumpall / memory issues

From
Lincoln Yeoh
Date:
At 01:44 AM 4/10/2003 -0400, Ericson Smith wrote:

>Hmmm... that might be an interesting solution. We do have a slower standby
>DB, that would be excellent for that purpose. It would be an added
>incentive too, because the standby DB would be hot after backup. I'm gonna
>give this a shot and report back. I guess we can do a pg_dumpall from the
>standby DB as soon as the main DB has finished backing up too!

With tee you can load to a staging db (only switch it to standby if the
dump is ok) and write to a file at the same time. Could be better if the
backup file is on a different disk/storage hardware.

I normally pipe a pg_dump through gzip to a file - need to justify >>1GHz
cpus ;).

Link.

Re: pg_dump / pg_dumpall / memory issues

From
Kyle
Date:
Shridhar has a good point using nice.  I know RedHat supports nice
because I have installed a totally background program that uses "nice
19" as part of its command line.

I just successfully tried this on a RedHat 7.3:

nice -n 15 pg_dumpall >testdump.sql

It seems to work.  However, my SQL server is so lightly loaded I cannot
verify that it actually *helps*.

-Kyle

Shridhar Daithankar wrote:
> If you have an OS that binds nice value of a process to it's I/O priority, you
> can lower the priority  of pg_dump so that it runs very slow, so to speak.
>
> I believe freeBSD does this. Not sure if linux does it as well.
>
> HTH
>
>  Shridhar
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html


Re: pg_dump / pg_dumpall / memory issues

From
Murthy Kambhampaty
Date:
If you have a standby database server, and you are looking to get speedy
backups, you will want to look at:
http://marc.theaimsgroup.com/?l=postgresql-admin&w=2&r=1&s=lvm+snapshots&q=b

The thread discusses filesystem level backup of the database cluster off an
[LVM/EVMS/NetApp] filer snapshot, with implementation details for linux LVM
snapshots.

Cheers,
    Murthy

-----Original Message-----
From: Ericson Smith [mailto:eric@did-it.com]
Sent: Thursday, April 10, 2003 01:45
To: Postgresql General
Subject: Re: [GENERAL] pg_dump / pg_dumpall / memory issues


We're running the 7.3 series. We are getting better backup performance
than 7.2 indeed. Looking at pg_dump.c -- it seems to use the COPY
command so perhaps throttling in the code may not be the best solution.
I did see the notes about this that Tom mentioned.

> So I'm wondering if the simple solution might be to either use a slower
> box / network connection / throttled port on the backup box, or just
> backup into another database since the copies into the other machine
> probably slow things down enough to render less of a load on the server
> being backed up.

Hmmm... that might be an interesting solution. We do have a slower
standby DB, that would be excellent for that purpose. It would be an
added incentive too, because the standby DB would be hot after backup.
I'm gonna give this a shot and report back. I guess we can do a
pg_dumpall from the standby DB as soon as the main DB has finished
backing up too!

Regards
- Ericson Smith
eric@did-it.com

scott.marlowe wrote:

>What version of pgsql are you running?  I've gotten MUCH better backup
>restore performance on 7.3 series than I did with 7.2.  I can backup 1 gig
>of data in about 10 minutes across 100 Base Tx network with a pipe like
>so:
>
>pg_dump -h hostname databasename | psql databasename
>
>It took something like 30 minutes to an hour before to do this in 7.2.x.
>
>(My box is a dual PIII 750 with 1.5 gig ram, and a 10KRPM UWScsi drive for
>the database seperate from the system.)
>
>
>
>
>


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: pg_dump / pg_dumpall / memory issues

From
Doug McNaught
Date:
Kyle <kyle@ccidomain.com> writes:

> Shridhar has a good point using nice.  I know RedHat supports nice
> because I have installed a totally background program that uses "nice
> 19" as part of its command line.

He's talking about honoring 'nice' for I/O scheduling, which is
in addition to the standard modification of CPU priority.  Linux
doesn't currently do this in 2.4--it was discussed for 2.5 but I don't
know if it has gone in yet.

-Doug

Re: pg_dump / pg_dumpall / memory issues

From
Dennis Gearon
Date:
If it does, then the first PC based multimedia system will fially be born.
Actually, the reverse of nice is needed for that; I need X amount of mips and Y
amount of I/O to run, should be a valid API call. THEN, multimedia apps can be
waht they're supposed to be.

Doug McNaught wrote:
> Kyle <kyle@ccidomain.com> writes:
>
>
>>Shridhar has a good point using nice.  I know RedHat supports nice
>>because I have installed a totally background program that uses "nice
>>19" as part of its command line.
>
>
> He's talking about honoring 'nice' for I/O scheduling, which is
> in addition to the standard modification of CPU priority.  Linux
> doesn't currently do this in 2.4--it was discussed for 2.5 but I don't
> know if it has gone in yet.
>
> -Doug
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>