Thread: pg_dump makes our system unusable - any way to pg_dump in the middle of the day? (postgres 8.4.4)

I read in the pg_dump man page that pg_dump does not block other users
accessing the database (readers or writers).

In practice, if I pg_dump our 100 GB database, our application, which
is half Web front end and half OLTP, at a certain point, slows to a
crawl and the Web interface becomes unresponsive.  I start getting
check_postgres complaints about number of locks and query lengths.  I
see locks around for over 5 minutes.

I've had to abort pg_dump as it made the system unusable.

Can I pg_dump without breaking the system?

Best,
-at

I'm using cruisecontrol and ant to back up our database at certain times on certain days of the week. Cruisecontrol sends out completion emails when it's done. I don't think pgsql has a good built in way to schedule backups. Cruisecontrol offers supreme flexibility with relatively little work.
 
I'll post my config files from ant and cruisecontrol that handle the backup in case you're interested. Cruisecontrol runs as a service on windows and linux and periodically checks if it should run its job. It can check a file for modifications or simply run the job at certain times. I have it doing the backup job
At 6am on Wed and Sat. On those times it fires off an Ant job which does the actual db backup using pg_dump ( see ant config ). Cruisecontrol comes packaged with the Ant program. This might be overkill but it's a valuable skill to be able to use cruisecontrol and ant to automate just about anything.
 
 
Cruisecontrol config:
 
<?xml version="1.0" encoding="UTF-8"?>
 
<cruisecontrol>
        <property name="machine.name" value="meqdbserver-dt" />
        <property name="webbase" value="http://${machine.name}:8080/" />
        <property name="buildroot.dir" value="C:\systest\" />
        <property name="antroot.dir" value="${buildroot.dir}ant\" />
        <property name="anthome.dir" value="${antroot.dir}apache-ant-1.7.0" />
        <property name="antupdaterbuild.file" value="systest-ant-updater.build" />
        <property name="antfilesupdater.dir" value="${buildroot.dir}updater\" />
        <property name="build.file" value="${antroot.dir}dbmaintenance-ant.build" />
        <property name="cruise.dir" value="C:\Program Files (x86)\CruiseControl\" />
        <property name="xsl.dir" value="${buildroot.dir}cruisecontrol\dbbackupxsl\" />
    <property name="css.dir" value="${buildroot.dir}cruisecontrol\css\" />
       
        <project name="Backup Regression Testing Database" buildafterfailed="false" requireModification="false">               
                <schedule>
                        <ant
                                anthome="${anthome.dir}"
                                time="0600"
                                day="wednesday"
                                buildfile="${build.file}"
                                target="dump"
                                uselogger="true" />            
                        <ant
                                anthome="${anthome.dir}"
                                time="0600"
                                day="saturday"
                                buildfile="${build.file}"
                                target="dump"
                                uselogger="true" />
                </schedule>    
 
                <publishers>
                        <htmlemail mailhost="smtpemail.host.com"
                                           returnaddress="${machine.name}@myemail.com"
                                           buildresultsurl="${webbase}cruisecontrol/buildresults/${project.name}"
                                           logdir="${cruise.dir}logs\${project.name}"
                                           xsldir="${xsl.dir}"
                                           failasimportant="true"
                                           css="${css.dir}cruisecontrol.css"
                                           defaultsuffix="@myemail.com">
                           <ignore user="User" />
                           <always address="user1" />
                           <always address="user2" />                     
                        </htmlemail>
                       
                        <artifactspublisher
                                dir="${antroot.dir}backup_dir"
                                dest="artifacts/${project.name}"
                                moveInsteadOfCopy="true"
                        />                     
                </publishers>          
        </project>
</cruisecontrol>
 
Ant config:
 
<?xml version="1.0" encoding="UTF-8"?>
 
<project name="system-orion" default="test">
        <property name="pgdump.exe" value="C:\Program Files\PostgreSQL\9.0\bin\pg_dump.exe" />
        <property name="db.name" value="testruns" />
        <property name="backup.format" value="Fc" />
        <property name="temp.dir" value="backup_dir" />
       
        <taskdef resource="net/sf/antcontrib/antlib.xml">
          <classpath>
                <pathelement location="antcontrib\ant-contrib-1.0b3.jar"/>
          </classpath>
        </taskdef>
       
        <target name="echo-filesize">
                <length file="${file.name}" property="file.length.bytes" />
 
                <script language="javascript">
                <![CDATA[
                        var length_bytes = project.getProperty("file.length.bytes");
                        var length_kbytes = Math.round((length_bytes / 1024) * Math.pow(10,2))
                                                          / Math.pow(10,2);
                        var length_mbytes = Math.round((length_kbytes / 1024) * Math.pow(10,2))
                                                          / Math.pow(10,2);
                        project.setNewProperty("fs.length.kb", length_kbytes);
                        project.setNewProperty("fs.length.mb", length_mbytes);
                ]]>
                </script>
               
                <echo message="${file.name} is ${fs.length.kb} kilobtyes ( ${fs.length.mb} megabytes )"/>
        </target>
       
        <target name="dump" >
                <tstamp><format property="time.stamp" pattern="MM-dd-yyyy_hh-mm-ss" /></tstamp>
                <property name="backup.name" value="${backup.format}_DB_${db.name}_ON_${time.stamp}.backup" />
                <delete dir="${temp.dir}" />
                <mkdir dir="${temp.dir}" />
                <property name="file.name" value="${temp.dir}\${backup.name}" />
                <exec executable="${pgdump.exe}" failonerror="true">
                        <arg value="-${backup.format}" />
                        <arg value="--username=username" />
                        <arg value="--file=${file.name}" />
                        <arg value="${db.name}" />
                </exec>
                <antcall target="echo-filesize" />
        </target>
</project>
 
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Aleksey Tsalolikhin
Sent: Thursday, February 24, 2011 5:39 PM
To: pgsql-general
Subject: [GENERAL] pg_dump makes our system unusable - any way to pg_dump in the middle of the day? (postgres 8.4.4)
 
I read in the pg_dump man page that pg_dump does not block other users accessing the database (readers or writers).
 
In practice, if I pg_dump our 100 GB database, our application, which is half Web front end and half OLTP, at a certain point, slows to a crawl and the Web interface becomes unresponsive.  I start getting check_postgres complaints about number of locks and query lengths.  I see locks around for over 5 minutes.
 
I've had to abort pg_dump as it made the system unusable.
 
Can I pg_dump without breaking the system?
 
Best,
-at
 
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
 
On 02/24/11 7:09 PM, Adam Bruss wrote:
> I'm using cruisecontrol and ant to back up our database at certain
> times on certain days of the week. Cruisecontrol sends out completion
> emails when it's done. I don't think pgsql has a good built in way to
> schedule backups. Cruisecontrol offers supreme flexibility with
> relatively little work.
>

wow, thats a complicated lotta stuff to do what crontab would do with a
oneliner

0 6 * * 3,6 pg_dump <args to dump>

(06:00 on wed,sat)

or the windows task scheduler equivalent.



I like being emailed when my job runs. It tells me how big the backup is and whether it passed or failed. I use
cruisecontroland ant on a regular basis so it was a natural choice. The most time consuming part was learning how to
usepg_dump. 

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
Sent: Thursday, February 24, 2011 10:22 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] pg_dump makes our system unusable - any way to pg_dump in the middle of the day? (postgres
8.4.4)

On 02/24/11 7:09 PM, Adam Bruss wrote:
> I'm using cruisecontrol and ant to back up our database at certain
> times on certain days of the week. Cruisecontrol sends out completion
> emails when it's done. I don't think pgsql has a good built in way to
> schedule backups. Cruisecontrol offers supreme flexibility with
> relatively little work.
>

wow, thats a complicated lotta stuff to do what crontab would do with a oneliner

0 6 * * 3,6 pg_dump <args to dump>

(06:00 on wed,sat)

or the windows task scheduler equivalent.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

On Thu, Feb 24, 2011 at 6:38 PM, Aleksey Tsalolikhin
<atsaloli.tech@gmail.com> wrote:
> In practice, if I pg_dump our 100 GB database, our application, which
> is half Web front end and half OLTP, at a certain point, slows to a
> crawl and the Web interface becomes unresponsive.  I start getting
> check_postgres complaints about number of locks and query lengths.  I
> see locks around for over 5 minutes.
>

I'd venture to say your system does not have enough memory and/or disk
bandwidth, or your Pg is not tuned to make use of enough of your
memory.  The most likely thing is that you're saturating your disk
I/O.

Check the various system statistics from iostat and vmstat to see what
your baseline load is, then compare that when pg_dump is running.  Are
you dumping over the network or to the local disk as well?

On 2/25/2011 7:26 AM, Vick Khera wrote:
> On Thu, Feb 24, 2011 at 6:38 PM, Aleksey Tsalolikhin
> <atsaloli.tech@gmail.com>  wrote:
>> In practice, if I pg_dump our 100 GB database, our application, which
>> is half Web front end and half OLTP, at a certain point, slows to a
>> crawl and the Web interface becomes unresponsive.  I start getting
>> check_postgres complaints about number of locks and query lengths.  I
>> see locks around for over 5 minutes.
>>
>
> I'd venture to say your system does not have enough memory and/or disk
> bandwidth, or your Pg is not tuned to make use of enough of your
> memory.  The most likely thing is that you're saturating your disk
> I/O.
>
> Check the various system statistics from iostat and vmstat to see what
> your baseline load is, then compare that when pg_dump is running.  Are
> you dumping over the network or to the local disk as well?
>

Agreed...  additionally, how much of that 100GB is actually changing?
You are probably backing up the same thing over and over.  Maybe some
replication or differential backup would make your backup's smaller and
easier on your IO.

-Andy

I'd also add: run pg_tune on your server. Made a *dramatic* difference for us.

On Friday, February 25, 2011 05:26:56 am Vick Khera wrote:

> On Thu, Feb 24, 2011 at 6:38 PM, Aleksey Tsalolikhin

>

> <atsaloli.tech@gmail.com> wrote:

> > In practice, if I pg_dump our 100 GB database, our application, which

> > is half Web front end and half OLTP, at a certain point, slows to a

> > crawl and the Web interface becomes unresponsive.  I start getting

> > check_postgres complaints about number of locks and query lengths.  I

> > see locks around for over 5 minutes.

>

> I'd venture to say your system does not have enough memory and/or disk

> bandwidth, or your Pg is not tuned to make use of enough of your

> memory. The most likely thing is that you're saturating your disk

> I/O.

>

> Check the various system statistics from iostat and vmstat to see what

> your baseline load is, then compare that when pg_dump is running. Are

> you dumping over the network or to the local disk as well?


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Our disk service times and % utilization (according to sar -d) while
running pg_dump are low.  For example:

01:23:08 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz
avgqu-sz     await     svctm     %util
01:23:09 AM       sda   1473.00      0.00  98128.00     66.62
0.41      0.28      0.03      3.70
01:23:09 AM      sda1      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00
01:23:09 AM      sda2   1473.00      0.00  98128.00     66.62
0.41      0.28      0.03      3.70
01:23:09 AM       sdb      1.00      0.00     16.00     16.00
0.00      3.00      3.00      0.30
01:23:09 AM      sdb1      1.00      0.00     16.00     16.00
0.00      3.00      3.00      0.30
01:23:09 AM       sdc      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00
01:23:09 AM       sdd    678.00 161584.00      0.00    238.32
0.34      0.49      0.45     30.20
01:23:09 AM      sdd1    678.00 161584.00      0.00    238.32
0.34      0.49      0.45     30.20
01:23:09 AM     nodev  12266.00      0.00  98128.00      8.00
4.30      0.35      0.00      3.70
01:23:09 AM     nodev      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00



On Fri, Feb 25, 2011 at 3:50 PM, Benjamin Smith
<lists@benjamindsmith.com> wrote:
> I'd also add: run pg_tune on your server. Made a *dramatic* difference for
> us.

Well, that's interesting.  I am going to try that, thanks!!

Aleksey.