Home > mailing lists

Re: backups blocking everything - Mailing list pgsql-performance

From	Merlin Moncure
Subject	Re: backups blocking everything
Date	October 28, 2011 18:20:43
Msg-id	CAHyXU0w8HWqWAvcTDckhkyMoQsVCadKofw4qLDdmxgh-2hSxgQ@mail.gmail.com Whole thread
In response to	Re: backups blocking everything (Samuel Gendler <sgendler@ideasculptor.com>)
Responses	Re: backups blocking everything
List	pgsql-performance

Tree view

On Thu, Oct 27, 2011 at 6:29 PM, Samuel Gendler
<sgendler@ideasculptor.com> wrote:
>
> On Thu, Oct 27, 2011 at 2:15 PM, Samuel Gendler <sgendler@ideasculptor.com>
> wrote:
>>
>> There are definitely no bloated tables.  The large tables are all
>> insert-only, and old data is aggregated up and then removed by dropping
>> whole partitions.  There should be no bloat whatsoever.  The OLTP side of
>> things is pretty minimal, and I can pg_dump those schemas in seconds, so
>> they aren't the problem, either.  I don't know what the I/O utilization is
>> during the dump, offhand. I'll be doing a more thorough investigation
>> tonight, though I suppose I could go look at the monitoring graphs if I
>> weren't in the middle of 6 other things at the moment.  the joys of startup
>> life.
>
>
> Does pg_dump use work_mem, maintenance_work_mem, or both?  I'm seeing a huge
> spike in swap-in during the period when I can't get into the db, then a
> little bit of swap out toward the end.  We've got very little OLTP traffic -
> like one or two users logged in and interacting with the system at a time,
> at most, so I've got work_mem set pretty high, as most of our reporting
> queries do large aggregations that grind to a halt if they go to disk.
> Besides, we've got nearly 200GB of RAM.  But it would seem that pg_dump is
> allocating a large number of work_mem (or maintenance_work_mem) segments.
> # show work_mem;
>  work_mem
> ----------
>  512MB
> (1 row)
> # show maintenance_work_mem;
>  maintenance_work_mem
> ----------------------
>  2GB
> To be honest, I'm not entirely certain how to interpret some of the graphs
> I'm looking at in this context.
> here are some pictures of what is going on. The db monitoring itself goes
> away when it eats all of the connections, but you can see what direction
> they are headed and the values when it finally manages to get a connection
> again at the end of the period.  All of the other numbers are just host
> monitoring, so they are continuous through the shutout.
> Memory usage on the host (shared buffers is set to 8GB):
> http://photos.smugmug.com/photos/i-sQ4hVCz/0/L/i-sQ4hVCz-L.png
> Swap Usage:
> http://photos.smugmug.com/photos/i-T25vcZ2/0/L/i-T25vcZ2-L.png
> Swap rate:
> http://photos.smugmug.com/photos/i-WDDcN9W/0/L/i-WDDcN9W-L.png
> CPU utilization:
> http://photos.smugmug.com/photos/i-4xkGqjB/0/L/i-4xkGqjB-L.png
> Load Average:
> http://photos.smugmug.com/photos/i-p4n94X4/0/L/i-p4n94X4-L.png
> disk IO for system disk (where the backup is being written to):
> http://photos.smugmug.com/photos/i-gbCxrnq/0/M/i-gbCxrnq-M.png
> disk IO for WAL volume:
> http://photos.smugmug.com/photos/i-5wNwrDX/0/M/i-5wNwrDX-M.png
> disk IO for data volume:
> http://photos.smugmug.com/photos/i-r7QGngG/0/M/i-r7QGngG-M.png
> Various postgres monitors - the graph names are self explanatory:
> http://photos.smugmug.com/photos/i-23sTvLP/0/M/i-23sTvLP-M.png
> http://photos.smugmug.com/photos/i-73rphrf/0/M/i-73rphrf-M.png
> http://photos.smugmug.com/photos/i-rpKvrVJ/0/L/i-rpKvrVJ-L.png
> http://photos.smugmug.com/photos/i-QbNQFJM/0/L/i-QbNQFJM-L.png

hrm -- it doesn't look like you are i/o bound -- postgres is
definitely the bottleneck.  taking a dump off of production is
throwing something else out of whack which is affecting your other
processes.

band aid solutions might be:
*) as noted above, implement hot standby and move dumps to the standby
*) consider adding connection pooling so your system doesn't
accumulate N processes during dump

a better diagnosis might involve:
*) strace of one of your non-dump proceses to see where the blocking
is happening
*) profiling one of your user processes and compare good vs bad time

Is there anything out of the ordinary about your application that's
worth mentioning?  using lots of subtransactions? prepared
transactions? tablespaces? huge amounts of tables? etc?

Have you checked syslogs/dmesg/etc for out of the ordinary system events?

merlin

pgsql-performance by date:

From: David Boreham
Date: 28 October 2011, 16:55:55
Subject: Re: WAL in RAM

From: Samuel Gendler
Date: 28 October 2011, 20:51:15
Subject: Re: backups blocking everything

Re: backups blocking everything - Mailing list pgsql-performance

Previous

Next