Thread: pg_dumpall affecting performance
Hello, I was under the impression that pg_dumpall didn't affect database performance when dumping while the db is live. However I have evidence to the contrary now - queries that are run during the pg_dumpall time take 10 to a 100 times longer to execute than normal while pg_dumpall is running. The strange thing is that this started after my database grew by about 25% after a large influx of data due to user load. I'm wonder if there is a tipping point or a config setting I need to change now that the db is larger that is causing all this to happen. Thanks, Mark
I was always under the impression that pg_dump and pg_dumpall cause all data to be read in to the buffers and then out, (ofcourse squeezing out whatever may be active). That is the big advantage to using PITR backups and using a tar or cpiomethod of backing up active containers and shipping off to another system, disk or api to tape system. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mark Mikulec Sent: Tuesday, February 15, 2011 12:41 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] pg_dumpall affecting performance Hello, I was under the impression that pg_dumpall didn't affect database performance when dumping while the db is live. HoweverI have evidence to the contrary now - queries that are run during the pg_dumpall time take 10 to a 100 times longerto execute than normal while pg_dumpall is running. The strange thing is that this started after my database grew by about 25% after a large influx of data due to user load.I'm wonder if there is a tipping point or a config setting I need to change now that the db is larger that is causingall this to happen. Thanks, Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
On 02/15/2011 10:41 AM, Mark Mikulec wrote: > Hello, > > I was under the impression that pg_dumpall didn't affect database > performance when dumping while the db is live. However I have evidence to > the contrary now - queries that are run during the pg_dumpall time take 10 > to a 100 times longer to execute than normal while pg_dumpall is running. > The strange thing is that this started after my database grew by about 25% > after a large influx of data due to user load. I'm wonder if there is a > tipping > point or a config setting I need to change now that the db is larger that > is > causing all this to happen. > Don't know where that impression came from. It is true that you can continue to *use* your database normally while running a dump but you are reading the entire database and either transmitting it over the network or writing it to a local drive so it shouldn't be surprising that performance is impacted. There are tipping points - one big one is when you move from having all your data in RAM to needing to read disk. And it can be a whopper. If all your information, through PG or OS caching is in RAM then your dumps may run very quickly. The moment you cross the point that things don't quite fit you can see a sharp decline. Consider a least-recently-used algorithm and a very simplistic scenario. You read the "start" data. It isn't cached so you go to disk *and* you put those blocks into cache pushing others than you would need later out of cache. This continues and you potentially end up having to read everything from disk plus incur the overhead of checking and updating the cache. Meanwhile, the data you needed for your query may have been pushed out of cache so there is more contention for disk. Admittedly an over-simplified example but you see the problem. Cheers, Steve
Mark Mikulec <mark@anticentertainment.com> wrote: > The strange thing is that this started after my database grew by > about 25% after a large influx of data due to user load In addition to the issues already mentioned, there is the fact that to maintain consistency an entire database must be dumped in a single database transaction with one snapshot. This means that garbage collection can't run, which may lead to bloat under some circumstances. This may be why your database grew by 25%. If that bloat is concentrated in a small number of tables, you may want to schedule aggressive maintenance (like CLUSTER) on those tables. One other factor which can affect running applications is the table locks which the dump must hold. You might want to look into PITR backup techniques, or streaming replication on 9.0 -Kevin