Re: pg_dump & performance degradation - Mailing list pgsql-hackers

From Philip Warner
Subject Re: pg_dump & performance degradation
Date
Msg-id 3.0.5.32.20000729155817.0203a210@mail.rhyme.com.au
Whole thread Raw
In response to Re: pg_dump & performance degradation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
At 00:57 29/07/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> The plan was for the user to specify a single number that was the ratio of
>> time spent sleeping to the time spent 'working' (ie. reading COPY lines).
>
>> In the ordinary case this value would be 0 (no sleep), and for a very low
>> load model it might be as high as 10 - for every 100ms spent working it
>> spends 1000ms sleeping.
>
>> This was intended to handle the arbitrary speed variations that occur when
>> reading, eg, large toasted rows and reading lots of small normal rows.
>
>But ... but ... you have no idea at all how much time the backend has
>expended to provide you with those rows, nor how much of the elapsed
>time was used up by unrelated processes.

True & true.

But where the time was used is less important to me; if it was used by PG,
or by another process, then it still means that there was a consumer who I
was fighting. All I am trying to do is prevent the consumption of all
available resources by pg_dump. I realize that this is totally opposed to
the notion of a good scheduler, but it does produce good results for me:
when I put delays in pg_dump, I couldn't really tell (from the system
performance) that backup was running.


> It's pointless to suppose
>that you are regulating system load this way --- 

That is true, but what I am regulating is consumption of available
resources (except in the case of delays caused by excessive lock contention).

For the most part, my backups go to 100%CPU, and huge numbers of I/Os. Most
importantly this affects web server response times as well as the time
taken for 'production' database queries (usually via the web).

With a backup process that sleeps, I get free CPU time & I/Os for
opportunistic processes (web servers & db queries), and a backup that takes
more time. This seems like a Good Thing.

Since backups on VMS never cause this sort of problem, I assume I am just
battling the Linux scheduler, rather than a deficiency in Postgres. Maybe
things would be different if I could set the priority on the backend from
the client...that might bear thinking about, but for R/W transactions it
would be a disaster to allow setting of priorities of backend processes.


>and I maintain that
>system load is what the dbadmin would really like to regulate.

In my case, because the scheduler does not cope well at 100% load, I think
I need to keep some resources in reserve. But I agree in principal. 


>You may as well keep it simple and not introduce unpredictable
>dependencies into the behavior of the feature.

This is certainly still an option; I might base the choice on some
empirical tests. I get very different results between a large table with
many columns and a large table with a small number of columns. I'll have to
keep investigating the causes.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


pgsql-hackers by date:

Previous
From: Denis Perchine
Date:
Subject: Re: Fwd: Postgres update
Next
From: "Robert B. Easter"
Date:
Subject: Re: [GENERAL] Database Diagram Drawing Tools ?