Thread: Implementing cost limit/delays for insert/delete/update/select

Implementing cost limit/delays for insert/delete/update/select

From
Peter Schuller
Date:
Hello,

I'd like to have a stab at implementing cost delays, for regular
INSERT/DELETE/UPDATE/SELECT. The motivation is roughly the same as for
VACUUM and the autovacuum limits; one may have application specific
bulk operations that need executing without adverseley affecting
latency/throughput of other operations.

I tentatively call this executing statements "nicely". A better naming
scheme might be a good idea...

The idea would be to introduce two GUC variables:
 - nice_cost_limit - nice_cost_delay

Which would be equivalent to their vacuum_* counterparts.

Upon executing an INSERT, UPDATE, DELETE or SELECT, one would
optionally specify a "NICELY" modifier to enable nice cost limits for
that statement. For example:
  DELETE NICELY FROM large_table WHERE id < 50000000

In the future I foresee also specifying a nice multiplier of some
kind, thus supporting variable niceness on a per-statement basis.

To avoid complexity, the initial version would not contain anything
similar to the autovacuum balancing of costs between separate vacuum
processes. The cost accounting would be entirely local to each
backend.

Firstly, any opinions on whether this is a good idea, or on whether
the above suggestions sound sensible?

Implementation:

Although the current cost delay support, in terms of variable naming,
suggest it is specific to vacuuming, I haven't found anything that is
really specific to this. As far as I can tell, an implementaiton would
entail:

* Adding the GUC variables
* Modifying the parser slightly to support the NICELY "modifier" (terminology?)
* Modify ExecPlan in backend/executor/execMain.c to contain accounting initialization and cleanup like
backend/commands/vacuum.c'svacuum(). 
* Create an equivalent of the vacuum_delay_point() and call it in each loop iteration in ExecPlan().

And then costmetically, probably rename various things so that
non-vacuum specific cost accounting is no longer named as if it were.

Does this sound vaguely sensible? Is there an obvious show-stopper I
am missing?

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Re: Implementing cost limit/delays for insert/delete/update/select

From
Gregory Stark
Date:
"Peter Schuller" <peter.schuller@infidyne.com> writes:

> Hello,
>
> I'd like to have a stab at implementing cost delays, for regular
> INSERT/DELETE/UPDATE/SELECT. The motivation is roughly the same as for
> VACUUM and the autovacuum limits; one may have application specific
> bulk operations that need executing without adverseley affecting
> latency/throughput of other operations.
>
> I tentatively call this executing statements "nicely". A better naming
> scheme might be a good idea...
>
> The idea would be to introduce two GUC variables:
>
>   - nice_cost_limit
>   - nice_cost_delay

I think the experience with vacuum was that cost_delay was a mistake. The only
parameter users really ought to be messing with is cost_limit. Every time a
user has posted about vacuum taking interminably long it was because they set
*_cost_delay to something unreasonable. I suppose this could be selection bias
since we would never hear about users who didn't set it unreasonably high.

But I think we should consider removing the {auto,}vacuum_cost_delay parameter
or at least hiding and undocumenting it. It's a foot-gun and serves no useful
purpose that merely lowering the {auto,}vacuum_cost_limit can't serve equally
well.

> Which would be equivalent to their vacuum_* counterparts.
>
> Upon executing an INSERT, UPDATE, DELETE or SELECT, one would
> optionally specify a "NICELY" modifier to enable nice cost limits for
> that statement. For example:
>
>    DELETE NICELY FROM large_table WHERE id < 50000000

Why not just have the GUC and leave it at that?

SET nice_cost_limit = ...
DELETE FROM ...
SET nice_cost_limit = ...
UPDATE ...
...

> In the future I foresee also specifying a nice multiplier of some
> kind, thus supporting variable niceness on a per-statement basis.

Yeah, actually I think both the vacuum and this parameter need some further
thought. They don't represent any sort of real world parameter that the user
has any hope of knowing how to set except by trial-and-error.

I think we would be better off with something like a vacuum_io_bandwidth_cap
or something like that. Then the user has a hope of understanding what kind of
numbers make sense.

> * Adding the GUC variables

> * Modifying the parser slightly to support the NICELY "modifier"
>   (terminology?)

As I mentioned I don't think this is necessary.

> * Modify ExecPlan in backend/executor/execMain.c to contain accounting
>   initialization and cleanup like backend/commands/vacuum.c's vacuum().
> * Create an equivalent of the vacuum_delay_point() and call it in each
>   loop iteration in ExecPlan().

ExecutePlan? That's not often enough. You can easily construct plans that do
massive sequential scans on the inner side of a join or in a subquery -- all
of which happens before a single record is returned from ExecutePlan for a.
You would have to test for whether it's time to sleep much more often.
Possibly before every ExecProcNode call would be enough.

Even then you have to worry about the i/o and cpu resources used by by
tuplesort. And there are degenerate cases where a single ExecProcNode could do
a lot of i/o such as a large scan looking for a single matching record.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!
 


Re: Implementing cost limit/delays for insert/delete/update/select

From
Joshua Drake
Date:
On Mon, 25 Aug 2008 22:39:54 +0100
Gregory Stark <stark@enterprisedb.com> wrote:
> But I think we should consider removing the {auto,}vacuum_cost_delay
> parameter or at least hiding and undocumenting it. It's a foot-gun
> and serves no useful purpose that merely lowering the
> {auto,}vacuum_cost_limit can't serve equally well.

I thought we were already considering some kind of IO tweak for Vacuum,
e.g; you may use up to 5Mb/s or something like that?

Sincerely,

Joshua D. Drake
-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: Implementing cost limit/delays for insert/delete/update/select

From
Peter Schuller
Date:
Btw, I forgot to mention in my original post that one interesting use
case that is not application specific, is to apply cost limits on
pg_dump. That would be one of the final goals for me.

> I think the experience with vacuum was that cost_delay was a mistake. The only
> parameter users really ought to be messing with is cost_limit. Every time a
> user has posted about vacuum taking interminably long it was because they set
> *_cost_delay to something unreasonable. I suppose this could be selection bias
> since we would never hear about users who didn't set it unreasonably high.
>
> But I think we should consider removing the {auto,}vacuum_cost_delay parameter
> or at least hiding and undocumenting it. It's a foot-gun and serves no useful
> purpose that merely lowering the {auto,}vacuum_cost_limit can't serve equally
> well.

Sounds sensible to me. I included nice_cost_delay in this case to
remain consistent with the others.

> >    DELETE NICELY FROM large_table WHERE id < 50000000
>
> Why not just have the GUC and leave it at that?
>
> SET nice_cost_limit = ...
> DELETE FROM ...
> SET nice_cost_limit = ...
> UPDATE ...
> ...

Sounds a lot cleaner than introducing new syntax, yes.

Leaving it with GUC only does mean the submitter must choose a value,
and cannot just indicate "whichever the administrator chose to be
sensible". Perhaps have a separate boolean cost_limit flag that would
allow just turning it on, without specifying actual limits?

> I think we would be better off with something like a vacuum_io_bandwidth_cap
> or something like that. Then the user has a hope of understanding what kind of
> numbers make sense.

Another option might be to give page_miss and friends an actual unit
that is meaningful, such as the expected worst-case "device time"
required to perform the I/O operation (tweaked on a per-device
basis). One could then specify the maximum vacuum cost in terms of
percentage of real time spend on vacuum related I/O.

> ExecutePlan? That's not often enough. You can easily construct plans that do
> massive sequential scans on the inner side of a join or in a subquery -- all
> of which happens before a single record is returned from ExecutePlan for a.
> You would have to test for whether it's time to sleep much more often.
> Possibly before every ExecProcNode call would be enough.
>
> Even then you have to worry about the i/o and cpu resources used by by
> tuplesort. And there are degenerate cases where a single ExecProcNode could do
> a lot of i/o such as a large scan looking for a single matching record.

Ok - I obviously need to look at these parts more carefully. Thanks
for the feedback!

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


Re: Implementing cost limit/delays for insert/delete/update/select

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> You would have to test for whether it's time to sleep much more often.
> Possibly before every ExecProcNode call would be enough.

That would have overhead comparable to EXPLAIN ANALYZE, which is a lot.

I'm fairly dubious about this whole proposal: it's not clear to me that
the vacuum delay stuff works very well at all, and to the extent that it
does work it's because vacuum has such stylized, predictable behavior.
The same can't be said of general SQL queries.  For one thing, it's
not apparent that rate-limiting I/O would be sufficient, because
although vacuum is nearly always I/O bound, general queries often are
CPU bound; or their system impact might be due to other factors like
contention for shared-memory data structures.  Priority inversion is
a pretty serious concern as well (ie, a sleeping "low priority" query
might be blocking other queries).

I think this proposal is likely to lead to a large, invasive patch that
isn't actually very useful in the real world.

Oh, and I think the suggestion to modify SQL syntax for this is right
out.  A GUC setting might be a sane API.
        regards, tom lane


Re: Implementing cost limit/delays for insert/delete/update/select

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> You would have to test for whether it's time to sleep much more often.
>> Possibly before every ExecProcNode call would be enough.
>
> That would have overhead comparable to EXPLAIN ANALYZE, which is a lot.

I don't think the overhead would be anywhere near as much. He wouldn't have to
call gettimeofday() (except just after sleeping) which is what hurts EXPLAIN
ANLAYZE so much. Just count i/o operations.

In any case, if the point is to call sleep how much would do we care about
overhead? Do you think it would consume so much extra cpu that even after
sleeping for 10ms it would be a big resource drain for other processes?

The point about the invasiveness remains. If you want to count i/o operations
it means touching a lot of lower-level code. I'm interested in doing that
anyways to get better monitoring and profiling data though.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!
 


Re: Implementing cost limit/delays for insert/delete/update/select

From
Simon Riggs
Date:
On Mon, 2008-08-25 at 22:39 +0200, Peter Schuller wrote:

> Does this sound vaguely sensible? Is there an obvious show-stopper I
> am missing?

This was a well structured proposal.

The main problem is where you put the delay_point() calls. If you put
them at the top of the executor then you will get a delay proportional
to the number of rows retrieved. For many queries, such as count(*) this
might be just one row, yet have run for hours. There's no point having a
priority scheme if it doesn't apply to all queries equally.

If you put them at each call of each node then you will get an
unacceptable overhead as Tom suggests.

Not sure what to suggest, if anything, apart from just writing your own
delay() function and using it in your query.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Implementing cost limit/delays for insert/delete/update/select

From
Peter Schuller
Date:
Hello,

[ I have not yet had the time to look at code again in response to
some of the points raised raised by several people; but I wanted to
follow-up somewhat still on other bits. ]

> > You would have to test for whether it's time to sleep much more often.
> > Possibly before every ExecProcNode call would be enough.
>
> That would have overhead comparable to EXPLAIN ANALYZE, which is a lot.
>
> I'm fairly dubious about this whole proposal: it's not clear to me that
> the vacuum delay stuff works very well at all, and to the extent that it
> does work it's because vacuum has such stylized, predictable behavior.

Well, it definitely works well enough to make a large difference in my
use cases. In particular with respect to the amount of write activity
generated which easily causes latency problems. That said, it remains
to be seen how much of an issue heavy write activity will be once
upgraded to 8.3 and after tweaking the Linux buffer cache.

Right now, I do not expect the database to be even useful in one of my
use cases, if it were not for delay points during vacuuming.

So although I make no argument as to whether it works better due to
the limited and understood nature of vacuuming, it is definitely an
appreciate feature for my use cases.

> The same can't be said of general SQL queries.  For one thing, it's
> not apparent that rate-limiting I/O would be sufficient, because
> although vacuum is nearly always I/O bound, general queries often are
> CPU bound; or their system impact might be due to other factors like
> contention for shared-memory data structures.

In my case I mostly care about I/O. I believe that this is going to be
a fairly common fact with anyone whose primary concern is latency.

The good part about CPU contention is that it is handled quite well by
modern operating systems/hardware. Even on a single-core machine, a
single CPU bound query should still only have a percentage-wise
throughput impact on other traffic (normally; of course you might have
some particularly bad contention on some resource, etc). If your
database is very sensitive to latency, you are likely running it at
far below full throughput, meaning that there should be quite a bit of
margin in terms of CPU.

This would be especially true on multi-core machines where the impact
of a single backend is even less.

The problem I have with I/O is that saturating I/O, in particular with
writes, has all sorts of indirect effects that are difficult to
predict, and are not at all guaranteed to translate into a simple
percentage-wise slow-down. For example, I've seen stalls lasting
several *minutes* due to a bulk DELETE of a million rows or so. With
mixed random-access writes, streaming writes, and the PG buffer cache,
the operating system buffer cache, and the RAID controller's cache, it
is not at all unlikely that you will have significant latency problems
when saturating the system with writes.

So recognizing that I am not likely to ever have very good behavior
while saturating the storage system with writes, I instead want to
limit the write activity generated to a sensible amount (preferably
such that individual bursts are small enough to e.g. fit in a RAID
controller cache). This reduces the problem of ensuring good behavior
with respect to short burst of writes and their interaction with
checkpoints, which is a much easier problem than somehow ensuring
"fairness" under write-saturated load.

So that is where my motivation comes from; in more or less all my use
cases, limiting disk I/O is massively more important than limiting CPU
usage.

On this topic, I have started thinking again about direct I/O. I asked
about this on -performance a while back in a different context and it
seemed there was definitely no clear concensus that one "should" have
direct I/O. That seemed to be mostly due to a perceived lack of
increase in throughput. However my gut feel is that by bypassing the
OS buffer cache, you could significantly improve real-time/latency
sensitive aspects in PostgreSQL in cases where throughput is not your
primary concern.

Perhaps something like that would be a more effective approach.

> Priority inversion is
> a pretty serious concern as well (ie, a sleeping "low priority" query
> might be blocking other queries).

I presume this is in reference to bulk modifications (rather than
selects) blocking other transactions with conflicting updates?

If so, yes I see that. The feature would be difficult to use reliably
for writes except in very controlled situations (in my particular
use-case that I am tunnel vision:ing on, it is guaranteed that there
is no conflict due to the nature of the application).

But is my understanding correct that there is no reason to believe
there are such issues for read-only queries, or queries that do not
actually conflict (at the SQL level) with concurrent transactions?

(Ignoring the impact it might have on old transactions hanging around
for a longer time.)

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org