Re: Implementing cost limit/delays for insert/delete/update/select - Mailing list pgsql-hackers

From Peter Schuller
Subject Re: Implementing cost limit/delays for insert/delete/update/select
Date
Msg-id 20080825215936.GA2658@hyperion.scode.org
Whole thread Raw
In response to Re: Implementing cost limit/delays for insert/delete/update/select  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
Btw, I forgot to mention in my original post that one interesting use
case that is not application specific, is to apply cost limits on
pg_dump. That would be one of the final goals for me.

> I think the experience with vacuum was that cost_delay was a mistake. The only
> parameter users really ought to be messing with is cost_limit. Every time a
> user has posted about vacuum taking interminably long it was because they set
> *_cost_delay to something unreasonable. I suppose this could be selection bias
> since we would never hear about users who didn't set it unreasonably high.
>
> But I think we should consider removing the {auto,}vacuum_cost_delay parameter
> or at least hiding and undocumenting it. It's a foot-gun and serves no useful
> purpose that merely lowering the {auto,}vacuum_cost_limit can't serve equally
> well.

Sounds sensible to me. I included nice_cost_delay in this case to
remain consistent with the others.

> >    DELETE NICELY FROM large_table WHERE id < 50000000
>
> Why not just have the GUC and leave it at that?
>
> SET nice_cost_limit = ...
> DELETE FROM ...
> SET nice_cost_limit = ...
> UPDATE ...
> ...

Sounds a lot cleaner than introducing new syntax, yes.

Leaving it with GUC only does mean the submitter must choose a value,
and cannot just indicate "whichever the administrator chose to be
sensible". Perhaps have a separate boolean cost_limit flag that would
allow just turning it on, without specifying actual limits?

> I think we would be better off with something like a vacuum_io_bandwidth_cap
> or something like that. Then the user has a hope of understanding what kind of
> numbers make sense.

Another option might be to give page_miss and friends an actual unit
that is meaningful, such as the expected worst-case "device time"
required to perform the I/O operation (tweaked on a per-device
basis). One could then specify the maximum vacuum cost in terms of
percentage of real time spend on vacuum related I/O.

> ExecutePlan? That's not often enough. You can easily construct plans that do
> massive sequential scans on the inner side of a join or in a subquery -- all
> of which happens before a single record is returned from ExecutePlan for a.
> You would have to test for whether it's time to sleep much more often.
> Possibly before every ExecProcNode call would be enough.
>
> Even then you have to worry about the i/o and cpu resources used by by
> tuplesort. And there are degenerate cases where a single ExecProcNode could do
> a lot of i/o such as a large scan looking for a single matching record.

Ok - I obviously need to look at these parts more carefully. Thanks
for the feedback!

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


pgsql-hackers by date:

Previous
From: Joshua Drake
Date:
Subject: Re: Implementing cost limit/delays for insert/delete/update/select
Next
From: Grant Finnemore
Date:
Subject: Proposal to sync SET ROLE and pg_stat_activity