Re: XPRS - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: XPRS
Date
Msg-id 20190903103105.gvijxkwl5z4rk22t@development
Whole thread Raw
In response to Re: XPRS  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Tue, Sep 03, 2019 at 11:04:43AM +1200, Thomas Munro wrote:
>On Tue, Sep 3, 2019 at 5:20 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> FWIW it's not clear to me why the cost would need to be recomputed after
>> constructing the parallel version of the plan? My understanding is that
>> the idea is to do cost-based planning for the serial plan, and then just
>> "mechanically" construct a parallel plan. Although, maybe there could be
>> multiple parallel alternatives ...
>
>Presumably you still need to choose between the serial and parallel
>plans by comparing costs.  You lose some by adding exchange operators,
>but you win some by dividing cardinality estimates.
>

Oh, right. Silly me.

>> >This is sort of fudging together of ideas from conversations with
>> >Kevin Grittner (who talked about admission control a few years back),
>> >Peter Geoghegan (who mentioned opportunistically asking for more), and
>> >things I've heard of on SQL Server ("memory grants").  I think it
>> >would provide some relief from the problems we see today: it's hard to
>> >set work_mem so that you never get OOM but you can still use a decent
>> >amount of your precious memory, especially with mixed parallel and
>> >non-parallel query workloads thanks to our current
>> >work_mem-multiplying design.
>>
>> I think this is probably the simplest and most realistic first step.
>>
>> Whenever I was thinking about memory acquisition, I've assumed we'd
>> monitor how much memory the plan is expected to use while we're
>> constructing it. My main problem was what to do when we reach the
>> per-query limit - whether to (a) simply reject the plan, (b) go back and
>> see if we can replan with lower work_mem (but how much and for which
>> nodes?), or (c) just continue.
>
>Yeah, it's all quite tricky and circular.  But I'm pretty sure that we
>need caps at execution time, anyway, so I think it's OK to start at
>that end of the problem and then later try to improve the way the
>planner.
>

True.

>> The proposed plan deals with this by not limiting the per-query (or rather
>> per-session) budget directly, and instead requesting requesting additional
>> budget. Which is nice.
>>
>> I suspect we should also keep an additional plan that is expected to meet
>> the session_work_mem limit, aside from the regular cheapest plan, and use
>> it if it's not much worse. Imagine you have a plan with cost 1000 that
>> needs (global_work_mem/2 + 1kB) memory, essentially serializing executions
>> of this query. And then there's an alternative plan with cost 1100 that
>> can run with session_work_mem. It seems better to just accept the second
>> plan, because it won't need to wait.
>
>Hmm.  I wonder if it's worth it.  You could also just replan as you
>said, but I'm wondering if just rejecting the query would be OK.
>

I think we should not reject queries unnecessarily, if there's a workable
execution plan. It's just another optimization criteria, and erroring out
right after planning is essentially "can't find a plan". But when there is
a plan that we could use, that seems like a bad idea. 

>> Another challenge with work_mem is that anyone can modify it arbitrarily,
>> i.e. a user can do
>>
>>   SET work_mem = '1TB';
>>
>> and use as much memory as they wist, or even crash the system. I wonder if
>> we could define the new GUCs (session_work_mem and global_work_mem) in a
>> way to prevent this. We probably don't want to make them PGC_POSTMASTER
>> (it seems useful to allow overriding them in ALTER USER/DATABASE), but I
>> don't think we have a good way to do that at the moment. Any ideas in this
>> direction?
>
>How about something giving the superuser the following GUCs:
>
>global_work_mem = 16GB
>session_min_work_mem = 0.5%  -- the amount of quota sessions keep, for
>fast small queries
>session_max_work_mem = 20% -- the maximum quota any one session is allowed
>session_extra_work_mem = 5% -- opportunistic execution-time boost
>
>Users are free to plan queries with work_mem = 1TB, and if you do that
>and it estimates that it wants 512GB, it will be rejected if you try
>to execute it because it exceeds session_max_work_mem, with a hint
>telling you to turn down work_mem.  Otherwise it either runs or joins
>the queue if it can't get the quota it needs immediately.
>

Seems reasonable, certainly for v1. I'd keep it as simple as possible.

>Eventually we could try to figure out how to set work_mem to automatic
>(I don't want to propose a concrete rule, but maybe something based on
>session_max_work_mem / njoins, with various fudge factors, and some
>accounting for parallel workers; it's probably good to low-ball it and
>rely on session_extra_work_mem).
>

Hmm, so you'd tweak work_mem for individual queries? Not sure that's
something I'd do at this point - it may seem simple, but I think it's
actually way harder to get right.

For example let's say you have two views that are planned nicely, then you
join then and suddenly the plan is much worse because the actual work_mem
got much lower suddenly. That's not great.

Of course, if it's just optional behavior, and the current with explicit
work_mem value is the default, then this is not an issue.

Anyway, I'd focus on MVP doing the bare minimum with simply enforcing a
session limit, and leave this for the future.

>Yeah, I think you'd want to be able to set session_XXX on databases
>and roles so that you can say your regular users can't eat more than
>10% of memory each, but a big reporting thing is allowed more.
>

Yeah, something like that.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: BUG #15977: Inconsistent behavior in chained transactions
Next
From: Tomas Vondra
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions