Re: XPRS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: XPRS
Date
Msg-id CA+hUKGKx6LLK-bf6ayMvNaXSSw4Wt19O1qO-uzGqPUZPqsc-Lw@mail.gmail.com
Whole thread Raw
In response to Re: XPRS  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: XPRS
List pgsql-hackers
On Tue, Sep 3, 2019 at 5:20 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> FWIW it's not clear to me why the cost would need to be recomputed after
> constructing the parallel version of the plan? My understanding is that
> the idea is to do cost-based planning for the serial plan, and then just
> "mechanically" construct a parallel plan. Although, maybe there could be
> multiple parallel alternatives ...

Presumably you still need to choose between the serial and parallel
plans by comparing costs.  You lose some by adding exchange operators,
but you win some by dividing cardinality estimates.

> >This is sort of fudging together of ideas from conversations with
> >Kevin Grittner (who talked about admission control a few years back),
> >Peter Geoghegan (who mentioned opportunistically asking for more), and
> >things I've heard of on SQL Server ("memory grants").  I think it
> >would provide some relief from the problems we see today: it's hard to
> >set work_mem so that you never get OOM but you can still use a decent
> >amount of your precious memory, especially with mixed parallel and
> >non-parallel query workloads thanks to our current
> >work_mem-multiplying design.
>
> I think this is probably the simplest and most realistic first step.
>
> Whenever I was thinking about memory acquisition, I've assumed we'd
> monitor how much memory the plan is expected to use while we're
> constructing it. My main problem was what to do when we reach the
> per-query limit - whether to (a) simply reject the plan, (b) go back and
> see if we can replan with lower work_mem (but how much and for which
> nodes?), or (c) just continue.

Yeah, it's all quite tricky and circular.  But I'm pretty sure that we
need caps at execution time, anyway, so I think it's OK to start at
that end of the problem and then later try to improve the way the
planner.

> The proposed plan deals with this by not limiting the per-query (or rather
> per-session) budget directly, and instead requesting requesting additional
> budget. Which is nice.
>
> I suspect we should also keep an additional plan that is expected to meet
> the session_work_mem limit, aside from the regular cheapest plan, and use
> it if it's not much worse. Imagine you have a plan with cost 1000 that
> needs (global_work_mem/2 + 1kB) memory, essentially serializing executions
> of this query. And then there's an alternative plan with cost 1100 that
> can run with session_work_mem. It seems better to just accept the second
> plan, because it won't need to wait.

Hmm.  I wonder if it's worth it.  You could also just replan as you
said, but I'm wondering if just rejecting the query would be OK.

> Another challenge with work_mem is that anyone can modify it arbitrarily,
> i.e. a user can do
>
>   SET work_mem = '1TB';
>
> and use as much memory as they wist, or even crash the system. I wonder if
> we could define the new GUCs (session_work_mem and global_work_mem) in a
> way to prevent this. We probably don't want to make them PGC_POSTMASTER
> (it seems useful to allow overriding them in ALTER USER/DATABASE), but I
> don't think we have a good way to do that at the moment. Any ideas in this
> direction?

How about something giving the superuser the following GUCs:

global_work_mem = 16GB
session_min_work_mem = 0.5%  -- the amount of quota sessions keep, for
fast small queries
session_max_work_mem = 20% -- the maximum quota any one session is allowed
session_extra_work_mem = 5% -- opportunistic execution-time boost

Users are free to plan queries with work_mem = 1TB, and if you do that
and it estimates that it wants 512GB, it will be rejected if you try
to execute it because it exceeds session_max_work_mem, with a hint
telling you to turn down work_mem.  Otherwise it either runs or joins
the queue if it can't get the quota it needs immediately.

Eventually we could try to figure out how to set work_mem to automatic
(I don't want to propose a concrete rule, but maybe something based on
session_max_work_mem / njoins, with various fudge factors, and some
accounting for parallel workers; it's probably good to low-ball it and
rely on session_extra_work_mem).

Yeah, I think you'd want to be able to set session_XXX on databases
and roles so that you can say your regular users can't eat more than
10% of memory each, but a big reporting thing is allowed more.

-- 
Thomas Munro
https://enterprisedb.com



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?
Next
From: Erik Rijkers
Date:
Subject: Re: row filtering for logical replication