Re: [HACKERS] parallelize queries containing initplans - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] parallelize queries containing initplans
Date
Msg-id CA+TgmoZLmoSF_8HVC6A2a2vE9efGm9BPDFfi7VFohHvuKGu4WQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] parallelize queries containing initplans  (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
Responses Re: [HACKERS] parallelize queries containing initplans  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Nov 16, 2017 at 5:23 AM, Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
> I've tested the above-mentioned scenario with this patch and it is
> working fine. Also, I've created a text column named 'vartext',
> inserted some random length texts(max length 100) and tweaked the
> above query as follows:
> select ten,count(*) from tenk1 a where a.ten in (select
> b.ten from tenk1 b where (select a.vartext from tenk1 c where c.ten =
> a.ten limit 1) = b.vartext limit 1) group by a.ten;
> This query is equivalent to select ten,count(*) from tenk1 group by
> a.ten. It also produced the expected result without throwing any
> error.

Great!  I have committed the patch; thanks for testing.

As I said in the commit message, there's a lot more work that could be
done here.  I think we should consider trying to revise this whole
system so that instead of serializing the values and passing them to
the workers, we allocate an array of slots where each slot has a Datum
flag, an isnull flag, and a dsa_pointer (which maybe could be union'd
to the Datum?).  If we're passing a parameter by value, we could just
store it in the Datum field; if it's null, we just set isnull.  If
it's being passed by reference, we dsa_allocate() space for it, copy
it into that space, and then store the dsa_pointer.

The advantage of this is that it would be possible to imagine the
contents of a slot changing while parallelism is running, which
doesn't really work with the current serialized-blob representation.
That would in turn allow us to imagine letting parallel-safe InitPlans
being evaluated by the first participant that needs the value rather
than before launching workers, which would be good, not only because
of the possibility of deferring work for InitPlans attached at or
above the Gather but also because it could be used for InitPlans below
the Gather (as long as they don't depend on any parameters computed
below the Gather).  I thought about trying to refactor this patch to
use a slot concept like what I just mentioned even before committing
it, but after poking at it a bit I decided that was going to require
too much new development to justify cramming it into the same patch.
Still, it seems like a good idea to do it at some point (when we have
lots of free time?).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Fabrízio de Royes Mello
Date:
Subject: Re: pgsql: Add hooks for session start and session end
Next
From: Antonin Houska
Date:
Subject: Re: [HACKERS] Proposal: generic WAL compression