Re: pg_background (and more parallelism infrastructure patches) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: pg_background (and more parallelism infrastructure patches)
Date
Msg-id CA+Tgmobr9u15pE9d6gTD0DLwnjLwSDyW72PeaKc9-kROB0=Duw@mail.gmail.com
Whole thread Raw
In response to Re: pg_background (and more parallelism infrastructure patches)  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: pg_background (and more parallelism infrastructure patches)  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Mon, Nov 10, 2014 at 1:29 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> That's an issue to which we may need to engineer some solution, but
>> not in this patch.  Overall, the patch's architecture is modeled
>> closely on the way we synchronize GUCs to new backends when using
>> EXEC_BACKEND, and I don't think we're going do any better than stating
>> with that and working to file down the rough edges as we go.  So I'd
>> like to go ahead and commit this.
>
> Did you check out whether PGC_BACKEND GUCs work? Other than that I agree
> that we can just solve further issues as we notice them. This isn't
> particularly intrustive.

Is the scenario you're concerned about this one:

1. Start postmaster.
2. Start a user session.
3. Change a PGC_BACKEND GUC in postgresql.conf.
4. Reload.
5. Launch a background worker that uses this code.

It should end up with the same values there as the active session, not
the current one from postgresql.conf.  But we want to verify that's
the behavior we actually get.   Right?

>> -- It doesn't handle types without send/receive functions.  I thought
>> that was tolerable since the functions without such types seem like
>> mostly edge cases, but Andres doesn't like it.  Apparently, BDR is
>> makes provisions to either ship the tuple as one big blob - if all
>> built-in types - or else use binary format where supported and text
>> format otherwise.  Since this issue is common to BDR, parallelism, and
>> pg_background, we might want to introduce some common infrastructure
>> for it, but it's not too clear to me right now what that should look
>> like.
>
> I think we definitely need to solve this - but I'm also not at all that
> sure how.
>
> For something like pg_background it's pretty trivial to fall back to
> in/out when there's no send/recv. It's just a couple of lines, and the
> portal code has the necessary provisions. So solving it for
> pg_background itself is pretty trivial.

That's not really the interesting part of the problem, I think.  Yeah,
pg_background can be made to speak text format if we really care.  But
the real issue is that even with the binary format, converting a tuple
in on-disk format into a DataRow message so that the other side can
reconstruct the tuple and shove it into an executor slot (or wherever
it wants to shove it) seems like it might be expensive.  You might
have data on that; if it's not expensive, stop here.  If it is
expensive, what we really want to do is figure out some way to make it
safe to copy the tuple into the shared memory queue, or send it out
over the socket, and have the process on the other end use it without
having to revalidate the whole thing column-by-column.

> But, as you say, pg_background
> itself isn't a primary goal (although a nice demonstration, with some
> real world applications).  There's enough differences between the
> parallelism and the replication cases that I'm not entirely sure how
> much common ground there is. Namely replication has the additional
> concerns of version differences, trust (don't use blobs if you don't
> fully trust the other side), and differences in the allocated oids
> (especially relevant for arrays which, very annoyingly, embed the oid in
> the send/recv format).

Yeah.  It would be nice to use the same code for deciding what to do
in a particular case.  It seems like it ought to be possible to have
one rule for whether a tuple with a given set of data types is safe to
ship in the on-disk format.  For pg_background/parallelism, it's
enough if that function returns true; for BDR, there might be
additional criteria applied before deciding to do it that way.  But
they could use the same decision tree for part of it.

What would be even better is to find some way to MAKE IT SAFE to send
the undecoded tuple.  I'm not sure what that would look like.  Locking
the types against concurrent changes?  Marking dropped types as
dropped without actually dropping them, and garbage collecting them
later?  Providing safer ways to deconstruct tuples?  It might help to
start by enumerating what exactly can go wrong here.  As far as I can
see, the main risks for pg_background and parallelism are (1) that the
type might get concurrently dropped, leaving us unable to interpret
the received tuple and (2) that the type might get concurrently
modified in some way that leaves us confused about how to interpret
the tuple, as by having the type size change.  The chances of a
problem in practice seem remote, since you can't do either of those
things if a table column with that type exists, but I can't convince
myself that there's no way for the type to be modified under us in
such a way that two different backends have different ideas about
whether it exists or how wide it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: WAL format and API changes (9.5)
Next
From: Alexander Korotkov
Date:
Subject: Re: WIP: Access method extendability