Re: pg_background (and more parallelism infrastructure patches) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: pg_background (and more parallelism infrastructure patches)
Date
Msg-id 20140728175051.GQ17793@alap3.anarazel.de
Whole thread Raw
In response to Re: pg_background (and more parallelism infrastructure patches)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_background (and more parallelism infrastructure patches)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2014-07-26 12:20:34 -0400, Robert Haas wrote:
> On Sat, Jul 26, 2014 at 4:37 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-07-25 14:11:32 -0400, Robert Haas wrote:
> >> Attached is a contrib module that lets you launch arbitrary command in
> >> a background worker, and supporting infrastructure patches for core.
> >
> > Cool.
> >
> > I assume this 'fell out' of the work towards parallelism? Do you think
> > all of the patches (except the contrib one) are required for that or is
> > some, e.g. 3), only required to demonstrate the others?
>
> I'm fairly sure that patches 3, 4, and 5 are all required in some form
> as building blocks for parallelism.  Patch 1 contains two functions,
> one of which (shm_mq_set_handle) I think is generally useful for
> people using background workers, but not absolutely required; and one
> of which is infrastructure for patch 3 which might not be necessary
> with different design choices (shm_mq_sendv).  Patch 2 is only
> included because pg_background can benefit from it; we could instead
> use an eoxact callback, at the expense of doing cleanup at
> end-of-transaction rather than end-of-query.  But it's a mighty small
> patch and seems like a reasonable extension to the API, so I lean
> toward including it.

Don't get me wrong, I don't object to anything in here. It's just that
the bigger picture can help giving sensible feedback.

> >> Patch 3 adds the ability for a backend to request that the protocol
> >> messages it would normally send to the frontend get redirected to a
> >> shm_mq.  I did this by adding a couple of hook functions.  The best
> >> design is definitely arguable here, so if you'd like to bikeshed, this
> >> is probably the patch to look at.
> >
> > Uh. This doesn't sound particularly nice. Shouldn't this rather be
> > clearly layered by making reading/writing from the client a proper API
> > instead of adding hook functions here and there?
>
> I don't know exactly what you have in mind here.  There is an API for
> writing to the client that is used throughout the backend, but right
> now "the client" always has to be a socket.  Hooking a couple of parts
> of that API lets us write someplace else instead.  If you've got
> another idea how to do this, suggest away...

What I'm thinking of is providing an actual API for the writes instead
of hooking into the socket API in a couple places. I.e. have something
like

typedef struct DestIO DestIO;

struct DestIO
{   void (*flush)(struct DestIO *io);   int (*putbytes)(struct DestIO *io, const char *s, size_t len);   int
(*getbytes)(structDestIO *io, const char *s, size_t len);   ...
 
}

and do everything through it. I haven't thought much about the specific
API we want, but abstracting the communication properly instead of
adding hooks here and there is imo much more likely to succeed in the
long term.

> > Also, you seem to have only touched receiving from the client, and not
> > sending back to the subprocess. Is that actually sufficient? I'd expect
> > that for this facility to be fully useful it'd have to be two way
> > communication. But perhaps I'm overestimating what it could be used for.
>
> Well, the basic shm_mq infrastructure can be used to send any kind of
> messages you want between any pair of processes that care to establish
> them.  But in general I expect that data is going to flow mostly in
> one direction - the user backend will launch workers and give them an
> initial set of instructions, and then results will stream back from
> the workers to the user backend.  Other messaging topologies are
> certainly possible, and probably useful for something, but I don't
> really know exactly what those things will be yet, and I'm not sure
> the FEBE protocol will be the right tool for the job anyway.

It's imo not particularly unreasonable to e.g. COPY to/from a bgworker. Which
would require the ability to both read/write from the other side.

> But
> error propagation, which is the main thrust of this, seems like a need
> that will likely be pretty well ubiquitous.

Agreed.

> >> This patch also adds a function to
> >> help you parse an ErrorResponse or NoticeResponse and re-throw the
> >> error or notice in the originating backend.  Obviously, parallelism is
> >> going to need this kind of functionality, but I suspect a variety of
> >> other applications people may develop using background workers may
> >> want it too; and it's certainly important for pg_background itself.
> >
> > I would have had use for it previously.
>
> Cool.  I know Petr was interested as well (possibly for the same project?).

Well, I was aware of Petr's project, but I also have my own pet project
I'd been playing with :). Nothing real.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Making joins involving ctid work for the benefit of UPSERT
Next
From: Fujii Masao
Date:
Subject: Re: postgresql.auto.conf and reload