Re: Tips/advice for implementing integrated RESTful HTTP API - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Tips/advice for implementing integrated RESTful HTTP API
Date
Msg-id 5403EBF4.6010105@2ndquadrant.com
Whole thread Raw
In response to Tips/advice for implementing integrated RESTful HTTP API  (Dobes Vandermeer <dobesv@gmail.com>)
Responses Re: Tips/advice for implementing integrated RESTful HTTP API
List pgsql-hackers
On 08/31/2014 12:40 PM, Dobes Vandermeer wrote:
> 1. Connecting to multiple databases
> 
> The background workers can apparently only connect to a single database
> at a time, but I want to expose all the databases via the API. 

bgworkers are assigned a database at launch time (if SPI is enabled),
and this database may not change during the worker's lifetime, same as a
normal backend.

Sometimes frustrating, but that's how it is.

> I think I could use libpq to connect to PostgreSQL on localhost but this
> might have weird side-effects in terms of authentication, pid use, stuff
> like that.

If you're going to do that, why use a bgworker at all?

In general, what do you gain from trying to do this within the database
server its self, not as an app in front of the DB?

> I could probably manage a pool of dynamic workers (as of 9.4), one per
> user/database combination or something along those lines.  Even one per
> request?  Is there some kind of IPC system in place to help shuttle the
> requests and responses between dynamic workers?  Or do I need to come up
> with my own?

The dynamic shmem code apparently has some queuing functionality. I
haven't used it yet.

> It seems like PostgreSQL itself has a way to shuttle requests out to
> workers, is it possible to tap into that system instead?  Basically some
> way to send the requests to a PostgreSQL backend from the background worker?

It does?

It's not the SPI, that executes work directly within the bgworker,
making it behave like a normal backend for the purpose of query execution.

> Or perhaps I shouldn't do this as a worker but rather modify PostgreSQL
> itself and do it in a more integrated/destructive manner?

Or just write a front-end.

The problem you'd have attempting to modify PostgreSQL its self for this
is that connection dispatch occurs via the postmaster, which is a
single-threaded process that already needs to do a bit of work to keep
an eye on how things are running. You don't want it constantly busy
processing and dispatching millions of tiny HTTP requests. It can't just
hand a connection off to a back-end immediately after accepting it,
either; it'd have to read the HTTP headers to determine what database to
connect to. Then launch a new backend for the connection, which is
horribly inefficient when doing tiny short-lived connections. The
postmaster has no concept of a pool of backends (unfortunately, IMO) to
re-use.

I imagine (it's not something I've investigated, really) that you'd want
a connection accepter process that watched the listening http request
socket. It'd hand connections off to dispatcher processes that read the
message content to get the target DB and dispatch the request to a
worker backend for the appropriate user/db combo, then collect the
results and return them on the connection. Hopefully at this point
you're thinking "that sounds a lot like a connection pool"... because it
is. An awfully complicated one, probably, as you'd have to manage
everything using shared memory segments and latches.

In my view it's unwise to try to do this in the DB with PostgreSQL's
architecture. Hack PgBouncer or PgPool to do what you want. Or write a
server with Tomcat/Jetty using JAX-RS and PgJDBC and the built in
connection pool facilities - you won't *believe* how easy it is.

> 3. Parallelism
> 
> The regular PostgreSQL server can run many queries in parallel

Well, one PostgreSQL instance (postmaster) may have many backends, each
of which may run queries in series but not in parallel. Any given
process may only run one query at once.

> but it
> seems like if I am using SPI I could only run one query at a time - it's
> not an asynchronous API.

Correct.

> Any help, sage advice, tips, and suggestions how to move forward in
> these areas would be muchly appreciated!

Don't do it with bgworkers.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: David Johnston
Date:
Subject: Re: Built-in binning functions
Next
From: Fabrízio de Royes Mello
Date:
Subject: Re: COPY and heap_sync