Re: RFC: Async query processing - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: RFC: Async query processing
Date
Msg-id CAHyXU0yZ6+uy1b+pw_tZg5TbBxq2S+nNPb1RRgBRY+LqCTFuew@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Async query processing  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: RFC: Async query processing
List pgsql-hackers
On Fri, Jan 3, 2014 at 11:06 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Claudio Freire <klaussfreire@gmail.com> writes:
>>> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer@redhat.com> wrote:
>>>> Loading data into the database isn't such an uncommon task.  Not everything
>>>> is OLTP.
>>
>>> Truly, but a sustained insert stream of 10 Mbps is certainly way
>>> beyond common non-OLTP loads. This is far more specific than non-OLTP.
>>
>> I think Florian has a good point there, and the reason is this: what
>> you are talking about will be of exactly zero use to applications that
>> want to see the results of one query before launching the next.  Which
>> eliminates a whole lot of apps.  I suspect that almost the *only*
>> common use case in which a stream of queries can be launched without
>> feedback is going to be bulk data loading.  It's not clear at all
>> that pipelining the PQexec code path is the way to better performance
>> for that --- why not use COPY, instead?
>
> You're forgetting ORM workloads.
>
> ORMs can usually plan the inserts to be in a sequence that both don't
> require feedback (except the knowledge that they were successful), and
> that do not violate constraints.
>
> Flushing a whole object hierarchy for instance, can be done without
> feedback. Not even serial columns need feedback, since many ORMs
> (SQLAlchemy, Hibernate) support allocation of ID sequences in batches
> (by issuing a proper select nextval).
>
> I agree, that with the proposed API, it's too error prone to be
> useful. But I also think, if the API is simple and fool-proof enough,
> it could be "build them and they will come". I know I'll be happy to
> implement support for SQLAlchemy (since it will benefit me), if the
> API resembles the proposition below (at least in simplicity).
>
> Per-query expectations could be such a thing. And it can even work with PQexec:
>
> PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);");
> --read--
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> ... 9 times...
> PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK);
> PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC);
> do {
>    // do something useful
> } while (PQflush());
>
> Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
> and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
> send if the buffer is full). After any ASYNC call, PQflush would be
> necessary (to flush the send queue and to consume the expected
> responses), but I can imagine any synchronous call (PQexec,
> PQsendQuery or whatever) could detect a non-empty buffer and just
> blockingly flush right there.
>
> This can benefit many useful patterns. ORM flush, is one, if there can
> be preallocation of IDs (which I know at least SQLAlchemy and
> Hibernate both support).
>
> Execute-many of prepared statements is another one, quite common.
>
> I'm not sure what would happen if one of the queries returned an
> error. If in a transaction, all the following queries would error out
> I'd imagine. If not, they would simply be executed blindly.. am I
> correct?

Long term, I'd rather see an optimized 'ORM flush' assemble the data
into a structured data set (perhaps a JSON document) and pass it to
some receiving routine that decomposed it into records.  This is a
better way to so things on so many levels.  Maybe I'm an old cranky
guy yelling at pigeons, but I don't think the current approach that
many ORMs take is going to withstand the test of time.

merlin



pgsql-hackers by date:

Previous
From: Claudio Freire
Date:
Subject: Re: RFC: Async query processing
Next
From: Fabien COELHO
Date:
Subject: Re: ISN extension bug? (with patch)