Re: RFC: Async query processing - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: RFC: Async query processing |
Date | |
Msg-id | CAHyXU0yZ6+uy1b+pw_tZg5TbBxq2S+nNPb1RRgBRY+LqCTFuew@mail.gmail.com Whole thread Raw |
In response to | Re: RFC: Async query processing (Claudio Freire <klaussfreire@gmail.com>) |
Responses |
Re: RFC: Async query processing
|
List | pgsql-hackers |
On Fri, Jan 3, 2014 at 11:06 AM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Claudio Freire <klaussfreire@gmail.com> writes: >>> On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer <fweimer@redhat.com> wrote: >>>> Loading data into the database isn't such an uncommon task. Not everything >>>> is OLTP. >> >>> Truly, but a sustained insert stream of 10 Mbps is certainly way >>> beyond common non-OLTP loads. This is far more specific than non-OLTP. >> >> I think Florian has a good point there, and the reason is this: what >> you are talking about will be of exactly zero use to applications that >> want to see the results of one query before launching the next. Which >> eliminates a whole lot of apps. I suspect that almost the *only* >> common use case in which a stream of queries can be launched without >> feedback is going to be bulk data loading. It's not clear at all >> that pipelining the PQexec code path is the way to better performance >> for that --- why not use COPY, instead? > > You're forgetting ORM workloads. > > ORMs can usually plan the inserts to be in a sequence that both don't > require feedback (except the knowledge that they were successful), and > that do not violate constraints. > > Flushing a whole object hierarchy for instance, can be done without > feedback. Not even serial columns need feedback, since many ORMs > (SQLAlchemy, Hibernate) support allocation of ID sequences in batches > (by issuing a proper select nextval). > > I agree, that with the proposed API, it's too error prone to be > useful. But I also think, if the API is simple and fool-proof enough, > it could be "build them and they will come". I know I'll be happy to > implement support for SQLAlchemy (since it will benefit me), if the > API resembles the proposition below (at least in simplicity). > > Per-query expectations could be such a thing. And it can even work with PQexec: > > PQexec(con, "SELECT nextval('a_id_seq') FROM generate_series(1,10);"); > --read-- > PQexec(con, "SELECT nextval('b_id_seq') FROM generate_series(1,10);"); > --read-- > PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > ... 9 times... > PQexec(con, "INSERT INTO a (...);", PQEXPECT_NO_RESULT | PQASYNC_CORK); > PQexec(con, "INSERT INTO b (...);", PQEXPECT_NO_RESULT | PQASYNC); > do { > // do something useful > } while (PQflush()); > > Here, the PQASYNC flag would temporarily switch to non-blocking I/O, > and buffer what cannot be sent. PQASNC_CORK, would only buffer (only > send if the buffer is full). After any ASYNC call, PQflush would be > necessary (to flush the send queue and to consume the expected > responses), but I can imagine any synchronous call (PQexec, > PQsendQuery or whatever) could detect a non-empty buffer and just > blockingly flush right there. > > This can benefit many useful patterns. ORM flush, is one, if there can > be preallocation of IDs (which I know at least SQLAlchemy and > Hibernate both support). > > Execute-many of prepared statements is another one, quite common. > > I'm not sure what would happen if one of the queries returned an > error. If in a transaction, all the following queries would error out > I'd imagine. If not, they would simply be executed blindly.. am I > correct? Long term, I'd rather see an optimized 'ORM flush' assemble the data into a structured data set (perhaps a JSON document) and pass it to some receiving routine that decomposed it into records. This is a better way to so things on so many levels. Maybe I'm an old cranky guy yelling at pigeons, but I don't think the current approach that many ORMs take is going to withstand the test of time. merlin
pgsql-hackers by date: