Re: RFC: Async query processing - Mailing list pgsql-hackers
From | Claudio Freire |
---|---|
Subject | Re: RFC: Async query processing |
Date | |
Msg-id | CAGTBQpZ7sRObEhTZ6ouXkA=ZgZDOfXoK7Vimy1MA_PMoVmDN8Q@mail.gmail.com Whole thread Raw |
In response to | Re: RFC: Async query processing (Florian Weimer <fweimer@redhat.com>) |
Responses |
Re: RFC: Async query processing
|
List | pgsql-hackers |
On Wed, Dec 18, 2013 at 1:50 PM, Florian Weimer <fweimer@redhat.com> wrote: > On 11/04/2013 02:51 AM, Claudio Freire wrote: >> >> On Sun, Nov 3, 2013 at 3:58 PM, Florian Weimer <fweimer@redhat.com> wrote: >>> >>> I would like to add truly asynchronous query processing to libpq, >>> enabling >>> command pipelining. The idea is to to allow applications to auto-tune to >>> the bandwidth-delay product and reduce the number of context switches >>> when >>> running against a local server. >> >> ... >>> >>> If the application is not interested in intermediate query results, it >>> would >>> use something like this: >> >> ... >>> >>> If there is no need to exit from the loop early (say, because errors are >>> expected to be extremely rare), the PQgetResultNoWait call can be left >>> out. >> >> >> It doesn't seem wise to me making such a distinction. It sounds like >> you're oversimplifying, and that's why you need "modes", to overcome >> the evidently restrictive limits of the simplified interface, and that >> it would only be a matter of (a short) time when some other limitation >> requires some other mode. > > > I need modes because I want to avoid unbound buffering, which means that > result data has to be consumed in the order queries are issued. ... > In any case, I don't want to change the wire protocol, I just want to enable > libpq clients to use more of its capabilities. I believe you will at least need to use TCP_CORK or some advanced socket options if you intend to decrease the number of packets without changing the protocol. Due to the interactive and synchronized nature of the protocol, TCP will immediately send the first query in a packet since it's already ready to do so. Buffering will only happen from the second query onwards, and this won't benefit a two-query loop as the one in your sample. As for expectations, they can be part of the connection object and not the wire protocol if you wish. The point I was making, is that the expectation should be part of the query call, since that's less error prone than setting a "discard results" mode. Think of it as PQsendQueryParams with an extra "async" argument that defaults to PQASYNC_NOT (ie, sync). There you can tell libpq to expect either no results, expect and discard them, or whatever. The benefit here is a simplified usage: your example code will be part of libpq and thus all this complexity will be hidden from users. Furthermore, libpq will do the small sanity check of actually checking that the server returns no results when expecting no result. >>> PGAsyncMode oldMode = PQsetsendAsyncMode(conn, PQASYNC_RESULT); >>> bool more_data; >>> do { >>> more_data = ...; >>> if (more_data) { >>> int ret = PQsendQueryParams(conn, >>> "INSERT ... RETURNING ...", ...); >>> if (ret == 0) { >>> // handle low-level error >>> } >>> } >>> // Consume all pending results. >>> while (1) { >>> PGresult *res; >>> if (more_data) { >>> res = PQgetResultNoWait(conn); >>> } else { >>> res = PQgetResult(conn); >>> } >> >> >> Somehow, that code looks backwards. I mean, really backwards. Wouldn't >> that be !more_data? > > No, if more data is available to transfer to the server, the no-wait variant > has to be used to avoid a needless synchronization with the server. Ok, yeah. Now I get it. It's client-side more_data. >> In any case, pipelining like that, without a clear distinction, in the >> wire protocol, of which results pertain to which query, could be a >> recipe for trouble when subtle bugs, either in lib usage or >> implementation, mistakenly treat one query's result as another's. > > > We already use pipelining in libpq (see pqFlush, PQsendQueryGuts and > pqParseInput3), the server is supposed to support it, and there is a lack of > a clear tit-for-tat response mechanism anyway because of NOTIFY/LISTEN and > the way certain errors are reported. pqFlush doesn't seem overly related, since the API specifically states that you cannot queue multiple PQsendQuery. It looks more like low-level buffering. Ie: when the command itself is larger than the os buffer and nonblocking operation requires multiple send() calls for one PQsendQuery. Am I wrong? >>> Instead of buffering the results, we could buffer the encoded command >>> messages in PQASYNC_RESULT mode. This means that PQsendQueryParams would >>> not block when it cannot send the (complete) command message, but store >>> in >>> the connection object so that the subsequent PQgetResultNoWait and >>> PQgetResult would send it. This might work better with single-tuple >>> result >>> mode. We cannot avoid buffering either multiple queries or multiple >>> responses if we want to utilize the link bandwidth, or we'd risk >>> deadlocks. >> >> >> This is a non-solution. Such an implementation, at least as described, >> would not remove neither network latency nor context switches, it >> would be a purely API change with no externally visible behavior >> change. > > > Ugh, why? Oh, sorry. I had this elaborate answer prepared, but I just noticed it's wrong: you do say "if it cannot send it rightaway". So yes, I guess that's quite similar to the kind of buffering I was talking about anyway. Still, I'd suggest using TCP_CORK when expecting this kind of usage pattern, or the first call in your example won't buffer at all. It's essentially the TCP slow-start issue, unless you've got a great many queries to pipeline, you won't see the benefit without careful use of TCP_CORK. Since TCP_CORK is quite platform-dependent, I'd recommend "corking" on the library side rather than trusting the network stack. >> An effective solution must include multi-command packets. Without >> knowing the wire protocol in detail, something like: >> >> PARSE: INSERT blah >> BIND: args >> EXECUTE with DISCARD >> PARSE: INSERT blah >> BIND: args >> EXECUTE with DISCARD >> PARSE: SELECT blah >> BIND: args >> EXECUTE with FETCH ALL >> >> All in one packet, would be efficient and error-free (IMO). > > > No, because this doesn't scale automatically with the bandwidth-delay > product. It also requires that the client buffers queries and their > parameters even though the network has to do that anyway. Why not? I'm talking about transport-level packets, btw, not libpq frames/whatever. Yes, the network stack will sometimes do that. But the it doesn't have to do it. It does it sometimes, which is not the same. And buffering algorithms are quite platform-dependent anyway, so it's not the best idea to make libpq highly reliant on them. But yes. You would get the benefit for large number of queries. Launch a tcpdump and test it. This is a simple test in the loopback interface, with python. On the server: >>> import socket >>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) >>> s.bind(('',8000)) >>> s.listen(10) >>> s2 = s.accept()[0] >>> s2.recv(256) 'hola mundo\n' On the client: >>> import socket >>> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) >>> s.connect(('127.0.0.1',8000)) >>> s.send('hola') ; s.send(' mundo\n') 4 7 Tcpdump output: 15:33:16.112991 IP localhost.49138 > localhost.irdmi: Flags [S], seq 2768629731, win 43690, options [mss 65495,sackOK,TS val 3152304 ecr 0,nop,wscale 7], length 0 E..<.S@.@.Ff...........@.............0......... .0.......... 15:33:16.113004 IP localhost.irdmi > localhost.49138: Flags [S.], seq 840184739, ack 2768629732, win 43690, options [mss 65495,sackOK,TS val 3152304 ecr 3152304,nop,wscale 7], length 0 E..<..@.@.<..........@..2.3..........0......... .0...0...... 15:33:16.113016 IP localhost.49138 > localhost.irdmi: Flags [.], ack 1, win 342, options [nop,nop,TS val 3152304 ecr 3152304], length 0 E..4.T@.@.Fm...........@....2.3....V.(..... .0...0.. 15:34:32.843626 IP localhost.49138 > localhost.irdmi: Flags [P.], seq 1:5, ack 1, win 342, options [nop,nop,TS val 3229034 ecr 3152304], length 4 E..8.U@.@.Fh...........@....2.3....V.,..... .1Ej.0..hola 15:34:32.843675 IP localhost.irdmi > localhost.49138: Flags [.], ack 5, win 342, options [nop,nop,TS val 3229035 ecr 3229034], length 0 E..4*.@.@............@..2.3........V.(..... .1Ek.1Ej 15:34:32.843696 IP localhost.49138 > localhost.irdmi: Flags [P.], seq 5:12, ack 1, win 342, options [nop,nop,TS val 3229035 ecr 3229035], length 7 E..;.V@.@.Fd...........@....2.3....V./..... .1Ek.1Ek mundo 15:34:32.843701 IP localhost.irdmi > localhost.49138: Flags [.], ack 12, win 342, options [nop,nop,TS val 3229035 ecr 3229035], length 0 E..4*.@.@............@..2.3........V.(..... .1Ek.1Ek See how there's two packets and two ack. On eth, it's the same. Except the server doesn't even get the whole "hola mundo", but just the first "hola", on the first recv, because of network delay. So, trusting the network start to do the quick start won't work. For steady streams of queries, it will work. But not for short bursts, which will be the most heavily used case I believe (most apps create short bursts of inserts and not continuous streams at full bandwidth).
pgsql-hackers by date: