Thread: [RFC] libpq extensions - followup
Hello everyone, A while back I suggested that it would be useful if asynchronous connections could support multiple queries "in flight", especially if you're working on event driven applications (see topic "rfc - libpq extensions"). I've since started work on a simple library, which I call libpqx, that I think could potentially provide a nice alternative to libpq. I was hoping the folks on the list could discuss whether such a library would be useful and if so, provide some feedback regarding its design. The library code is very much in its infancy but I believe it provides a reasonable foundation that can be easily extended. The library source tarball can be found on my home page: http://www.arizmendi.org Please keep in mind that the current code is only meant to demonstrate the feasibility of its design - it's still full of "TODO"s, "assert"s and "abort"s. Cheers, Iker DESIGN CRITERIA: =========== Here are some of the things I thought the library should do, and that I tried to address: 1) Support for event driven and threaded programs. 2) Multiple "in flight" queries on asynchronous connections must be permitted. For event driven applications this is key. 3) Support connection pooling for event-driven and threaded programs. After getting started on (2), I realized this was also necessary. One of the reasons to write event driven servers is to support a larger number of clients than you otherwise would with a process-per-client or thread-per-client architecture. Without connection pooling an event driven architecture doesn't help - it only moves the problem to the DB server (which will spawn a process for each connection). 4) Easy to use interface. I used libpq as a starting model but made several changes and added two new abstractions - connection pool (PGXpool) and query (PGXquery). The latter is necessary to support (2) and it also provides a foundation for parameterized queries (eg, "DELETE FROM TableA WHERE x= ?"). It might also serve to hide the details of bytea encoding. 5) Thread safe. For event driven applications this isn't an issue (except on SMP machines - this needs to be addressed). In threaded programs, multiple threads can share a common connection pool (although they must each have their own PGXconn object). IMPLEMENTATION: =========== Most of the coding is pretty straight-forward (at least I tried to make it so) but there are two implementation details that should be noted: 1) Connection pooling and event driven servers. When implementing an event driven server on *NIX you typically have a poll/select event loop that listens for activity on a range of file descriptors. Without connection pooling one establishes a connection to the backend and uses the socket descriptor to listen on. However, in a pooled implementation a connection may not actually be backed by a socket until the pool connection count goes positive. In this case the client will have nothing to register in its event loop. To deal with this issue, the PGXconn object uses a pipe to the connection pool to receive availability notifications - an "eventable" descriptor is thus always available to give to clients. 2) Explicit state machine implementation. The PGXconn object relies on the use of explicit machine states to get all its work done (synchronous operations get there own special state). Each machine states implements a common set of methods (defined in a struct of function pointers) in a separate file which keeps the code layout clean. At the moment I implemented states that mimic the way libpq does its thing but I think there's potential to cleanly provide additional functionality (non-blocking query writes, for instance) by simply adding new states. P.S. My implementation of a state machine is probably pretty naive since I only recently started using them aggressively - any suggestions here would be much appreciated. 3) Wrapping of libpq. During this first cut, I leveraged libpq as much as possible but in the future it may be worth while for libpqx to wean itself off the libpq API and instead use more of its underlying code.
Iker Arizmendi <iker@research.att.com> writes: > A while back I suggested that it would be useful if asynchronous > connections could support multiple queries "in flight", especially if > you're working on event driven applications (see topic "rfc - libpq > extensions"). I've since started work on a simple library, I must be missing something fundamental here. The backend doesn't support multiple parallel queries, so how can you have "multiple queries in flight" on the same connection? regards, tom lane
Althought multiple queries can't be executed concurrently they can be queued up by a connection object. For instance: PGXquery* q1 = PQXcreateQuery(...); PGXquery* q2 = PQXcreateQuery(...); ... PGXquery* qN = PQXcreateQuery(...); PQXexecute(conn1, q1); PQXexecute(conn1, q2); ... PQXexecute(conn1, qN); while (1) { /* event loop code */ } In this example, queries q1 through qN are "in flight" simultaneously as far as the client is concerned even though the connection conn1 is really queueing them up. During execution of the event loop, the connection object manages the job of executing each query sequentially (which, with respect to the client, is an implementation detail). Cheers, Iker On Sun, 26 Jan 2003 00:47:20 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Iker Arizmendi <iker@research.att.com> writes: > > A while back I suggested that it would be useful if asynchronous > > connections could support multiple queries "in flight", especially > > if you're working on event driven applications (see topic "rfc - > > libpq extensions"). I've since started work on a simple library, > > I must be missing something fundamental here. The backend doesn't > support multiple parallel queries, so how can you have "multiple > queries in flight" on the same connection? > > regards, tom lane
BTW, the source tarball contains an example program that shows how multiple "in flight" queries are simulated. It also contains a multi-threaded example (where each thread waits on the connection pool). Cheers, Iker On Sun, 26 Jan 2003 00:47:20 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Iker Arizmendi <iker@research.att.com> writes: > > A while back I suggested that it would be useful if asynchronous > > connections could support multiple queries "in flight", especially > > if you're working on event driven applications (see topic "rfc - > > libpq extensions"). I've since started work on a simple library, > > I must be missing something fundamental here. The backend doesn't > support multiple parallel queries, so how can you have "multiple > queries in flight" on the same connection? > > regards, tom lane -- Iker Arizmendi AT&T Labs - Research Speech and Image Processing Lab e: iker@research.att.com w: http://research.att.com p: 973-360-8516
Iker Arizmendi <iker@research.att.com> writes: > Althought multiple queries can't be executed concurrently they can be > queued up by a connection object. Hmm. This seems to presume that every query will be executed as a separate transaction. That's pretty limiting (unless you suppose all the queries are read-only). regards, tom lane
> Hmm. This seems to presume that every query will be executed as a > separate transaction. That's pretty limiting (unless you suppose all > the queries are read-only). True, but only at this early stage. There are (at least) two ways which can be used to provide transaction support: 1) Explicit transaction start/end methods. Eg. PQXbeginTx(pConn1); PQXexecute(pConn1, q1); PQXexecute(pConn1, q2); PQXexecute(pConn1, q3); PQXcommitTx(pConn1); 2) Or implicit transaction support. Eg. PGXquery* q1 = PQXcreateQuery("BEGIN... UPDATE..."); PGXquery* q2 = PQXcreateQuery("UPDATE"); PGXquery* q3 = PQXcreateQuery("INSERT"); PGXquery* q4 = PQXcreateQuery("COMMIT"); In this latter case the PGXconn object would have to inspect the SQL queries in search of transaction delimiters. As far as uncommitted transactions go, the PGXconn object has a flag that indicates whether its currently in a transaction - this flag can be checked (it currently isn't) upon transitioning to the closed state. At that point whatever policy governing uncommitted transactions can be put in place before returning the connection to the pool. Cheers, Iker On Sun, 26 Jan 2003 13:38:02 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Iker Arizmendi <iker@research.att.com> writes: > > Althought multiple queries can't be executed concurrently they can > > be queued up by a connection object. > > Hmm. This seems to presume that every query will be executed as a > separate transaction. That's pretty limiting (unless you suppose all > the queries are read-only). > > regards, tom lane -- Iker Arizmendi AT&T Labs - Research Speech and Image Processing Lab e: iker@research.att.com w: http://research.att.com p: 973-360-8516
BTW, the source tarball contains two example programs - an event-driven version which demonstrates simulating multiple "in-flight" queries and a multi-threaded version. Cheers, Iker ----- Original Message ----- From: "Iker Arizmendi" <iker@research.att.com> Newsgroups: comp.databases.postgresql.general,comp.databases.postgresql.questions Sent: Sunday, January 26, 2003 2:02 AM Subject: Re: [RFC] libpq extensions - followup > Althought multiple queries can't be executed concurrently they can be > queued up by a connection object. For instance: > > PGXquery* q1 = PQXcreateQuery(...); > PGXquery* q2 = PQXcreateQuery(...); > ... > PGXquery* qN = PQXcreateQuery(...); > > PQXexecute(conn1, q1); > PQXexecute(conn1, q2); > ... > PQXexecute(conn1, qN); > > while (1) { /* event loop code */ } > > In this example, queries q1 through qN are "in flight" simultaneously as > far as the client is concerned even though the connection conn1 is > really queueing them up. During execution of the event loop, the > connection object manages the job of executing each query > sequentially (which, with respect to the client, is an implementation > detail). > > Cheers, > Iker > > On Sun, 26 Jan 2003 00:47:20 -0500 > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Iker Arizmendi <iker@research.att.com> writes: > > > A while back I suggested that it would be useful if asynchronous > > > connections could support multiple queries "in flight", especially > > > if you're working on event driven applications (see topic "rfc - > > > libpq extensions"). I've since started work on a simple library, > > > > I must be missing something fundamental here. The backend doesn't > > support multiple parallel queries, so how can you have "multiple > > queries in flight" on the same connection? > > > > regards, tom lane > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)