Thread: about client-side cursors
Hi, I'd like to discuss about the concept of client-side cursor as defined in psycopg. This is a concept I've always been quite uncomfortable with; I'll try to explain why. First, from a practical point of view: cur = connection.cursor() cur.execute(...) result = cur.fetchxxx() This typical pattern seems awkward to me. Where do IO operations actually take place? (on execute()? on fetch()? on both?) Then, and besides the terminology, the fact that named cursors are very different from client-side cursors while being constructed by the same API is confusing. I understand that this may come from the DB-API: https://www.python.org/dev/peps/pep-0249/#cursor-objects > [Cursor] objects represent a database cursor. [...] But this does not seem quite right to me since a client-side cursor is actually not a PostgreSQL cursor. (The documention on psycopg2 is clear on the difference, though.) In asyncio context, while playing around with psycopg3, the situation is even worse because of the additional async/await keywords; for instance: conn = await psycopg3.AsyncConnection.connect() async with conn: cur = await conn.execute("SELECT * FROM test") r = await cur.fetchall() Why do we need two 'await' (i.e. two IO operations) on conn.execute() and cur.fetchall() if 'cur' is a client-side cursor? (Back to my first point about where IO happen, this appears to depend on whether the connection has 'autocommit' or not...) All in all, my main point is: (why) do we need client-side cursors? Thanks, Denis
> On Feb 3, 2021, at 08:15, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > > All in all, my main point is: (why) do we need client-side cursors? Well, the original source is: https://www.python.org/dev/peps/pep-0249/ There are many arguments that can be made against this API, but I think we're a bit stuck at this point. -- -- Christophe Pettus xof@thebuild.com
Am Wed, Feb 03, 2021 at 08:27:49AM -0800 schrieb Christophe Pettus: > > All in all, my main point is: (why) do we need client-side cursors? > > Well, the original source is: > > https://www.python.org/dev/peps/pep-0249/ > > There are many arguments that can be made against this API, but I think we're a bit stuck at this point. Perhaps client-side cursors were invented to make it possible to code to the same API regardless of whether real (server-side) cursors are used or not ? Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On Wed, 3 Feb 2021 at 17:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > I'd like to discuss about the concept of client-side cursor as defined > in psycopg. Welcome Denis! > This is a concept I've always been quite uncomfortable with; > I'll try to explain why. > > First, from a practical point of view: > > cur = connection.cursor() > cur.execute(...) > result = cur.fetchxxx() > > This typical pattern seems awkward to me. Where do IO operations > actually take place? (on execute()? on fetch()? on both?) For a client-side cursor, it happens on execute, but for a server-side cursor it happens in both (execute creates the cursor, I have no idea as to when the server decides to hit the disks, but for the client every fetch is a roundtrip with the server). > Then, and besides the terminology, the fact that named cursors are very > different from client-side cursors while being constructed by the same > API is confusing. I understand that this may come from the DB-API: > > https://www.python.org/dev/peps/pep-0249/#cursor-objects > > [Cursor] objects represent a database cursor. [...] > > But this does not seem quite right to me since a client-side cursor is > actually not a PostgreSQL cursor. (The documention on psycopg2 is clear > on the difference, though.) Yes, that's correct: the name comes from the dbapi. The dbapi cursor does two different things indeed: it runs queries and manages results. I understand that the overlap is sort of weird and everyone who has said "psycopg sucks: let me write my own driver" have used the mental model - much more familiar - of "I ask a query to a connection which gives me a result". So usually what is created is something which can be used as: result = conn.execute(query, vars) for record in result: ... This is understandable, so much that in psycopg3 I've added conn.execute, which is implemented as: def execute(self, query, vars) cur = self.cursor() cur.execute(query, vars) return cur and allow you to use the cursor only to iterate on the data returned, ignoring the execute*() methods. > In asyncio context, while playing around with psycopg3, the situation is > even worse because of the additional async/await keywords; for instance: > > conn = await psycopg3.AsyncConnection.connect() > async with conn: > cur = await conn.execute("SELECT * FROM test") > r = await cur.fetchall() > > Why do we need two 'await' (i.e. two IO operations) on conn.execute() > and cur.fetchall() if 'cur' is a client-side cursor? (Back to my first > point about where IO happen, this appears to depend on whether the > connection has 'autocommit' or not...) I am going a bit back and forth on this mentally, although not in the code. The forces pulling in different directions are: 1) if a method may possibly do I/O activity, then it should be async. I don't know where I have read it and how strong is this guideline, however it makes sense: it's better to have a function to call with await where the await is actually not needed than a function which should have been await but now you are stuck with a blocking interface. 2) if we want to have sync and async cursor, it would be better if they had the same interface The point 2 is pretty true in psycopg2 because there is a single cursor class, and every method is if'ed like: def fetchmany(self, nrecs): if not self.name: # client-side cursor res = self.result[self.current, self.current + nrecs] self.current += nrecs return res else: # server-side cursor result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}") return result I don't think this is a good implementation, so in psycopg3 I would like to have a Cursor interface with a ClientCursor and a ServerCursor implementation, without the if for each method. In async mode, AsyncClientCursor.fetchmany() doesn't actually need to be async, but AsyncServerCursor.fetchmany() does I/O activity. However, would two different implementation classes demand two different interfaces? > All in all, my main point is: (why) do we need client-side cursors? I think client-side cursors are the bread and butter of querying the database. They are a lightweight and stateless way to receive a result: you ask a question, you receive an answer, and all is over. You can put caches between client server, connection pools... Server-side cursors impy a state persisted across calls and shared between clients and server, which is much more heavyweight, scales less for the server, and makes the identity of the connection precious. If any I would do without the server-side cursor, but it would be abdicating a responsibility that the driver should handle IMO, so although I'm not a fan I will definitely implement them from psycopg3. There are choices to be made, and it's one of those points for which I'd be happy to hear other experiences; I haven't asked around yet because I first wanted to implement a sketch of server-side cursors and then look at the 4 class side to side (to side to side) and decide what to do. Let's call them: - SyncClientCursor - AsyncClientCursor - SyncServerCursor - AsyncServerCursor The Server versions have I/O done on fetch*() and on close(). So the first three objects might have non-async fetch(). I was preferring to give all them an uniform interface so the documentation would be "on async communication, always use async when fetching", as opposite as "on async use "await execute()", no await on fetch() for client-side cursor, but "await fetch()` for server-side ones". In practice, I am envisioning a SyncCursor object interface, with SyncClient and SyncServer being two different implementations, and an AsyncCursor interface, with AsyncClient and AsyncServer being two different implementations: @abstract class SyncCursor: def execute() def fetch() class SyncClientCursor(SyncCursor): def execute() def fetch() class SyncClientCursor(SyncCursor): def execute() def fetch() @abstract class AsyncCursor: async def execute() async def fetch() class AsyncClientCursor(AsyncCursor): await def execute() await def fetch() class AsyncClientCursor(AsyncCursor): await def execute() await def fetch() Doing without await on AsyncClientCursor.fetch() creates an asymmetry: class AsyncClientCursor: await def execute() def fetch() class AsyncClientCursor: await def execute() await def fetch() so you couldn't write a function "processResults(acur)" independent of whether the cursor is client or server, whereas you can do it no problem for sync cursors. That's my brain dump! @Denis: it's just more to explain current implementation and mental processes, however in pg3 there is still space to make changes and I'm happy to hear yours and others input. I think that using await uniformly makes things easier to think about and code to compose but I'm happy to hear if there are different opinions. -- Daniele
> On Feb 3, 2021, at 09:41, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > Perhaps client-side cursors were invented to make it possible > to code to the same API regardless of whether real > (server-side) cursors are used or not ? That's definitely part of it. The name "cursor" for the client-side object is a bit unfortunate, but we need *something* there. The operations are: 1. Send query to server and execute it. 2. Retrieve results from server. Those are both asynchronous operations, and we need something to encapsulate the state to pass from #1 to #2. You couldjust have the connection object do that, but then we're declaring forever that we can't run two queries at the sametime on the same connection, and that seems unwise. -- -- Christophe Pettus xof@thebuild.com
On Wed, 3 Feb 2021 at 18:44, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > 2) if we want to have sync and async cursor, it would be better if > they had the same interface Sorry, this should have been "if we want to have client and server cursors, it would be better if they had the same interface".
Hi, Daniele Varrazzo a écrit : > On Wed, 3 Feb 2021 at 17:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > > > I'd like to discuss about the concept of client-side cursor as defined > > in psycopg. > > Welcome Denis! > > > > This is a concept I've always been quite uncomfortable with; > > I'll try to explain why. > > > > First, from a practical point of view: > > > > cur = connection.cursor() > > cur.execute(...) > > result = cur.fetchxxx() > > > > This typical pattern seems awkward to me. Where do IO operations > > actually take place? (on execute()? on fetch()? on both?) > > For a client-side cursor, it happens on execute, but for a server-side > cursor it happens in both (execute creates the cursor, I have no idea > as to when the server decides to hit the disks, but for the client > every fetch is a roundtrip with the server). Then, having fetch*() methods on client-side cursors is also misleading because nothing gets fetched actually, it's just result unpacking. > > Then, and besides the terminology, the fact that named cursors are very > > different from client-side cursors while being constructed by the same > > API is confusing. I understand that this may come from the DB-API: > > > > https://www.python.org/dev/peps/pep-0249/#cursor-objects > > > [Cursor] objects represent a database cursor. [...] > > > > But this does not seem quite right to me since a client-side cursor is > > actually not a PostgreSQL cursor. (The documention on psycopg2 is clear > > on the difference, though.) > > Yes, that's correct: the name comes from the dbapi. The dbapi cursor > does two different things indeed: it runs queries and manages results. > I understand that the overlap is sort of weird and everyone who has > said "psycopg sucks: let me write my own driver" have used the mental > model - much more familiar - of "I ask a query to a connection which > gives me a result". So usually what is created is something which can > be used as: > > result = conn.execute(query, vars) > for record in result: > ... > > This is understandable, so much that in psycopg3 I've added > conn.execute, which is implemented as: > > def execute(self, query, vars) > cur = self.cursor() > cur.execute(query, vars) > return cur > > and allow you to use the cursor only to iterate on the data returned, > ignoring the execute*() methods. I am aware of that and that's indeed a nice addition. As I mentioned in the thread introducing this feature, I still think that returning a (client-side) cursor here is confusing. It seems to me that there something wrong in either (or both) the DB-API or the way it got implemented in psycopg: * PEP-249 states that "cursor objects represent a database cursor", * it also states that "if the database does not provide a direct cursor concept, the module will have to emulate cursors using other means to the extent needed by this specification". But, unless I missed it, the PEP does not state how to implement query execution and result fetch operations if the database does not need a cursor. It seems to me that connection.execute() provides an implementation to that case, except that it'd better if it did not return a misnamed cursor but some kind of "result set" or "rows" object. As you mentioned, some other Python drivers already do this. Alternatively (or in complement), how about renaming connection.cursor() as connection.client() for client-side cursors (while keep cursor() for server-side ones)? That would fulfill the second statement above, I think. (Or is it something closer to a "portal", as defined in the Extended Query protocol? Maybe not...) Looking around in other programming languages, the situation is similar: * Golang's libpq https://godocs.io/github.com/lib/pq follows https://github.com/golang/go/wiki/SQLInterface : queries are executed on the connection object and Rows object are returned. * In Rust, there is https://docs.rs/crate/postgres/0.19.0 (sync) which is quite similar and https://docs.rs/tokio-postgres/0.7.0/tokio_postgres/ (async), which exposes a Client type to execute queries. With psycopg3 coming up, I believe it'd be a good opportunity to improve that. > > In asyncio context, while playing around with psycopg3, the situation is > > even worse because of the additional async/await keywords; for instance: > > > > conn = await psycopg3.AsyncConnection.connect() > > async with conn: > > cur = await conn.execute("SELECT * FROM test") > > r = await cur.fetchall() > > > > Why do we need two 'await' (i.e. two IO operations) on conn.execute() > > and cur.fetchall() if 'cur' is a client-side cursor? (Back to my first > > point about where IO happen, this appears to depend on whether the > > connection has 'autocommit' or not...) > > I am going a bit back and forth on this mentally, although not in the > code. The forces pulling in different directions are: > > 1) if a method may possibly do I/O activity, then it should be async. > I don't know where I have read it and how strong is this guideline, > however it makes sense: it's better to have a function to call with > await where the await is actually not needed than a function which > should have been await but now you are stuck with a blocking > interface. I gently disagree here. This is essentially the same issue that having cursor.fetch*() not fetching anything, but it's worse than "just" a naming problem because it's on the control flow. For instance, consider the following examples with and without autocommit on the connection to demonstrate where IO takes place: conn = await psycopg3.AsyncConnection.connect(autocommit=False) async with conn: cur = await conn.execute("SELECT 1") rows = await cur.fetchall() # IO happens here conn = await psycopg3.AsyncConnection.connect(autocommit=True) async with conn: cur = await conn.execute("SELECT 1") # IO happens here rows = await cur.fetchall() This is quite unexpected, and behaves differently on sync cursors (IO always happens on execute()). > 2) if we want to have client and server cursor, it would be better if > they had the same interface (Unless we drop the client-side cursor, or rename it.) > The point 2 is pretty true in psycopg2 because there is a single > cursor class, and every method is if'ed like: > > def fetchmany(self, nrecs): > if not self.name: > # client-side cursor > res = self.result[self.current, self.current + nrecs] > self.current += nrecs > return res > else: > # server-side cursor > result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}") > return result > > I don't think this is a good implementation, so in psycopg3 I would > like to have a Cursor interface with a ClientCursor and a ServerCursor > implementation, without the if for each method. In async mode, > AsyncClientCursor.fetchmany() doesn't actually need to be async, but > AsyncServerCursor.fetchmany() does I/O activity. However, would two > different implementation classes demand two different interfaces? These are not just two different implementations, these are two different things. So, yes, different interfaces (and names) would be better, IMHO. > > All in all, my main point is: (why) do we need client-side cursors? > > I think client-side cursors are the bread and butter of querying the > database. They are a lightweight and stateless way to receive a > result: you ask a question, you receive an answer, and all is over. Ok. But then, they are just a proxy to the actual results already fetched upon execute() with no added value, aren't they? > You can put caches between client server, connection pools... How is this related to client-side cursors? > Server-side cursors impy a state persisted across calls and shared > between clients and server, which is much more heavyweight, scales > less for the server, and makes the identity of the connection > precious. > > If any I would do without the server-side cursor, but it would be > abdicating a responsibility that the driver should handle IMO, so > although I'm not a fan I will definitely implement them from psycopg3. > > > There are choices to be made, and it's one of those points for which > I'd be happy to hear other experiences; I haven't asked around yet > because I first wanted to implement a sketch of server-side cursors > and then look at the 4 class side to side (to side to side) and decide > what to do. Let's call them: > > - SyncClientCursor > - AsyncClientCursor > - SyncServerCursor > - AsyncServerCursor > > The Server versions have I/O done on fetch*() and on close(). So the > first three objects might have non-async fetch(). I was preferring to > give all them an uniform interface so the documentation would be "on > async communication, always use async when fetching", as opposite as > "on async use "await execute()", no await on fetch() for client-side > cursor, but "await fetch()` for server-side ones". In practice, I am > envisioning a SyncCursor object interface, with SyncClient and > SyncServer being two different implementations, and an AsyncCursor > interface, with AsyncClient and AsyncServer being two different > implementations: > > @abstract > class SyncCursor: > def execute() > def fetch() > > class SyncClientCursor(SyncCursor): > def execute() > def fetch() > > class SyncClientCursor(SyncCursor): > def execute() > def fetch() > > @abstract > class AsyncCursor: > async def execute() > async def fetch() > > class AsyncClientCursor(AsyncCursor): > await def execute() > await def fetch() > > class AsyncClientCursor(AsyncCursor): > await def execute() > await def fetch() > > Doing without await on AsyncClientCursor.fetch() creates an asymmetry: > > class AsyncClientCursor: > await def execute() > def fetch() > > class AsyncClientCursor: > await def execute() > await def fetch() > > so you couldn't write a function "processResults(acur)" independent of > whether the cursor is client or server, whereas you can do it no > problem for sync cursors. > > That's my brain dump! @Denis: it's just more to explain current > implementation and mental processes, however in pg3 there is still > space to make changes and I'm happy to hear yours and others input. I > think that using await uniformly makes things easier to think about > and code to compose but I'm happy to hear if there are different > opinions. So my humble opinion is that naming and control flow should be explicit. For instance, we could drop the fetch*() method from client-side cursors (to be renamed) and add result access methods: class AsyncClient: await def execute() def first() # return first row def all() # return all rows while keeping the cursor term for real cursors: class AsyncCursor: await def execute() await def fetch() Thanks for you detailed and kind reply, Daniele. Also, besides discussing here (kind of a self-introduction), I'd be happy to help out code-wise on this particular aspect or another. -- Denis
> On Feb 4, 2021, at 03:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > But, unless I missed it, the PEP does not state how to implement query > execution and result fetch operations if the database does not need a > cursor. First, any change like this would have to maintain their current API essentially forever, unless psycopg3 represents a completelyincompatible break with the psycopg2 interface. There is an enormous body of code out there that uses the currentcursor() interface for client-side cursors. Second, it would be very unwise to make guarantees about when these operations interact with the database. I think thatany client application should expect that both cursor.execute() and cursor.fetchone()/.fetchall() are asynchronous operations,and code appropriately. -- -- Christophe Pettus xof@thebuild.com
Christophe Pettus a écrit : > > On Feb 4, 2021, at 03:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > > But, unless I missed it, the PEP does not state how to implement query > > execution and result fetch operations if the database does not need a > > cursor. > > First, any change like this would have to maintain their current API > essentially forever, unless psycopg3 represents a completely > incompatible break with the psycopg2 interface. There is an enormous > body of code out there that uses the current cursor() interface for > client-side cursors. Sure, that's a valid point. But maintaining this backwards compatibility does not mean we can't hide the details and advertise a cleaner API for newcomers or people willing to migrate. > Second, it would be very unwise to make guarantees about when these > operations interact with the database. I think that any client > application should expect that both cursor.execute() and > cursor.fetchone()/.fetchall() are asynchronous operations, and code > appropriately. If "cursor" is a real database cursor, I agree.
> On Feb 4, 2021, at 07:02, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > > If "cursor" is a real database cursor, I agree. But it's not just CURSORs that have this behavior. libpq allows the client to the send the query, and then make separaterequests for each row, even without a database cursor; this maps almost exactly to .execute() and .fetchone(). Itdoesn't seem a good idea to guarantee forever that .execute() will *never* do I/O without a database-side cursor. Having a single convenience method on the connection object that does the equivalent of a .execute() and a .fetchall() mightbe useful, though. -- -- Christophe Pettus xof@thebuild.com
On Thu, 4 Feb 2021 at 16:17, Christophe Pettus <xof@thebuild.com> wrote: > Having a single convenience method on the connection object that does the equivalent of a .execute() and a .fetchall()might be useful, though. You can already do it (in psycopg3): for record in conn.execute("query"): # do stuff or records = conn.execute("query").fetchall() or record = conn.execute("query").fetchone() The latter examples are two methods, not one, but I think they compose nicely enough, no? They are pretty orthogonal. -- Daniele
Christophe Pettus a écrit : > But it's not just CURSORs that have this behavior. libpq allows the > client to the send the query, and then make separate requests for each > row, even without a database cursor; this maps almost exactly to > .execute() and .fetchone(). Is this related to prepared statements in the extended query protocol? (Then, I'd argue that both preparation and execution steps would involve IO. But if it's not a cursor, we should use a different name, as postgresql doc does.) > It doesn't seem a good idea to guarantee > forever that .execute() will *never* do I/O without a database-side > cursor. Currently, it seems to me that .execute() always do IO with or without a database cursor.
> On Feb 4, 2021, at 08:38, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > Is this related to prepared statements in the extended query protocol? > (Then, I'd argue that both preparation and execution steps would involve > IO. But if it's not a cursor, we should use a different name, as > postgresql doc does.) No, this is separate from prepared statements or the extended protocol. You can do single-row returns with either basicor extended protocol. It's unfortunate that cursor as become overloaded, but I think that's something we just have to accept at this point. Itsounds like the real issue is that it would be convenient to have a single async function that does the semantic equivalentof: curs = connection.cursor() curs.execute(query, *parameters) for row in curs.fetchall(): yield row curs.close() -- -- Christophe Pettus xof@thebuild.com
On Thu, 4 Feb 2021 at 12:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > Daniele Varrazzo a écrit : > Then, having fetch*() methods on client-side cursors is also misleading > because nothing gets fetched actually, it's just result unpacking. Arguably the naming is not correct, and they have been called that way to fulfill the requirement of "emulating the cursor" as the dbapi requires. > > This is understandable, so much that in psycopg3 I've added > > conn.execute, which is implemented as: > I am aware of that and that's indeed a nice addition. As I mentioned in > the thread introducing this feature, I still think that returning a > (client-side) cursor here is confusing. You can think about the cursor as a result in that case. I understand your dissatisfaction with it is a) the names b) the async. Let's see if there is something we can do around these points. > It seems to me that there something wrong in either (or both) the DB-API > or the way it got implemented in psycopg: > > * PEP-249 states that "cursor objects represent a database cursor", > > * it also states that "if the database does not provide a direct cursor > concept, the module will have to emulate cursors using other means to > the extent needed by this specification". > > But, unless I missed it, the PEP does not state how to implement query > execution and result fetch operations if the database does not need a > cursor. I think that wasn't the point of the DBAPI. I believe that the DB-SIG only wanted to create an interface which would be shared by different adapters, either hiding the detail (requiring e.g. to emulate client-side a missing feature) or providing description for the differences. I wasn't around when things were fleshed out but I believe it was the late 90s, and the leading database communication models were much more heavyweight (DAO, ADO, ODBC?); programs would have consisted of relatively few desktop (hundreds? thousands?) which would "open a cursor" going up and down in the results using DB cursors. Written in Visual Basic and the likes... So I think the DBAPI doesn't specify an execution model and when the IO has to happen because that would have worked for certain DBs and not for others. But I believe that if a standard would be drafted now would be a much more stateless one and wouldn't focus on cursors. Fast-forward 20 years... > It seems to me that connection.execute() provides an implementation to > that case, except that it'd better if it did not return a misnamed > cursor but some kind of "result set" or "rows" object. As you mentioned, > some other Python drivers already do this. > > Alternatively (or in complement), how about renaming connection.cursor() > as connection.client() for client-side cursors (while keep cursor() for > server-side ones)? That would fulfill the second statement above, I > think. (Or is it something closer to a "portal", as defined in the > Extended Query protocol? Maybe not...) ... while I want to use the psycopg3 chance to improve what can be done better than psycopg2, given the ways Python and Postgres have changed in several years, I don't want to introduce gratuitous changes. I would still like for psycopg3 to be a drop-in replacement for psycopg2, and while there are differences enough to require testing your application, I don't want to require the users to change *every single call* to psycopg2. I understand that getone(), getmany(), getall() would be more apt names than fetch*() in case their implementation says they are client-side, but is it really worth to ask the users to grep their codebases to replace all the occurrences? As things stand now, a lot of relatively simple code can run unmodified with either pg2 and 3 With the async objects we can have more freedom because there is no precedent code (I believe the interface is the same exposed by aiopg, the async psycopg2 wrapper, but maintaining compatibility with it is not a thing I feel required). > Looking around in other programming languages, the situation is similar: > > * Golang's libpq https://godocs.io/github.com/lib/pq follows > https://github.com/golang/go/wiki/SQLInterface : queries are executed > on the connection object and Rows object are returned. > > * In Rust, there is https://docs.rs/crate/postgres/0.19.0 (sync) which > is quite similar and https://docs.rs/tokio-postgres/0.7.0/tokio_postgres/ > (async), which exposes a Client type to execute queries. > > With psycopg3 coming up, I believe it'd be a good opportunity to improve > that. tokio_postgres makes me curious about pipelining... However those seem all interfaces which don't hide the nature of cursorless querying. If we have an adapter which can work both cursor and cursorless I think it is good that they offer the same interface to obtain the results, even if the way they obtain has a different implementation. > > 1) if a method may possibly do I/O activity, then it should be async. > > I don't know where I have read it and how strong is this guideline, > > however it makes sense: it's better to have a function to call with > > await where the await is actually not needed than a function which > > should have been await but now you are stuck with a blocking > > interface. > > I gently disagree here. This is essentially the same issue that having > cursor.fetch*() not fetching anything, but it's worse than "just" a > naming problem because it's on the control flow. > > For instance, consider the following examples with and without > autocommit on the connection to demonstrate where IO takes place: > > conn = await psycopg3.AsyncConnection.connect(autocommit=False) > async with conn: > cur = await conn.execute("SELECT 1") > rows = await cur.fetchall() # IO happens here > > conn = await psycopg3.AsyncConnection.connect(autocommit=True) > async with conn: > cur = await conn.execute("SELECT 1") # IO happens here > rows = await cur.fetchall() > > This is quite unexpected, and behaves differently on sync cursors (IO > always happens on execute()). I think that's wrong, I don't believe it changes with autocommit. No-autocommit fires an extra begin together with the execute but doesn't change the fetch behaviour. IO on execute happens regardless of the type of connection and cursor, IO on fetch happens only for server-side cursors. > > 2) if we want to have client and server cursor, it would be better if > > they had the same interface > > (Unless we drop the client-side cursor, or rename it.) > > > The point 2 is pretty true in psycopg2 because there is a single > > cursor class, and every method is if'ed like: > > > > def fetchmany(self, nrecs): > > if not self.name: > > # client-side cursor > > res = self.result[self.current, self.current + nrecs] > > self.current += nrecs > > return res > > else: > > # server-side cursor > > result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}") > > return result > > > > I don't think this is a good implementation, so in psycopg3 I would > > like to have a Cursor interface with a ClientCursor and a ServerCursor > > implementation, without the if for each method. In async mode, > > AsyncClientCursor.fetchmany() doesn't actually need to be async, but > > AsyncServerCursor.fetchmany() does I/O activity. However, would two > > different implementation classes demand two different interfaces? > > These are not just two different implementations, these are two > different things. So, yes, different interfaces (and names) would be > better, IMHO. But for an user who is not overly concerned with when I/O happen (assuming that they know where it happens and that they are fine with it, not wanting to keep them in the dark) having the same interface means that their application can be written without being concerned by whether a server or client cursor was used. You seem to want to implement two diverging and leaky interfaces, modeled on what effectively happens in I/O and not providing any leeway. The price to pay for it is to have the async keyword on methods which in a specific implementation don't actually do I/O but 1) it's cheap, you don't pay much for it, it doesn't really do I/O and 2) it allows the possibility to write a cursor subclass which actually does IO on input, for instance a catalog lookup on fetch(). But even logging to a remote logger is an I/O operation, so if you don't have `async fetchone()` you cannot write a subclass of these objects to do it: you have to write a wrapper instead, with a necessarily different interface. I think we would be painting ourselves in a corner. > > > All in all, my main point is: (why) do we need client-side cursors? > > > > I think client-side cursors are the bread and butter of querying the > > database. They are a lightweight and stateless way to receive a > > result: you ask a question, you receive an answer, and all is over. > > Ok. But then, they are just a proxy to the actual results already > fetched upon execute() with no added value, aren't they? Yes, they are just a useful container for a resultset. It's a useful object to have, regardless of its name. > > You can put caches between client server, connection pools... > > How is this related to client-side cursors? They are things you cannot do with a server-side cursor, and in my experience much more frequently used. When I say "cursor" I mean "result" here, really, as in "you can have these things if you fetch your result entirely". > So my humble opinion is that naming and control flow should be explicit. > For instance, we could drop the fetch*() method from client-side cursors > (to be renamed) and add result access methods: > > class AsyncClient: > await def execute() > def first() # return first row > def all() # return all rows > > while keeping the cursor term for real cursors: > > class AsyncCursor: > await def execute() > await def fetch() This requires everyone to rewrite their program in order to use psycopg3. I think that the adoption of the adapter would be zero if there isn't a frictionless way to use the new version, and I am trying to design the adapter to be easy to upgrade. This might imply some non trivial work internally but I think it's necessary if we want a non-zero chance of the adapter to be considered by the industry. The names and concepts you propose are surely better connected to the reality of the implementation of a cursor-less query, but I think that's not a strong enough reason to lose the whole user base. asyncpg uses the same concept: it is a much thinner interface, it is designed around the fe-be implementation rather than to offer an interface that, as imprecise as it is, it's what people have used for 20 years but, as it requires a complete rewrite of all the queries and of all the code interacting with the database, it doesn't seem it has attracted many users. > Thanks for you detailed and kind reply, Daniele. Also, besides > discussing here (kind of a self-introduction), I'd be happy to help out > code-wise on this particular aspect or another. Your help would be very appreciated, thank you :) If you, or others, would like to help, a few open areas of work are listed here: https://github.com/psycopg/psycopg3/projects/1 Let me write a separate message with the state of the works... -- Daniele
Daniele Varrazzo a écrit : > On Thu, 4 Feb 2021 at 12:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > > > Daniele Varrazzo a écrit : > > > This is understandable, so much that in psycopg3 I've added > > > conn.execute, which is implemented as: > > > I am aware of that and that's indeed a nice addition. As I mentioned in > > the thread introducing this feature, I still think that returning a > > (client-side) cursor here is confusing. > > You can think about the cursor as a result in that case. I understand > your dissatisfaction with it is a) the names b) the async. Let's see > if there is something we can do around these points. > > [...] > > It seems to me that connection.execute() provides an implementation to > > that case, except that it'd better if it did not return a misnamed > > cursor but some kind of "result set" or "rows" object. As you mentioned, > > some other Python drivers already do this. > > > > Alternatively (or in complement), how about renaming connection.cursor() > > as connection.client() for client-side cursors (while keep cursor() for > > server-side ones)? That would fulfill the second statement above, I > > think. (Or is it something closer to a "portal", as defined in the > > Extended Query protocol? Maybe not...) > > ... while I want to use the psycopg3 chance to improve what can be > done better than psycopg2, given the ways Python and Postgres have > changed in several years, I don't want to introduce gratuitous > changes. I would still like for psycopg3 to be a drop-in replacement > for psycopg2, and while there are differences enough to require > testing your application, I don't want to require the users to change > *every single call* to psycopg2. I understand that getone(), > getmany(), getall() would be more apt names than fetch*() in case > their implementation says they are client-side, but is it really worth > to ask the users to grep their codebases to replace all the > occurrences? As things stand now, a lot of relatively simple code can > run unmodified with either pg2 and 3 I understand that; keeping the user base by providing a smooth migration path is highly important. (As I said in another message, I'm not suggesting to entirely drop the compatibility.) So renaming client-side cursors to something else is not an option (too much of a "gratuitous" change), that's a fair point. However, could we have connection.execute() not return a cursor (with fetch*() methods)? It seems like a thin adaptation to me, apparently not needing bw compat, and a good step forwards "cursorless querying". > With the async objects we can have more freedom because there is no > precedent code (I believe the interface is the same exposed by aiopg, > the async psycopg2 wrapper, but maintaining compatibility with it is > not a thing I feel required). > > > > Looking around in other programming languages, the situation is similar: > > > > * Golang's libpq https://godocs.io/github.com/lib/pq follows > > https://github.com/golang/go/wiki/SQLInterface : queries are executed > > on the connection object and Rows object are returned. > > > > * In Rust, there is https://docs.rs/crate/postgres/0.19.0 (sync) which > > is quite similar and https://docs.rs/tokio-postgres/0.7.0/tokio_postgres/ > > (async), which exposes a Client type to execute queries. > > > > With psycopg3 coming up, I believe it'd be a good opportunity to improve > > that. > > tokio_postgres makes me curious about pipelining... > > However those seem all interfaces which don't hide the nature of > cursorless querying. If we have an adapter which can work both cursor > and cursorless I think it is good that they offer the same interface > to obtain the results, even if the way they obtain has a different > implementation. This concept of "cursorless querying" is new to me and I somehow see the situation reversed: why would we expose an artificial thing (the client-side cursor)? (Rhetorical question, feel free to ignore.) We should make "cursorless querying" more explicit! (Building on connection.execute().) > > > 1) if a method may possibly do I/O activity, then it should be async. > > > I don't know where I have read it and how strong is this guideline, > > > however it makes sense: it's better to have a function to call with > > > await where the await is actually not needed than a function which > > > should have been await but now you are stuck with a blocking > > > interface. > > > > I gently disagree here. This is essentially the same issue that having > > cursor.fetch*() not fetching anything, but it's worse than "just" a > > naming problem because it's on the control flow. > > > > For instance, consider the following examples with and without > > autocommit on the connection to demonstrate where IO takes place: > > > > conn = await psycopg3.AsyncConnection.connect(autocommit=False) > > async with conn: > > cur = await conn.execute("SELECT 1") > > rows = await cur.fetchall() # IO happens here > > > > conn = await psycopg3.AsyncConnection.connect(autocommit=True) > > async with conn: > > cur = await conn.execute("SELECT 1") # IO happens here > > rows = await cur.fetchall() > > > > This is quite unexpected, and behaves differently on sync cursors (IO > > always happens on execute()). > > I think that's wrong, I don't believe it changes with autocommit. > No-autocommit fires an extra begin together with the execute but > doesn't change the fetch behaviour. IO on execute happens regardless > of the type of connection and cursor, IO on fetch happens only for > server-side cursors. Well, maybe I'm missing something... In the examples above, (written down explicitly to understand where IO happens), if I shut down postgres between 'await conn.execute()' and 'await cur.fetchall()', the first example breaks but the second doesn't. Perhaps the autocommit mention was misleading; it's enough to insert 'await conn.commit()' before 'await cur.fetchall()' to reproduce. So (and again, unless I'm missing something), if this is not "by design", maybe this is bug? > > > 2) if we want to have client and server cursor, it would be better if > > > they had the same interface > > > > (Unless we drop the client-side cursor, or rename it.) > > > > > The point 2 is pretty true in psycopg2 because there is a single > > > cursor class, and every method is if'ed like: > > > > > > def fetchmany(self, nrecs): > > > if not self.name: > > > # client-side cursor > > > res = self.result[self.current, self.current + nrecs] > > > self.current += nrecs > > > return res > > > else: > > > # server-side cursor > > > result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}") > > > return result > > > > > > I don't think this is a good implementation, so in psycopg3 I would > > > like to have a Cursor interface with a ClientCursor and a ServerCursor > > > implementation, without the if for each method. In async mode, > > > AsyncClientCursor.fetchmany() doesn't actually need to be async, but > > > AsyncServerCursor.fetchmany() does I/O activity. However, would two > > > different implementation classes demand two different interfaces? > > > > These are not just two different implementations, these are two > > different things. So, yes, different interfaces (and names) would be > > better, IMHO. > > But for an user who is not overly concerned with when I/O happen > (assuming that they know where it happens and that they are fine with > it, not wanting to keep them in the dark) having the same interface > means that their application can be written without being concerned by > whether a server or client cursor was used. > > You seem to want to implement two diverging and leaky interfaces, > modeled on what effectively happens in I/O and not providing any > leeway. The price to pay for it is to have the async keyword on > methods which in a specific implementation don't actually do I/O but > 1) it's cheap, you don't pay much for it, it doesn't really do I/O and > 2) it allows the possibility to write a cursor subclass which actually > does IO on input, for instance a catalog lookup on fetch(). But even > logging to a remote logger is an I/O operation, so if you don't have > `async fetchone()` you cannot write a subclass of these objects to do > it: you have to write a wrapper instead, with a necessarily different > interface. I think we would be painting ourselves in a corner. Sorry, I don't buy that point. To me, slightly exaggerating, it seems like we should make any methods of an interface 'async' just in case someones wants to fire some IO in a subclass. In other words, having IO entry-points properly identified follows the principle of least surprise (or "explicit is better than implicit", to take over from the famous zen). > > > > All in all, my main point is: (why) do we need client-side cursors? > > > > > > I think client-side cursors are the bread and butter of querying the > > > database. They are a lightweight and stateless way to receive a > > > result: you ask a question, you receive an answer, and all is over. > > > > Ok. But then, they are just a proxy to the actual results already > > fetched upon execute() with no added value, aren't they? > > Yes, they are just a useful container for a resultset. It's a useful > object to have, regardless of its name. > > > > > You can put caches between client server, connection pools... > > > > How is this related to client-side cursors? > > They are things you cannot do with a server-side cursor, and in my > experience much more frequently used. When I say "cursor" I mean > "result" here, really, as in "you can have these things if you fetch > your result entirely". > > > > So my humble opinion is that naming and control flow should be explicit. > > For instance, we could drop the fetch*() method from client-side cursors > > (to be renamed) and add result access methods: > > > > class AsyncClient: > > await def execute() > > def first() # return first row > > def all() # return all rows > > > > while keeping the cursor term for real cursors: > > > > class AsyncCursor: > > await def execute() > > await def fetch() > > This requires everyone to rewrite their program in order to use > psycopg3. I think that the adoption of the adapter would be zero if > there isn't a frictionless way to use the new version, and I am trying > to design the adapter to be easy to upgrade. This might imply some non > trivial work internally but I think it's necessary if we want a > non-zero chance of the adapter to be considered by the industry. As far as the async interface is concerned, I think there is no adoption issue because there's no precedent use from psycopg2. So we could expose two API: cursorless querying ('await conn.execute()') and have a single server-side cursor class. Cheers, Denis
> On Feb 4, 2021, at 09:21, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > > Well, maybe I'm missing something... In the examples above, (written > down explicitly to understand where IO happens), if I shut down postgres > between 'await conn.execute()' and 'await cur.fetchall()', the first > example breaks but the second doesn't. Perhaps the autocommit mention > was misleading; it's enough to insert 'await conn.commit()' before > 'await cur.fetchall()' to reproduce. So (and again, unless I'm missing > something), if this is not "by design", maybe this is bug? You're relying on private knowledge, not an API guarantee, as to "where I/O happens" here. Like any expectation based onprivate knowledge, you can get tripped up by that. We shouldn't write into the API contract that either cursor.execute() or cursor.fetch*() are guaranteed not to fail withan I/O error. Any time you interact with the cursor object, that can be assumed to be an asynchronous operation thatcan fail due to the remote server not being available. Knowing that, you can reasonably code defensively no matter what. Adding those explicit guarantees about I/O to the API hugely limits underlying implementation changes in the future, foras far as I can tell no real gain. If it comes down to "cursor isn't a good name for this class," that's probably true, but we're a decade past making thatdecision. > As far as the async interface is concerned, I think there is no adoption > issue because there's no precedent use from psycopg2. So we could > expose two API: cursorless querying ('await conn.execute()') and have a > single server-side cursor class. Right now, switching from using just a client-side cursor object to server-side cursor preserves largely preserves the API. I think that's a valuable feature that's worth retaining. -- -- Christophe Pettus xof@thebuild.com
Christophe Pettus a écrit : >> On Feb 4, 2021, at 09:21, Denis Laxalde <denis.laxalde@dalibo.com> wrote: >> >> Well, maybe I'm missing something... In the examples above, (written >> down explicitly to understand where IO happens), if I shut down postgres >> between 'await conn.execute()' and 'await cur.fetchall()', the first >> example breaks but the second doesn't. Perhaps the autocommit mention >> was misleading; it's enough to insert 'await conn.commit()' before >> 'await cur.fetchall()' to reproduce. So (and again, unless I'm missing >> something), if this is not "by design", maybe this is bug? > > You're relying on private knowledge, not an API guarantee, as to "where I/O happens" here. Like any expectation basedon private knowledge, you can get tripped up by that. Well, that's just an exercise for me to understand where things happen. Of course, I'd expect an 'await' expression to possibly involve I/O in real life. > If it comes down to "cursor isn't a good name for this class," that's probably true, but we're a decade past making thatdecision. That, and the fact that fetch*() methods do not fetch actually. (For the synchronous case, I'm quite convinced now this is too late to change.) >> As far as the async interface is concerned, I think there is no adoption >> issue because there's no precedent use from psycopg2. So we could >> expose two API: cursorless querying ('await conn.execute()') and have a >> single server-side cursor class. > > Right now, switching from using just a client-side cursor object to server-side cursor preserves largely preserves theAPI. I think that's a valuable feature that's worth retaining. Worth retaining, maybe. Is it worth imposing the 'await cursor.fetch*()' pattern to everybody when it does not appear needed? I'm not so sure, as this would be at the cost of readability and clarity. (On the other hand, if I were to migrate some code to use server-side cursors, I would probably do it on a case-by-case basis; so if the cursor wasn't there in the first place, adding it would be little trouble in contrast to deciding where a server-side cursor is needed or not.)
Am Wed, Feb 03, 2021 at 06:44:33PM +0100 schrieb Daniele Varrazzo: > Yes, that's correct: the name comes from the dbapi. The dbapi cursor > does two different things indeed: it runs queries and manages results. > I understand that the overlap is sort of weird and everyone who has > said "psycopg sucks: let me write my own driver" have used the mental > model - much more familiar - of "I ask a query to a connection which > gives me a result". So usually what is created is something which can > be used as: > > result = conn.execute(query, vars) > for record in result: > ... > > This is understandable, so much that in psycopg3 I've added > conn.execute, which is implemented as: > > def execute(self, query, vars) > cur = self.cursor() > cur.execute(query, vars) > return cur > > and allow you to use the cursor only to iterate on the data returned, > ignoring the execute*() methods. Given that people interested in using conn.execute() don't seem to want to concern themselves with cursors at all (until there's an explicit need, at which point they would seem to want a server-side cursor, and use conn.cursor()), and the fact that conn.execute() is outside the DB-API anyway, I wonder whether this class connection: def execute(self, query, vars) cur = self.cursor() cur.execute(query, vars) return cur.fetchall() makes even more sense ? Perhaps even reconsider naming it "execute". > The Server versions have I/O done on fetch*() and on close(). So the > first three objects might have non-async fetch(). I was preferring to > give all them an uniform interface +1 Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On Fri, 5 Feb 2021 at 10:41, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote: > Given that people interested in using conn.execute() don't > seem to want to concern themselves with cursors at all (until > there's an explicit need, at which point they would seem to > want a server-side cursor, and use conn.cursor()), and the > fact that conn.execute() is outside the DB-API anyway, I > wonder whether this > > class connection: > def execute(self, query, vars) > cur = self.cursor() > cur.execute(query, vars) > return cur.fetchall() > > makes even more sense ? > > Perhaps even reconsider naming it "execute". If it didn't return a cursor, it would make sense to reconsider calling it execute(). As it is now it returns the same that cursor returns, it's pretty much just a contraption of a chain of methods, hence the same name. If you return just the fetchall list you lose access to results metadata (description), nextset, and someone will come asking "can I have executefetchone() please" the next minute :) I'll play a bit more with it in the test suite (which is currently the main body of code using psycopg3) and think about it. -- Daniele
Am Fri, Feb 05, 2021 at 11:40:31AM +0100 schrieb Daniele Varrazzo: > > class connection: > > def execute(self, query, vars) > > cur = self.cursor() > > cur.execute(query, vars) > > return cur.fetchall() > > > > makes even more sense ? > > > > Perhaps even reconsider naming it "execute". > > If it didn't return a cursor, it would make sense to reconsider > calling it execute(). As it is now it returns the same that cursor > returns, it's pretty much just a contraption of a chain of methods, > hence the same name. > > If you return just the fetchall list you lose access to results > metadata (description), nextset, and someone will come asking "can I > have executefetchone() please" the next minute :) Yeah, I know. Therefore I thought maybe "conn.run_query()" or some such because execute() is already loaded with meaning. If one wants access to what a cursor provides, as defined by the DB-API, one should _use_ a cursor, as per DB-API :-) Or, perhaps, it returns a generator over the fetchall() results list ? Karsten -- GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B
On Wed, 3 Feb 2021 at 17:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > Why do we need two 'await' (i.e. two IO operations) on conn.execute() > and cur.fetchall() if 'cur' is a client-side cursor? (Back to my first > point about where IO happen, this appears to depend on whether the > connection has 'autocommit' or not...) Around this point, one arguably not useful `await` is on `connection.cursor()`. I wanted to put some flesh around server-side cursors before making my mind around it. Now I have, as I am working in a server-side cursor branch, and I don't think there are reasonable cases where `connection.cursor()` might do I/O. So, in the server-side branch, I've made the function non-async on AsyncConnection too. This makes also context manager better to use, as now you can do: async with aconn.cursor() as cur: # use it whereas previously it would have taken an `async with await aconn.cursor()`. I assume the change is welcome, but please let me know if this is not the case. Cheers -- Daniele
Daniele Varrazzo a écrit : > So, in the server-side branch, I've made the function non-async on > AsyncConnection too. This makes also context manager better to use, as > now you can do: > > async with aconn.cursor() as cur: > # use it > > whereas previously it would have taken an `async with await aconn.cursor()`. > > I assume the change is welcome, but please let me know if this is not the case. Yes, that change is welcome. Thanks!