Re: about client-side cursors - Mailing list psycopg
From | Daniele Varrazzo |
---|---|
Subject | Re: about client-side cursors |
Date | |
Msg-id | CA+mi_8bUaY0kGj0Oy6HA4CYPoFs6jG2Hi=p2Joh4hgnAAGcR4A@mail.gmail.com Whole thread Raw |
In response to | Re: about client-side cursors (Denis Laxalde <denis.laxalde@dalibo.com>) |
List | psycopg |
On Thu, 4 Feb 2021 at 12:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > Daniele Varrazzo a écrit : > Then, having fetch*() methods on client-side cursors is also misleading > because nothing gets fetched actually, it's just result unpacking. Arguably the naming is not correct, and they have been called that way to fulfill the requirement of "emulating the cursor" as the dbapi requires. > > This is understandable, so much that in psycopg3 I've added > > conn.execute, which is implemented as: > I am aware of that and that's indeed a nice addition. As I mentioned in > the thread introducing this feature, I still think that returning a > (client-side) cursor here is confusing. You can think about the cursor as a result in that case. I understand your dissatisfaction with it is a) the names b) the async. Let's see if there is something we can do around these points. > It seems to me that there something wrong in either (or both) the DB-API > or the way it got implemented in psycopg: > > * PEP-249 states that "cursor objects represent a database cursor", > > * it also states that "if the database does not provide a direct cursor > concept, the module will have to emulate cursors using other means to > the extent needed by this specification". > > But, unless I missed it, the PEP does not state how to implement query > execution and result fetch operations if the database does not need a > cursor. I think that wasn't the point of the DBAPI. I believe that the DB-SIG only wanted to create an interface which would be shared by different adapters, either hiding the detail (requiring e.g. to emulate client-side a missing feature) or providing description for the differences. I wasn't around when things were fleshed out but I believe it was the late 90s, and the leading database communication models were much more heavyweight (DAO, ADO, ODBC?); programs would have consisted of relatively few desktop (hundreds? thousands?) which would "open a cursor" going up and down in the results using DB cursors. Written in Visual Basic and the likes... So I think the DBAPI doesn't specify an execution model and when the IO has to happen because that would have worked for certain DBs and not for others. But I believe that if a standard would be drafted now would be a much more stateless one and wouldn't focus on cursors. Fast-forward 20 years... > It seems to me that connection.execute() provides an implementation to > that case, except that it'd better if it did not return a misnamed > cursor but some kind of "result set" or "rows" object. As you mentioned, > some other Python drivers already do this. > > Alternatively (or in complement), how about renaming connection.cursor() > as connection.client() for client-side cursors (while keep cursor() for > server-side ones)? That would fulfill the second statement above, I > think. (Or is it something closer to a "portal", as defined in the > Extended Query protocol? Maybe not...) ... while I want to use the psycopg3 chance to improve what can be done better than psycopg2, given the ways Python and Postgres have changed in several years, I don't want to introduce gratuitous changes. I would still like for psycopg3 to be a drop-in replacement for psycopg2, and while there are differences enough to require testing your application, I don't want to require the users to change *every single call* to psycopg2. I understand that getone(), getmany(), getall() would be more apt names than fetch*() in case their implementation says they are client-side, but is it really worth to ask the users to grep their codebases to replace all the occurrences? As things stand now, a lot of relatively simple code can run unmodified with either pg2 and 3 With the async objects we can have more freedom because there is no precedent code (I believe the interface is the same exposed by aiopg, the async psycopg2 wrapper, but maintaining compatibility with it is not a thing I feel required). > Looking around in other programming languages, the situation is similar: > > * Golang's libpq https://godocs.io/github.com/lib/pq follows > https://github.com/golang/go/wiki/SQLInterface : queries are executed > on the connection object and Rows object are returned. > > * In Rust, there is https://docs.rs/crate/postgres/0.19.0 (sync) which > is quite similar and https://docs.rs/tokio-postgres/0.7.0/tokio_postgres/ > (async), which exposes a Client type to execute queries. > > With psycopg3 coming up, I believe it'd be a good opportunity to improve > that. tokio_postgres makes me curious about pipelining... However those seem all interfaces which don't hide the nature of cursorless querying. If we have an adapter which can work both cursor and cursorless I think it is good that they offer the same interface to obtain the results, even if the way they obtain has a different implementation. > > 1) if a method may possibly do I/O activity, then it should be async. > > I don't know where I have read it and how strong is this guideline, > > however it makes sense: it's better to have a function to call with > > await where the await is actually not needed than a function which > > should have been await but now you are stuck with a blocking > > interface. > > I gently disagree here. This is essentially the same issue that having > cursor.fetch*() not fetching anything, but it's worse than "just" a > naming problem because it's on the control flow. > > For instance, consider the following examples with and without > autocommit on the connection to demonstrate where IO takes place: > > conn = await psycopg3.AsyncConnection.connect(autocommit=False) > async with conn: > cur = await conn.execute("SELECT 1") > rows = await cur.fetchall() # IO happens here > > conn = await psycopg3.AsyncConnection.connect(autocommit=True) > async with conn: > cur = await conn.execute("SELECT 1") # IO happens here > rows = await cur.fetchall() > > This is quite unexpected, and behaves differently on sync cursors (IO > always happens on execute()). I think that's wrong, I don't believe it changes with autocommit. No-autocommit fires an extra begin together with the execute but doesn't change the fetch behaviour. IO on execute happens regardless of the type of connection and cursor, IO on fetch happens only for server-side cursors. > > 2) if we want to have client and server cursor, it would be better if > > they had the same interface > > (Unless we drop the client-side cursor, or rename it.) > > > The point 2 is pretty true in psycopg2 because there is a single > > cursor class, and every method is if'ed like: > > > > def fetchmany(self, nrecs): > > if not self.name: > > # client-side cursor > > res = self.result[self.current, self.current + nrecs] > > self.current += nrecs > > return res > > else: > > # server-side cursor > > result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}") > > return result > > > > I don't think this is a good implementation, so in psycopg3 I would > > like to have a Cursor interface with a ClientCursor and a ServerCursor > > implementation, without the if for each method. In async mode, > > AsyncClientCursor.fetchmany() doesn't actually need to be async, but > > AsyncServerCursor.fetchmany() does I/O activity. However, would two > > different implementation classes demand two different interfaces? > > These are not just two different implementations, these are two > different things. So, yes, different interfaces (and names) would be > better, IMHO. But for an user who is not overly concerned with when I/O happen (assuming that they know where it happens and that they are fine with it, not wanting to keep them in the dark) having the same interface means that their application can be written without being concerned by whether a server or client cursor was used. You seem to want to implement two diverging and leaky interfaces, modeled on what effectively happens in I/O and not providing any leeway. The price to pay for it is to have the async keyword on methods which in a specific implementation don't actually do I/O but 1) it's cheap, you don't pay much for it, it doesn't really do I/O and 2) it allows the possibility to write a cursor subclass which actually does IO on input, for instance a catalog lookup on fetch(). But even logging to a remote logger is an I/O operation, so if you don't have `async fetchone()` you cannot write a subclass of these objects to do it: you have to write a wrapper instead, with a necessarily different interface. I think we would be painting ourselves in a corner. > > > All in all, my main point is: (why) do we need client-side cursors? > > > > I think client-side cursors are the bread and butter of querying the > > database. They are a lightweight and stateless way to receive a > > result: you ask a question, you receive an answer, and all is over. > > Ok. But then, they are just a proxy to the actual results already > fetched upon execute() with no added value, aren't they? Yes, they are just a useful container for a resultset. It's a useful object to have, regardless of its name. > > You can put caches between client server, connection pools... > > How is this related to client-side cursors? They are things you cannot do with a server-side cursor, and in my experience much more frequently used. When I say "cursor" I mean "result" here, really, as in "you can have these things if you fetch your result entirely". > So my humble opinion is that naming and control flow should be explicit. > For instance, we could drop the fetch*() method from client-side cursors > (to be renamed) and add result access methods: > > class AsyncClient: > await def execute() > def first() # return first row > def all() # return all rows > > while keeping the cursor term for real cursors: > > class AsyncCursor: > await def execute() > await def fetch() This requires everyone to rewrite their program in order to use psycopg3. I think that the adoption of the adapter would be zero if there isn't a frictionless way to use the new version, and I am trying to design the adapter to be easy to upgrade. This might imply some non trivial work internally but I think it's necessary if we want a non-zero chance of the adapter to be considered by the industry. The names and concepts you propose are surely better connected to the reality of the implementation of a cursor-less query, but I think that's not a strong enough reason to lose the whole user base. asyncpg uses the same concept: it is a much thinner interface, it is designed around the fe-be implementation rather than to offer an interface that, as imprecise as it is, it's what people have used for 20 years but, as it requires a complete rewrite of all the queries and of all the code interacting with the database, it doesn't seem it has attracted many users. > Thanks for you detailed and kind reply, Daniele. Also, besides > discussing here (kind of a self-introduction), I'd be happy to help out > code-wise on this particular aspect or another. Your help would be very appreciated, thank you :) If you, or others, would like to help, a few open areas of work are listed here: https://github.com/psycopg/psycopg3/projects/1 Let me write a separate message with the state of the works... -- Daniele