Re: about client-side cursors - Mailing list psycopg

From Denis Laxalde
Subject Re: about client-side cursors
Date
Msg-id 20210204172135.bl2boudpyw4vfahv@dalibo.com
Whole thread Raw
In response to about client-side cursors  (Denis Laxalde <denis.laxalde@dalibo.com>)
Responses Re: about client-side cursors  (Christophe Pettus <xof@thebuild.com>)
List psycopg
Daniele Varrazzo a écrit :
> On Thu, 4 Feb 2021 at 12:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote:
> 
> > Daniele Varrazzo a écrit :
> > > This is understandable, so much that in psycopg3 I've added
> > > conn.execute, which is implemented as:
> 
> > I am aware of that and that's indeed a nice addition. As I mentioned in
> > the thread introducing this feature, I still think that returning a
> > (client-side) cursor here is confusing.
> 
> You can think about the cursor as a result in that case. I understand
> your dissatisfaction with it is a) the names b) the async. Let's see
> if there is something we can do around these points.
> 
> [...]
> > It seems to me that connection.execute() provides an implementation to
> > that case, except that it'd better if it did not return a misnamed
> > cursor but some kind of "result set" or "rows" object. As you mentioned,
> > some other Python drivers already do this.
> >
> > Alternatively (or in complement), how about renaming connection.cursor()
> > as connection.client() for client-side cursors (while keep cursor() for
> > server-side ones)? That would fulfill the second statement above, I
> > think. (Or is it something closer to a "portal", as defined in the
> > Extended Query protocol? Maybe not...)
> 
> ... while I want to use the psycopg3 chance to improve what can be
> done better than psycopg2, given the ways Python and Postgres have
> changed in several years, I don't want to introduce gratuitous
> changes. I would still like for psycopg3 to be a drop-in replacement
> for psycopg2, and while there are differences enough to require
> testing your application, I don't want to require the users to change
> *every single call* to psycopg2. I understand that getone(),
> getmany(), getall() would be more apt names than fetch*() in case
> their implementation says they are client-side, but is it really worth
> to ask the users to grep their codebases to replace all the
> occurrences? As things stand now, a lot of relatively simple code can
> run unmodified with either pg2 and 3

I understand that; keeping the user base by providing a smooth migration
path is highly important. (As I said in another message, I'm not
suggesting to entirely drop the compatibility.)

So renaming client-side cursors to something else is not an option
(too much of a "gratuitous" change), that's a fair point.

However, could we have connection.execute() not return a cursor (with
fetch*() methods)? It seems like a thin adaptation to me, apparently not
needing bw compat, and a good step forwards "cursorless querying".

> With the async objects we can have more freedom because there is no
> precedent code (I believe the interface is the same exposed by aiopg,
> the async psycopg2 wrapper, but maintaining compatibility with it is
> not a thing I feel required).
> 
> 
> > Looking around in other programming languages, the situation is similar:
> >
> > * Golang's libpq https://godocs.io/github.com/lib/pq follows
> >   https://github.com/golang/go/wiki/SQLInterface : queries are executed
> >   on the connection object and Rows object are returned.
> >
> > * In Rust, there is https://docs.rs/crate/postgres/0.19.0 (sync) which
> >   is quite similar and https://docs.rs/tokio-postgres/0.7.0/tokio_postgres/
> >   (async), which exposes a Client type to execute queries.
> >
> > With psycopg3 coming up, I believe it'd be a good opportunity to improve
> > that.
> 
> tokio_postgres makes me curious about pipelining...
> 
> However those seem all interfaces which don't hide the nature of
> cursorless querying. If we have an adapter which can work both cursor
> and cursorless I think it is good that they offer the same interface
> to obtain the results, even if the way they obtain has a different
> implementation.

This concept of "cursorless querying" is new to me and I somehow see the
situation reversed: why would we expose an artificial thing (the
client-side cursor)? (Rhetorical question, feel free to ignore.)

We should make "cursorless querying" more explicit! (Building on
connection.execute().)

> > > 1) if a method may possibly do I/O activity, then it should be async.
> > > I don't know where I have read it and how strong is this guideline,
> > > however it makes sense: it's better to have a function to call with
> > > await where the await is actually not needed than a function which
> > > should have been await but now you are stuck with a blocking
> > > interface.
> >
> > I gently disagree here. This is essentially the same issue that having
> > cursor.fetch*() not fetching anything, but it's worse than "just" a
> > naming problem because it's on the control flow.
> >
> > For instance, consider the following examples with and without
> > autocommit on the connection to demonstrate where IO takes place:
> >
> >     conn = await psycopg3.AsyncConnection.connect(autocommit=False)
> >     async with conn:
> >         cur = await conn.execute("SELECT 1")
> >         rows = await cur.fetchall()  # IO happens here
> >
> >     conn = await psycopg3.AsyncConnection.connect(autocommit=True)
> >     async with conn:
> >         cur = await conn.execute("SELECT 1")  # IO happens here
> >         rows = await cur.fetchall()
> >
> > This is quite unexpected, and behaves differently on sync cursors (IO
> > always happens on execute()).
> 
> I think that's wrong, I don't believe it changes with autocommit.
> No-autocommit fires an extra begin together with the execute but
> doesn't change the fetch behaviour. IO on execute happens regardless
> of the type of connection and cursor, IO on fetch happens only for
> server-side cursors.

Well, maybe I'm missing something... In the examples above, (written
down explicitly to understand where IO happens), if I shut down postgres
between 'await conn.execute()' and 'await cur.fetchall()', the first
example breaks but the second doesn't. Perhaps the autocommit mention
was misleading; it's enough to insert 'await conn.commit()' before
'await cur.fetchall()' to reproduce. So (and again, unless I'm missing
something), if this is not "by design", maybe this is bug?

> > > 2) if we want to have client and server cursor, it would be better if
> > > they had the same interface
> >
> > (Unless we drop the client-side cursor, or rename it.)
> >
> > > The point 2 is pretty true in psycopg2 because there is a single
> > > cursor class, and every method is if'ed like:
> > >
> > >     def fetchmany(self, nrecs):
> > >         if not self.name:
> > >             # client-side cursor
> > >             res = self.result[self.current, self.current + nrecs]
> > >             self.current += nrecs
> > >             return res
> > >         else:
> > >             # server-side cursor
> > >             result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}")
> > >             return result
> > >
> > > I don't think this is a good implementation, so in psycopg3 I would
> > > like to have a Cursor interface with a ClientCursor and a ServerCursor
> > > implementation, without the if for each method. In async mode,
> > > AsyncClientCursor.fetchmany() doesn't actually need to be async, but
> > > AsyncServerCursor.fetchmany() does I/O activity. However, would two
> > > different implementation classes demand two different interfaces?
> >
> > These are not just two different implementations, these are two
> > different things. So, yes, different interfaces (and names) would be
> > better, IMHO.
> 
> But for an user who is not overly concerned with when I/O happen
> (assuming that they know where it happens and that they are fine with
> it, not wanting to keep them in the dark) having the same interface
> means that their application can be written without being concerned by
> whether a server or client cursor was used.
> 
> You seem to want to implement two diverging and leaky interfaces,
> modeled on what effectively happens in I/O and not providing any
> leeway. The price to pay for it is to have the async keyword on
> methods which in a specific implementation don't actually do I/O but
> 1) it's cheap, you don't pay much for it, it doesn't really do I/O and
> 2) it allows the possibility to write a cursor subclass which actually
> does IO on input, for instance a catalog lookup on fetch(). But even
> logging to a remote logger is an I/O operation, so if you don't have
> `async fetchone()` you cannot write a subclass of these objects to do
> it: you have to write a wrapper instead, with a necessarily different
> interface. I think we would be painting ourselves in a corner.

Sorry, I don't buy that point. To me, slightly exaggerating, it seems
like we should make any methods of an interface 'async' just in case
someones wants to fire some IO in a subclass. In other words, having IO
entry-points properly identified follows the principle of least
surprise (or "explicit is better than implicit", to take over from the
famous zen).

> > > > All in all, my main point is: (why) do we need client-side cursors?
> > >
> > > I think client-side cursors are the bread and butter of querying the
> > > database. They are a lightweight and stateless way to receive a
> > > result: you ask a question, you receive an answer, and all is over.
> >
> > Ok. But then, they are just a proxy to the actual results already
> > fetched upon execute() with no added value, aren't they?
> 
> Yes, they are just a useful container for a resultset. It's a useful
> object to have, regardless of its name.
> 
> 
> > > You can put caches between client server, connection pools...
> >
> > How is this related to client-side cursors?
> 
> They are things you cannot do with a server-side cursor, and in my
> experience much more frequently used. When I say "cursor" I mean
> "result" here, really, as in "you can have these things if you fetch
> your result entirely".
> 
> 
> > So my humble opinion is that naming and control flow should be explicit.
> > For instance, we could drop the fetch*() method from client-side cursors
> > (to be renamed) and add result access methods:
> >
> >     class AsyncClient:
> >         await def execute()
> >         def first()  # return first row
> >         def all()  # return all rows
> >
> > while keeping the cursor term for real cursors:
> >
> >     class AsyncCursor:
> >         await def execute()
> >         await def fetch()
> 
> This requires everyone to rewrite their program in order to use
> psycopg3. I think that the adoption of the adapter would be zero if
> there isn't a frictionless way to use the new version, and I am trying
> to design the adapter to be easy to upgrade. This might imply some non
> trivial work internally but I think it's necessary if we want a
> non-zero chance of the adapter to be considered by the industry.

As far as the async interface is concerned, I think there is no adoption
issue because there's no precedent use from psycopg2. So we could
expose two API: cursorless querying ('await conn.execute()') and have a
single server-side cursor class.

Cheers,
Denis



psycopg by date:

Previous
From: Daniele Varrazzo
Date:
Subject: Re: about client-side cursors
Next
From: Christophe Pettus
Date:
Subject: Re: about client-side cursors