Re: about client-side cursors - Mailing list psycopg

From Daniele Varrazzo
Subject Re: about client-side cursors
Date
Msg-id CA+mi_8bUaY0kGj0Oy6HA4CYPoFs6jG2Hi=p2Joh4hgnAAGcR4A@mail.gmail.com
Whole thread Raw
In response to Re: about client-side cursors  (Denis Laxalde <denis.laxalde@dalibo.com>)
List psycopg
On Thu, 4 Feb 2021 at 12:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote:

> Daniele Varrazzo a écrit :

> Then, having fetch*() methods on client-side cursors is also misleading
> because nothing gets fetched actually, it's just result unpacking.

Arguably the naming is not correct, and they have been called that way
to fulfill the requirement of "emulating the cursor" as the dbapi
requires.


> > This is understandable, so much that in psycopg3 I've added
> > conn.execute, which is implemented as:

> I am aware of that and that's indeed a nice addition. As I mentioned in
> the thread introducing this feature, I still think that returning a
> (client-side) cursor here is confusing.

You can think about the cursor as a result in that case. I understand
your dissatisfaction with it is a) the names b) the async. Let's see
if there is something we can do around these points.


> It seems to me that there something wrong in either (or both) the DB-API
> or the way it got implemented in psycopg:
>
> * PEP-249 states that "cursor objects represent a database cursor",
>
> * it also states that "if the database does not provide a direct cursor
>   concept, the module will have to emulate cursors using other means to
>   the extent needed by this specification".
>
> But, unless I missed it, the PEP does not state how to implement query
> execution and result fetch operations if the database does not need a
> cursor.

I think that wasn't the point of the DBAPI. I believe that the DB-SIG
only wanted to create an interface which would be shared by different
adapters, either hiding the detail (requiring e.g. to emulate
client-side a missing feature) or providing description for the
differences. I wasn't around when things were fleshed out but I
believe it was the late 90s, and the leading database communication
models were much more heavyweight (DAO, ADO, ODBC?); programs would
have consisted of relatively few desktop (hundreds? thousands?) which
would "open a cursor" going up and down in the results using DB
cursors. Written in Visual Basic and the likes...

So I think the DBAPI doesn't specify an execution model and when the
IO has to happen because that would have worked for certain DBs and
not for others. But I believe that if a standard would be drafted now
would be a much more stateless one and wouldn't focus on cursors.
Fast-forward 20 years...


> It seems to me that connection.execute() provides an implementation to
> that case, except that it'd better if it did not return a misnamed
> cursor but some kind of "result set" or "rows" object. As you mentioned,
> some other Python drivers already do this.
>
> Alternatively (or in complement), how about renaming connection.cursor()
> as connection.client() for client-side cursors (while keep cursor() for
> server-side ones)? That would fulfill the second statement above, I
> think. (Or is it something closer to a "portal", as defined in the
> Extended Query protocol? Maybe not...)

... while I want to use the psycopg3 chance to improve what can be
done better than psycopg2, given the ways Python and Postgres have
changed in several years, I don't want to introduce gratuitous
changes. I would still like for psycopg3 to be a drop-in replacement
for psycopg2, and while there are differences enough to require
testing your application, I don't want to require the users to change
*every single call* to psycopg2. I understand that getone(),
getmany(), getall() would be more apt names than fetch*() in case
their implementation says they are client-side, but is it really worth
to ask the users to grep their codebases to replace all the
occurrences? As things stand now, a lot of relatively simple code can
run unmodified with either pg2 and 3

With the async objects we can have more freedom because there is no
precedent code (I believe the interface is the same exposed by aiopg,
the async psycopg2 wrapper, but maintaining compatibility with it is
not a thing I feel required).


> Looking around in other programming languages, the situation is similar:
>
> * Golang's libpq https://godocs.io/github.com/lib/pq follows
>   https://github.com/golang/go/wiki/SQLInterface : queries are executed
>   on the connection object and Rows object are returned.
>
> * In Rust, there is https://docs.rs/crate/postgres/0.19.0 (sync) which
>   is quite similar and https://docs.rs/tokio-postgres/0.7.0/tokio_postgres/
>   (async), which exposes a Client type to execute queries.
>
> With psycopg3 coming up, I believe it'd be a good opportunity to improve
> that.

tokio_postgres makes me curious about pipelining...

However those seem all interfaces which don't hide the nature of
cursorless querying. If we have an adapter which can work both cursor
and cursorless I think it is good that they offer the same interface
to obtain the results, even if the way they obtain has a different
implementation.



> > 1) if a method may possibly do I/O activity, then it should be async.
> > I don't know where I have read it and how strong is this guideline,
> > however it makes sense: it's better to have a function to call with
> > await where the await is actually not needed than a function which
> > should have been await but now you are stuck with a blocking
> > interface.
>
> I gently disagree here. This is essentially the same issue that having
> cursor.fetch*() not fetching anything, but it's worse than "just" a
> naming problem because it's on the control flow.
>
> For instance, consider the following examples with and without
> autocommit on the connection to demonstrate where IO takes place:
>
>     conn = await psycopg3.AsyncConnection.connect(autocommit=False)
>     async with conn:
>         cur = await conn.execute("SELECT 1")
>         rows = await cur.fetchall()  # IO happens here
>
>     conn = await psycopg3.AsyncConnection.connect(autocommit=True)
>     async with conn:
>         cur = await conn.execute("SELECT 1")  # IO happens here
>         rows = await cur.fetchall()
>
> This is quite unexpected, and behaves differently on sync cursors (IO
> always happens on execute()).

I think that's wrong, I don't believe it changes with autocommit.
No-autocommit fires an extra begin together with the execute but
doesn't change the fetch behaviour. IO on execute happens regardless
of the type of connection and cursor, IO on fetch happens only for
server-side cursors.


> > 2) if we want to have client and server cursor, it would be better if
> > they had the same interface
>
> (Unless we drop the client-side cursor, or rename it.)
>
> > The point 2 is pretty true in psycopg2 because there is a single
> > cursor class, and every method is if'ed like:
> >
> >     def fetchmany(self, nrecs):
> >         if not self.name:
> >             # client-side cursor
> >             res = self.result[self.current, self.current + nrecs]
> >             self.current += nrecs
> >             return res
> >         else:
> >             # server-side cursor
> >             result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}")
> >             return result
> >
> > I don't think this is a good implementation, so in psycopg3 I would
> > like to have a Cursor interface with a ClientCursor and a ServerCursor
> > implementation, without the if for each method. In async mode,
> > AsyncClientCursor.fetchmany() doesn't actually need to be async, but
> > AsyncServerCursor.fetchmany() does I/O activity. However, would two
> > different implementation classes demand two different interfaces?
>
> These are not just two different implementations, these are two
> different things. So, yes, different interfaces (and names) would be
> better, IMHO.

But for an user who is not overly concerned with when I/O happen
(assuming that they know where it happens and that they are fine with
it, not wanting to keep them in the dark) having the same interface
means that their application can be written without being concerned by
whether a server or client cursor was used.

You seem to want to implement two diverging and leaky interfaces,
modeled on what effectively happens in I/O and not providing any
leeway. The price to pay for it is to have the async keyword on
methods which in a specific implementation don't actually do I/O but
1) it's cheap, you don't pay much for it, it doesn't really do I/O and
2) it allows the possibility to write a cursor subclass which actually
does IO on input, for instance a catalog lookup on fetch(). But even
logging to a remote logger is an I/O operation, so if you don't have
`async fetchone()` you cannot write a subclass of these objects to do
it: you have to write a wrapper instead, with a necessarily different
interface. I think we would be painting ourselves in a corner.


> > > All in all, my main point is: (why) do we need client-side cursors?
> >
> > I think client-side cursors are the bread and butter of querying the
> > database. They are a lightweight and stateless way to receive a
> > result: you ask a question, you receive an answer, and all is over.
>
> Ok. But then, they are just a proxy to the actual results already
> fetched upon execute() with no added value, aren't they?

Yes, they are just a useful container for a resultset. It's a useful
object to have, regardless of its name.


> > You can put caches between client server, connection pools...
>
> How is this related to client-side cursors?

They are things you cannot do with a server-side cursor, and in my
experience much more frequently used. When I say "cursor" I mean
"result" here, really, as in "you can have these things if you fetch
your result entirely".


> So my humble opinion is that naming and control flow should be explicit.
> For instance, we could drop the fetch*() method from client-side cursors
> (to be renamed) and add result access methods:
>
>     class AsyncClient:
>         await def execute()
>         def first()  # return first row
>         def all()  # return all rows
>
> while keeping the cursor term for real cursors:
>
>     class AsyncCursor:
>         await def execute()
>         await def fetch()

This requires everyone to rewrite their program in order to use
psycopg3. I think that the adoption of the adapter would be zero if
there isn't a frictionless way to use the new version, and I am trying
to design the adapter to be easy to upgrade. This might imply some non
trivial work internally but I think it's necessary if we want a
non-zero chance of the adapter to be considered by the industry.

The names and concepts you propose are surely better connected to the
reality of the implementation of a cursor-less query, but I think
that's not a strong enough reason to lose the whole user base. asyncpg
uses the same concept: it is a much thinner interface, it is designed
around the fe-be implementation rather than to offer an interface
that, as imprecise as it is, it's what people have used for 20 years
but, as it requires a complete rewrite of all the queries and of all
the code interacting with the database, it doesn't seem it has
attracted many users.

> Thanks for you detailed and kind reply, Daniele. Also, besides
> discussing here (kind of a self-introduction), I'd be happy to help out
> code-wise on this particular aspect or another.

Your help would be very appreciated, thank you :)

If you, or others, would like to help, a few open areas of work are
listed here: https://github.com/psycopg/psycopg3/projects/1 Let me
write a separate message with the state of the works...

-- Daniele



psycopg by date:

Previous
From: Daniele Varrazzo
Date:
Subject: Latest developments in psycopg3
Next
From: Denis Laxalde
Date:
Subject: Re: about client-side cursors