Re: about client-side cursors - Mailing list psycopg

From Daniele Varrazzo
Subject Re: about client-side cursors
Date
Msg-id CA+mi_8a1ery2xvb+AdxMtVXydOdGC7wRMnfXNAvk8v73HsLFyw@mail.gmail.com
Whole thread Raw
In response to about client-side cursors  (Denis Laxalde <denis.laxalde@dalibo.com>)
Responses Re: about client-side cursors  (Daniele Varrazzo <daniele.varrazzo@gmail.com>)
Re: about client-side cursors  (Denis Laxalde <denis.laxalde@dalibo.com>)
Re: about client-side cursors  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
List psycopg
On Wed, 3 Feb 2021 at 17:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote:

> I'd like to discuss about the concept of client-side cursor as defined
> in psycopg.

Welcome Denis!


> This is a concept I've always been quite uncomfortable with;
> I'll try to explain why.
>
> First, from a practical point of view:
>
>    cur = connection.cursor()
>    cur.execute(...)
>    result = cur.fetchxxx()
>
> This typical pattern seems awkward to me. Where do IO operations
> actually take place? (on execute()? on fetch()? on both?)

For a client-side cursor, it happens on execute, but for a server-side
cursor it happens in both (execute creates the cursor, I have no idea
as to when the server decides to hit the disks, but for the client
every fetch is a roundtrip with the server).


> Then, and besides the terminology, the fact that named cursors are very
> different from client-side cursors while being constructed by the same
> API is confusing. I understand that this may come from the DB-API:
>
>   https://www.python.org/dev/peps/pep-0249/#cursor-objects
>   > [Cursor] objects represent a database cursor. [...]
>
> But this does not seem quite right to me since a client-side cursor is
> actually not a PostgreSQL cursor. (The documention on psycopg2 is clear
> on the difference, though.)

Yes, that's correct: the name comes from the dbapi. The dbapi cursor
does two different things indeed: it runs queries and manages results.
I understand that the overlap is sort of weird and everyone who has
said "psycopg sucks: let me write my own driver" have used the mental
model - much more familiar - of "I ask a query to a connection which
gives me a result". So usually what is created is something which can
be used as:

    result = conn.execute(query, vars)
    for record in result:
        ...

This is understandable, so much that in psycopg3 I've added
conn.execute, which is implemented as:

    def execute(self, query, vars)
        cur = self.cursor()
        cur.execute(query, vars)
        return cur

and allow you to use the cursor only to iterate on the data returned,
ignoring the execute*() methods.


> In asyncio context, while playing around with psycopg3, the situation is
> even worse because of the additional async/await keywords; for instance:
>
>    conn = await psycopg3.AsyncConnection.connect()
>    async with conn:
>        cur = await conn.execute("SELECT * FROM test")
>        r = await cur.fetchall()
>
> Why do we need two 'await' (i.e. two IO operations) on conn.execute()
> and cur.fetchall() if 'cur' is a client-side cursor? (Back to my first
> point about where IO happen, this appears to depend on whether the
> connection has 'autocommit' or not...)

I am going a bit back and forth on this mentally, although not in the
code. The forces pulling in different directions are:

1) if a method may possibly do I/O activity, then it should be async.
I don't know where I have read it and how strong is this guideline,
however it makes sense: it's better to have a function to call with
await where the await is actually not needed than a function which
should have been await but now you are stuck with a blocking
interface.

2) if we want to have sync and async cursor, it would be better if
they had the same interface

The point 2 is pretty true in psycopg2 because there is a single
cursor class, and every method is if'ed like:

    def fetchmany(self, nrecs):
        if not self.name:
            # client-side cursor
            res = self.result[self.current, self.current + nrecs]
            self.current += nrecs
            return res
        else:
            # server-side cursor
            result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}")
            return result

I don't think this is a good implementation, so in psycopg3 I would
like to have a Cursor interface with a ClientCursor and a ServerCursor
implementation, without the if for each method. In async mode,
AsyncClientCursor.fetchmany() doesn't actually need to be async, but
AsyncServerCursor.fetchmany() does I/O activity. However, would two
different implementation classes demand two different interfaces?


> All in all, my main point is: (why) do we need client-side cursors?

I think client-side cursors are the bread and butter of querying the
database. They are a lightweight and stateless way to receive a
result: you ask a question, you receive an answer, and all is over.
You can put caches between client server, connection pools...
Server-side cursors impy a state persisted across calls and shared
between clients and server, which is much more heavyweight, scales
less for the server, and makes the identity of the connection
precious.

If any I would do without the server-side cursor, but it would be
abdicating a responsibility that the driver should handle IMO, so
although I'm not a fan I will definitely implement them from psycopg3.


There are choices to be made, and it's one of those points for which
I'd be happy to hear other experiences; I haven't asked around yet
because I first wanted to implement a sketch of server-side cursors
and then look at the 4 class side to side (to side to side) and decide
what to do. Let's call them:

- SyncClientCursor
- AsyncClientCursor
- SyncServerCursor
- AsyncServerCursor

The Server versions have I/O done on fetch*() and on close(). So the
first three objects might have non-async fetch(). I was preferring to
give all them an uniform interface so the documentation would be "on
async communication, always use async when fetching", as opposite as
"on async use "await execute()", no await on fetch() for client-side
cursor, but "await fetch()` for server-side ones". In practice, I am
envisioning a SyncCursor object interface, with SyncClient and
SyncServer being two different implementations, and an AsyncCursor
interface, with AsyncClient and AsyncServer being two different
implementations:

    @abstract
    class SyncCursor:
        def execute()
        def fetch()

    class SyncClientCursor(SyncCursor):
        def execute()
        def fetch()

    class SyncClientCursor(SyncCursor):
        def execute()
        def fetch()

    @abstract
    class AsyncCursor:
        async def execute()
        async def fetch()

    class AsyncClientCursor(AsyncCursor):
        await def execute()
        await def fetch()

    class AsyncClientCursor(AsyncCursor):
        await def execute()
        await def fetch()

Doing without await on AsyncClientCursor.fetch() creates an asymmetry:

    class AsyncClientCursor:
        await def execute()
        def fetch()

    class AsyncClientCursor:
        await def execute()
        await def fetch()

so you couldn't write a function "processResults(acur)" independent of
whether the cursor is client or server, whereas you can do it no
problem for sync cursors.

That's my brain dump! @Denis: it's just more to explain current
implementation and mental processes, however in pg3 there is still
space to make changes and I'm happy to hear yours and others input. I
think that using await uniformly makes things easier to think about
and code to compose but I'm happy to hear if there are different
opinions.


-- Daniele



psycopg by date:

Previous
From: Karsten Hilbert
Date:
Subject: Re: about client-side cursors
Next
From: Christophe Pettus
Date:
Subject: Re: about client-side cursors