Re: [PERFORM] Correct use of cursors for very large result sets in Postgres - Mailing list pgsql-performance

From Vitalii Tymchyshyn
Subject Re: [PERFORM] Correct use of cursors for very large result sets in Postgres
Date
Msg-id CABWW-d3xywucp_rCnTOe4aoUknL0TqmLB1ucDf7v1Khniynt+A@mail.gmail.com
Whole thread Raw
In response to Re: [PERFORM] Correct use of cursors for very large result sets inPostgres  (John Gorman <jgorman@eldocomp.com>)
Responses Re: [PERFORM] Correct use of cursors for very large result sets inPostgres
List pgsql-performance
For JDBC there are certain prerequisites for setFetchSize to work, e.g. using forward only result sets and transactions.

вт, 21 лют. 2017 о 09:06 John Gorman <jgorman@eldocomp.com> пише:

My experience with cursors in PostgreSQL with Java has been to stay away from them. We support 2 databases with our product, PostgreSQL (default) and SQL Server. While re-encrypting data in a database the application used cursors with a fetch size of 1000.

 

Worked perfectly on SQL Server and on PostgreSQL until we got to a PostgreSQL table with more than 11 million rows. After spending weeks trying to figure out what was happening, I realized that when it gets to a table with more than 10 million rows for some reason, the cursor functionality just silently stopped working and it was reading the entire table. I asked another very senior architect to look at it and he came to the same conclusion. Because of limited time, I ended up working around it using limit/offset.

 

Again we are using Java, so the problem could just be in the PostgreSQL JDBC driver. Also we were on 9.1 at the time.

 

Regards

John

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mike Beaton
Sent: Tuesday, February 21, 2017 6:49 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

 

Thanks, Tom.

Wouldn't this mean that cursors are noticeably non-optimal even for normal data sizes, since the entire data to be streamed from the table is always duplicated into another buffer and then streamed?

 

if you want the whole query result at once, why are you bothering with a cursor?

 

The PostgreSQL docs (https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) clearly recommend cursors as a way to return a reference to a large result set from a function (as I understood, this is recommended precisely as a way to avoid tuple-based buffering of the data).

 

So following that advice, it's not unreasonable that I would actually have a cursor to a large dataset.

 

Then, I would ideally want to be able to fetch the data from that cursor without the entire data getting duplicated (even if only a bit at a time instead of all at once, which seems to be the best case behaviour) as I go.

 

Additionally, I thought that if I had a streaming use-case (which I do), and a streaming data-access layer (which I do), then since `SELECT * FROM large` is absolutely fine, end-to-end, in that situation, then by symmetry and the principle of least astonishment `FETCH ALL FROM cursor` might be fine too.

 

pgsql-performance by date:

Previous
From: "Wes Vaske (wvaske)"
Date:
Subject: Re: [PERFORM] Suggestions for a HBA controller (6 x SSDs + madamRAID10)
Next
From: John Gorman
Date:
Subject: Re: [PERFORM] Correct use of cursors for very large result sets inPostgres