Thread: [PERFORM] Correct use of cursors for very large result sets in Postgres

[PERFORM] Correct use of cursors for very large result sets in Postgres

From
Mike Beaton
Date:
**Short version of my question:**

If I hold a cursor reference to an astronomically huge result set in my client code, would it be ridiculous (i.e. completely defeats the point of cursors) to issue "FETCH ALL FROM cursorname" as my next command? Or would this slowly stream the data back to me as I consume it (at least in principle, assuming that I have a well written driver sitting between me and Postgres)?

**More detail**

If I understand things at all correctly, then Postgres cursors are REALLY for dealing with the following problem [even though they can be used (abused?) for other things, such as returning multiple different result sets from one function]:

> Note: The current implementation of RETURN NEXT and RETURN QUERY
> stores the entire result set before returning from the function, as
> discussed above. That means that if a PL/pgSQL function produces a
> very large result set, performance might be poor: data will be written
> to disk to avoid memory exhaustion, but the function itself will not
> return until the entire result set has been generated.


But (again if I understand correctly) when you write a function which returns a cursor then the whole query is NOT buffered into memory (and disk) before the user of the function can start to consume anything, but instead the results can be consumed bit by bit. (There is more overhead setting up and using the cursor, but it's worth it to avoid massive buffer allocation for very large result sets.)


I would like to understand how this relates to SELECTS and FETCHES over the wire to a Postgres server.

In all cases, I'm talk about consuming results from client code which is communicating with Postgres on a socket behind the scenes (using the Npgsql library in my case, actually).

Q1: What if I try to execute "SELECT * FROM AstronomicallyLargeTable" as my only command over the wire to Postgres? Will that allocate all the memory for the entire select and then start to send data back to me? Or will it (effectively) generate its own cursor and stream the data back a little at a time (with no huge additional buffer allocation on the server)?

Q2: What if I already have a cursor reference to an astronomically large result set (say because I've already done one round trip, and got back the cursor reference from some function), and then I execute "FETCH ALL FROM cursorname" over the wire to Postgres? Is that stupid, because it will allocate ALL the memory for all the results *on the Postgres server* before sending anything back to me? Or will "FETCH ALL FROM cursorname" actually work as I'd like it to, streaming the data back slowly as I consume it, without any massive buffer allocation happening on the Postgres server?

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Mike Beaton
Date:
I asked the same question at the same time on Stack Overflow (sincere apologies if this is a breach of etiquette - I really needed an answer, and I thought the two communities might not overlap).

Stackoverflow now has an answer, by me: http://stackoverflow.com/q/42292341/#42297234 - which is based on accumulating the most consistent, coherent information from the answers and comments given there so far.

I think this is right, and I happily repeat it below, for anyone finding my question on this list. But I would still *love* to find official PostgreSQL documentation of all this. And of course to be told - quickly! - if anyone knows it is still wrong.

***The answer is:***

**Q1:** For `SELECT * FROM AstronomicallyHugeTable` sent over the wire, then PostgreSQL will *not* generate a huge buffer, and will stream the data efficiently, starting quickly, to the client.

**Q2:** For `FETCH ALL FROM CursorToAstronomicallyHugeTable` sent over the wire, then PostgreSQL will also *not* generate a huge buffer, and also will stream the data efficiently, starting quickly, to the client.

**Implications of this for `FETCH ALL FROM cursor`**

IF (and this is a big if) you have client software which is NOT going to store all the fetched data anywhere, but is just trying to do something with it row by row (and this presupposes that your data access layer supports this, which Npgsql does), then there is nothing wrong with `FETCH ALL FROM cursor`. No huge buffers anywhere. No long setup time. Processing huge data this way will certainly run for a very long time - or at least until the user or some other condition aborts the process, and the cursor can be closed. But it will start to run quickly, and its usage of resources will be efficient.

**WARNINGS**

It would *never* make sense to do `FETCH ALL FROM cursor` for astronomically large data, if your client side code (including your data access layer) has any bottleneck at all at which means that all the data from a command is fetched before any processing can be done. Many data access layers (and especially data access wrappers) are like this. So beware. But it is also true that not all client side code is made this way.

Returning huge data using a `TABLE` or `SETOF` return type from within a PostgeSQL function will *always* be broken (i.e. will create a huge buffer and take a very long time to start). This will be so whether the function is called from SQL to SQL or called over the wire. The bottleneck is before the function returns. For efficient returns of very large data sets you must use a cursor return from a function (or else do `SELECT *` directly over the wire), in every case.

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Tom Lane
Date:
Mike Beaton <mjsbeaton@gmail.com> writes:
> [ generally accurate information ]

> **WARNINGS**

> It would *never* make sense to do `FETCH ALL FROM cursor` for
> astronomically large data, if your client side code (including your data
> access layer) has any bottleneck at all at which means that all the data
> from a command is fetched before any processing can be done. Many data
> access layers (and especially data access wrappers) are like this. So
> beware. But it is also true that not all client side code is made this way.

It would probably be good to point out that most client-side libraries
will do it that way, including libpq, because then they can make success
or failure of the query look atomic to the application.  If you use an
API that lets you see rows as they come off the wire, it's up to you
to recover properly from a query failure that occurs after some/many rows
have already been returned.

> Returning huge data using a `TABLE` or `SETOF` return type from within a
> PostgeSQL function will *always* be broken (i.e. will create a huge buffer
> and take a very long time to start). This will be so whether the function
> is called from SQL to SQL or called over the wire.

I believe this is false in general.  I think it's probably true for all
the standard PL languages, because they don't want to bother with
suspending/resuming execution, so they make "RETURN NEXT" add the row to
a tuplestore not return it immediately.  But it's definitely possible to
write a C function that returns a row at a time, and depending on what the
calling SQL statement looks like, that could get streamed back to the
client live rather than being buffered first.

As a trivial example, if you do
    select generate_series(1,100000000);
in psql and watch what's happening with "top", you'll see psql's memory
usage going through the roof (because libpq tries to buffer the result)
but the connected backend's memory usage is steady as a rock --- nor
does it dump the data into a temporary file.  On the other hand,
    select * from generate_series(1,100000000);
does dump the data into a temp file, something we ought to work on
improving.

            regards, tom lane


Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Mike Beaton
Date:
Dear Tom,

This is very helpful, thank you.

You make a very useful point that the limitation is basically on PL/pgSQL and other PL languages. And someone on SO already pointed out that an inline SQL function with a enormous sized TABLE return value also doesn't have any buffering problems. So that's a very convenient option, whenever SQL alone is powerful enough.

You make the further very helpful point that any library which is written using `libpq` won't work as desired on `FETCH ALL FROM HugeCursor`. But I don't know whether that's 'most' libraries. I think that depends on your programming milieu! I'm working in the world of ADO.NET (but the same seems to apply to JDBC) where most low level drivers are not written using `libpq` but rather directly with sockets against the database - which makes sense because a streaming data reader is part of the contract which those drivers have to implement.

It's definitely worth noting that the `FETCH 100000 FROM cursor` until exhausted pattern will *always* be safe. But most fundamentally I did, very specifically, want to know if the `FETCH ALL FROM CursorToAstronomicallyLargeData` pattern can *ever* work sensibly. It seems it clearly can and does if certain assumptions are met. Assumptions which I actually know *are* met, in the case in which I potentially wanted to use it!

One outstanding question I have. Based on a lot of helpful responses given to the SO question I can now test and see what disk buffers are generated (by setting `log_temp_files` to `0` and then `tail -f log`), as well as how long it takes for results to start arriving.

With a large (10,000,000 row) test table, if I do `SELECT * FROM table` on psql it starts to return results immediately with no disk buffer. If I do `FETCH ALL FROM cursortotable` on psql it takes about 7.5 seconds to start returning results, and generates a 14MB buffer. If I do `SELECT * FROM table` on a correctly coded streaming client, it also starts to return results immediately with no disk buffer. But if I do `FETCH ALL FROM cursortotable` from my streaming client, it takes about 1.5 seconds for results to start coming... but again with no disk buffer, as hoped

I was kind of hoping that the 'it creates a buffer' and the 'it takes a while to start' issues would be pretty much directly aligned, but it's clearly not as simple as that! I don't know if you can offer any more helpful insight on this last aspect?

Many thanks,

Mike

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Mike Beaton
Date:
I meant to say: "the `FETCH 10000 FROM cursor` until exhausted pattern will always be safe". Nasty typo, sorry!

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Tom Lane
Date:
Mike Beaton <mjsbeaton@gmail.com> writes:
> One outstanding question I have. Based on a lot of helpful responses given
> to the SO question I can now test and see what disk buffers are generated
> (by setting `log_temp_files` to `0` and then `tail -f log`), as well as how
> long it takes for results to start arriving.

> With a large (10,000,000 row) test table, if I do `SELECT * FROM table` on
> psql it starts to return results immediately with no disk buffer. If I do
> `FETCH ALL FROM cursortotable` on psql it takes about 7.5 seconds to start
> returning results, and generates a 14MB buffer. If I do `SELECT * FROM
> table` on a correctly coded streaming client, it also starts to return
> results immediately with no disk buffer. But if I do `FETCH ALL FROM
> cursortotable` from my streaming client, it takes about 1.5 seconds for
> results to start coming... but again with no disk buffer, as hoped

Seems odd.  Is your cursor just on "SELECT * FROM table", or is there
some processing in there you're not mentioning?  Maybe it's a cursor
WITH HOLD and you're exiting the source transaction?

            regards, tom lane


Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Mike Beaton
Date:
> Seems odd.  Is your cursor just on "SELECT * FROM table", or is there
> some processing in there you're not mentioning?  Maybe it's a cursor
> WITH HOLD and you're exiting the source transaction?

Hi Tom,

I've deleted my own Stack Overflow answer in favour of Laurenz Albe's one.

New TL;DR (I'm afraid): PostgreSQL is always generating a huge buffer file on `FETCH ALL FROM CursorToHuge`.

The test data is created by:

`SELECT * INTO large FROM generate_series(1, 10000000) id;`

The test function to generate the cursor is:

````
CREATE OR REPLACE FUNCTION lump() RETURNS refcursor
   LANGUAGE plpgsql AS
$$DECLARE
   c CURSOR FOR SELECT id FROM large;
BEGIN
   c := 'c';
   OPEN c;
   RETURN c;
END;$$;
````

The two tests are:

`SELECT * FROM large;`

Result: no buffer file.

And:

````
BEGIN;
SELECT lump();
FETCH ALL FROM c;
COMMIT;
````

Result: 14MB buffer, every time.

The buffer file appears in `base\pgsql_tmp` while the data is streaming but only appears in the Postgres log file at the point when it is released (which makes sense, as its final size is part of the log row).

This has the additionally confusing result that the buffer file is reported in the Postgres logs just before the user sees the first row of data on `psql` (and on anything using `libpq`), but just after the user sees the last row of data, on any client program which is streaming the data via a streaming data access layer (such as `Npgsql`, or `JDBC` with the right configuration).

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Mike Beaton
Date:
The generated buffer is 140MB, not 14MB. At 14 bytes per row, that makes sense.

I have done another test.

If I execute `FETCH ALL FROM cursor` I get a 140MB disk buffer file, on the PostgreSQL server, reported in its log.

If I execute `FETCH 5000000 FROM cursor` (exactly half the rows), I see a 70MB disk buffer file.

This is regardless of how many rows I actually stream from thE connection before closing the cursor.

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Tom Lane
Date:
Mike Beaton <mjsbeaton@gmail.com> writes:
> New TL;DR (I'm afraid): PostgreSQL is always generating a huge buffer file
> on `FETCH ALL FROM CursorToHuge`.

I poked into this and determined that it's happening because pquery.c
executes FETCH statements the same as it does with any other
tuple-returning utility statement, ie "run it to completion and put
the results in a tuplestore, then send the tuplestore contents to the
client".  I think the main reason nobody worried about that being
non-optimal was that we weren't expecting people to FETCH very large
amounts of data in one go --- if you want the whole query result at
once, why are you bothering with a cursor?

This could probably be improved, but it would (I think) require inventing
an additional PortalStrategy specifically for FETCH, and writing
associated code paths in pquery.c.  Don't know when/if someone might get
excited enough about it to do that.

            regards, tom lane


Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Mike Beaton
Date:
Thanks, Tom.

Wouldn't this mean that cursors are noticeably non-optimal even for normal data sizes, since the entire data to be streamed from the table is always duplicated into another buffer and then streamed?

if you want the whole query result at once, why are you bothering with a cursor?

The PostgreSQL docs (https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) clearly recommend cursors as a way to return a reference to a large result set from a function (as I understood, this is recommended precisely as a way to avoid tuple-based buffering of the data).

So following that advice, it's not unreasonable that I would actually have a cursor to a large dataset.

Then, I would ideally want to be able to fetch the data from that cursor without the entire data getting duplicated (even if only a bit at a time instead of all at once, which seems to be the best case behaviour) as I go.

Additionally, I thought that if I had a streaming use-case (which I do), and a streaming data-access layer (which I do), then since `SELECT * FROM large` is absolutely fine, end-to-end, in that situation, then by symmetry and the principle of least astonishment `FETCH ALL FROM cursor` might be fine too.

Re: [PERFORM] Correct use of cursors for very large result sets inPostgres

From
John Gorman
Date:

My experience with cursors in PostgreSQL with Java has been to stay away from them. We support 2 databases with our product, PostgreSQL (default) and SQL Server. While re-encrypting data in a database the application used cursors with a fetch size of 1000.

 

Worked perfectly on SQL Server and on PostgreSQL until we got to a PostgreSQL table with more than 11 million rows. After spending weeks trying to figure out what was happening, I realized that when it gets to a table with more than 10 million rows for some reason, the cursor functionality just silently stopped working and it was reading the entire table. I asked another very senior architect to look at it and he came to the same conclusion. Because of limited time, I ended up working around it using limit/offset.

 

Again we are using Java, so the problem could just be in the PostgreSQL JDBC driver. Also we were on 9.1 at the time.

 

Regards

John

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mike Beaton
Sent: Tuesday, February 21, 2017 6:49 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

 

Thanks, Tom.

Wouldn't this mean that cursors are noticeably non-optimal even for normal data sizes, since the entire data to be streamed from the table is always duplicated into another buffer and then streamed?

 

if you want the whole query result at once, why are you bothering with a cursor?

 

The PostgreSQL docs (https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) clearly recommend cursors as a way to return a reference to a large result set from a function (as I understood, this is recommended precisely as a way to avoid tuple-based buffering of the data).

 

So following that advice, it's not unreasonable that I would actually have a cursor to a large dataset.

 

Then, I would ideally want to be able to fetch the data from that cursor without the entire data getting duplicated (even if only a bit at a time instead of all at once, which seems to be the best case behaviour) as I go.

 

Additionally, I thought that if I had a streaming use-case (which I do), and a streaming data-access layer (which I do), then since `SELECT * FROM large` is absolutely fine, end-to-end, in that situation, then by symmetry and the principle of least astonishment `FETCH ALL FROM cursor` might be fine too.

 

Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

From
Vitalii Tymchyshyn
Date:
For JDBC there are certain prerequisites for setFetchSize to work, e.g. using forward only result sets and transactions.

вт, 21 лют. 2017 о 09:06 John Gorman <jgorman@eldocomp.com> пише:

My experience with cursors in PostgreSQL with Java has been to stay away from them. We support 2 databases with our product, PostgreSQL (default) and SQL Server. While re-encrypting data in a database the application used cursors with a fetch size of 1000.

 

Worked perfectly on SQL Server and on PostgreSQL until we got to a PostgreSQL table with more than 11 million rows. After spending weeks trying to figure out what was happening, I realized that when it gets to a table with more than 10 million rows for some reason, the cursor functionality just silently stopped working and it was reading the entire table. I asked another very senior architect to look at it and he came to the same conclusion. Because of limited time, I ended up working around it using limit/offset.

 

Again we are using Java, so the problem could just be in the PostgreSQL JDBC driver. Also we were on 9.1 at the time.

 

Regards

John

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mike Beaton
Sent: Tuesday, February 21, 2017 6:49 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

 

Thanks, Tom.

Wouldn't this mean that cursors are noticeably non-optimal even for normal data sizes, since the entire data to be streamed from the table is always duplicated into another buffer and then streamed?

 

if you want the whole query result at once, why are you bothering with a cursor?

 

The PostgreSQL docs (https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) clearly recommend cursors as a way to return a reference to a large result set from a function (as I understood, this is recommended precisely as a way to avoid tuple-based buffering of the data).

 

So following that advice, it's not unreasonable that I would actually have a cursor to a large dataset.

 

Then, I would ideally want to be able to fetch the data from that cursor without the entire data getting duplicated (even if only a bit at a time instead of all at once, which seems to be the best case behaviour) as I go.

 

Additionally, I thought that if I had a streaming use-case (which I do), and a streaming data-access layer (which I do), then since `SELECT * FROM large` is absolutely fine, end-to-end, in that situation, then by symmetry and the principle of least astonishment `FETCH ALL FROM cursor` might be fine too.

 

Re: [PERFORM] Correct use of cursors for very large result sets inPostgres

From
John Gorman
Date:

Yes of course that’s all verified and taken into account during code initialization

 

 

From: Vitalii Tymchyshyn [mailto:vit@tym.im]
Sent: Wednesday, February 22, 2017 8:14 PM
To: John Gorman; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

 

For JDBC there are certain prerequisites for setFetchSize to work, e.g. using forward only result sets and transactions.

 

вт, 21 лют. 2017 о 09:06 John Gorman <jgorman@eldocomp.com> пише:

My experience with cursors in PostgreSQL with Java has been to stay away from them. We support 2 databases with our product, PostgreSQL (default) and SQL Server. While re-encrypting data in a database the application used cursors with a fetch size of 1000.

 

Worked perfectly on SQL Server and on PostgreSQL until we got to a PostgreSQL table with more than 11 million rows. After spending weeks trying to figure out what was happening, I realized that when it gets to a table with more than 10 million rows for some reason, the cursor functionality just silently stopped working and it was reading the entire table. I asked another very senior architect to look at it and he came to the same conclusion. Because of limited time, I ended up working around it using limit/offset.

 

Again we are using Java, so the problem could just be in the PostgreSQL JDBC driver. Also we were on 9.1 at the time.

 

Regards

John

 

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Mike Beaton
Sent: Tuesday, February 21, 2017 6:49 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres

 

Thanks, Tom.

Wouldn't this mean that cursors are noticeably non-optimal even for normal data sizes, since the entire data to be streamed from the table is always duplicated into another buffer and then streamed?

 

if you want the whole query result at once, why are you bothering with a cursor?

 

The PostgreSQL docs (https://www.postgresql.org/docs/9.6/static/plpgsql-cursors.html#AEN66551) clearly recommend cursors as a way to return a reference to a large result set from a function (as I understood, this is recommended precisely as a way to avoid tuple-based buffering of the data).

 

So following that advice, it's not unreasonable that I would actually have a cursor to a large dataset.

 

Then, I would ideally want to be able to fetch the data from that cursor without the entire data getting duplicated (even if only a bit at a time instead of all at once, which seems to be the best case behaviour) as I go.

 

Additionally, I thought that if I had a streaming use-case (which I do), and a streaming data-access layer (which I do), then since `SELECT * FROM large` is absolutely fine, end-to-end, in that situation, then by symmetry and the principle of least astonishment `FETCH ALL FROM cursor` might be fine too.