Re: [PERFORM] Correct use of cursors for very large result sets in Postgres - Mailing list pgsql-performance

From Mike Beaton
Subject Re: [PERFORM] Correct use of cursors for very large result sets in Postgres
Date
Msg-id CAHzAAWTRCdbNkQqfS74jAtoT8ThBnQAuDm+=Uw-JOfgADQ3jtQ@mail.gmail.com
Whole thread Raw
In response to Re: [PERFORM] Correct use of cursors for very large result sets in Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PERFORM] Correct use of cursors for very large result sets in Postgres  (Mike Beaton <mjsbeaton@gmail.com>)
Re: [PERFORM] Correct use of cursors for very large result sets in Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
> Seems odd.  Is your cursor just on "SELECT * FROM table", or is there
> some processing in there you're not mentioning?  Maybe it's a cursor
> WITH HOLD and you're exiting the source transaction?

Hi Tom,

I've deleted my own Stack Overflow answer in favour of Laurenz Albe's one.

New TL;DR (I'm afraid): PostgreSQL is always generating a huge buffer file on `FETCH ALL FROM CursorToHuge`.

The test data is created by:

`SELECT * INTO large FROM generate_series(1, 10000000) id;`

The test function to generate the cursor is:

````
CREATE OR REPLACE FUNCTION lump() RETURNS refcursor
   LANGUAGE plpgsql AS
$$DECLARE
   c CURSOR FOR SELECT id FROM large;
BEGIN
   c := 'c';
   OPEN c;
   RETURN c;
END;$$;
````

The two tests are:

`SELECT * FROM large;`

Result: no buffer file.

And:

````
BEGIN;
SELECT lump();
FETCH ALL FROM c;
COMMIT;
````

Result: 14MB buffer, every time.

The buffer file appears in `base\pgsql_tmp` while the data is streaming but only appears in the Postgres log file at the point when it is released (which makes sense, as its final size is part of the log row).

This has the additionally confusing result that the buffer file is reported in the Postgres logs just before the user sees the first row of data on `psql` (and on anything using `libpq`), but just after the user sees the last row of data, on any client program which is streaming the data via a streaming data access layer (such as `Npgsql`, or `JDBC` with the right configuration).

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres
Next
From: Diego Vargas
Date:
Subject: [PERFORM] Query Performance