[BUGS] BUG #14561: Unexpected disk buffer in FETCH handling - Mailing list pgsql-bugs

From mjsbeaton@gmail.com
Subject [BUGS] BUG #14561: Unexpected disk buffer in FETCH handling
Date
Msg-id 20170221132020.1265.40782@wrigleys.postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      14561
Logged by:          Mike Beaton
Email address:      mjsbeaton@gmail.com
PostgreSQL version: 9.6.2
Operating system:   All
Description:

I have what I believe is a bug with `FETCH` on PostgreSQL.

To reproduce this, I first generate a cursor to a 10,000,000 row test table
(all test code below).

If I then do `FETCH ALL FROM cursor` this generates a 140MB disk buffer on
the PostgreSQL server itself.

If I do `FETCH 5000000 FROM cursor` instead, this generates a 70MB disk
buffer on PostgreSQL.

If I execute `SELECT * FROM table` this generates no disk buffer.

In all cases the correct data streams back to me. Note that the size of the
generated buffer from the `FETCH` statements is not dependent on how much of
the potential data I actually stream before closing the cursor.

The cursor is generated within a PL/pgSQL function (see below), but it is a
cursor to a simple SELECT, and the SELECT itself does not involve an use of
RETURN or RETURN QUERY.

This seems to imply that, for any cursor FETCH on PostgreSQL, the entire
dataset which could be returned is first buffered on disk and then results
are streamed from this buffer to the client; or at least that, for some
other reason, a buffer which is proprtional in size to the entire FETCH size
is created. Neither is what I would expect.

It is not really relevant but, in all cases, I am testing this using
streaming client code (do something with each row then discard the data),
which is built on a streaming client library (Npgsql, in fact) and not on
the non-streaming libpq, so there is no issue of huge data use on the client
side of this (just long execution time). I believed that `FETCH ALL FROM
cursor` should be a reasonable pattern on such a client, even for cursors to
huge data, but this issue means that it is not.

(I initially raised this issue on
http://stackoverflow.com/questions/42292341/ - the test code below was
produced by Laurenz Albe in response, in order to show me how to answer my
own question.)

--

Test data table generated by:

`SELECT * INTO large FROM generate_series(1, 10000000) id;`

The test function to return cursor to SELECT *:

````
CREATE OR REPLACE FUNCTION lump() RETURNS refcursor
   LANGUAGE plpgsql AS
$$DECLARE
   c CURSOR FOR SELECT id FROM large;
BEGIN
   c := 'c';
   OPEN c;
   RETURN c;
END;$$;
````

Test 1:

`SELECT * FROM large;`

Result: no buffer file.

Test 2:

````
BEGIN;
SELECT lump();
FETCH ALL FROM c;
COMMIT;
````

Result: 140MB buffer.

Test 3:

````
BEGIN;
SELECT lump();
FETCH 5000000 FROM c;
COMMIT;
````

Result: 70MB buffer.

The size of the buffer file is NOT dependent on how many rows are actually
streamed before closing the cursor.

The buffer file appears in `base\pgsql_tmp` while the data is streaming,
but
only appears in the Postgres log file at the point when the cursor is
released
(which makes sense, as its final size is part of the log information).

This has the additionally confusing result that the buffer file is
reported
in the Postgres logs just before the user sees the first row of data, if
trying the above tests on
`psql` and on anything using `libpq`, but just after the user sees the
last row of data on any client program which is streaming the data via a
streaming data access layer (such as `Npgsql`, or `JDBC` with the right
configuration).


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: amos_operation@swiss-as.com
Date:
Subject: [BUGS] BUG #14560: FK not valid are ignored during transactions
Next
From: "David G. Johnston"
Date:
Subject: Re: [BUGS] BUG #14560: FK not valid are ignored during transactions