Re: [PERFORM] Correct use of cursors for very large result sets in Postgres - Mailing list pgsql-performance

From Mike Beaton
Subject Re: [PERFORM] Correct use of cursors for very large result sets in Postgres
Date
Msg-id CAHzAAWRScAvevHu6AyK0ZZxnWo+SehXO9u51OSFvJEj9=61fXg@mail.gmail.com
Whole thread Raw
In response to [PERFORM] Correct use of cursors for very large result sets in Postgres  (Mike Beaton <mjsbeaton@gmail.com>)
Responses Re: [PERFORM] Correct use of cursors for very large result sets in Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I asked the same question at the same time on Stack Overflow (sincere apologies if this is a breach of etiquette - I really needed an answer, and I thought the two communities might not overlap).

Stackoverflow now has an answer, by me: http://stackoverflow.com/q/42292341/#42297234 - which is based on accumulating the most consistent, coherent information from the answers and comments given there so far.

I think this is right, and I happily repeat it below, for anyone finding my question on this list. But I would still *love* to find official PostgreSQL documentation of all this. And of course to be told - quickly! - if anyone knows it is still wrong.

***The answer is:***

**Q1:** For `SELECT * FROM AstronomicallyHugeTable` sent over the wire, then PostgreSQL will *not* generate a huge buffer, and will stream the data efficiently, starting quickly, to the client.

**Q2:** For `FETCH ALL FROM CursorToAstronomicallyHugeTable` sent over the wire, then PostgreSQL will also *not* generate a huge buffer, and also will stream the data efficiently, starting quickly, to the client.

**Implications of this for `FETCH ALL FROM cursor`**

IF (and this is a big if) you have client software which is NOT going to store all the fetched data anywhere, but is just trying to do something with it row by row (and this presupposes that your data access layer supports this, which Npgsql does), then there is nothing wrong with `FETCH ALL FROM cursor`. No huge buffers anywhere. No long setup time. Processing huge data this way will certainly run for a very long time - or at least until the user or some other condition aborts the process, and the cursor can be closed. But it will start to run quickly, and its usage of resources will be efficient.

**WARNINGS**

It would *never* make sense to do `FETCH ALL FROM cursor` for astronomically large data, if your client side code (including your data access layer) has any bottleneck at all at which means that all the data from a command is fetched before any processing can be done. Many data access layers (and especially data access wrappers) are like this. So beware. But it is also true that not all client side code is made this way.

Returning huge data using a `TABLE` or `SETOF` return type from within a PostgeSQL function will *always* be broken (i.e. will create a huge buffer and take a very long time to start). This will be so whether the function is called from SQL to SQL or called over the wire. The bottleneck is before the function returns. For efficient returns of very large data sets you must use a cursor return from a function (or else do `SELECT *` directly over the wire), in every case.

pgsql-performance by date:

Previous
From: Mike Beaton
Date:
Subject: [PERFORM] Correct use of cursors for very large result sets in Postgres
Next
From: Tom Lane
Date:
Subject: Re: [PERFORM] Correct use of cursors for very large result sets in Postgres