Thread: Is the PL/pgSQL refcursor useful in a modern three-tier app?
Section "43.7. Cursors” in the PL/pgSQL chapter of the doc (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS)starts with this: « Rather than executing a whole query at once, it is possible to set up a cursor that encapsulates the query, and then readthe query result a few rows at a time. One reason for doing this is to avoid memory overrun when the result containsa large number of rows. (However, PL/pgSQL users do not normally need to worry about that, since FOR loops automaticallyuse a cursor internally to avoid memory problems.) A more interesting usage is to return a reference to a cursorthat a function has created, allowing the caller to read the rows. This provides an efficient way to return large rowsets from functions. » On its face, it seems to make sense. And I’ve written a few proof-of-concept tests. For example, I wrote a “security definer”function that's owned by a role that can select from the relevant table(s) that returns refcursor. And I called itfrom a subprogram that's owned by a role that cannot select from the relevant table(s) to loop through the rows. But Ican't convince myself that this division of labor is useful. And especially I can't convince myself that the "pipeling"capability is relevant in a three-tier app with a stateless browser UI. Here, the paradigm has the client-sideapp checking out a connection from the pool, generating the entire response to the end-user's request, releasingthe connection, and sending the response back to the browser. This paradigm isn't consistent with allowing the enduser to navigate forwards and backwards in a scrollable cursor that is somehow held in its open state in in the severby the the middle tier client on behalf of a browser session that comes back time and again to its dedicated middletier client and thence yo its dedicated database server session. (Anyway, without anything like Oracle PL/SQL's packages,you have no mechanism to hold the opened cursor variable between successive server calls.) Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in very special use-cases?
On 3/14/23 17:50, Bryn Llewellyn wrote: > Section "43.7. Cursors” in the PL/pgSQL chapter of the doc (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS)starts with this: > > « > Rather than executing a whole query at once, it is possible to set up a cursor that encapsulates the query, and then readthe query result a few rows at a time. One reason for doing this is to avoid memory overrun when the result containsa large number of rows. (However, PL/pgSQL users do not normally need to worry about that, since FOR loops automaticallyuse a cursor internally to avoid memory problems.) A more interesting usage is to return a reference to a cursorthat a function has created, allowing the caller to read the rows. This provides an efficient way to return large rowsets from functions. > » > > On its face, it seems to make sense. And I’ve written a few proof-of-concept tests. For example, I wrote a “security definer”function that's owned by a role that can select from the relevant table(s) that returns refcursor. And I called itfrom a subprogram that's owned by a role that cannot select from the relevant table(s) to loop through the rows. But Ican't convince myself that this division of labor is useful. And especially I can't convince myself that the "pipeling"capability is relevant in a three-tier app with a stateless browser UI. Here, the paradigm has the client-sideapp checking out a connection from the pool, generating the entire response to the end-user's request, releasingthe connection, and sending the response back to the browser. This paradigm isn't consistent with allowing the enduser to navigate forwards and backwards in a scrollable cursor that is somehow held in its open state in in the severby the the middle tier client on behalf of a browser session that comes back time and again to its dedicated middletier client and thence yo its dedicated database server session. (Anyway, without anything like Oracle PL/SQL's packages,you have no mechanism to hold the opened cursor variable between successive server calls.) I guess that would depend on how you define a server call: https://www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING "Once a cursor has been opened, it can be manipulated with the statements described here. These manipulations need not occur in the same function that opened the cursor to begin with. You can return a refcursor value out of a function and let the caller operate on the cursor. (Internally, a refcursor value is simply the string name of a so-called portal containing the active query for the cursor. This name can be passed around, assigned to other refcursor variables, and so on, without disturbing the portal.) All portals are implicitly closed at transaction end. Therefore a refcursor value is usable to reference an open cursor only until the end of the transaction." > > Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in very special use-cases? > -- Adrian Klaver adrian.klaver@aklaver.com
adrian.klaver@aklaver.com wrote:bryn@yugabyte.com wrote:Section "43.7. Cursors” in the PL/pgSQL chapter of the doc (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS) starts with this:
«
Rather than executing a whole query at once, it is possible to set up a cursor that encapsulates the query, and then read the query result a few rows at a time. One reason for doing this is to avoid memory overrun when the result contains a large number of rows. (However, PL/pgSQL users do not normally need to worry about that, since FOR loops automatically use a cursor internally to avoid memory problems.) A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions.
»
On its face, it seems to make sense. And I’ve written a few proof-of-concept tests. For example, I wrote a “security definer” function that's owned by a role that can select from the relevant table(s) that returns refcursor. And I called it from a subprogram that's owned by a role that cannot select from the relevant table(s) to loop through the rows. But I can't convince myself that this division of labor is useful. And especially I can't convince myself that the "pipeling" capability is relevant in a three-tier app with a stateless browser UI. Here, the paradigm has the client-side app checking out a connection from the pool, generating the entire response to the end-user's request, releasing the connection, and sending the response back to the browser. This paradigm isn't consistent with allowing the end user to navigate forwards and backwards in a scrollable cursor that is somehow held in its open state in in the sever by the the middle tier client on behalf of a browser session that comes back time and again to its dedicated middle tier client and thence yo its dedicated database server session. (Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls.)
I guess that would depend on how you define a server call:
www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING
"Once a cursor has been opened, it can be manipulated with the statements described here.
These manipulations need not occur in the same function that opened the cursor to begin with. You can return a refcursor value out of a function and let the caller operate on the cursor. (Internally, a refcursor value is simply the string name of a so-called portal containing the active query for the cursor. This name can be passed around, assigned to other refcursor variables, and so on, without disturbing the portal.)
All portals are implicitly closed at transaction end. Therefore a refcursor value is usable to reference an open cursor only until the end of the transaction."Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in very special use-cases?
Ah… I see. I had read this wrongly:
« Internally, a refcursor value is simply the string name of a so-called portal containing the active query for the cursor. This name can be passed around, assigned to other refcursor variables, and so on, without disturbing the portal. »
I thought that it was an "under the hood" explanation and that the only thing that you could see after entry into the PL/pgSQL subprogram that will be the top of the stack would be an opaque value (sometimes called a handle in other contexts) that was accessible only from PL/pgSQL.
I hadn't yet tried this in psql:
create schema s;
create table s.t(k int primary key, v text not null);
insert into s.t(k, v) values (1, 'cat'), (2, 'dog');
create function s.f(k_in in int)
returns refcursor
language plpgsql
as $body$
declare
s_f_cur cursor(k_in int) for select v from s.t where k = k_in;
begin
open s_f_cur(k_in);
return s_f_cur;
end;
$body$;
create function s.g(cur in refcursor)
returns text
language plpgsql
as $body$
declare
v text;
begin
fetch cur into v;
return v;
end;
$body$;
begin;
select ''''||s.f(1)||'''' as cur
\gset
select s.g(:cur) as result;
end;
create table s.t(k int primary key, v text not null);
insert into s.t(k, v) values (1, 'cat'), (2, 'dog');
create function s.f(k_in in int)
returns refcursor
language plpgsql
as $body$
declare
s_f_cur cursor(k_in int) for select v from s.t where k = k_in;
begin
open s_f_cur(k_in);
return s_f_cur;
end;
$body$;
create function s.g(cur in refcursor)
returns text
language plpgsql
as $body$
declare
v text;
begin
fetch cur into v;
return v;
end;
$body$;
begin;
select ''''||s.f(1)||'''' as cur
\gset
select s.g(:cur) as result;
end;
I just did. And the result of "select s.g(:cur)" was the expected "cat".
It requires a leap of imagination, or help from the pgsql-general list, to get to this. So thanks!
Might the doc add an example like this?
Of course, it all falls into place now. I can see how I could write a client app in, say, Python to write a humongous report to a file by fetching manageably-sized chunks, time and again until done with a function like my "g()" here, from a cursor that I'd opened using a function like my "f()".
B.t.w., when I said "top-level call", I meant the SQL statement that a client issues—in this case most likely "select my_plpgsql_function()" or "call my_plpgsql_procedure()". That top-of-stack subprogram can invoke other subprograms and so on ad infinitum. But eventually the whole stack empties and control passes back to the client. But all that falls away now with the exampe I showed in place.
On 3/14/23 20:29, Bryn Llewellyn wrote: >> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> wrote: >> >>> bryn@yugabyte.com <mailto:bryn@yugabyte.com> wrote: >>> >>> Section "43.7. Cursors” in the PL/pgSQL chapter of the doc >>> (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR- >>> <http://www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR->DECLARATIONS) starts with this: >>> « >>> Rather than executing a whole query at once, it is possible to set up >>> a cursor that encapsulates the query, and then read the query result >>> a few rows at a time. One reason for doing this is to avoid memory >>> overrun when the result contains a large number of rows. (However, >>> PL/pgSQL users do not normally need to worry about that, since FOR >>> loops automatically use a cursor internally to avoid memory >>> problems.) A more interesting usage is to return a reference to a >>> cursor that a function has created, allowing the caller to read the >>> rows. This provides an efficient way to return large row sets from >>> functions. >>> » >>> On its face, it seems to make sense. And I’ve written a few >>> proof-of-concept tests. For example, I wrote a “security definer” >>> function that's owned by a role that can select from the relevant >>> table(s) that returns refcursor. And I called it from a subprogram >>> that's owned by a role that cannot select from the relevant table(s) >>> to loop through the rows. But I can't convince myself that this >>> division of labor is useful. And especially I can't convince myself >>> that the "pipeling" capability is relevant in a three-tier app with a >>> stateless browser UI. Here, the paradigm has the client-side app >>> checking out a connection from the pool, generating the entire >>> response to the end-user's request, releasing the connection, and >>> sending the response back to the browser. This paradigm isn't >>> consistent with allowing the end user to navigate forwards and >>> backwards in a scrollable cursor that is somehow held in its open >>> state in in the sever by the the middle tier client on behalf of a >>> browser session that comes back time and again to its dedicated >>> middle tier client and thence yo its dedicated database server >>> session. (Anyway, without anything like Oracle PL/SQL's packages, you >>> have no mechanism to hold the opened cursor variable between >>> successive server calls.) >> >> I guess that would depend on how you define a server call: >> >> www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING <http://www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-USING> >> >> "Once a cursor has been opened, it can be manipulated with the >> statements described here. >> >> These manipulations need not occur in the same function that opened >> the cursor to begin with. You can return a refcursor value out of a >> function and let the caller operate on the cursor. (Internally, a >> refcursor value is simply the string name of a so-called portal >> containing the active query for the cursor. This name can be passed >> around, assigned to other refcursor variables, and so on, without >> disturbing the portal.) >> >> All portals are implicitly closed at transaction end. Therefore a >> refcursor value is usable to reference an open cursor only until the >> end of the transaction." >> >>> Is it fair to say that the PL/pgSQL refcursor is useful, at best, >>> only in very special use-cases? > > Ah… I see. I had read this wrongly: > >> « Internally, a refcursor value is simply the string name of a >> so-called portal containing the active query for the cursor. This name >> can be passed around, assigned to other refcursor variables, and so >> on, without disturbing the portal. » > > I thought that it was an "under the hood" explanation and that the only > thing that you could see after entry into the PL/pgSQL subprogram that > will be the top of the stack would be an opaque value (sometimes called > a handle in other contexts) that was accessible only from PL/pgSQL. > I am not sure how this: "These manipulations need not occur in the same function that opened the cursor to begin with. You can return a refcursor value out of a function and let the caller operate on the cursor. ..." could be any clearer. > I just did. And the result of "select s.g(:cur)" was the expected "cat". > > It requires a leap of imagination, or help from the pgsql-general list, > to get to this. So thanks! > > Might the doc add an example like this? 43.7.3.5. Returning Cursors "PL/pgSQL functions can return cursors to the caller. This is useful to return multiple rows or columns, especially with very large result sets. To do this, the function opens the cursor and returns the cursor name to the caller (or simply opens the cursor using a portal name specified by or otherwise known to the caller). The caller can then fetch rows from the cursor. The cursor can be closed by the caller, or it will be closed automatically when the transaction closes." And then a series of examples on how to do that. I have a hard time fathoming why someone who writes documentation does not actually read documentation. > > Of course, it all falls into place now. I can see how I could write a > client app in, say, Python to write a humongous report to a file by > fetching manageably-sized chunks, time and again until done with a > function like my "g()" here, from a cursor that I'd opened using a > function like my "f()". > > B.t.w., when I said "top-level call", I meant the SQL statement that a > client issues—in this case most likely "select my_plpgsql_function()" or > "call my_plpgsql_procedure()". That top-of-stack subprogram can invoke > other subprograms and so on ad infinitum. But eventually the whole stack > empties and control passes back to the client. But all that falls away > now with the exampe I showed in place. > -- Adrian Klaver adrian.klaver@aklaver.com
On Tue, 2023-03-14 at 17:50 -0700, Bryn Llewellyn wrote: > Section "43.7. Cursors” in the PL/pgSQL chapter of the doc > (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS) starts with this: > > « > [...] > A more interesting usage is to return a reference to a cursor that a function has created, > allowing the caller to read the rows. This provides an efficient way to return large row > sets from functions. > » > > I can't convince myself that this division of labor is useful. And especially I can't convince > myself that the "pipeling" capability is relevant in a three-tier app with a stateless browser UI. You seem to think that a client request corresponds to a single database request, but that doesn't have to be the case. Satisfying a client request can mean iterating through a result set. Cursors shine wherever you need procedural processing of query results, or where you don't need the complete result set, but cannot tell in advance how much you will need, or where you need to scroll and move forward and backward through a result set. Yours, Laurenz Albe
> laurenz.albe@cybertec.at wrote: > >> bryn@yugabyte.com wrote: >> >> Section "43.7. Cursors” in the PL/pgSQL chapter of the doc >> (www.postgresql.org/docs/current/plpgsql-cursors.html#PLPGSQL-CURSOR-DECLARATIONS) starts with this: >> >> « >> [...] >> A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to readthe rows. This provides an efficient way to return large row sets from functions. >> » >> >> I can't convince myself that this division of labor is useful. And especially I can't convince myself that the "pipeling"capability is relevant in a three-tier app with a stateless browser UI. > > You seem to think that a client request corresponds to a single database request, but that doesn't have to be the case. Satisfying a client request can mean iterating through a result set. > > Cursors shine wherever you need procedural processing of query results, or where you don't need the complete result set,but cannot tell in advance how much you will need, or where you need to scroll and move forward and backward througha result set. Re « You seem to think that a client request corresponds to a single database request », I meant no more than what psql modelswhen you hit "return"after terminating an ordinary SQL statement with semi-colon (i.e. not a meta-command). I used"client" to mean a program that connects directly to the PostgreSQL server (using TCP/IP)—and not the ultimate humanclient who's using, say, a browser. Perhaps I should have said "top-level SQL call". (To keep this simple, let's assumethat I don't use "backslash-semicolon" to send two or more top-level SQL calls in one round trip.) So in this example: begin; declare cur cursor for select k, v from s.t order by k; fetch forward 10 in cur; fetch absolute 90 in cur; fetch forward 10 in cur; commit; where I hit "return" after each semicolon, I'd say that I made six top-level SQL calls—and that lines up with the fact thateach of "begin". "declare", "fetch", and "commit" has its own page in the "SQL Commands" section of the PG doc. It sounds like you prefer "database request" for this. Is that right? I was trying to get at the distinction between issuing a particular SQL statement as a top-level SQL call and issuing itfrom a PL/pgSQL subprogram. I've read wording like what you wrote following "Cursors shine wherever you need …" elsewhere. But I can't picture a concreteuse case where, not withstanding the "where" restriction that my "select" used, I can't tell how much of the resultset I'll need or where reading result #n1 informs me that I next need to scroll and read result #n2. So I was lookingfor a convincing example.
On 3/15/23 13:37, Bryn Llewellyn wrote: >> laurenz.albe@cybertec.at wrote: >> > Re « You seem to think that a client request corresponds to a single database request », I meant no more than what psqlmodels when you hit "return"after terminating an ordinary SQL statement with semi-colon (i.e. not a meta-command). Iused "client" to mean a program that connects directly to the PostgreSQL server (using TCP/IP)—and not the ultimate humanclient who's using, say, a browser. Perhaps I should have said "top-level SQL call". (To keep this simple, let's assumethat I don't use "backslash-semicolon" to send two or more top-level SQL calls in one round trip.) > > So in this example: > > begin; > declare cur cursor for select k, v from s.t order by k; > fetch forward 10 in cur; > fetch absolute 90 in cur; > fetch forward 10 in cur; > commit; > > where I hit "return" after each semicolon, I'd say that I made six top-level SQL calls—and that lines up with the factthat each of "begin". "declare", "fetch", and "commit" has its own page in the "SQL Commands" section of the PG doc. > > It sounds like you prefer "database request" for this. Is that right? > > I was trying to get at the distinction between issuing a particular SQL statement as a top-level SQL call and issuing itfrom a PL/pgSQL subprogram. > > I've read wording like what you wrote following "Cursors shine wherever you need …" elsewhere. But I can't picture a concreteuse case where, not withstanding the "where" restriction that my "select" used, I can't tell how much of the resultset I'll need or where reading result #n1 informs me that I next need to scroll and read result #n2. So I was lookingfor a convincing example. Huh? You provided your own example earlier: "Of course, it all falls into place now. I can see how I could write a client app in, say, Python to write a humongous report to a file by fetching manageably-sized chunks, time and again until done with a function like my "g()" here, from a cursor that I'd opened using a function like my "f()"." > > > > > > -- Adrian Klaver adrian.klaver@aklaver.com
> adrian.klaver@aklaver.com wrote: > >> bryn@yugabyte.com wrote: >> >>> laurenz.albe@cybertec.at wrote: >>> >>> You seem to think that a client request corresponds to a single database request >> >> …I can’t picture a concrete use case where, not withstanding the "where" restriction that my "select" used, I can't tellhow much of the result set I'll need or where reading result #n1 informs me that I next need to scroll and read result#n2. So I was looking for a convincing example. > > Huh? > > You provided your own example earlier: > > "Of course, it all falls into place now. I can see how I could write a client app in, say, Python to write a humongousreport to a file by fetching manageably-sized chunks, time and again until done with a function like my "g()" here,from a cursor that I'd opened using a function like my "f()"." My “Humongous report via client-side Python” example doesn’t call for me to abandon it part way through. Nor does it callfor me to leap forwards as I discover facts along the way that make me realize that I need immediately to see a far distantfact by scrolling to where it is (and especially by scrolling backwards to what I’ve already seen). It was an exampleof this that I was asking for. The bare ability to do controlled piecewise materialization and fetch is clear.
adrian.klaver@aklaver.com wrote:I have a hard time fathoming why someone who writes documentation does not actually read documentation.
Ouch. In fact, I had read the whole of the "43.7. Cursors" section in the "PL/pgSQL" chapter (www.postgresql.org/docs/15/plpgsql-cursors.html). And the sections in the "SQL Commands" chapter for "declare", "fetch" and "close". But several of the key concepts didn't sink in and this prevented me not only from understanding what some of the examples showed but, worse, from being able to use the right vocabulary to express what confused me.
It's very much clearer now than when I started this thread, about twenty-four hours ago. Here's (some of) what I believe that I now understand.
"refcursor" is a base type, listed in pg_type. This sentence seems to be key:
«
A refcursor value is simply the string name of a so-called portal containing the active query for the cursor. This name can be passed around, assigned to other refcursor variables, and so on, without disturbing the portal.
»
Maybe it's better to say that a portal has a defining "select" statement and acts as a pointer to the potential result set that its select statement defines. A portal also represents the position of the current (next-to-be-fetched) row it that set. The doc that I've found doesn't make it clear how much of the entire result set is materialized at a time. But the implication is that it's materialized only in portions and that one portion is purged to make room for another.
You can create a portal instance using either top-level SQL (with the "declare" statement) or using PL/pgSQL by declaring a refcursor variable for its name and then using that as the argument of "open". Only in top-level SQL, the "with hold" option for "declare" lets you create a portal instance outside of a transaction block. This has session duration. (Or you can pre-empt this with the "close" statement.) Otherwise, you must use the "declare" statement within an ongoing transaction. With this choice, it vanishes when the transaction ends. You can also create a portal instance by using PL/pgSQL. (There's no "with hold" option here.)
A portal instance exists within the session as a whole, even though you can declare the refcursor to denote it as a PL/pgSQL subprogram's formal parameter or as a PL/pgSQL local variable. This means that you can create a portal instance using PL/pgSQL and (when you know its name) fetch from it using top-level SQL
The open portal instances in a particular session are listed in pg_cursors. (Why not pg_portals?) When the instance was created with the SQL "declare" statement, pg_cursors.statement shows the verbatim text that follows the "declare" keyword. (In other words, not a legal SQL statement.) When the instance was created using PL/pgSQL, pg_cursors.statement shows the verbatim text that follows (in one creation approach variant) "open <identifier> for" in the defining block statement's executable section. (In other words, and with a caveat about placeholders, this is a legal SQL statement.)
A portal instance is uniquely identified by just its name. (You cannot use a schema-qualified identifier to create it or to refer to it.) And (just like a prepared statement) the name must be unique only within a particular session.
A portal instance is uniquely identified by just its name. (You cannot use a schema-qualified identifier to create it or to refer to it.) And (just like a prepared statement) the name must be unique only within a particular session.
There are many ways to set the name of a portal instance. Here are some examples. First top-level SQL:
begin;
declare "My Refcursor" cursor for select k, v from s.t order by k;
select name, statement from pg_cursors;
fetch forward 5 in "My Refcursor";
end;
declare "My Refcursor" cursor for select k, v from s.t order by k;
select name, statement from pg_cursors;
fetch forward 5 in "My Refcursor";
end;
I noticed that 'select pg_typeof("My Refcursor")' within the ongoing txn fails with '42703: column "My Refcursor" does not exist'.
Now, PL/pgSQL:
create function s.f(cur in refcursor = 'cur')
returns refcursor
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
begin
open cur for select k, v from s.t order by k;
return cur;
end;
$body$;
returns refcursor
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
begin
open cur for select k, v from s.t order by k;
return cur;
end;
$body$;
begin;
select s.f('My Cursor');
select name, statement from pg_cursors;
fetch forward 5 in "My Cursor";
end;
select s.f('My Cursor');
select name, statement from pg_cursors;
fetch forward 5 in "My Cursor";
end;
Arguably, it's pointless to use a function to return the name of the portal instance that you supplied as an input—and you might just as well write this:
create procedure s.p(cur in refcursor = 'cur')
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
begin
open cur for select k, v from s.t order by k;
end;
$body$;
begin;
call s.p('My Cursor');
select name, statement from pg_cursors;
fetch forward 5 in "My Cursor";
end;
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
begin
open cur for select k, v from s.t order by k;
end;
$body$;
begin;
call s.p('My Cursor');
select name, statement from pg_cursors;
fetch forward 5 in "My Cursor";
end;
You could sacrifice the ability to name the portal instance at runtime like this:
create procedure s.p()
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
cur refcursor := 'My Cursor';
begin
open cur for select k, v from s.t order by k;
end;
$body$;
begin;
call s.p();
select name, statement from pg_cursors;
fetch forward 5 in "My Cursor";
end;
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
cur refcursor := 'My Cursor';
begin
open cur for select k, v from s.t order by k;
end;
$body$;
begin;
call s.p();
select name, statement from pg_cursors;
fetch forward 5 in "My Cursor";
end;
You can even let the runtime system make up a name for you. But you need to go back to the function encapsulation to learn what was chosen:
create function s.f()
returns refcursor
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
cur refcursor;
begin
open cur for select k, v from s.t order by k;
return cur;
end;
$body$;
begin;
select s.f();
select name, statement from pg_cursors;
fetch forward 5 in "<unnamed portal 1>";
end;
returns refcursor
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
cur refcursor;
begin
open cur for select k, v from s.t order by k;
return cur;
end;
$body$;
begin;
select s.f();
select name, statement from pg_cursors;
fetch forward 5 in "<unnamed portal 1>";
end;
Here's yet another variant:
create procedure s.p()
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
"My Refcursor" cursor for select k, v from s.t order by k;
begin
open "My Refcursor";
$body$;
begin;
call s.p();
select name, statement from pg_cursors;
fetch forward 5 in "My Refcursor";
end;
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
"My Refcursor" cursor for select k, v from s.t order by k;
begin
open "My Refcursor";
raise info '%', pg_typeof("My Refcursor")::text;
end;$body$;
begin;
call s.p();
select name, statement from pg_cursors;
fetch forward 5 in "My Refcursor";
end;
(I included "pg_typeof()" just here to make the point that it reports "refcursor" and not the plain "cursor" that the declaration might lead you to expect. It reports "refcursor" in all the other PL/pgSQL examples too.
With all these variants (and there may be more), and with only some of the exemplified, I don't feel too stupid for getting confused.
Well, it is simple. As we wrote, some of us think that cursors are useful, and we tried to explain why we think that. If you don't think that cursors are useful, don't use them. We are not out to convince you otherwise. Yours, Laurenz Albe
On Thu, Mar 16, 2023 at 6:48 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
As we wrote, some of us think that cursors are useful, and we tried to
explain why we think that. If you don't think that cursors are useful,
don't use them. We are not out to convince you otherwise.
Perhaps OT (I only skimed this thread) but when I compared Cursors to regular Statements / Queries
from a LIBPQ client application perspective, on the same "streamable" queries (i.e. w/o a sort), Cursor
shined in terms of time-to-first-row, compared to waiting for the whole ResultSet, but getting the full result
OTOH was 2x as long with Cursor, compared to the regular SELECT Statement.
Thus in my mind, it really depends on what you value in a particular situation, latency or throughput. --DD
PS: In my testing, I used forward-only cursors
PPS: I don't recall the ResultSet cardinality or byte size, nor the batching used with the Cursor.
čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <ddevienne@gmail.com> napsal:
On Thu, Mar 16, 2023 at 6:48 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:As we wrote, some of us think that cursors are useful, and we tried to
explain why we think that. If you don't think that cursors are useful,
don't use them. We are not out to convince you otherwise.Perhaps OT (I only skimed this thread) but when I compared Cursors to regular Statements / Queriesfrom a LIBPQ client application perspective, on the same "streamable" queries (i.e. w/o a sort), Cursorshined in terms of time-to-first-row, compared to waiting for the whole ResultSet, but getting the full resultOTOH was 2x as long with Cursor, compared to the regular SELECT Statement.Thus in my mind, it really depends on what you value in a particular situation, latency or throughput. --DD
cursors are optimized for minimal cost of first row, queries are optimized for minimal cost of last row
Regards
Pavel
PS: In my testing, I used forward-only cursorsPPS: I don't recall the ResultSet cardinality or byte size, nor the batching used with the Cursor.
On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <ddevienne@gmail.com> napsal:[...] depends on what you value in a particular situation, latency or throughput. --DDcursors are optimized for minimal cost of first row, queries are optimized for minimal cost of last row
That's a nice way to put it Pavel.
And to have it both ways, use COPY in binary protocol? That way the rows are streamed
to you in arbitrary chunks as soon as available (I hope), and the burden is on you the
client to decode and use those rows in parallel as they are "streamed" to you.
I've yet to test that (thus the 'i hope' above). I used COPY binary for INSERTs,
and COPY text/json for SELECTs, not yet COPY binary for SELECTs. I'm hoping
the latency of COPY will be small compared to a regular SELECT where I have to
wait for LIBPQ to assemble the whole ResultSet. Are my hopes unfounded? --DD
čt 16. 3. 2023 v 9:55 odesílatel Dominique Devienne <ddevienne@gmail.com> napsal:
On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <ddevienne@gmail.com> napsal:[...] depends on what you value in a particular situation, latency or throughput. --DDcursors are optimized for minimal cost of first row, queries are optimized for minimal cost of last rowThat's a nice way to put it Pavel.And to have it both ways, use COPY in binary protocol? That way the rows are streamedto you in arbitrary chunks as soon as available (I hope), and the burden is on you theclient to decode and use those rows in parallel as they are "streamed" to you.I've yet to test that (thus the 'i hope' above). I used COPY binary for INSERTs,and COPY text/json for SELECTs, not yet COPY binary for SELECTs. I'm hopingthe latency of COPY will be small compared to a regular SELECT where I have towait for LIBPQ to assemble the whole ResultSet. Are my hopes unfounded? --DD
COPY is a different creature - it has no execution plan, and it is not interpreted by the executor.
Using COPY SELECT instead SELECT looks like premature optimization. The performance benefit will be minimal (maybe there can be exceptions depending on data, network properties or interface). Cursors, queries can use binary protocol, if the client can support it.
Regards
Pavel
On Thu, Mar 16, 2023 at 11:15 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
čt 16. 3. 2023 v 9:55 odesílatel Dominique Devienne <ddevienne@gmail.com> napsal:On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <ddevienne@gmail.com> napsal:[...] depends on what you value in a particular situation, latency or throughput. --DDcursors are optimized for minimal cost of first row, queries are optimized for minimal cost of last rowThat's a nice way to put it Pavel.And to have it both ways, use COPY in binary protocol?COPY is a different creature - it has no execution plan, and it is not interpreted by the executor.
OK. Not sure what that means exactly. There's still a SELECT, with possibly WHERE clauses and/or JOINs, no?
Doesn't that imply an execution plan? I'm a bit confused.
Using COPY SELECT instead SELECT looks like premature optimization.
Possible. But this is not an e-commerce web-site with a PostgreSQL backend here.
This is classical client-server with heavy weight desktop apps loading heavy weight data
(in number and size) from PostgreSQL. So performance (throughput) does matter a lot to us.
And I measure that performance in both rows/sec and MB/sec, not (itsy bitsy) transactions / sec.
The performance benefit will be minimal ([...]).
COPY matters on INSERT for sure performance-wise.
So why wouldn't COPY matter for SELECTs too?
Cursors, queries can use binary protocol, if the client can support it.
I already do. But we need all the speed we can get.
In any case, I'll have to try and see/test for myself eventually.
We cannot afford to leave any performance gains on the table.
čt 16. 3. 2023 v 11:52 odesílatel Dominique Devienne <ddevienne@gmail.com> napsal:
On Thu, Mar 16, 2023 at 11:15 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:čt 16. 3. 2023 v 9:55 odesílatel Dominique Devienne <ddevienne@gmail.com> napsal:On Thu, Mar 16, 2023 at 9:23 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:čt 16. 3. 2023 v 9:18 odesílatel Dominique Devienne <ddevienne@gmail.com> napsal:[...] depends on what you value in a particular situation, latency or throughput. --DDcursors are optimized for minimal cost of first row, queries are optimized for minimal cost of last rowThat's a nice way to put it Pavel.And to have it both ways, use COPY in binary protocol?COPY is a different creature - it has no execution plan, and it is not interpreted by the executor.OK. Not sure what that means exactly. There's still a SELECT, with possibly WHERE clauses and/or JOINs, no?Doesn't that imply an execution plan? I'm a bit confused.Using COPY SELECT instead SELECT looks like premature optimization.Possible. But this is not an e-commerce web-site with a PostgreSQL backend here.This is classical client-server with heavy weight desktop apps loading heavy weight data(in number and size) from PostgreSQL. So performance (throughput) does matter a lot to us.And I measure that performance in both rows/sec and MB/sec, not (itsy bitsy) transactions / sec.The performance benefit will be minimal ([...]).COPY matters on INSERT for sure performance-wise.So why wouldn't COPY matter for SELECTs too?
Please, can you show some benchmarks :-) I don't believe it.
The protocol is already designed for massive reading by queries. If COPY SELECT is significantly faster than SELECT, then some should be wrong on some side (server or client).
Regards
Pavel
Cursors, queries can use binary protocol, if the client can support it.I already do. But we need all the speed we can get.In any case, I'll have to try and see/test for myself eventually.We cannot afford to leave any performance gains on the table.
On 3/15/23 18:41, Bryn Llewellyn wrote: >> adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> wrote: >> >> I have a hard time fathoming why someone who writes documentation does >> not actually read documentation. > > Ouch. In fact, I had read the whole of the "43.7. Cursors" section in > the "PL/pgSQL" chapter (www.postgresql.org/docs/15/plpgsql-cursors.html > <http://www.postgresql.org/docs/15/plpgsql-cursors.html>). And the > sections in the "SQL Commands" chapter for "declare", "fetch" and > "close". But several of the key concepts didn't sink in and this > prevented me not only from understanding what some of the examples > showed but, worse, from being able to use the right vocabulary to > express what confused me. Given this from your original question: " (Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls.)" What part of this: CREATE TABLE test (col text); INSERT INTO test VALUES ('123'); CREATE FUNCTION reffunc(refcursor) RETURNS refcursor AS ' BEGIN OPEN $1 FOR SELECT col FROM test; RETURN $1; END; ' LANGUAGE plpgsql; BEGIN; SELECT reffunc('funccursor'); FETCH ALL IN funccursor; COMMIT; did not make sense in that context? > The open portal instances in a particular session are listed in > pg_cursors. (Why not pg_portals?) When the instance was created with the Why are tables also known as relations and you can look them up in pg_class or pg_tables? Answer: It is the rules of the game. > > create procedure s.p() > set search_path = pg_catalog, pg_temp > language plpgsql > as $body$ > declare > "My Refcursor" cursor for select k, v from s.t order by k; > begin > open "My Refcursor"; > * raise info '%', pg_typeof("My Refcursor")::text;* > end; > $body$; > > begin; > call s.p(); > select name, statement from pg_cursors; > fetch forward 5 in "My Refcursor"; > end; > > (I included "pg_typeof()" just here to make the point that it reports > "refcursor" and not the plain "cursor" that the declaration might lead > you to expect. It reports "refcursor" in all the other PL/pgSQL examples > too. https://www.postgresql.org/docs/current/plpgsql-cursors.html "All access to cursors in PL/pgSQL goes through cursor variables, which are always of the special data type refcursor. One way to create a cursor variable is just to declare it as a variable of type refcursor. Another way is to use the cursor declaration syntax, which in general is: name [ [ NO ] SCROLL ] CURSOR [ ( arguments ) ] FOR query; " Again, I would like to know how that is confusing? > > With all these variants (and there may be more), and with only some of > the exemplified, I don't feel too stupid for getting confused. > Where you get confused is in moving the goal posts. What starts out with: "(Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls.) Is it fair to say that the PL/pgSQL refcursor is useful, at best, only in very special use-cases?" evolves into deep dive into all thing cursors. -- Adrian Klaver adrian.klaver@aklaver.com
On 2023-03-16 11:52:47 +0100, Dominique Devienne wrote: > On Thu, Mar 16, 2023 at 11:15 AM Pavel Stehule <pavel.stehule@gmail.com> wrote: > čt 16. 3. 2023 v 9:55 odesílatel Dominique Devienne <ddevienne@gmail.com> > napsal: > That's a nice way to put it Pavel. > And to have it both ways, use COPY in binary protocol? [...] > The performance benefit will be minimal ([...]). > > COPY matters on INSERT for sure performance-wise. > So why wouldn't COPY matter for SELECTs too? COPY is faster than a bunch of INSERTs because each INSERT has some overhead: It needs to be parsed (if you PREPAREd the INSERT you need to parse the EXECUTE command instead) and planned. But most importantly you have a round trip time between the client and the server. With COPY you incur that overhead only once. (Which reminds me that I should benchmark INSERT with lots of VALUES against COPY some time.) With COPYing the output of a SELECT I don't see any savings. On the contrary, it's an extra step. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
> adrian.klaver@aklaver.com wrote: > >> bryn@yugabyte.com wrote: >> >>> adrian.klaver@aklaver.com wrote: >>> >>> I have a hard time fathoming why someone who writes documentation does not actually read documentation. >> >> >> Ouch. In fact, I had read the whole of the "43.7. Cursors" section in the "PL/pgSQL" chapter. And the sections in the"SQL Commands" chapter for "declare", "fetch" and "close". But several of the key concepts didn't sink in and this preventedme not only from understanding what some of the examples showed but, worse, from being able to use the right vocabularyto express what confused me. > > Given this from your original question: > > « Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable betweensuccessive server calls. » > > What part of this [a particular code example] did not make sense in that context? First off, despite the fact that I've clearly annoyed you (for which I apologize), I have found these exchanges very helpful.So thank you very much. Your questions can be summarized as "Why couldn't you understand the doc? And why did you mix questions about use-cases withquestions about the mechanics?" The answer has to do with psychology. I probably can't explain this convincingly. That'swhy it's taken me a long time to respond. I also had to do lots of testing before responding to make sure that themental model that I've formed for myself is consistent with these. You may not be interested in what follows. But, anyway, here goes. — I happen to have many years of experience with Oracle Database and PL/SQL. The latter has analogous features to PL/pgSQL's"refcursor". But the differences between the notions in the two environments are enormous. My attempt to understandthe latter was hindered by my understanding of the former. I accept that this is *my* problem and that I couldnever expect that the PG doc would cater for such a reader. — Oracle Database doesn't expose scrollability for PL/SQL's equivalent of "refcursor". So I had never come across use caseswhere this was beneficial. I wanted, therefore, to hear about some. I thought that insights here would help me understandthe mechanics. But I didn't get anything beyond "Scrollability is what it is. If you don't need it, don't use it." Anyway, never mind all that now. Here's what I now (think that) I understand—with some comments on what made it hard forme to grasp. — The key notion is what is referred to sometimes as "portal" and sometimes as "cursor". This is the thing that's globalwithin, and private to, a session, that's uniquely identified by a bare name, that, and that's listed in "pg_cursors".I believe that in typical use, a cursor has only transaction duration. But (and only when you use the SQL API)you can create a cursor with (up to) session duration — The doc pages for the "declare", "fetch", and "close" SQL statements don't mention "portal" and use only "cursor". Theyuse the term to mean the underlying phenomenon and use wording like: "DECLARE allows a user to create cursors"; "Youcan see all available cursors by querying the pg_cursors system view"; "FETCH retrieves rows using a previously-createdcursor"; "CLOSE frees the resources associated with an open cursor. After the cursor is closed, no subsequentoperations are allowed on it. A cursor should be closed when it is no longer needed." However, these pages leavethe term "open" undefined, though it's used. It seems that it has no meaning. Rather, a cursor with a particular nameeither exists or not. You create it with "declare" and drop it with "close". And that's it. If "open" means anything,it's just another word for "exists". (The fact that "pg_cursors" doesn't have a boolean column called "open" supportsthis understanding.) The sentence "After the cursor is closed, no subsequent operations are allowed on it." is equivalentto "After a table is dropped, no subsequent operations are allowed on it." But who would bother to say that? Noticethat "pg_cursors" has a column called "creation_time" — and not "declaration time". — On the other hand, the doc page "43.7. Cursors" uses "portal" a lot—and never says that it means exactly the same as "cursor"qua term of art (and not qua keyword). It does say "...a so-called portal containing the active query for the cursor."This suggests a subtle difference in meaning between "portal" and "cursor" and a notion of containment. I can't makeany sense of that. It says things like "Before a cursor can be used to retrieve rows, it must be opened. (This is theequivalent action to the SQL command DECLARE CURSOR.)" This is the closest that it comes to saying that the SQL API andthe PL/pgSQL API both manipulate the same thing—what you see in "pg_cursors". The sentence that I quoted is equivalentto saying "Before you can insert a row into a table, the table has to exist." In other words, an unhelpful tautology.I believe that the sense is this: « A variable with the data type "refcursor" holds a bare name (which is governedby the usual rules for a SQL name). The name might be found in "pg_cursors" or it might not be. When, and only when,it is found in "pg_cursors", the refcursor variable acts as a handle to the denoted cursor and supports operations uponit using various PL/pgSQL statements that use the identifier for the refcursor variable's name. — I (but maybe only I) would have appreciated being able to read a single generic account that explained the underlying concepts.This would have prepared me for understanding the operations that the SQL and PL/pgSQL APIs expose—and especiallythat they are interoperable. So I'd've liked to see a note at the start of the four relevant sections ("43.7. Cursors"and the "declare", "fetch", and "close" SQL statements) that x-ref'd to the generic account and said "read this first". Here's some more detail of how I'd state the mental model that I've deduced. Please tell me if you think that some, or all,of my account is wrong. (When I say "cursor", I always mean what's listed in "pg_cursors". And I'll never mention "portal"because the term seems to means exactly the same as "cursor".) (1) A cursor must have a defining "select" statement. It also always has a pointer to the next-to-be-fetched row in the resultset that the "select" defines. The rows are (implicitly) numbered from 1 through N where N is the number of rows thatthe cursor's "select" defines. However (as you can see from "fetch :x" in SQL, where :x is less than 1 or more than N)the pointer can point outside of the result set and not cause an error. (2) A cursor defines a read-consistent snapshot, as of its "pg_cursors.creation_time". (From the "declare" doc, « In PostgreSQL,all cursors are insensitive. ») The complete set of rows that the "select" defines may not all be concurrentlymaterialized in the cursor. This implies some kind of aging out and replacement implementation. The details aren'tdescribed because they have no semantic significance. (3) In top-level SQL, you create a cursor with the "declare" statement. This lets you name it, specify its “select”, andspecify a few other boolean attributes like "[ no ] scroll" and "{ with | without } hold". (4) In PL/pgSQL, you create a cursor with "open". The operand is the identifier for the refcursor variable that holds thecursor's name. You can test your mental model by using the equivalent SQL statements with the "execute" PL/SQL statement. (5) In top-level SQL, you drop a cursor with "close" where the operand is the identifier for the cursor's name. In PL/pgSQL,you drop a cursor with "close" where the operand is the identifier for the refcursor variable that holds the cursor'sname. (6) In PL/pgSQL, the value of a refcursor variable is an ordinary "text" value. It might be null. It you assign the nameof a cursor that's listed in "pg_cursors" to a refcursor variable, then you can fetch from it or close it. And as longas the name isn't currently found in "pg_cursors", you can create a new row with that name with the "open" statement,specifying any "select" that you want. (7) I found the terms "bound cursor" and "unbound cursor" (as in the section "43.7.2.3. Opening A Bound Cursor") initiallyvery confusing because the wording connotes a property of a cursor—and "pg_cursors" has no column for such a notion.But I presently came to understand that this was a careless shorthand for "[un]bound cursor variable" — which phrasesare also used on the same page. (8) I found it initially hard to understand that the "bound" property of a refcursor variable is not part of its value. (Andnor, for that matter, is the SQL statement that you specify with the "open" statement.) I reasoned, eventually, thatthe "bound" property must be an annotation of the variable in the AST for the block statement where the variable is declared.(Here, "declare" is used in the PL/pgSQL sense, and not the SQL sense, of the term). This explains why, when a functionreturns a refcursor value where the variable was declared as "bound", it can only be seen as "unbound" in a subprogramthat has a refcursor formal argument. The same reasoning applies if you assign a bound refcursor variable to anunbound refcursor variable. (But I can't see that you'd have a reason to do that unless, like I was, you were testing yourmental model.) It's the fact that the value that a refcursor variable holds is nothing other than the text of a (potential) cursor's name(and that the SQL text and "bound" status are represented elsewhere) that lead me to write « without anything like OraclePL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls ». WhenI wrote that, I thought, wrongly as I now see, that a refcursor variable held a composite, opaque value (or an opaquepointer to such) like it does in Oracle. (9) The upshot of #8 is that the "FOR recordvar IN bound_cursorvar" construct can be used only in the block statement thatdeclares the bound cursor variable. And this seems to defeat the point. You may just as well use an ordinary "for" loopthat has the SQL statement right after the "in" keyword. (10) I discovered that this construct: for ... in select ... from pg_cursors order by name loop ... end loop; sees a cursor with an automatically generated name like "<unnamed portal N>" for the loop itself. I suppose that this makesgood sense. But it does seem to undermine the value of declaring and using a bound cursor variable—esp as the nominalvalue of the "cursor" concept is the scrollability and the ability to fetch a smallish set of rows from anywhere ina huge result set.
On 3/20/23 1:46 PM, Bryn Llewellyn wrote: >> adrian.klaver@aklaver.com wrote: >> >>> bryn@yugabyte.com wrote: >>> >>>> adrian.klaver@aklaver.com wrote: >>>> >>>> I have a hard time fathoming why someone who writes documentation does not actually read documentation. >>> >>> >>> Ouch. In fact, I had read the whole of the "43.7. Cursors" section in the "PL/pgSQL" chapter. And the sections in the"SQL Commands" chapter for "declare", "fetch" and "close". But several of the key concepts didn't sink in and this preventedme not only from understanding what some of the examples showed but, worse, from being able to use the right vocabularyto express what confused me. >> >> Given this from your original question: >> >> « Anyway, without anything like Oracle PL/SQL's packages, you have no mechanism to hold the opened cursor variable betweensuccessive server calls. » >> >> What part of this [a particular code example] did not make sense in that context? > > First off, despite the fact that I've clearly annoyed you (for which I apologize), I have found these exchanges very helpful.So thank you very much. > > Your questions can be summarized as "Why couldn't you understand the doc? And why did you mix questions about use-caseswith questions about the mechanics?" The answer has to do with psychology. I probably can't explain this convincingly.That's why it's taken me a long time to respond. I also had to do lots of testing before responding to makesure that the mental model that I've formed for myself is consistent with these. > > You may not be interested in what follows. But, anyway, here goes. > > — I happen to have many years of experience with Oracle Database and PL/SQL. The latter has analogous features to PL/pgSQL's"refcursor". But the differences between the notions in the two environments are enormous. My attempt to understandthe latter was hindered by my understanding of the former. I accept that this is *my* problem and that I couldnever expect that the PG doc would cater for such a reader. > > — Oracle Database doesn't expose scrollability for PL/SQL's equivalent of "refcursor". So I had never come across use caseswhere this was beneficial. I wanted, therefore, to hear about some. I thought that insights here would help me understandthe mechanics. But I didn't get anything beyond "Scrollability is what it is. If you don't need it, don't use it." > > Anyway, never mind all that now. Here's what I now (think that) I understand—with some comments on what made it hard forme to grasp. > > — The key notion is what is referred to sometimes as "portal" and sometimes as "cursor". This is the thing that's globalwithin, and private to, a session, that's uniquely identified by a bare name, that, and that's listed in "pg_cursors".I believe that in typical use, a cursor has only transaction duration. But (and only when you use the SQL API)you can create a cursor with (up to) session duration > > — The doc pages for the "declare", "fetch", and "close" SQL statements don't mention "portal" and use only "cursor". Theyuse the term to mean the underlying phenomenon and use wording like: "DECLARE allows a user to create cursors"; "Youcan see all available cursors by querying the pg_cursors system view"; "FETCH retrieves rows using a previously-createdcursor"; "CLOSE frees the resources associated with an open cursor. After the cursor is closed, no subsequentoperations are allowed on it. A cursor should be closed when it is no longer needed." However, these pages leavethe term "open" undefined, though it's used. It seems that it has no meaning. Rather, a cursor with a particular nameeither exists or not. You create it with "declare" and drop it with "close". And that's it. If "open" means anything,it's just another word for "exists". (The fact that "pg_cursors" doesn't have a boolean column called "open" supportsthis understanding.) The sentence "After the cursor is closed, no subsequent operations are allowed on it." is equivalentto "After a table is dropped, no subsequent operations are allowed on it." But who would bother to say that? Noticethat "pg_cursors" has a column called "creation_time" — and not "declaration time". > > — On the other hand, the doc page "43.7. Cursors" uses "portal" a lot—and never says that it means exactly the same as"cursor" qua term of art (and not qua keyword). It does say "...a so-called portal containing the active query for thecursor." This suggests a subtle difference in meaning between "portal" and "cursor" and a notion of containment. I can'tmake any sense of that. It says things like "Before a cursor can be used to retrieve rows, it must be opened. (Thisis the equivalent action to the SQL command DECLARE CURSOR.)" This is the closest that it comes to saying that the SQLAPI and the PL/pgSQL API both manipulate the same thing—what you see in "pg_cursors". The sentence that I quoted is equivalentto saying "Before you can insert a row into a table, the table has to exist." In other words, an unhelpful tautology.I believe that the sense is this: « A variable with the data type "refcursor" holds a bare name (which is governedby the usual rules for a SQL name). The name might be found in "pg_cursors" or it might not be. When, and only when,it is found in "pg_cursors", the refcursor variable acts as a handle to the denoted cursor and supports operations uponit using various PL/pgSQL statements that use the identifier for the refcursor variable's name. > > — I (but maybe only I) would have appreciated being able to read a single generic account that explained the underlyingconcepts. This would have prepared me for understanding the operations that the SQL and PL/pgSQL APIs expose—andespecially that they are interoperable. So I'd've liked to see a note at the start of the four relevant sections("43.7. Cursors" and the "declare", "fetch", and "close" SQL statements) that x-ref'd to the generic account andsaid "read this first". > > Here's some more detail of how I'd state the mental model that I've deduced. Please tell me if you think that some, orall, of my account is wrong. (When I say "cursor", I always mean what's listed in "pg_cursors". And I'll never mention"portal" because the term seems to means exactly the same as "cursor".) > > (1) A cursor must have a defining "select" statement. It also always has a pointer to the next-to-be-fetched row in theresult set that the "select" defines. The rows are (implicitly) numbered from 1 through N where N is the number of rowsthat the cursor's "select" defines. However (as you can see from "fetch :x" in SQL, where :x is less than 1 or more thanN) the pointer can point outside of the result set and not cause an error. > > (2) A cursor defines a read-consistent snapshot, as of its "pg_cursors.creation_time". (From the "declare" doc, « In PostgreSQL,all cursors are insensitive. ») The complete set of rows that the "select" defines may not all be concurrentlymaterialized in the cursor. This implies some kind of aging out and replacement implementation. The details aren'tdescribed because they have no semantic significance. > > (3) In top-level SQL, you create a cursor with the "declare" statement. This lets you name it, specify its “select”, andspecify a few other boolean attributes like "[ no ] scroll" and "{ with | without } hold". > > (4) In PL/pgSQL, you create a cursor with "open". The operand is the identifier for the refcursor variable that holds thecursor's name. You can test your mental model by using the equivalent SQL statements with the "execute" PL/SQL statement. > > (5) In top-level SQL, you drop a cursor with "close" where the operand is the identifier for the cursor's name. In PL/pgSQL,you drop a cursor with "close" where the operand is the identifier for the refcursor variable that holds the cursor'sname. > > (6) In PL/pgSQL, the value of a refcursor variable is an ordinary "text" value. It might be null. It you assign the nameof a cursor that's listed in "pg_cursors" to a refcursor variable, then you can fetch from it or close it. And as longas the name isn't currently found in "pg_cursors", you can create a new row with that name with the "open" statement,specifying any "select" that you want. > > (7) I found the terms "bound cursor" and "unbound cursor" (as in the section "43.7.2.3. Opening A Bound Cursor") initiallyvery confusing because the wording connotes a property of a cursor—and "pg_cursors" has no column for such a notion.But I presently came to understand that this was a careless shorthand for "[un]bound cursor variable" — which phrasesare also used on the same page. > > (8) I found it initially hard to understand that the "bound" property of a refcursor variable is not part of its value.(And nor, for that matter, is the SQL statement that you specify with the "open" statement.) I reasoned, eventually,that the "bound" property must be an annotation of the variable in the AST for the block statement where the variableis declared. (Here, "declare" is used in the PL/pgSQL sense, and not the SQL sense, of the term). This explains why,when a function returns a refcursor value where the variable was declared as "bound", it can only be seen as "unbound"in a subprogram that has a refcursor formal argument. The same reasoning applies if you assign a bound refcursorvariable to an unbound refcursor variable. (But I can't see that you'd have a reason to do that unless, like I was,you were testing your mental model.) > > It's the fact that the value that a refcursor variable holds is nothing other than the text of a (potential) cursor's name(and that the SQL text and "bound" status are represented elsewhere) that lead me to write « without anything like OraclePL/SQL's packages, you have no mechanism to hold the opened cursor variable between successive server calls ». WhenI wrote that, I thought, wrongly as I now see, that a refcursor variable held a composite, opaque value (or an opaquepointer to such) like it does in Oracle. > > (9) The upshot of #8 is that the "FOR recordvar IN bound_cursorvar" construct can be used only in the block statement thatdeclares the bound cursor variable. And this seems to defeat the point. You may just as well use an ordinary "for" loopthat has the SQL statement right after the "in" keyword. > > (10) I discovered that this construct: > > for ... in select ... from pg_cursors order by name loop > ... > end loop; > > sees a cursor with an automatically generated name like "<unnamed portal N>" for the loop itself. I suppose that this makesgood sense. But it does seem to undermine the value of declaring and using a bound cursor variable—esp as the nominalvalue of the "cursor" concept is the scrollability and the ability to fetch a smallish set of rows from anywhere ina huge result set. Pretty much all of the above can be explained by: https://www.postgresql.org/docs/current/sql-declare.html "Note This page describes usage of cursors at the SQL command level. If you are trying to use cursors inside a PL/pgSQL function, the rules are different —" "The SQL standard only makes provisions for cursors in embedded SQL. The PostgreSQL server does not implement an OPEN statement for cursors; a cursor is considered to be open when it is declared. However, ECPG, the embedded SQL preprocessor for PostgreSQL, supports the standard SQL cursor conventions, including those involving DECLARE and OPEN statements." https://www.postgresql.org/docs/current/plpgsql-cursors.html "Rather than executing a whole query at once, it is possible to set up a cursor that encapsulates the query, and then read the query result a few rows at a time. One reason for doing this is to avoid memory overrun when the result contains a large number of rows. (However, PL/pgSQL users do not normally need to worry about that, since FOR loops automatically use a cursor internally to avoid memory problems.) A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions." As to portal, entering it in the documentation search leads to a first result of: https://www.postgresql.org/docs/current/protocol-flow.html Do a page search for portal. -- Adrian Klaver adrian.klaver@aklaver.com
I may have been one of the respondents who showed some annoyance, and I am sorry for that. I understand that you ask questions to gain deeper understanding. On Mon, 2023-03-20 at 13:46 -0700, Bryn Llewellyn wrote: > Oracle Database doesn't expose scrollability for PL/SQL's equivalent of "refcursor". > So I had never come across use cases where this was beneficial. I wanted, therefore, > to hear about some. I thought that insights here would help me understand the mechanics. > But I didn't get anything beyond "Scrollability is what it is. If you don't need it, > don't use it." I recently used cursor scrollability, so I can show you a use case: https://github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49 The goal is to get the query result count right away, without having to run a second query for it: you declare the cursor, move to the end of the result set, fetch the ROW_COUNT, then move back to the beginning of the result set and start fetching the result rows. About your description of the difficulties with the terms "cursor", "portal", "bound" vs. "unbound" cursors in PL/pgSQL etc: I understand your confusion, and I believe that the documentation could be improved. The way I understand it, "portal" is PostgreSQL jargon. A portal is a cursor. The documentation tries to avoid "portal" as an implementation detail. PL/pgSQL cursors and "refcursor"s are not the same as SQL cursors: they are variables that hold a cursor name. That is confusing. I personally find that reading the PostgreSQL documentation gets you far, but only so far: for deep understanding, you have to read the code. It is usually well documented and readable, and I have come to see it as an extension of the documentation that covers the details. Yours, Laurenz Albe
laurenz.albe@cybertec.at wrote:
...I understand that you ask questions to gain deeper understanding.bryn@yugabyte.com wrote:...I had never come across use cases where [scrollability] was beneficial. I wanted, therefore, to hear about some. I thought that insights here would help me understand the mechanics.
I recently used cursor scrollability, so I can show you a use case:github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49
The goal is to get the query result count right away, without having to run a second query for it: you declare the cursor, move to the end of the result set, fetch the ROW_COUNT, then move back to the beginning of the result set and start fetching the result rows.
...I personally find that reading the PostgreSQL documentation gets you far, but only so far: for deep understanding, you have to read the code. It is usually well
documented and readable, and I have come to see it as an extension of the documentation that covers the details.
Thanks for the link to your SQL file at the line where you get the row count in the way that you describe. I saw that this is in the PL/pgSQL source text for function "materialize_foreign_table()" (~200 lines). And I saw that you use the cursor mechanism that we're discussing here in only one other function, "db_migrate_refresh()" (~480 lines). But this second one doesn't use the "move forward all" device to get a row count. I looked only at these two functions.
I noted that neither of these functions has a refcursor formal argument and that, rather, you open (i.e. create) each of the three cursors that you use within the two functions that uses them. I noted, too, that for the three "select" statements that you use to open your refcursors, none of these has an "order by". I noted that your code implements "create table destination" and "insert into destination... select from source..." where order doesn't matter.
However, source code famously reveals only what it does and not what the author's intention, and overall design philosophy, is. I looked at the README accounts for these two functions here:
But these accounts say nothing about the design of their implementations. The accounts, and the bigger picture, seem to imply that read consistency in the presence of concurrent activity from other sessions is not a concern. I'm guessing that this is accounted for upstream from how the code that I'm looking at operates—i.e. that the source database is extracted to staging tables like, say, export does so that your code operates as the only session that reads from, creates, and populates the tables that it references.
The upshot, therefore, is that I'm afraid that I can only guess at why you use "open, fetch, close" on a refcursor rather than an ordinary cursor for loop. After all, you use the technique only to traverse metadata tables about partitions, subpartitions, and columns. I'd be astonished if such tables have humongous numbers of rows (like hundreds of millions).
As a sanity test, I did this:
create function s.t_count()
returns int
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
cur refcursor := 'cur';
n int not null := 0;
begin
open cur for select v from s.t;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;
return n;
end;
$body$;
returns int
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
cur refcursor := 'cur';
n int not null := 0;
begin
open cur for select v from s.t;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;
return n;
end;
$body$;
Then I timed these two alternatives on a test table with ten million rows.
select count(*) from s.t;
select s.t_count();
select s.t_count();
They both got the same answer. But my function took about twice as long as the native count(*). Mind you, with "only" 10 million rows (and using a PG cluster in a VM on my Mac Book) the times were tiny: ~600ms versus ~300ms.
Then I tried this:
create procedure s.p(tab in text, mode in text)
set search_path = pg_catalog, pg_tempsecurity definer
language plpgsql
as $body$
declare
cur constant refcursor not null := 'cur';
stmt constant text not null := format('select k, v from s.%I', tab);
cnt_stmt constant text not null := format('select count(*) from s.%I', tab);
kk int not null := 0;
vv int not null := 0;
k int;
v int;
n int not null := 0;
cnt int not null := 0;
begin
case mode
when 'naive' then
execute cnt_stmt into cnt;
for kk, vv in execute stmt loop
n := n + 1;
assert kk = n and vv = n*2;
end loop;
assert n = cnt;
when 'refcursor' then
open cur for execute stmt;
move forward all in cur;
get diagnostics cnt = row_count;
move absolute 0 in cur;
loop
fetch cur into k, v;
exit when not found;
n := n + 1;
assert k = n and v = n*2;
end loop;
close cur; -- Just as a formality.
assert n = cnt;
end case;
end;
$body$
$body$
I created my 10 million row table like this:
with g(v) as (select generate_series(1, 10*1000*1000))
insert into s.t(k, v) select g.v, (g.v)*2 from g;
insert into s.t(k, v) select g.v, (g.v)*2 from g;
And I timed each mode a few times to get a reasonable average. The naive mode took ~7.5 sec; and the refcursor mode took ~13 sec. I don't s'pose that the difference matters much (and anyway, in your use case, all those "create table" and "insert select" statements would dominate the times). But it seems safe to say that the refcursor mode doesn't _help_ performance. Moreover, because the naive mode uses the cursor mechanism under the covers, the memory consumption discussion is the same for both approaches.
I appreciate that, in the face of concurrent sessions making changes to the content of "s.t", and using the default "read committed" isolation level, the before-the-fact separate count(*) might disagree with the after-the-fact value from counting the rows that are traversed. And you need the count at the start of the loop so that you can create the right number of (sub)partitions. So I do see (because other considerations, including code complexity, are so similar) that the refcursor mode is more stylish. Maybe that's it.
Certainly, it was very useful to think about this and to try the tests that I did. So thank you very much for showing me your code.
————————————————————
Finally, I see how an understanding of internals helps the understanding of performance-related question. But I find it very hard to accept that I should read the C implementation of PostgreSQL in order to get the proper mental model, and the proper terms of art, that I need to understand it semantics (and the corresponding SQL and PL/pgSQL syntax). Having said that, I did a little test an saw that this:
move last in cur;
get diagnostics n = row_count;reported just 1—in contrast to what you used:
move forward all in cur;
I've no idea how it's possible to navigate to the last result in the set without knowing how many there are. Maybe that fact is there internally—but with no explicit SQL or PL/pgSQL exposure of the value.
On Wed, 2023-03-22 at 12:09 -0700, Bryn Llewellyn wrote: > > laurenz.albe@cybertec.at wrote: > > I recently used cursor scrollability, so I can show you a use case: > > > > github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49 > > However, source code famously reveals only what it does and not what the author's intention, > and overall design philosophy, is. I looked at the README accounts for these two functions here: > > github.com/cybertec-postgresql/db_migrator#materialize_foreign_table > github.com/cybertec-postgresql/db_migrator#db_migrate_refresh > > But these accounts say nothing about the design of their implementations. The documentation describes the purpose of the function and how to use it, not how I went about implementing it. You are interpreting too much. I just wanted to show you a case where scrollable cursors can be useful. > The upshot, therefore, is that I'm afraid that I can only guess at why you use "open, fetch, close" > on a refcursor rather than an ordinary cursor for loop. After all, you use the technique only > to traverse metadata tables about partitions, subpartitions, and columns. I'd be astonished > if such tables have humongous numbers of rows (like hundreds of millions). No, I don't expect large result sets, and concurrency won't be a problem. I explained why I used scrollable cursors: to save myself an extra query for the total result set count. Here is another account of how cursors can be useful: https://www.cybertec-postgresql.com/en/pagination-problem-total-result-count/ Yours, Laurenz Albe
On 3/22/23 12:09, Bryn Llewellyn wrote: >> laurenz.albe@cybertec.at <mailto:laurenz.albe@cybertec.at> wrote: >> >> ...I understand that you ask questions to gain deeper understanding. >> >>> bryn@yugabyte.com <mailto:bryn@yugabyte.com> wrote: >>> >>> ...I had never come across use cases where [scrollability] was >>> beneficial. I wanted, therefore, to hear about some. I thought that >>> insights here would help me understand the mechanics. >> >> I recently used cursor scrollability, so I can show you a use case: >> >> github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49 <http://github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49> >> > Thanks for the link to your SQL file at the line where you get the row > count in the way that you describe. I saw that this is in the PL/pgSQL > source text for function "materialize_foreign_table()" (~200 lines). And > I saw that you use the cursor mechanism that we're discussing here in > only one other function, "db_migrate_refresh()" (~480 lines). But this > second one doesn't use the "move forward all" device to get a row count. > I looked only at these two functions. > > I noted that neither of these functions has a refcursor formal argument > and that, rather, you open (i.e. create) each of the three cursors that > you use within the two functions that uses them. I noted, too, that for > the three "select" statements that you use to open your refcursors, none > of these has an "order by". I noted that your code implements "create > table destination" and "insert into destination... select > from source..." where order doesn't matter. CREATE FUNCTION materialize_foreign_table( schema name, table_name name, with_data boolean DEFAULT TRUE, pgstage_schema name DEFAULT NAME 'pgsql_stage' ) RETURNS boolean LANGUAGE plpgsql VOLATILE STRICT SET search_path = pg_catalog AS $$DECLARE ft name; stmt text; errmsg text; detail text; cur_partitions refcursor; cur_subpartitions refcursor; ... CREATE FUNCTION db_migrate_refresh( plugin name, staging_schema name DEFAULT NAME 'fdw_stage', pgstage_schema name DEFAULT NAME 'pgsql_stage', only_schemas name[] DEFAULT NULL ) RETURNS integer LANGUAGE plpgsql VOLATILE CALLED ON NULL INPUT SET search_path = pg_catalog AS $$DECLARE extschema text; old_msglevel text; v_plugin_schema text; v_create_metadata_views regproc; v_translate_datatype regproc; v_translate_identifier regproc; v_translate_expression regproc; c_col refcursor ... What is not formal about the above? Though it does not matter as, back to the docs again: https://www.postgresql.org/docs/current/plpgsql-cursors.html "All access to cursors in PL/pgSQL goes through cursor variables, which are always of the special data type refcursor. One way to create a cursor variable is just to declare it as a variable of type refcursor. Another way is to use the cursor declaration syntax, which in general is: name [ [ NO ] SCROLL ] CURSOR [ ( arguments ) ] FOR query; " Though maybe you are trying to differentiate between bound and unbound refcursor's, where the ones defined in the functions are unbound. Also order by is not relevant for getting a count. > Finally, I see how an understanding of internals helps the understanding > of performance-related question. But I find it very hard to accept that > I should read the C implementation of PostgreSQL in order to get the > proper mental model, and the proper terms of art, that I need to > understand it semantics (and the corresponding SQL and PL/pgSQL syntax). > Having said that, I did a little test an saw that this: > > move last in cur; > get diagnostics n = row_count; > > reported just 1—in contrast to what you used: > > move forward all in cur; Because they are different actions. The first directly moves to the last row in the cursor and counts just that row The second moves through all the rows left in the cursor and counts all the rows it traversed. It is spelled out here: https://www.postgresql.org/docs/current/sql-move.html "The count is the number of rows that a FETCH command with the same parameters would have returned (possibly zero)." > > I've no idea how it's possible to navigate to the last result in the set > without knowing how many there are. Maybe that fact is there > internally—but with no explicit SQL or PL/pgSQL exposure of the value. > > > > -- Adrian Klaver adrian.klaver@aklaver.com
laurenz.albe@cybertec.at wrote:bryn@yugabyte.com wrote:I recently used cursor scrollability, so I can show you a use case:
github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49
However, source code famously reveals only what it does and not what the author's intention,
and overall design philosophy, is. I looked at the README accounts for these two functions here:
github.com/cybertec-postgresql/db_migrator#materialize_foreign_table
github.com/cybertec-postgresql/db_migrator#db_migrate_refresh
But these accounts say nothing about the design of their implementations.
The documentation describes the purpose of the function and how to use it, not how I went about implementing it. You are interpreting too much. I just wanted to show you a case where scrollable cursors can be useful.
The upshot, therefore, is that I'm afraid that I can only guess at why you use "open, fetch, close" on a refcursor rather than an ordinary cursor for loop. After all, you use the technique only to traverse metadata tables about partitions, subpartitions, and columns. I'd be astonished if such tables have humongous numbers of rows (like hundreds of millions).
No, I don't expect large result sets, and concurrency won't be a problem. I explained why I used scrollable cursors: to save myself an extra query for the total result set count.
It's only natural to want to understand the design philosophy and things like data volumes and possible concurrency concerns when it's recommended that I look at the 200 line source code for a function. I wanted only to assure you that I'd done my due diligence to get some answers. Thanks for confirming my guess about the data volumes and concurrency concerns. Right, I do see that the approach that you used very naturally expresses what needs to be done.
— Create a txn-duration "pg_cursors" row for the outer "select" of interest.
— Traverse it to get the row count, using "move" so's not to see the rows.
— Fetch the first row and do something special that needs the row count as well as the row.
— Loop over the remaining rows and do more pre-row actions.
— Similar logic in an inner loop for a different "select".
— Close the cursors explicitly.
It seems to me (given what you promised me about data volumes and concurrency) that using explicit "select count(*)" with ordinary loops that have the "select" statements in the "for" header (and that would use the cursor mechanisms under the hood) would work correctly and without harming performance. But I'll agree that such code might seem to be less expressive of its purpose then your approach.
Here is another account of how cursors can be useful:
www.cybertec-postgresql.com/en/pagination-problem-total-result-count/
Thanks! It's a very interesting and helpful read.
You wrote « a WITH HOLD cursor... fetches the complete result set at the end of the transaction and materializes it on the server. »
I checked the current doc for the 'declare' statement. It says:
«
In the current implementation, the rows represented by a held cursor are copied into a temporary file or memory area so that they remain available for subsequent transactions.
»
I tested this with "table t(k int primary key, v int not null)" with 100 Million rows. (I'm using psql's default mode for "AUTOCOMMIT" to ask for the server's behavior "as is"—so no explicit "start transaction" unless I need this. And I'm using a PostgreSQL server in a Ubuntu VM on my Mac Book with a psql client in the native macOS. First I timed a "with hold" cursor creation.
-- Took ~30 sec.
declare "With Hold" scroll cursor with hold for select k, v from t order by k;
Then I timed a "without hold" cursor creation for the same "select":
start transaction;
-- Took less than ~20 ms
declare "Without Hold" scroll cursor without hold for select k, v from t;
-- Took less than ~20 ms
declare "Without Hold" scroll cursor without hold for select k, v from t;
This certainly bears out what you and the doc said about "with hold".
W.r.t. this thread's "Subject", how do you fetch successive batches from a session duration cursor in successive requests from a browser when these go through a connection pool and the general philosophy is "check out connection, use it, release connection" so that successive browser requests will each use a different session? You refer to this with your « transaction level connection pooling won’t work with WITH HOLD cursors, since they are bound to the database connection ». (In this context, your point about showing an approximate result is well-taken.) Does anybody use interactive client-server apps these days? Or are you thinking of non-interactive report generation running on a client machine dedicated to that purpose?
W.r.t. your « With WITH HOLD cursor pagination, the whole result set is calculated in the beginning, so the result set count comes for free. », I timed this:
-- Took ~ 6.5 sec. (Doesn't speed up with repeats.)
do $body$
declare
cur constant refcursor not null := 'With Hold';
n integer not null := 0;
begin
move absolute 0 in cur;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;
raise info 'row_count: %', n;
end;
$body$;
declare
cur constant refcursor not null := 'With Hold';
n integer not null := 0;
begin
move absolute 0 in cur;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;
raise info 'row_count: %', n;
end;
$body$;
And this:
-- Took ~6 sec. Speeds up to ~3 sec with repeats.
select count(*) from s.t;
select count(*) from s.t;
I don't s'pose that what I saw about "count(*)" getting quicker with repeats matters 'cos you'd do it only once. But I remain to be convinced that there's a performance benefit to be got from "comes for free". On the other hand, I do see that (in the face of concurrent DML to the table of interest from other sessions, there's a clear semantic benefit when you get the count from a "with hold" cursor. This would matter in a use case like the migration one that you showed me where you didn't have the luxury of reading from staging tables in single-user mode.
adrian.klaver@aklaver.com wrote:Thanks for the link to your SQL file at the line where you get the row count in the way that you describe... I noted that neither of these functions has a refcursor formal argument and that, rather, you open (i.e. create) each of the three cursors that you use within the two functions that uses them.
CREATE FUNCTION materialize_foreign_table(...)
CREATE FUNCTION db_migrate_refresh(...)
What is not formal about the above?
I used the term of art "formal argument" ordinarily to denote what's listed in parentheses at the start of a subprogram definition. The term stands in contrast to "actual argument"—meaning the expression that's used in a subprogram invocation to provide a value for the corresponding formal argument. Go to this page:
and search in it for "Example: formal and actual function arguments in python". The blurb that comes just before the example says what I just said. It seems to be more common to talk about formal and actual "parameters". But the PG doc prefers "argument".
I didn't at all mean that "formal" is good and that Laurenz's code is not formal and therefore is bad! So sorry if you took it to mean this.
Though it does not matter as, back to the docs again:
www.postgresql.org/docs/current/plpgsql-cursors.html
"All access to cursors in PL/pgSQL goes through cursor variables, which are always of the special data type refcursor. One way to create a cursor variable is just to declare it as a variable of type refcursor..." Though maybe you are trying to differentiate between bound and unbound refcursor's...
Ah... you missed my point entirely. Looks like my prose was opaque. I was referring to this paradigm as described at the start of the page that you referenced:
«
A more interesting usage is to return a reference to a cursor that a function has created, allowing the caller to read the rows. This provides an efficient way to return large row sets from functions.
»
<aside> This seems to me to imply that the producer function with a "refcursor" return is "security definer" and that the consumer subprogram cannot access the tables of interest with explicit "select”. I thinks that's nice. </aside>
I needed to know that the definition and use of each "refcursor" variable, in Laurentz's code, are in the same subprogram because it's only when this is the case that using an ordinary "select count(*)" and an ordinary "for" loop, where the "select" is written in place within the "for" clause, are viable alternatives.
Also order by is not relevant for getting a count.
Yes, of course. I know that. (But it does matter in many cases of results presentation.) I was simply noting that Laurenz’s use-case didn’t need ordering and that the "select" statements will be faster than if ordering were needed. Sorry if I implied something that I didn’t mean too. (The example in Laurenz's post, www.cybertec-postgresql.com/en/pagination-problem-total-result-count/, does use "order by".)
I did a little test an saw that this:move last in cur;
get diagnostics n = row_count;
reported just 1—in contrast to what you used:move forward all in cur;
Because they are different actions. The first directly moves to the last row in the cursor and counts just that row. The second moves through all the rows left in the cursor and counts all the rows it traversed. It is spelled out here:
https://www.postgresql.org/docs/current/sql-move.html
"The count is the number of rows that a FETCH command with the same parameters would have returned (possibly zero)."
Yes, I'm afraid that I expressed myself poorly again. I meant to say that while I know nothing about the internals, I can't see how "move last in cur" can know where the last row is unless it traverses the entire result set from its current position until it finds no more rows. And while its doing this, it may just as well count the rows it traverses. It would seem, therefore, that a single operation that moves to where you asked for and that tells you how many rows it traversed would be enough—rather than two that both achieve the same effect in the same time where one gives you the count of rows traversed and the other doesn't. But never mind. The simple way to see it is that the operations simply do what they do—and I won't worry about how they might have been conceived differently.
I did some timing tests. I’ve copied the results and the code at the end—just for completeness. Here’s what I concluded.
(1) (and this is unremarkable), if you can find a way to meet your requirement with just a single SQL statement, then this is bound to be best. I chose a simple requirement: read from a table and insert half the total number of rows into one table and the remainder into another. This single SQL statement meets that requirement:
with
chunk(n) as (select count(*)/2 from s.t0),
i1 as (insert into s.t1(k, v) select k, v from s.t0 where k < (select n from chunk) returning k)
insert into s.t2(k, v) select k, v from s.t0 where k >= (select n from chunk);
chunk(n) as (select count(*)/2 from s.t0),
i1 as (insert into s.t1(k, v) select k, v from s.t0 where k < (select n from chunk) returning k)
insert into s.t2(k, v) select k, v from s.t0 where k >= (select n from chunk);
Internet search shows that this CTE approach seems to be the popular PG pattern to achieve what Oracle’s dedicated multi-table "insert" achieves. On the assumption that the source table suffers constant concurrent changes from other sessions, the PL/pgSQL loop that meets this requirement _must_ use the cursor-based approach for getting the count that we’ve been discussing. I’m glad to have learned all this because some requirements (e.g. when the source table traversal has to do DDLs) can’t be met with a single SQL statement, and so a PL/pgSQL loop must be used.
(2) Even though the declaration (in a txn that you commit) materializes the entire result set, this manages to be a lot faster than doing this yourself with a temp table like this:
with
c(n) as (
select count(*) from s.t),
i1(n) as (
insert into pg_temp.t_count select n from c returning n)
insert into pg_temp.t(k, v, r) select k, v, row_number() over(order by k) from s.t;
create unique index temp_t_k_unq on pg_temp.t(k asc);
c(n) as (
select count(*) from s.t),
i1(n) as (
insert into pg_temp.t_count select n from c returning n)
insert into pg_temp.t(k, v, r) select k, v, row_number() over(order by k) from s.t;
create unique index temp_t_k_unq on pg_temp.t(k asc);
The index is meant to help "move". Maybe some use cases wouldn’t need it. Same goes for recording the count in its own one-row, one-column table.
(3) The times to do this:
move absolute 0 in cur;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;
and this:
move absolute 0 in cur;
move last in cur;
move absolute 0 in cur;
move last in cur;
move absolute 0 in cur;
are the same (within the limits of my measurements). But, of course, there’s no need to use the second alternative.
(4) The time to get the count using "move forward all in cur" is about the same as doing an ordinary "count(*)"—but "move forward" is a little slower. I'm therefore puzzled by Laurentz's "the result set count comes for free [with the cursor approach]" in his blog post (x-ref above). Its appeal seems to me to be limited to its rock-solid semantics.
(5) It's somewhat quicker to move around in a holdable cursor than in a non-holdable one. But, of course, you pay a price when you create the former.
Here are my timings—all in seconds and all with three digits of precision. I used PG 15.2 in a Ubuntu VM on my Mac Book. I run psql in the bare macOS and connect into the VM. On some runs, some times are higher than usual. I expect that the way the VM handles files has something to do with this. I chose a run where the timings were pretty stable to copy here.
Insert 10,000,000 rows time: 13.7
Populate temp tables time: 10.6
Native count(*) time: 0.633
0.637
0.677
0.781
Declare "Is-Holdable" time: 1.91
Is-Holdable move fwd count time: 0.703
0.700
0.700
0.700
Declare "Not-Holdable" time: 0.000
Not-Holdable move fwd count time: 0.877
0.866
0.864
0.872
Declare "Is-Holdable" time: 1.78
Is-Holdable move back & fore time: 0.691
0.695
0.709
0.708
Declare "Not-Holdable" time: 0.000
Not-Holdable move back & fore time: 0.872
0.871
0.874
0.888
Populate temp tables time: 10.6
Native count(*) time: 0.633
0.637
0.677
0.781
Declare "Is-Holdable" time: 1.91
Is-Holdable move fwd count time: 0.703
0.700
0.700
0.700
Declare "Not-Holdable" time: 0.000
Not-Holdable move fwd count time: 0.877
0.866
0.864
0.872
Declare "Is-Holdable" time: 1.78
Is-Holdable move back & fore time: 0.691
0.695
0.709
0.708
Declare "Not-Holdable" time: 0.000
Not-Holdable move back & fore time: 0.872
0.871
0.874
0.888
Here's the code. If you want to run it, just copy it into a single file and start it at the psql prompt. You need just to connect as a regular user that has "create" on the database that you use.
--------------------------------------------------------------------------------
-- Create the objects.
\c :db :u
set search_path = pg_catalog, pg_temp;
drop schema if exists s cascade;
create schema s;
create table s.t(k int primary key, v int not null);
/*
Format the elapsed time since t0 in seconds
with three digits of precision.
*/;
create function s.elapsed_time(t0 in double precision)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
e constant double
precision not null := extract(epoch from clock_timestamp()) - t0;
t constant text not null :=
case
when e >= 100000.0 then '>= 100K' ||' '
when e >= 10000.0 then to_char(round(e/100.0)*100.0, '99,999' )||' '
when e >= 1000.0 then to_char(round(e/10.0)* 10.0, '9,999' )||' '
when e >= 100.0 then to_char(e, '999' )||' '
when e >= 10.0 then to_char(e, '99.9' )||' '
when e >= 1.0 then to_char(e, '9.99' )||' '
else to_char(e, '0.999')
end;
begin
return lpad(t, 12);
end;
$body$;
create procedure s.insert_table_time(no_of_rows int, t inout text)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
n_text constant text not null := ltrim(to_char(no_of_rows, '999,999,999,999'));
caption constant text not null := rpad('Insert '||n_text||' rows time:', 35);
t0 double
precision not null := 0.0;
begin
truncate table s.t;
t0 := extract(epoch from clock_timestamp());
with g(v) as (select generate_series(1, no_of_rows))
insert into s.t(k, v) select g.v, (g.v)*2 from g;
t := caption||s.elapsed_time(t0);
end;
$body$;
create procedure s.populate_temp_tables_time(t inout text)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
caption constant text not null := rpad('Populate temp tables time:', 35);
t0 double
precision not null := 0.0;
begin
create table pg_temp.t_count(n int);
create table pg_temp.t(k int, v int, r int);
t0 := extract(epoch from clock_timestamp());
with
c(n) as (
select count(*) from s.t),
i1(n) as (
insert into pg_temp.t_count select n from c returning n)
insert into pg_temp.t(k, v, r) select k, v, row_number() over(order by k) from s.t;
create unique index temp_t_k_unq on pg_temp.t(k asc);
t := caption||s.elapsed_time(t0);
end;
$body$;
create procedure s.close_cursor(cur in refcursor)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
begin
close cur;
end;
$body$;
create procedure s.declare_cursor_time(cur_name in text, holdable in boolean, t inout text)
-- set search_path = pg_catalog, pg_temp :: Incomparible with "commit".
language plpgsql
as $body$
declare
hld constant text not null := case holdable
when true then 'with hold'
else 'without hold'
end;
slct constant text not null := 'select k, v from s.t order by k';
decl constant text not null := format('declare %I scroll cursor %s for '||slct, cur_name, hld);
caption constant text not null := rpad('Declare "'||cur_name||'" time:', 35);
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
execute decl;
if holdable then
commit;
end if;
t := caption||s.elapsed_time(t0);
end;
$body$;
create function s.native_count_time(expected_n in int, show_caption in boolean = false)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
n int not null := 0;
caption constant text not null := case show_caption
when true then rpad('Native count(*) time:', 35)
else rpad('', 35)
end;
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
n := (select count(*) from s.t);
assert n = expected_n;
return caption||s.elapsed_time(t0);
end;
$body$;
drop function if exists s.count_by_move_time(refcursor, int, boolean) cascade;
create function s.count_by_move_time(cur in refcursor, expected_n in int, show_caption in boolean = false)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
n int not null := 0;
caption constant text not null := case show_caption
when true then rpad(cur::text||' move fwd count time:', 35)
else rpad('', 35)
end;
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
move absolute 0 in cur;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;
assert n = expected_n;
return caption||s.elapsed_time(t0);
end;
$body$;
create function s.move_back_and_fore_time(cur in refcursor, show_caption in boolean = false)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
caption constant text not null := case show_caption
when true then rpad(cur::text||' move back & fore time:', 35)
else rpad('', 35)
end;
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
move absolute 0 in cur;
move last in cur;
move absolute 0 in cur;
return caption||s.elapsed_time(t0);
end;
$body$;
--------------------------------------------------------------------------------
-- Do the timimg.
-- Ten thousand rows.
\set no_of_rows 10000000
\set Is_Holdable ''''Is-Holdable''''
\set Not_Holdable ''''Not-Holdable''''
\t on
\o results.txt
call s.insert_table_time(:no_of_rows, null::text);
call s.populate_temp_tables_time(null::text);
select s.native_count_time(:no_of_rows, true);
select s.native_count_time(:no_of_rows);
select s.native_count_time(:no_of_rows);
select s.native_count_time(:no_of_rows);
call s.declare_cursor_time(:Is_Holdable, true, null::text);
select s.count_by_move_time(:Is_Holdable, :no_of_rows, true);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
start transaction;
call s.declare_cursor_time(:Not_Holdable, false, null::text);
select s.count_by_move_time(:Not_Holdable, :no_of_rows, true);
select s.count_by_move_time(:Not_Holdable, :no_of_rows);
select s.count_by_move_time(:Not_Holdable, :no_of_rows);
select s.count_by_move_time(:Not_Holdable, :no_of_rows);
rollback;
call s.close_cursor(:Is_Holdable);
call s.declare_cursor_time(:Is_Holdable, true, null::text);
select s.move_back_and_fore_time(:Is_Holdable, true);
select s.move_back_and_fore_time(:Is_Holdable);
select s.move_back_and_fore_time(:Is_Holdable);
select s.move_back_and_fore_time(:Is_Holdable);
start transaction;
call s.declare_cursor_time(:Not_Holdable, false, null::text);
select s.move_back_and_fore_time(:Not_Holdable, true);
select s.move_back_and_fore_time(:Not_Holdable);
select s.move_back_and_fore_time(:Not_Holdable);
select s.move_back_and_fore_time(:Not_Holdable);
rollback;
\o
\t off
-- Create the objects.
\c :db :u
set search_path = pg_catalog, pg_temp;
drop schema if exists s cascade;
create schema s;
create table s.t(k int primary key, v int not null);
/*
Format the elapsed time since t0 in seconds
with three digits of precision.
*/;
create function s.elapsed_time(t0 in double precision)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
e constant double
precision not null := extract(epoch from clock_timestamp()) - t0;
t constant text not null :=
case
when e >= 100000.0 then '>= 100K' ||' '
when e >= 10000.0 then to_char(round(e/100.0)*100.0, '99,999' )||' '
when e >= 1000.0 then to_char(round(e/10.0)* 10.0, '9,999' )||' '
when e >= 100.0 then to_char(e, '999' )||' '
when e >= 10.0 then to_char(e, '99.9' )||' '
when e >= 1.0 then to_char(e, '9.99' )||' '
else to_char(e, '0.999')
end;
begin
return lpad(t, 12);
end;
$body$;
create procedure s.insert_table_time(no_of_rows int, t inout text)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
n_text constant text not null := ltrim(to_char(no_of_rows, '999,999,999,999'));
caption constant text not null := rpad('Insert '||n_text||' rows time:', 35);
t0 double
precision not null := 0.0;
begin
truncate table s.t;
t0 := extract(epoch from clock_timestamp());
with g(v) as (select generate_series(1, no_of_rows))
insert into s.t(k, v) select g.v, (g.v)*2 from g;
t := caption||s.elapsed_time(t0);
end;
$body$;
create procedure s.populate_temp_tables_time(t inout text)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
caption constant text not null := rpad('Populate temp tables time:', 35);
t0 double
precision not null := 0.0;
begin
create table pg_temp.t_count(n int);
create table pg_temp.t(k int, v int, r int);
t0 := extract(epoch from clock_timestamp());
with
c(n) as (
select count(*) from s.t),
i1(n) as (
insert into pg_temp.t_count select n from c returning n)
insert into pg_temp.t(k, v, r) select k, v, row_number() over(order by k) from s.t;
create unique index temp_t_k_unq on pg_temp.t(k asc);
t := caption||s.elapsed_time(t0);
end;
$body$;
create procedure s.close_cursor(cur in refcursor)
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
begin
close cur;
end;
$body$;
create procedure s.declare_cursor_time(cur_name in text, holdable in boolean, t inout text)
-- set search_path = pg_catalog, pg_temp :: Incomparible with "commit".
language plpgsql
as $body$
declare
hld constant text not null := case holdable
when true then 'with hold'
else 'without hold'
end;
slct constant text not null := 'select k, v from s.t order by k';
decl constant text not null := format('declare %I scroll cursor %s for '||slct, cur_name, hld);
caption constant text not null := rpad('Declare "'||cur_name||'" time:', 35);
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
execute decl;
if holdable then
commit;
end if;
t := caption||s.elapsed_time(t0);
end;
$body$;
create function s.native_count_time(expected_n in int, show_caption in boolean = false)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
n int not null := 0;
caption constant text not null := case show_caption
when true then rpad('Native count(*) time:', 35)
else rpad('', 35)
end;
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
n := (select count(*) from s.t);
assert n = expected_n;
return caption||s.elapsed_time(t0);
end;
$body$;
drop function if exists s.count_by_move_time(refcursor, int, boolean) cascade;
create function s.count_by_move_time(cur in refcursor, expected_n in int, show_caption in boolean = false)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
n int not null := 0;
caption constant text not null := case show_caption
when true then rpad(cur::text||' move fwd count time:', 35)
else rpad('', 35)
end;
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
move absolute 0 in cur;
move forward all in cur;
get diagnostics n = row_count;
move absolute 0 in cur;
assert n = expected_n;
return caption||s.elapsed_time(t0);
end;
$body$;
create function s.move_back_and_fore_time(cur in refcursor, show_caption in boolean = false)
returns text
set search_path = pg_catalog, pg_temp
language plpgsql
as $body$
declare
caption constant text not null := case show_caption
when true then rpad(cur::text||' move back & fore time:', 35)
else rpad('', 35)
end;
t0 constant double
precision not null := extract(epoch from clock_timestamp());
begin
move absolute 0 in cur;
move last in cur;
move absolute 0 in cur;
return caption||s.elapsed_time(t0);
end;
$body$;
--------------------------------------------------------------------------------
-- Do the timimg.
-- Ten thousand rows.
\set no_of_rows 10000000
\set Is_Holdable ''''Is-Holdable''''
\set Not_Holdable ''''Not-Holdable''''
\t on
\o results.txt
call s.insert_table_time(:no_of_rows, null::text);
call s.populate_temp_tables_time(null::text);
select s.native_count_time(:no_of_rows, true);
select s.native_count_time(:no_of_rows);
select s.native_count_time(:no_of_rows);
select s.native_count_time(:no_of_rows);
call s.declare_cursor_time(:Is_Holdable, true, null::text);
select s.count_by_move_time(:Is_Holdable, :no_of_rows, true);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
select s.count_by_move_time(:Is_Holdable, :no_of_rows);
start transaction;
call s.declare_cursor_time(:Not_Holdable, false, null::text);
select s.count_by_move_time(:Not_Holdable, :no_of_rows, true);
select s.count_by_move_time(:Not_Holdable, :no_of_rows);
select s.count_by_move_time(:Not_Holdable, :no_of_rows);
select s.count_by_move_time(:Not_Holdable, :no_of_rows);
rollback;
call s.close_cursor(:Is_Holdable);
call s.declare_cursor_time(:Is_Holdable, true, null::text);
select s.move_back_and_fore_time(:Is_Holdable, true);
select s.move_back_and_fore_time(:Is_Holdable);
select s.move_back_and_fore_time(:Is_Holdable);
select s.move_back_and_fore_time(:Is_Holdable);
start transaction;
call s.declare_cursor_time(:Not_Holdable, false, null::text);
select s.move_back_and_fore_time(:Not_Holdable, true);
select s.move_back_and_fore_time(:Not_Holdable);
select s.move_back_and_fore_time(:Not_Holdable);
select s.move_back_and_fore_time(:Not_Holdable);
rollback;
\o
\t off