Thread: get row count from a "cursor resultset"

get row count from a "cursor resultset"

From
Maik Wiege
Date:
Hello!
I'm querying my database and sometimes get a very big resultset back, so
I needed to use a resultset with a cursor to avoid a OutOfMemoryError.

Now, because I have to use a TYPE_FORWARD_ONLY resultset for that, I can
not use the result.last() function anymore to retrieve the amount of
rows the query returns.

The query is a little bit complicated and that for can take some time,
so I don't want to do the "SELECT COUNT(*) FROM (SELECT...)" afterward,
what would result in doing the query twice and there for would double
the time needed.

I didn't find a solution for this, but there must be, because the
pgAdmin III - tool does this. After starting the query in the SQL-Dialog
it asks wether to retrieve just the first 100 results or all results
displaying the corect amount of found rows. Or is this just not possible
with jdbc.

Thanks for any help

   Maik

Re: get row count from a "cursor resultset"

From
Oliver Jowett
Date:
Maik Wiege wrote:
> Hello!
> I'm querying my database and sometimes get a very big resultset back, so
> I needed to use a resultset with a cursor to avoid a OutOfMemoryError.
>
> Now, because I have to use a TYPE_FORWARD_ONLY resultset for that, I can
> not use the result.last() function anymore to retrieve the amount of
> rows the query returns.
>
> The query is a little bit complicated and that for can take some time,
> so I don't want to do the "SELECT COUNT(*) FROM (SELECT...)" afterward,
> what would result in doing the query twice and there for would double
> the time needed.

Have you tried using COUNT directly in your original query (not as a
separate query)? I wouldn't expect that to affect total execution time
much if at all; the main difference will be that the full query has to
complete before you get any results, while without the COUNT you might
start getting results immediately.

There's not really any other option that the current driver code will do
for you. You could DECLARE CURSOR / MOVE / FETCH yourself perhaps; I'd
expect this to have similar startup characteristics to the COUNT(*) case
though. Even if the driver supported cursor-based random-access
resultsets, it'd be using MOVE / FETCH under the covers in the same way..

> I didn't find a solution for this, but there must be, because the pgAdmin III - tool does this. After starting the
queryin the SQL-Dialog it asks wether to retrieve just the first 100 results or all results displaying the corect
amountof found rows. Or is this just not possible with jdbc. 

I'd point out that this does not require knowing exactly how many rows
there are -- only that there are more than 100. You can do the
equivalent in JDBC easily by setting fetchsize to 100 or so and calling
next() 100 times.. if you haven't hit the end of the resultset by that
point, it has more than 100 rows.

-O

Re: get row count from a "cursor resultset"

From
Oliver Jowett
Date:
Oliver Jowett wrote:
> Maik Wiege wrote:
>
>>I didn't find a solution for this, but there must be, because the pgAdmin III - tool does this. After starting the
queryin the SQL-Dialog it asks wether to retrieve just the first 100 results or all results displaying the corect
amountof found rows. Or is this just not possible with jdbc. 
>
>
> I'd point out that this does not require knowing exactly how many rows
> there are -- only that there are more than 100. You can do the
> equivalent in JDBC easily by setting fetchsize to 100 or so and calling
> next() 100 times.. if you haven't hit the end of the resultset by that
> point, it has more than 100 rows.

Oh, sorry, my mistake -- I misread your description. Perhaps they are
using either COUNT(*) in the same query or MOVE to get the number of rows.

-O

Re: get row count from a "cursor resultset"

From
Oliver Jowett
Date:
Oliver Jowett wrote:
> Oliver Jowett wrote:
>
>>Maik Wiege wrote:
>>
>>
>>>I didn't find a solution for this, but there must be, because the pgAdmin III - tool does this. After starting the
queryin the SQL-Dialog it asks wether to retrieve just the first 100 results or all results displaying the corect
amountof found rows. Or is this just not possible with jdbc. 
>>
>>
>>I'd point out that this does not require knowing exactly how many rows
>>there are -- only that there are more than 100. You can do the
>>equivalent in JDBC easily by setting fetchsize to 100 or so and calling
>>next() 100 times.. if you haven't hit the end of the resultset by that
>>point, it has more than 100 rows.
>
>
> Oh, sorry, my mistake -- I misread your description. Perhaps they are
> using either COUNT(*) in the same query or MOVE to get the number of rows.

Ok, some digging in pgAdmin's svn web interface seems to indicate that
pgadmin is not actually doing anything special here -- it does the full
query via libpq's PQexec(), retrieving all the tuples (and hence can
count them) but only does the work to construct a user interface
displaying them all after that "lots of rows, retrieve anyway?" dialog
has been answered.

You could check this by looking at pgadmin's memory footprint while
retrieving a big resultset.

-O

Re: get row count from a "cursor resultset"

From
Maik Wiege
Date:
Oliver Jowett wrote:
> Ok, some digging in pgAdmin's svn web interface seems to indicate that
> pgadmin is not actually doing anything special here -- it does the full
> query via libpq's PQexec(), retrieving all the tuples (and hence can
> count them) but only does the work to construct a user interface
> displaying them all after that "lots of rows, retrieve anyway?" dialog
> has been answered.
>
> You could check this by looking at pgadmin's memory footprint while
> retrieving a big resultset.
Thank you for your answers! Yes, this explains it, it is realy using
much memory. But what can I do about my problem? Isn't this a very
common problem? Can't imagine any user querying a database and does not
want to be informed how many results he got...

Thanks for any help
   Maik

Re: get row count from a "cursor resultset"

From
Oliver Jowett
Date:
Maik Wiege wrote:

> But what can I do about my problem? Isn't this a very
> common problem? Can't imagine any user querying a database and does not
> want to be informed how many results he got...

Well, as I suggested before, use DECLARE CURSOR / MOVE / FETCH, or
include a COUNT(*) column in your original query.

-O

Re: get row count from a "cursor resultset"

From
Maik Wiege
Date:
Oliver Jowett schrieb:
> Maik Wiege wrote:
>
>
>>But what can I do about my problem? Isn't this a very
>>common problem? Can't imagine any user querying a database and does not
>>want to be informed how many results he got...
>
>
> Well, as I suggested before, use DECLARE CURSOR / MOVE / FETCH, or
> include a COUNT(*) column in your original query.
>
Hi! Could you give me an example how to do it? I can't fugure it out. If
I put the COUNT(*) in my query:
SELECT DISTINCT REGISTER.pd.pdkz, COUNT(*) FROM REGISTER.pd JOIN
REGISTER.vornamen ON REGISTER.vornamen.pdkz = REGISTER.pd.pdkz
the sql error comes up:
ERROR:  column "pd.pdkz" must appear in the GROUP BY clause or be used
in an aggregate function

Thanks for any help
   Maik

Re: get row count from a "cursor resultset"

From
Maik Wiege
Date:
Oliver Jowett schrieb:
> Try COUNT(distinct REGISTER.pd.pdkz) perhaps
Hi!
SELECT COUNT(DISTINCT REGISTER.pd.pdkz)
FROM REGISTER.pd JOIN
REGISTER.vornamen ON REGISTER.vornamen.pdkz = REGISTER.pd.pdkz

works, but with that of cource I don't get the data, just the amount of
rows, so I would have to call it another time without COUNT to get the data.

A query like that
SELECT COUNT(DISTINCT REGISTER.pd.pdkz), DISTINCT REGISTER.pd.pdkz
FROM REGISTER.pd JOIN
REGISTER.vornamen ON REGISTER.vornamen.pdkz = REGISTER.pd.pdkz
gives the same mentioned error: column "pd.pdkz" must appear in the
GROUP BY clause or be used in an aggregate function

Any ideas anyone?

Re: get row count from a "cursor resultset"

From
Dave Cramer
Date:
Maik,

Your only choice may be to do two separate queries. This isn't quite
as bad as you think, as the count will load the buffers with your query
If you need exactly the right amount, make sure you do this in a
transaction.

Dave
On 7-Jul-05, at 7:29 AM, Maik Wiege wrote:

> Oliver Jowett schrieb:
>
>> Try COUNT(distinct REGISTER.pd.pdkz) perhaps
>>
> Hi!
> SELECT COUNT(DISTINCT REGISTER.pd.pdkz)
> FROM REGISTER.pd JOIN
> REGISTER.vornamen ON REGISTER.vornamen.pdkz = REGISTER.pd.pdkz
>
> works, but with that of cource I don't get the data, just the
> amount of rows, so I would have to call it another time without
> COUNT to get the data.
>
> A query like that
> SELECT COUNT(DISTINCT REGISTER.pd.pdkz), DISTINCT REGISTER.pd.pdkz
> FROM REGISTER.pd JOIN
> REGISTER.vornamen ON REGISTER.vornamen.pdkz = REGISTER.pd.pdkz
> gives the same mentioned error: column "pd.pdkz" must appear in the
> GROUP BY clause or be used in an aggregate function
>
> Any ideas anyone?
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>               http://archives.postgresql.org
>
>