Thread: How to get the total number of rows returned by query when using a cursor?
How to get the total number of rows returned by query when using a cursor?
From
"Egor Shipovalov"
Date:
I'm implementing paging through search results using cursors. Is there a better way to know total number of rows under a cursor than running a separate COUNT(*) query? I think PostgreSQL is bound to know this number after the first FETCH, isn't it? On a side note, why queries using LIMIT are SO terribly slow, compared to cursors and sometimes even ones without LIMIT? Shouldn't LIMIT be internally implemented using cursor mechanism then? Best regards, Egor Shipovalov.
Egor Shipovalov wrote: > I'm implementing paging through search results using cursors. Is there a > better way to know total number of rows under a cursor than running a > separate COUNT(*) query? I think PostgreSQL is bound to know this number > after the first FETCH, isn't it? Using libpq, yes. You can use PQntuples. See http://developer.postgresql.org/docs/postgres/libpq-exec.html#LIBPQ-EXEC-SELECT-INFO Unfortunately this will retrieve the entire resultselt before returning the control. When libpq will have chuncking of data, this will be much more flexible, as you are expecting it now, I guess. Shridhar
Thanks. In fact, I've thought about that too, but it doesn't seem to work. I'm using Perl Pg module which, I'm guessing, is just a wrapper around libpq. This is the code I used to test it: --------------- use Pg; $conn = Pg::connectdb('user=my_user dbname=my_db'); $conn->exec('BEGIN'); $r = $conn->exec("DECLARE my_cursor CURSOR FOR SELECT nick FROM users ORDER BY nick DESC"); print $r->ntuples; # prints 0 $r = $conn->exec('FETCH 10 FROM my_cursor'); print $r->ntuples; # prints 10 -------------- Too bad we seem to be stuck with COUNT(*) for now. Are there any fundamental problems getting to that cursor count data? Judging from the lack of absolute cursor positioning it appears somewhat like so ;( Best regards, Egor Shipovalov. > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Shridhar > Daithankar > Sent: Tuesday, September 23, 2003 18:59 > To: Pgsql-General > Subject: Re: [GENERAL] How to get the total number of rows > returned by query > > > Egor Shipovalov wrote: > > > I'm implementing paging through search results using cursors. Is there a > > better way to know total number of rows under a cursor than running a > > separate COUNT(*) query? I think PostgreSQL is bound to know this number > > after the first FETCH, isn't it? > > Using libpq, yes. You can use PQntuples. See > http://developer.postgresql.org/docs/postgres/libpq-exec.html#LIBP > Q-EXEC-SELECT-INFO > > Unfortunately this will retrieve the entire resultselt before > returning the > control. When libpq will have chuncking of data, this will be much more > flexible, as you are expecting it now, I guess. > > Shridhar > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
"Egor Shipovalov" <pgsql_list@eonline.ru> writes: > I'm implementing paging through search results using cursors. Is there a > better way to know total number of rows under a cursor than running a > separate COUNT(*) query? I think PostgreSQL is bound to know this number > after the first FETCH, isn't it? Why would you think that? In general Postgres doesn't know the number of rows until it reaches the end of the query. > On a side note, why queries using LIMIT are SO terribly slow, compared to > cursors and sometimes even ones without LIMIT? Generally they're not particularly slow. Perhaps you should show us the EXPLAIN ANALYZE results for your problem case. regards, tom lane
Re: How to get the total number of rows returned by query when using a cursor?
From
"Egor Shipovalov"
Date:
> > I'm implementing paging through search results using cursors. Is there a > > better way to know total number of rows under a cursor than running a > > separate COUNT(*) query? I think PostgreSQL is bound to know this number > > after the first FETCH, isn't it? > > Why would you think that? In general Postgres doesn't know the number > of rows until it reaches the end of the query. Sorry, I thought that because my query was sorted. For unsorted queries, yes, there's usually no way to know until you reach the end. > > On a side note, why queries using LIMIT are SO terribly slow, > compared to > > cursors and sometimes even ones without LIMIT? > > Generally they're not particularly slow. Perhaps you should show us the > EXPLAIN ANALYZE results for your problem case. I did just that, but the plan was too complicated for me to comprehend ;) The interesting thing is that some queries are tens (!) of times slower with LIMIT that without. These contain a number of joins on subselects. If you're interested, I'll post examples. Best regards, Egor Shipovalov.
"Egor Shipovalov" <pgsql_list@eonline.ru> writes: > The interesting thing is that some queries are tens (!) of times slower with > LIMIT that without. These contain a number of joins on subselects. If you're > interested, I'll post examples. Sounds like the planner is misestimating costs and choosing the wrong plan. Please show examples (EXPLAIN ANALYZE for same query with and without LIMIT would be useful). regards, tom lane
Re: How to get the total number of rows returned by query when using a cursor?
From
"Egor Shipovalov"
Date:
> Sounds like the planner is misestimating costs and choosing the > wrong plan. > Please show examples (EXPLAIN ANALYZE for same query with and without > LIMIT would be useful). Here it goes. This query is 6 times slower with LIMIT 10 added (I had slower ones, but those are complex). I must say I rewrote it using subselects, thus speeding it up in general and getting rid of the problem, but this is the form I used on MySQL and it worked fast. The purpose is to find users that list users 'asta' and 'furrr' as their friends and are themselves listed as friends by the user 'chalky'. SELECT users.* FROM users, friends AS f0, friends AS f1, friends AS fo0 WHERE f0.nick = users.nick AND f1.nick = f0.nick AND fo0.friend_nick = f1.nick AND f0.friend_nick = 'asta' AND f1.friend_nick = 'furrr' AND fo0.nick = 'chalky' ; Without LIMIT: --------------------- Merge Join (cost=43404.20..45067.61 rows=1460751 width=660) (actual time=4617.53..4633.82 rows=5 loops=1) Merge Cond: ("outer".nick = "inner".nick) -> Merge Join (cost=26432.36..27412.27 rows=75288 width=38) (actual time=125.43..136.41 rows=17 loops=1) Merge Cond: ("outer".nick = "inner".friend_nick) -> Sort (cost=13216.18..13225.88 rows=3880 width=19) (actual time=115.29..115.98 rows=388 loops=1) Sort Key: f0.nick -> Index Scan using friend_nick__nick on friends f0 (cost=0.00..12984.87 rows=3880 width=19) (actual time=1.31..102.35 rows=391 loops=1) Index Cond: (friend_nick = 'asta'::character varying) -> Sort (cost=13216.18..13225.88 rows=3880 width=19) (actual time=9.76..9.94 rows=87 loops=1) Sort Key: fo0.friend_nick -> Index Scan using nick__friend_nick on friends fo0 (cost=0.00..12984.87 rows=3880 width=19) (actual time=1.42..7.18 rows=87 loops=1) Index Cond: (nick = 'chalky'::character varying) -> Materialize (cost=17089.36..17089.36 rows=3880 width=622) (actual time=4491.16..4491.82 rows=337 loops=1) -> Merge Join (cost=16971.84..17089.36 rows=3880 width=622) (actual time=4037.07..4485.64 rows=360 loops=1) Merge Cond: ("outer".nick = "inner".nick) -> Sort (cost=13216.18..13225.88 rows=3880 width=19) (actual time=133.72..135.02 rows=448 loops=1) Sort Key: f1.nick -> Index Scan using friend_nick__nick on friends f1 (cost=0.00..12984.87 rows=3880 width=19) (actual time=1.30..118.64 rows=448 loops=1) Index Cond: (friend_nick = 'furrr'::character varying) -> Sort (cost=3755.66..3780.47 rows=9923 width=603) (actual time=3902.67..4109.18 rows=9921 loops=1) Sort Key: users.nick -> Seq Scan on users (cost=0.00..1134.23 rows=9923 width=603) (actual time=14.61..1839.57 rows=9923 loops=1) Total runtime: 4643.07 msec (23 rows) --------------------- With LIMIT 10: Limit (cost=0.00..24.79 rows=10 width=660) (actual time=8380.84..23224.93 rows=5 loops=1) -> Merge Join (cost=0.00..3620915.28 rows=1460751 width=660) (actual time=8380.82..23224.88 rows=5 loops=1) Merge Cond: ("outer".nick = "inner".nick) -> Nested Loop (cost=0.00..2820642.36 rows=75288 width=38) (actual time=234.64..15567.35 rows=17 loops=1) -> Index Scan using nick__friend_nick on friends f0 (cost=0.00..2509966.71 rows=3880 width=19) (actual time=5.92..15360.73 rows=391 loops=1) Filter: (friend_nick = 'asta'::character varying) -> Index Scan using nick__friend_nick on friends fo0 (cost=0.00..79.82 rows=19 width=19) (actual time=0.48..0.48 rows=0 loops=391) Index Cond: ((fo0.nick = 'chalky'::character varying) AND (fo0.friend_nick = "outer".nick)) -> Materialize (cost=799561.12..799561.12 rows=3880 width=622) (actual time=7650.62..7651.35 rows=337 loops=1) -> Nested Loop (cost=0.00..799561.12 rows=3880 width=622) (actual time=10.42..7640.60 rows=360 loops=1) -> Index Scan using journals_0_pkey on users (cost=0.00..5102.15 rows=9923 width=603) (actual time=1.03..2548.42 rows=9923 loops=1) -> Index Scan using nick__friend_nick on friends f1 (cost=0.00..79.82 rows=19 width=19) (actual time=0.47..0.47rows=0 loops=9923) Index Cond: ((f1.nick = "outer".nick) AND (f1.friend_nick = 'furrr'::character varying)) Total runtime: 23227.58 msec (14 rows) ------------------- Relevant table descriptions: Table "public.users" Column | Type | Modifiers -------------------+-----------------------------+-------------------- id | integer | nick | character varying(15) | not null name | character varying | website | character varying(1024) | website_name | character varying | country | character varying | city | character varying | birth_date | date | email | character varying(50) | icq | bigint | bio | text | account_type | character varying | is_community | boolean | not null friends_num | integer | not null default 0 friend_of_num | integer | not null default 0 member_of_num | integer | not null default 0 memories_num | integer | not null default 0 time_created | timestamp without time zone | not null time_updated | timestamp without time zone | journal_entries | integer | not null default 0 comments_received | integer | not null default 0 comments_posted | integer | not null default 0 picture | character varying | interests_num | integer | not null default 0 memored_num | integer | not null default 0 age | smallint | rank_pos | integer | rank | numeric(10,3) | location | character varying(255) | Indexes: journals_0_pkey primary key btree (nick), age btree (age), country btree (country), is_community btree (is_community), location_info btree (country, city), picture btree (picture), rank btree (rank), time_updated btree (time_updated) Table "public.friends" Column | Type | Modifiers -------------+-----------------------+----------- nick | character varying(15) | not null friend_nick | character varying(15) | not null Indexes: friends2_pkey primary key btree (nick, friend_nick), friend_nick__nick unique btree (friend_nick, nick), nick__friend_nick unique btree (nick, friend_nick) > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Tuesday, September 23, 2003 19:43 > To: Egor Shipovalov > Cc: Pgsql-General > Subject: Re: [GENERAL] How to get the total number of rows returned by > query when using a cursor? > > > "Egor Shipovalov" <pgsql_list@eonline.ru> writes: > > The interesting thing is that some queries are tens (!) of > times slower with > > LIMIT that without. These contain a number of joins on > subselects. If you're > > interested, I'll post examples. > > > regards, tom lane
Egor Shipovalov wrote: > Limit (cost=0.00..24.79 rows=10 width=660) (actual time=8380.84..23224.93 > rows=5 loops=1) >[...] > -> Index Scan using nick__friend_nick on friends f0 > (cost=0.00..2509966.71 rows=3880 width=19) (actual time=5.92..15360.73 > rows=391 loops=1) > Filter: (friend_nick = 'asta'::character varying) AFAICS this estimated rows=3880 vs. actual rows=391 misestimation is the source of your problem. If you can convince the planner that friend_nick='asta' gives only 400 rows, it probably will switch to an index scan using friend_nick__nick with an estimated cost of ~ 1600 and an actual time of ~ 100. > -> Nested Loop (cost=0.00..799561.12 rows=3880 width=622) > (actual time=10.42..7640.60 rows=360 loops=1) > -> Index Scan using journals_0_pkey on users > (cost=0.00..5102.15 rows=9923 width=603) (actual time=1.03..2548.42 > rows=9923 loops=1) > -> Index Scan using nick__friend_nick on friends f1 > (cost=0.00..79.82 rows=19 width=19) (actual time=0.47..0.47rows=0 > loops=9923) > Index Cond: ((f1.nick = "outer".nick) AND > (f1.friend_nick = 'furrr'::character varying)) With better statistics this might change to a much cheaper -> Nested Loop -> Index Scan using friend_nick__nick on f1 (cost=..1600 rows=400) -> Index Scan using u_pkey on users ( ... loops=448) ... unless the planner finds an ever faster plan. So try ALTER TABLE friends ALTER COLUMN friend_nick SET STATISTICS 100; ANALYSE friends; and let us know how this affects your query. > Table "public.friends" > Column | Type | Modifiers > -------------+-----------------------+----------- > nick | character varying(15) | not null > friend_nick | character varying(15) | not null > Indexes: friends2_pkey primary key btree (nick, friend_nick), > friend_nick__nick unique btree (friend_nick, nick), > nick__friend_nick unique btree (nick, friend_nick) BTW, this last index is useless because it duplicates the primary key. Servus Manfred