Thread: SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX
The following query took 17 seconds: select count(LogSN), min(LogSN), max(LogSN) from Log where create_time < '2005/10/19'; Figuring that getting the count will involve scanning the database, I took it out, but the new query took 200 seconds: select min(LogSN), max(LogSN) from Log where create_time < '2005/10/19'; Is it because the planner is using index pk_log instead of idx_logtime? Anyway to avoid that? I can get instant replies with 2 separate queries for min(LogSN) and max(LogSN) using order by create_time limit 1, but I can't get both values within 1 query using the limit 1 construct. Any suggestions? I am running pg 8.1.2 on Windows 2000. The queries are done immediately after a vacuum analyze. Best regards, KC. ---------------------- esdt=> \d log; create_time | character varying(23) | default '1970/01/01~00:00:00.000'::char acter varying logsn | integer | not null ... Indexes: "pk_log" PRIMARY KEY, btree (logsn) "idx_logtime" btree (create_time, logsn) ... esdt=> vacuum analyze log; VACUUM esdt=> explain analyze select count(LogSN), min(LogSN), max(LogSN) from Log where create_time < '2005/10/19'; Aggregate (cost=57817.74..57817.75 rows=1 width=4) (actual time=17403.381..17403.384 rows=1 loops=1) -> Bitmap Heap Scan on log (cost=1458.31..57172.06 rows=86089 width=4) (actual time=180.368..17039.262 rows=106708 loops=1) Recheck Cond: ((create_time)::text < '2005/10/19'::text) -> Bitmap Index Scan on idx_logtime (cost=0.00..1458.31 rows=86089 width=0) (actual time=168.777..168.777 rows=106708 loops=1) Index Cond: ((create_time)::text < '2005/10/19'::text) Total runtime: 17403.787 ms esdt=> explain analyze select min(LogSN), max(LogSN) from Log where create_time < '2005/10/19'; Result (cost=2.51..2.52 rows=1 width=0) (actual time=200051.507..200051.510 rows=1 loops=1) InitPlan -> Limit (cost=0.00..1.26 rows=1 width=4) (actual time=18.541..18.544 rows=1 loops=1) -> Index Scan using pk_log on log (cost=0.00..108047.11 rows=86089 width=4) (actual time=18.533..18.533 rows=1 loops=1) Filter: (((create_time)::text < '2005/10/19'::text) AND (logsn IS NOT NULL)) -> Limit (cost=0.00..1.26 rows=1 width=4) (actual time=200032.928..200032.931 rows=1 loops=1) -> Index Scan Backward using pk_log on log (cost=0.00..108047.11 rows=86089 width=4) (actual time=200032.920..200032.920 rows=1 loops=1) Filter: (((create_time)::text < '2005/10/19'::text) AND (logsn IS NOT NULL)) Total runtime: 200051.701 ms esdt=> explain analyze select LogSN from Log where create_time < '2005/10/19' order by create_time limit 1; Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.071..0.073 rows=1 loops=1) -> Index Scan using idx_logtime on log (cost=0.00..84649.94 rows=86089 width=31) (actual time=0.063..0.063 rows=1 loops=1) Index Cond: ((create_time)::text < '2005/10/19'::text) Total runtime: 0.182 ms esdt=> explain analyze select LogSN from Log where create_time < '2005/10/19' order by create_time desc limit 1; Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.058..0.061 rows=1 loops=1) -> Index Scan Backward using idx_logtime on log (cost=0.00..84649.94 rows=86089 width=31) (actual time=0.051..0.051 rows=1 loops=1) Index Cond: ((create_time)::text < '2005/10/19'::text) Total runtime: 0.186 ms
On Fri, Jan 20, 2006 at 12:35:36PM +0800, K C Lau wrote: Here's the problem... the estimate for the backwards index scan is *way* off: > -> Limit (cost=0.00..1.26 rows=1 width=4) (actual > time=200032.928..200032.931 rows=1 loops=1) > -> Index Scan Backward using pk_log on > log (cost=0.00..108047.11 rows=86089 width=4) (actual > time=200032.920..200032.920 rows=1 loops=1) > Filter: (((create_time)::text < '2005/10/19'::text) AND > (logsn IS NOT NULL)) > Total runtime: 200051.701 ms BTW, these queries below are meaningless; they are not equivalent to min(logsn). > esdt=> explain analyze select LogSN from Log where create_time < > '2005/10/19' order by create_time limit 1; > > Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.071..0.073 rows=1 > loops=1) > -> Index Scan using idx_logtime on log (cost=0.00..84649.94 > rows=86089 width=31) (actual time=0.063..0.063 rows=1 loops=1) > Index Cond: ((create_time)::text < '2005/10/19'::text) > Total runtime: 0.182 ms > > esdt=> explain analyze select LogSN from Log where create_time < > '2005/10/19' order by create_time desc limit 1; > Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.058..0.061 rows=1 > loops=1) > -> Index Scan Backward using idx_logtime on log (cost=0.00..84649.94 > rows=86089 width=31) (actual time=0.051..0.051 rows=1 loops=1) > Index Cond: ((create_time)::text < '2005/10/19'::text) > Total runtime: 0.186 ms > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > On Fri, Jan 20, 2006 at 12:35:36PM +0800, K C Lau wrote: > Here's the problem... the estimate for the backwards index scan is *way* > off: >> -> Limit (cost=0.00..1.26 rows=1 width=4) (actual >> time=200032.928..200032.931 rows=1 loops=1) >> -> Index Scan Backward using pk_log on >> log (cost=0.00..108047.11 rows=86089 width=4) (actual >> time=200032.920..200032.920 rows=1 loops=1) >> Filter: (((create_time)::text < '2005/10/19'::text) AND >> (logsn IS NOT NULL)) >> Total runtime: 200051.701 ms It's more subtle than you think. The estimated rowcount is the estimated number of rows fetched if the indexscan were run to completion, which it isn't because the LIMIT cuts it off after the first returned row. That estimate is not bad (we can see from the aggregate plan that the true value would have been 106708, assuming that the "logsn IS NOT NULL" condition isn't filtering anything). The real problem is that it's taking quite a long time for the scan to reach the first row with create_time < 2005/10/19, which is not too surprising if logsn is strongly correlated with create_time ... but in the absence of any cross-column statistics the planner has no very good way to know that. (Hm ... but both of them probably also show a strong correlation to physical order ... we could look at that maybe ...) The default assumption is that the two columns aren't correlated and so it should not take long to hit the first such row, which is why the planner likes the indexscan/limit plan. regards, tom lane
> >> Hi, >> >> Will simple queries such as "SELECT * FROM blah_table WHERE tag='x'; >> work any >> faster by putting them into a stored procedure? > > IMHO no, why do you think so? You can use PREPARE instead, if you have > many > selects like this. I tought that creating stored procedures in database means storing it's execution plan (well, actually storing it like a compiled object). Well, that's what I've learned couple a years ago in colledge ;) What are the advantages of parsing SP functions every time it's called? My position is that preparing stored procedures for execution solves more problems, that it creates. And the most important one to be optimizing access to queries from multiple connections (which is one of the most important reasons for using stored procedures in the first place). Best regards, Rikard
At 01:20 06/01/21, Jim C. Nasby wrote: >BTW, these queries below are meaningless; they are not equivalent to >min(logsn). > > > esdt=> explain analyze select LogSN from Log where create_time < > > '2005/10/19' order by create_time limit 1; Thank you for pointing it out. It actually returns the min(logsn), as the index is on (create_time, logsn). To be more explicit, I have changed to query to: explain analyze select LogSN from Log where create_time < '2005/10/19' order by create_time, logsn limit 1; esdt=> \d log; create_time | character varying(23) | default '1970/01/01~00:00:00.000'::character varying logsn | integer | not null ... Indexes: "pk_log" PRIMARY KEY, btree (logsn) "idx_logtime" btree (create_time, logsn) Best regards, KC.
I have worked round the issue by using 2 separate queries with the LIMIT construct. LogSN and create_time are indeed directly correlated, both monotonously increasing, occasionally with multiple LogSN's having the same create_time. What puzzles me is why the query with COUNT, MIN, MAX uses idx_logtime for the scan, but the query without the COUNT uses pk_log and takes much longer. If it had chosen idx_logtime instead, then it should have returned immediately for both MIN and MAX. Best regards, KC. At 02:51 06/01/21, Tom Lane wrote: >"Jim C. Nasby" <jnasby@pervasive.com> writes: > > On Fri, Jan 20, 2006 at 12:35:36PM +0800, K C Lau wrote: > > Here's the problem... the estimate for the backwards index scan is *way* > > off: > > >> -> Limit (cost=0.00..1.26 rows=1 width=4) (actual > >> time=200032.928..200032.931 rows=1 loops=1) > >> -> Index Scan Backward using pk_log on > >> log (cost=0.00..108047.11 rows=86089 width=4) (actual > >> time=200032.920..200032.920 rows=1 loops=1) > >> Filter: (((create_time)::text < '2005/10/19'::text) AND > >> (logsn IS NOT NULL)) > >> Total runtime: 200051.701 ms > >It's more subtle than you think. The estimated rowcount is the >estimated number of rows fetched if the indexscan were run to >completion, which it isn't because the LIMIT cuts it off after the >first returned row. That estimate is not bad (we can see from the >aggregate plan that the true value would have been 106708, assuming >that the "logsn IS NOT NULL" condition isn't filtering anything). > >The real problem is that it's taking quite a long time for the scan >to reach the first row with create_time < 2005/10/19, which is not >too surprising if logsn is strongly correlated with create_time ... >but in the absence of any cross-column statistics the planner has >no very good way to know that. (Hm ... but both of them probably >also show a strong correlation to physical order ... we could look >at that maybe ...) The default assumption is that the two columns >aren't correlated and so it should not take long to hit the first such >row, which is why the planner likes the indexscan/limit plan. > > regards, tom lane