Thread: SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX

SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX

From
K C Lau
Date:
The following query took 17 seconds:
select count(LogSN), min(LogSN), max(LogSN) from Log where create_time <
'2005/10/19';

Figuring that getting the count will involve scanning the database, I took
it out, but the new query took 200 seconds:
select min(LogSN), max(LogSN) from Log where create_time < '2005/10/19';

Is it because the planner is using index pk_log instead of idx_logtime?
Anyway to avoid that?

I can get instant replies with 2 separate queries for min(LogSN) and
max(LogSN) using order by create_time limit 1, but I can't get both values
within 1 query using the limit 1 construct. Any suggestions?

I am running pg 8.1.2 on Windows 2000. The queries are done immediately
after a vacuum analyze.

Best regards,
KC.

----------------------

esdt=> \d log;
  create_time | character varying(23)   | default
'1970/01/01~00:00:00.000'::char
acter varying
  logsn       | integer                 | not null
  ...
Indexes:
     "pk_log" PRIMARY KEY, btree (logsn)
     "idx_logtime" btree (create_time, logsn)
     ...

esdt=> vacuum analyze log;
VACUUM

esdt=> explain analyze select count(LogSN), min(LogSN), max(LogSN) from Log
where create_time < '2005/10/19';

  Aggregate  (cost=57817.74..57817.75 rows=1 width=4) (actual
time=17403.381..17403.384 rows=1 loops=1)
    ->  Bitmap Heap Scan on log  (cost=1458.31..57172.06 rows=86089
width=4) (actual time=180.368..17039.262 rows=106708 loops=1)
          Recheck Cond: ((create_time)::text < '2005/10/19'::text)
          ->  Bitmap Index Scan on idx_logtime  (cost=0.00..1458.31
rows=86089 width=0) (actual time=168.777..168.777 rows=106708 loops=1)
                Index Cond: ((create_time)::text < '2005/10/19'::text)
  Total runtime: 17403.787 ms

esdt=> explain analyze select min(LogSN), max(LogSN) from Log where
create_time < '2005/10/19';

  Result  (cost=2.51..2.52 rows=1 width=0) (actual
time=200051.507..200051.510 rows=1 loops=1)
    InitPlan
      ->  Limit  (cost=0.00..1.26 rows=1 width=4) (actual
time=18.541..18.544 rows=1 loops=1)
            ->  Index Scan using pk_log on log  (cost=0.00..108047.11
rows=86089
width=4) (actual time=18.533..18.533 rows=1 loops=1)
                  Filter: (((create_time)::text < '2005/10/19'::text) AND
(logsn IS NOT NULL))
      ->  Limit  (cost=0.00..1.26 rows=1 width=4) (actual
time=200032.928..200032.931 rows=1 loops=1)
            ->  Index Scan Backward using pk_log on
log  (cost=0.00..108047.11 rows=86089 width=4) (actual
time=200032.920..200032.920 rows=1 loops=1)
                  Filter: (((create_time)::text < '2005/10/19'::text) AND
(logsn IS NOT NULL))
  Total runtime: 200051.701 ms

esdt=> explain analyze select LogSN from Log where create_time <
'2005/10/19' order by create_time limit 1;

  Limit  (cost=0.00..0.98 rows=1 width=31) (actual time=0.071..0.073 rows=1
loops=1)
    ->  Index Scan using idx_logtime on log  (cost=0.00..84649.94
rows=86089 width=31) (actual time=0.063..0.063 rows=1 loops=1)
          Index Cond: ((create_time)::text < '2005/10/19'::text)
  Total runtime: 0.182 ms

esdt=> explain analyze select LogSN from Log where create_time <
'2005/10/19' order by create_time desc limit 1;
  Limit  (cost=0.00..0.98 rows=1 width=31) (actual time=0.058..0.061 rows=1
loops=1)
    ->  Index Scan Backward using idx_logtime on log  (cost=0.00..84649.94
rows=86089 width=31) (actual time=0.051..0.051 rows=1 loops=1)
          Index Cond: ((create_time)::text < '2005/10/19'::text)
  Total runtime: 0.186 ms


Re: SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX

From
"Jim C. Nasby"
Date:
On Fri, Jan 20, 2006 at 12:35:36PM +0800, K C Lau wrote:

Here's the problem... the estimate for the backwards index scan is *way*
off:

>      ->  Limit  (cost=0.00..1.26 rows=1 width=4) (actual
> time=200032.928..200032.931 rows=1 loops=1)
>            ->  Index Scan Backward using pk_log on
> log  (cost=0.00..108047.11 rows=86089 width=4) (actual
> time=200032.920..200032.920 rows=1 loops=1)
>                  Filter: (((create_time)::text < '2005/10/19'::text) AND
> (logsn IS NOT NULL))
>  Total runtime: 200051.701 ms

BTW, these queries below are meaningless; they are not equivalent to
min(logsn).

> esdt=> explain analyze select LogSN from Log where create_time <
> '2005/10/19' order by create_time limit 1;
>
>  Limit  (cost=0.00..0.98 rows=1 width=31) (actual time=0.071..0.073 rows=1
> loops=1)
>    ->  Index Scan using idx_logtime on log  (cost=0.00..84649.94
> rows=86089 width=31) (actual time=0.063..0.063 rows=1 loops=1)
>          Index Cond: ((create_time)::text < '2005/10/19'::text)
>  Total runtime: 0.182 ms
>
> esdt=> explain analyze select LogSN from Log where create_time <
> '2005/10/19' order by create_time desc limit 1;
>  Limit  (cost=0.00..0.98 rows=1 width=31) (actual time=0.058..0.061 rows=1
> loops=1)
>    ->  Index Scan Backward using idx_logtime on log  (cost=0.00..84649.94
> rows=86089 width=31) (actual time=0.051..0.051 rows=1 loops=1)
>          Index Cond: ((create_time)::text < '2005/10/19'::text)
>  Total runtime: 0.186 ms
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Fri, Jan 20, 2006 at 12:35:36PM +0800, K C Lau wrote:
> Here's the problem... the estimate for the backwards index scan is *way*
> off:

>> ->  Limit  (cost=0.00..1.26 rows=1 width=4) (actual
>> time=200032.928..200032.931 rows=1 loops=1)
>> ->  Index Scan Backward using pk_log on
>> log  (cost=0.00..108047.11 rows=86089 width=4) (actual
>> time=200032.920..200032.920 rows=1 loops=1)
>> Filter: (((create_time)::text < '2005/10/19'::text) AND
>> (logsn IS NOT NULL))
>> Total runtime: 200051.701 ms

It's more subtle than you think.  The estimated rowcount is the
estimated number of rows fetched if the indexscan were run to
completion, which it isn't because the LIMIT cuts it off after the
first returned row.  That estimate is not bad (we can see from the
aggregate plan that the true value would have been 106708, assuming
that the "logsn IS NOT NULL" condition isn't filtering anything).

The real problem is that it's taking quite a long time for the scan
to reach the first row with create_time < 2005/10/19, which is not
too surprising if logsn is strongly correlated with create_time ...
but in the absence of any cross-column statistics the planner has
no very good way to know that.  (Hm ... but both of them probably
also show a strong correlation to physical order ... we could look
at that maybe ...)  The default assumption is that the two columns
aren't correlated and so it should not take long to hit the first such
row, which is why the planner likes the indexscan/limit plan.

            regards, tom lane

Re: Stored procedures

From
Rikard Pavelic
Date:
>
>> Hi,
>>
>> Will simple queries such as "SELECT * FROM blah_table WHERE tag='x';
>> work any
>> faster by putting them into a stored procedure?

>
> IMHO no, why do you think so? You can use PREPARE instead, if you have
> many
> selects like this.


I tought that creating stored procedures in database means
storing it's execution plan (well, actually storing it like a
compiled object). Well, that's what I've learned couple a years
ago in colledge ;)

What are the advantages of parsing SP functions every time it's called?

My position is that preparing stored procedures for execution solves
more problems, that it creates.
And the most important one to be optimizing access to queries from
multiple connections (which is one of the most important reasons for
using stored procedures in the first place).

Best regards,
    Rikard



Re: SELECT MIN, MAX took longer time than SELECT

From
K C Lau
Date:
At 01:20 06/01/21, Jim C. Nasby wrote:

>BTW, these queries below are meaningless; they are not equivalent to
>min(logsn).
>
> > esdt=> explain analyze select LogSN from Log where create_time <
> > '2005/10/19' order by create_time limit 1;

Thank you for pointing it out.

It actually returns the min(logsn), as the index is on (create_time,
logsn). To be more explicit, I have changed to query to:
explain analyze select LogSN from Log where create_time < '2005/10/19'
order by create_time, logsn limit 1;

esdt=> \d log;
  create_time | character varying(23)   | default
'1970/01/01~00:00:00.000'::character varying
  logsn       | integer                 | not null
  ...
Indexes:
     "pk_log" PRIMARY KEY, btree (logsn)
     "idx_logtime" btree (create_time, logsn)

Best regards,
KC.


Re: SELECT MIN, MAX took longer time than SELECT

From
K C Lau
Date:
I have worked round the issue by using 2 separate queries with the LIMIT
construct.

LogSN and create_time are indeed directly correlated, both monotonously
increasing, occasionally with multiple LogSN's having the same create_time.

What puzzles me is why the query with COUNT, MIN, MAX uses idx_logtime for
the scan, but the query without the COUNT uses pk_log and takes much
longer. If it had chosen idx_logtime instead, then it should have returned
immediately for both MIN and MAX.

Best regards,
KC.

At 02:51 06/01/21, Tom Lane wrote:
>"Jim C. Nasby" <jnasby@pervasive.com> writes:
> > On Fri, Jan 20, 2006 at 12:35:36PM +0800, K C Lau wrote:
> > Here's the problem... the estimate for the backwards index scan is *way*
> > off:
>
> >> ->  Limit  (cost=0.00..1.26 rows=1 width=4) (actual
> >> time=200032.928..200032.931 rows=1 loops=1)
> >> ->  Index Scan Backward using pk_log on
> >> log  (cost=0.00..108047.11 rows=86089 width=4) (actual
> >> time=200032.920..200032.920 rows=1 loops=1)
> >> Filter: (((create_time)::text < '2005/10/19'::text) AND
> >> (logsn IS NOT NULL))
> >> Total runtime: 200051.701 ms
>
>It's more subtle than you think.  The estimated rowcount is the
>estimated number of rows fetched if the indexscan were run to
>completion, which it isn't because the LIMIT cuts it off after the
>first returned row.  That estimate is not bad (we can see from the
>aggregate plan that the true value would have been 106708, assuming
>that the "logsn IS NOT NULL" condition isn't filtering anything).
>
>The real problem is that it's taking quite a long time for the scan
>to reach the first row with create_time < 2005/10/19, which is not
>too surprising if logsn is strongly correlated with create_time ...
>but in the absence of any cross-column statistics the planner has
>no very good way to know that.  (Hm ... but both of them probably
>also show a strong correlation to physical order ... we could look
>at that maybe ...)  The default assumption is that the two columns
>aren't correlated and so it should not take long to hit the first such
>row, which is why the planner likes the indexscan/limit plan.
>
>                         regards, tom lane