Thread: database performance and query performance question

database performance and query performance question

From

"Shea,Dan [CIS]"

Date:

22 January 2004, 15:47:48

Our database has slowed right down.  We are not getting any performance from
our biggest table "forecastelement".
The table has 93,218,671 records in it and climbing.
The index is on 4 columns, origianlly it was on 3.  I added another to see
if it improve performance.  It did not.
Should there be less columns in the index?
How can we improve database performance?
How should I improve my query?

PWFPM_DEV=# \d forecastelement
              Table "public.forecastelement"
     Column     |            Type             | Modifiers
----------------+-----------------------------+-----------
 version        | character varying(99)       | not null
 origin         | character varying(10)       | not null
 timezone       | character varying(99)       | not null
 region_id      | character varying(20)       | not null
 wx_element     | character varying(99)       | not null
 value          | character varying(99)       | not null
 flag           | character(3)                | not null
 units          | character varying(99)       | not null
 valid_time     | timestamp without time zone | not null
 issue_time     | timestamp without time zone | not null
 next_forecast  | timestamp without time zone | not null
 reception_time | timestamp without time zone | not null
Indexes:
    "forecastelement_vrwi_idx" btree
(valid_time,region_id.wx_element.issue_time)

explain analyze select  DISTINCT ON (valid_time)
to_char(valid_time,'YYYYMMDDHH24MISS') as valid_time,value from
                   (select valid_time,value,"time"(valid_time) as
hour,reception_time,
                   issue_time from forecastelement where
                   valid_time between '2002-09-02 04:00:00' and
                   '2002-09-07 03:59:59' and region_id = 'PU-REG-WTO-00200'
                   and wx_element = 'TEMP_VALEUR1' and issue_time between
                   '2002-09-02 05:00:00' and '2002-09-06 05:00:00'
                   and origin = 'REGIONAL'    and "time"(issue_time) =
'05:00:00'
                   order by issue_time,reception_time DESC,valid_time) as
foo where
                   (date(valid_time) = date(issue_time)+1 -1  or
date(valid_time) = date(issue_time)+1   or
                   (valid_time between '2002-09-07 00:00:00' and '2002-09-07
03:59:59'
              and issue_time = '2002-09-06 05:00:00'))  order by valid_time
,issue_time DESC;

USING INDEX
"forecastelement_vrwi_idx" btree (valid_time, region_id, wx_element,
issue_time)
 Unique  (cost=116.75..116.76 rows=1 width=83) (actual
time=9469.088..9470.002 rows=115 loops=1)
   ->  Sort  (cost=116.75..116.75 rows=1 width=83) (actual
time=9469.085..9469.308 rows=194 loops=1)
         Sort Key: to_char(valid_time, 'YYYYMMDDHH24MISS'::text), issue_time
         ->  Subquery Scan foo  (cost=116.72..116.74 rows=1 width=83)
(actual time=9465.979..9467.735 rows=194 loops=1)
               ->  Sort  (cost=116.72..116.73 rows=1 width=30) (actual
time=9440.756..9440.981 rows=194 loops=1)
                     Sort Key: issue_time, reception_time, valid_time
                     ->  Index Scan using forecastelement_vrwi_idx on
forecastelement  (cost=0.00..116.71 rows=1 width=30) (actual
time=176.510..9439.470 rows=194 loops=1)
                           Index Cond: ((valid_time >= '2002-09-02
04:00:00'::timestamp without time zone) AND (valid_time <= '2002-09-07
03:59:59'::timestamp without time zone) AND ((region_id)::text =
'PU-REG-WTO-00200'::text) AND ((wx_element)::text = 'TEMP_VALEUR1'::text)
AND (issue_time >= '2002-09-02 05:00:00'::timestamp without time zone) AND
(issue_time <= '2002-09-06 05:00:00'::timestamp without time zone))
                           Filter: (((origin)::text = 'REGIONAL'::text) AND
("time"(issue_time) = '05:00:00'::time without time zone) AND
((date(valid_time) = ((date(issue_time) + 1) - 1)) OR (date(valid_time) =
(date(issue_time) + 1)) OR ((valid_time >= '2002-09-07 00:00:00'::timestamp
without time zone) AND (valid_time <= '2002-09-07 03:59:59'::timestamp
without time zone) AND (issue_time = '2002-09-06 05:00:00'::timestamp
without time zone))))
 Total runtime: 9470.404 ms

We are running postgresql-7.4-0.5PGDG.i386.rpm .
on a Dell Poweredge 6650.
system
OS RHAS 3.0
cpu  4
memory 3.6 GB
disk  270 GB raid 5

postgresql.conf
max_connections = 64
shared_buffers = 4000
vacuum_mem = 32768
effective_cache_size = 312500
random_page_cost = 2

Re: database performance and query performance question

From

Josh Berkus

Date:

22 January 2004, 16:01:52

Dan,

> Should there be less columns in the index?
> How can we improve database performance?
> How should I improve my query?

Your query plan isn't the problem.  It's a good plan, and a reasonably
efficient query.   Under other circumstances, the SELECT DISTINCT with the
to_char could be a performance-killer, but it's not in that result set.

Overall, you're taking 9 seconds to scan 93 million records.  Is this the time
the first time you run the query, or the 2nd and successive times?

When did you last run VACUUM ANALYZE on the table?   Have you tried increasing
the ANALYZE statistics on the index columns to, say, 500?

Your disks are RAID 5.  How many drives?  In RAID5, more drives improves the
speed of large scans.

And what's your sort_mem setting?   You didn't mention it.

Why is your effective cache size only 300mb when you have 3 GB of RAM?  It's
not affecting this query, but it could affect others.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: database performance and query performance question

From

Josh Berkus

Date:

22 January 2004, 16:22:24

Dan,

> Why is your effective cache size only 300mb when you have 3 GB of RAM?  It's
> not affecting this query, but it could affect others.

Ignore this last question, I dropped a zero from my math.   Sorry!

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: database performance and query performance question

From

"Shea,Dan [CIS]"

Date:

22 January 2004, 16:36:10

-----Original Message-----
From: Josh Berkus [mailto:josh@agliodbs.com]
Sent: Thursday, January 22, 2004 3:01 PM
To: Shea,Dan [CIS]; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] database performance and query performance
question


Dan,

> Should there be less columns in the index?
> How can we improve database performance?
> How should I improve my query?

>>Your query plan isn't the problem.  It's a good plan, and a reasonably
>>efficient query.   Under other circumstances, the SELECT DISTINCT with the

>>to_char could be a performance-killer, but it's not in that result set.

>>Overall, you're taking 9 seconds to scan 93 million records.  Is this the
time
>>the first time you run the query, or the 2nd and successive times?

This is actually the second time.  The first query took more time.
Concerning the number of columns for an index, I switched the index to have
only one column and tried the same query.  It is below.


>>When did you last run VACUUM ANALYZE on the table?   Have you tried
increasing
>>the ANALYZE statistics on the index columns to, say, 500?
 It is run nightly. But last night's did not complete.  It was taking quite
some time and I cancelled it, over 4 hours. I will try increasing the
ANALYZE statistics to 500.

>>Your disks are RAID 5.  How many drives?  In RAID5, more drives improves
the
>>speed of large scans.
 There are 4 drives in this raid 5.  We are using lvm with ext3 filesystem.
Will be moving the database to a SAN within the next month.

And what's your sort_mem setting?   You didn't mention it.
>>The sort_mem is the default
PWFPM_DEV=# show sort_mem;
 sort_mem
----------
 1024

Why is your effective cache size only 300mb when you have 3 GB of RAM?  It's

not affecting this query, but it could affect others.
>> Oh, I thought I had it set for 2.5 GB of RAM. 312500 * 8k = 2.5 GB


QUERY WITH 1 column in index.

 Unique  (cost=717633.28..717633.29 rows=1 width=83) (actual
time=62922.399..62923.334 rows=115 loops=1)
   ->  Sort  (cost=717633.28..717633.29 rows=1 width=83) (actual
time=62922.395..62922.615 rows=194 loops=1)
         Sort Key: to_char(valid_time, 'YYYYMMDDHH24MISS'::text), issue_time
         ->  Subquery Scan foo  (cost=717633.26..717633.27 rows=1 width=83)
(actual time=62918.232..62919.989 rows=194 loops=1)
               ->  Sort  (cost=717633.26..717633.26 rows=1 width=30) (actual
time=62902.378..62902.601 rows=194 loops=1)
                     Sort Key: issue_time, reception_time, valid_time
                     ->  Index Scan using forecastelement_v_idx on
forecastelement  (cost=0.00..717633.25 rows=1 width=30) (actual
time=1454.974..62900.752 rows=194 loops=1)
                           Index Cond: ((valid_time >= '2002-09-02
04:00:00'::timestamp without time zone) AND (valid_time <= '2002-09-07
03:59:59'::timestamp without time zone))
                           Filter: (((region_id)::text =
'PU-REG-WTO-00200'::text) AND ((wx_element)::text = 'TEMP_VALEUR1'::text)
AND (issue_time >= '2002-09-02 05:00:00'::timestamp without time zone) AND
(issue_time <= '2002-09-06 05:00:00'::timestamp without time zone) AND
((origin)::text = 'REGIONAL'::text) AND ("time"(issue_time) =
'05:00:00'::time without time zone) AND ((date(valid_time) =
((date(issue_time) + 1) - 1)) OR (date(valid_time) = (date(issue_time) + 1))
OR ((valid_time >= '2002-09-07 00:00:00'::timestamp without time zone) AND
(valid_time <= '2002-09-07 03:59:59'::timestamp without time zone) AND
(issue_time = '2002-09-06 05:00:00'::timestamp without time zone))))
 Total runtime: 62923.723 ms
(10 rows)

PWFPM_DEV=# expalin analyze 312500
PWFPM_DEV=# explain analyze select  DISTINCT ON (valid_time)
to_char(valid_time,'YYYYMMDDHH24MISS') as valid_time,value from
PWFPM_DEV-#                    (select valid_time,value,"time"(valid_time)
as hour,reception_time,
PWFPM_DEV(#                    issue_time from forecastelement where
PWFPM_DEV(#                    valid_time between '2002-09-02 04:00:00' and
PWFPM_DEV(#                    '2002-09-07 03:59:59' and region_id =
'PU-REG-WTO-00200'
PWFPM_DEV(#                    and wx_element = 'TEMP_VALEUR1' and
issue_time between
PWFPM_DEV(#                    '2002-09-02 05:00:00' and '2002-09-06
05:00:00'
PWFPM_DEV(#                    and origin = 'REGIONAL'    and
"time"(issue_time) = '05:00:00'
PWFPM_DEV(#                    order by issue_time,reception_time
DESC,valid_time) as foo where
PWFPM_DEV-#                    (date(valid_time) = date(issue_time)+1 -1  or
date(valid_time) = date(issue_time)+1   or
PWFPM_DEV(#                    (valid_time between '2002-09-07 00:00:00' and
'2002-09-07 03:59:59'
PWFPM_DEV(#                    and issue_time = '2002-09-06 05:00:00'))
order by valid_time ,issue_time DESC;

QUERY PLAN

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
 Unique  (cost=717633.28..717633.29 rows=1 width=83) (actual
time=21468.227..21469.164 rows=115 loops=1)
   ->  Sort  (cost=717633.28..717633.29 rows=1 width=83) (actual
time=21468.223..21468.452 rows=194 loops=1)
         Sort Key: to_char(valid_time, 'YYYYMMDDHH24MISS'::text), issue_time
         ->  Subquery Scan foo  (cost=717633.26..717633.27 rows=1 width=83)
(actual time=21465.274..21467.006 rows=194 loops=1)
               ->  Sort  (cost=717633.26..717633.26 rows=1 width=30) (actual
time=21465.228..21465.452 rows=194 loops=1)
                     Sort Key: issue_time, reception_time, valid_time
                     ->  Index Scan using forecastelement_v_idx on
forecastelement  (cost=0.00..717633.25 rows=1 width=30) (actual
time=1479.649..21463.779 rows=194 loops=1)
                           Index Cond: ((valid_time >= '2002-09-02
04:00:00'::timestamp without time zone) AND (valid_time <= '2002-09-07
03:59:59'::timestamp without time zone))
                           Filter: (((region_id)::text =
'PU-REG-WTO-00200'::text) AND ((wx_element)::text = 'TEMP_VALEUR1'::text)
AND (issue_time >= '2002-09-02 05:00:00'::timestamp without time zone) AND
(issue_time <= '2002-09-06 05:00:00'::timestamp without time zone) AND
((origin)::text = 'REGIONAL'::text) AND ("time"(issue_time) =
'05:00:00'::time without time zone) AND ((date(valid_time) =
((date(issue_time) + 1) - 1)) OR (date(valid_time) = (date(issue_time) + 1))
OR ((valid_time >= '2002-09-07 00:00:00'::timestamp without time zone) AND
(valid_time <= '2002-09-07 03:59:59'::timestamp without time zone) AND
(issue_time = '2002-09-06 05:00:00'::timestamp without time zone))))
 Total runtime: 21469.485 ms
(10 rows)

PWFPM_DEV=#
--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: database performance and query performance question

From

"Shea,Dan [CIS]"

Date:

22 January 2004, 16:36:31

Something that I do not understand is why if you use a valid_time =
'2004-01-22 00:00:00' the query will use the index but if you do a
valid_time >  '2004-01-22 00:00:00' it does not use the index?
PWFPM_DEV=# explain analyze select * from forecastelement where valid_time >
date '2004-01-23'::date limit 10;
                                                               QUERY PLAN
----------------------------------------------------------------------------
------------------------------------------------------------
 Limit  (cost=0.00..3.82 rows=10 width=129) (actual
time=199550.388..199550.783 rows=10 loops=1)
   ->  Seq Scan on forecastelement  (cost=0.00..2722898.40 rows=7131102
width=129) (actual time=199550.382..199550.757 rows=10 loops=1)
         Filter: (valid_time > '2004-01-23 00:00:00'::timestamp without time
zone)
 Total runtime: 199550.871 ms
(4 rows)

PWFPM_DEV=# explain analyze select * from forecastelement where valid_time =
date '2004-01-23'::date limit 10;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
 Limit  (cost=0.00..18.76 rows=10 width=129) (actual time=176.141..276.577
rows=10 loops=1)
   ->  Index Scan using forecastelement_vrwi_idx on forecastelement
(cost=0.00..160770.98 rows=85707 width=129) (actual time=176.133..276.494
rows=10 loops=1)
         Index Cond: (valid_time = '2004-01-23 00:00:00'::timestamp without
time zone)
 Total runtime: 276.721 ms
(4 rows)

-----Original Message-----
From: Josh Berkus [mailto:josh@agliodbs.com]
Sent: Thursday, January 22, 2004 3:01 PM
To: Shea,Dan [CIS]; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] database performance and query performance
question


Dan,

> Should there be less columns in the index?
> How can we improve database performance?
> How should I improve my query?

Your query plan isn't the problem.  It's a good plan, and a reasonably
efficient query.   Under other circumstances, the SELECT DISTINCT with the
to_char could be a performance-killer, but it's not in that result set.

Overall, you're taking 9 seconds to scan 93 million records.  Is this the
time
the first time you run the query, or the 2nd and successive times?

When did you last run VACUUM ANALYZE on the table?   Have you tried
increasing
the ANALYZE statistics on the index columns to, say, 500?

Your disks are RAID 5.  How many drives?  In RAID5, more drives improves the

speed of large scans.

And what's your sort_mem setting?   You didn't mention it.

Why is your effective cache size only 300mb when you have 3 GB of RAM?  It's

not affecting this query, but it could affect others.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: database performance and query performance question

From

Josh Berkus

Date:

22 January 2004, 16:44:24

Dan,

> Something that I do not understand is why if you use a valid_time =
> '2004-01-22 00:00:00' the query will use the index but if you do a
> valid_time >  '2004-01-22 00:00:00' it does not use the index?

Because of the expected number of rows to be returned.  Take a look at the row
estimates on the forecastleelement scans.

You can improve these estimates by increasing the ANALYZE stats and/or running
ANALYZE more often.   Of course, increasing the stats makes analyze run
slower ...

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: database performance and query performance question

From

Hannu Krosing

Date:

22 January 2004, 16:48:01

Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
> Something that I do not understand is why if you use a valid_time =
> '2004-01-22 00:00:00' the query will use the index but if you do a
> valid_time >  '2004-01-22 00:00:00' it does not use the index?

It probably can't tell if > is selective enough to justify using index.

Together with "limit 10" it may be.

You could try

explain analyze select * from forecastelement where valid_time between
'2004-01-22'::date and '2004-01-22'::date limit 10;

to see if this is considered good enough.

--------------
Hannu

Re: database performance and query performance question

From

Hannu Krosing

Date:

22 January 2004, 16:54:17

Hannu Krosing kirjutas N, 22.01.2004 kell 22:46:
> Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
> > Something that I do not understand is why if you use a valid_time =
> > '2004-01-22 00:00:00' the query will use the index but if you do a
> > valid_time >  '2004-01-22 00:00:00' it does not use the index?
>
> It probably can't tell if > is selective enough to justify using index.
>
> Together with "limit 10" it may be.
>
> You could try
>
> explain analyze select * from forecastelement where valid_time between
> '2004-01-22'::date and '2004-01-22'::date limit 10;

Sorry, that should have been:

between '2004-01-22'::date and '2004-01-23'::date


> to see if this is considered good enough.
>
> --------------
> Hannu
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: database performance and query performance question

From

"Shea,Dan [CIS]"

Date:

22 January 2004, 17:10:06

This sure speed up the query, it is fast.
PWFPM_DEV=# explain analyze select * from forecastelement where valid_time
between '2004-01-12'::date and '2003-01-12'::date;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
---
 Index Scan using forecastelement_v_idx on forecastelement
(cost=0.00..159607.11 rows=466094 width=129) (actual time=49.504..49.504
rows=0 loops=1)
   Index Cond: ((valid_time >= '2004-01-12 00:00:00'::timestamp without time
zone) AND (valid_time <= '2003-01-12 00:00:00'::timestamp without time
zone))
 Total runtime: 49.589 ms
(3 rows)

-----Original Message-----
From: Hannu Krosing [mailto:hannu@tm.ee]
Sent: Thursday, January 22, 2004 3:54 PM
To: Shea,Dan [CIS]
Cc: 'josh@agliodbs.com'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] database performance and query performance
question


Hannu Krosing kirjutas N, 22.01.2004 kell 22:46:
> Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
> > Something that I do not understand is why if you use a valid_time =
> > '2004-01-22 00:00:00' the query will use the index but if you do a
> > valid_time >  '2004-01-22 00:00:00' it does not use the index?
>
> It probably can't tell if > is selective enough to justify using index.
>
> Together with "limit 10" it may be.
>
> You could try
>
> explain analyze select * from forecastelement where valid_time between
> '2004-01-22'::date and '2004-01-22'::date limit 10;

Sorry, that should have been:

between '2004-01-22'::date and '2004-01-23'::date


> to see if this is considered good enough.
>
> --------------
> Hannu
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: database performance and query performance question

From

"Shea,Dan [CIS]"

Date:

22 January 2004, 17:33:02

The end date in the previous example was actually invalid  between
'2004-01-12'::date and '2003-01-12'::date;
There have been multiple inserts since I recreated the index but it took
quite some time to complete the following
PWFPM_DEV=# explain analyze select * from forecastelement where valid_time
between '2004-01-12'::date and '2004-01-13'::date;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
-------
 Index Scan using forecastelement_v_idx on forecastelement
(cost=0.00..832139.81 rows=2523119 width=129) (actual time=0.519..467159.658
rows=2940600 loops=1)
   Index Cond: ((valid_time >= '2004-01-12 00:00:00'::timestamp without time
zone) AND (valid_time <= '2004-01-13 00:00:00'::timestamp without time
zone))
 Total runtime: 472627.148 ms
(3 rows)

-----Original Message-----
From: Shea,Dan [CIS]
Sent: Thursday, January 22, 2004 4:10 PM
To: 'Hannu Krosing'; Shea,Dan [CIS]
Cc: 'josh@agliodbs.com'; pgsql-performance@postgresql.org
Subject: RE: [PERFORM] database performance and query performance
question


This sure speed up the query, it is fast.
PWFPM_DEV=# explain analyze select * from forecastelement where valid_time
between '2004-01-12'::date and '2003-01-12'::date;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
---
 Index Scan using forecastelement_v_idx on forecastelement
(cost=0.00..159607.11 rows=466094 width=129) (actual time=49.504..49.504
rows=0 loops=1)
   Index Cond: ((valid_time >= '2004-01-12 00:00:00'::timestamp without time
zone) AND (valid_time <= '2003-01-12 00:00:00'::timestamp without time
zone))
 Total runtime: 49.589 ms
(3 rows)

-----Original Message-----
From: Hannu Krosing [mailto:hannu@tm.ee]
Sent: Thursday, January 22, 2004 3:54 PM
To: Shea,Dan [CIS]
Cc: 'josh@agliodbs.com'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] database performance and query performance
question


Hannu Krosing kirjutas N, 22.01.2004 kell 22:46:
> Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 22:35:
> > Something that I do not understand is why if you use a valid_time =
> > '2004-01-22 00:00:00' the query will use the index but if you do a
> > valid_time >  '2004-01-22 00:00:00' it does not use the index?
>
> It probably can't tell if > is selective enough to justify using index.
>
> Together with "limit 10" it may be.
>
> You could try
>
> explain analyze select * from forecastelement where valid_time between
> '2004-01-22'::date and '2004-01-22'::date limit 10;

Sorry, that should have been:

between '2004-01-22'::date and '2004-01-23'::date


> to see if this is considered good enough.
>
> --------------
> Hannu
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: database performance and query performance question

From

Josh Berkus

Date:

22 January 2004, 17:53:42

Dan,

Of course it took forever.  You're retrieving 2.9 million rows!

>  Index Scan using forecastelement_v_idx on forecastelement
> (cost=0.00..832139.81 rows=2523119 width=129) (actual time=0.519..467159.658
> rows=2940600 loops=1)

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco

Re: database performance and query performance question

From

Hannu Krosing

Date:

22 January 2004, 19:39:40

Shea,Dan [CIS] kirjutas N, 22.01.2004 kell 23:32:
> The end date in the previous example was actually invalid  between
> '2004-01-12'::date and '2003-01-12'::date;
> There have been multiple inserts since I recreated the index but it took
> quite some time to complete the following
> PWFPM_DEV=# explain analyze select * from forecastelement where valid_time
> between '2004-01-12'::date and '2004-01-13'::date;

You could try ORDER BY to bias the optimiser towards using an index:

explain analyze
 select *
  from forecastelement
 where valid_time > '2004-01-12'::date
 order by valid_time
 limit 10;

This also may be more close to what you are expecting :)

------------------
Hannu

Re: database performance and query performance question

From

Greg Stark

Date:

22 January 2004, 20:21:01

"Shea,Dan [CIS]" <Dan.Shea@ec.gc.ca> writes:

> Indexes:
>     "forecastelement_vrwi_idx" btree (valid_time,region_id.wx_element.issue_time)
>
> explain analyze
>  SELECT DISTINCT ON (valid_time)
>         to_char(valid_time,'YYYYMMDDHH24MISS') AS valid_time,
>         value
>    from (
>      SELECT valid_time,value, "time"(valid_time) AS hour, reception_time, issue_time
>        FROM forecastelement
>       WHERE valid_time BETWEEN '2002-09-02 04:00:00' AND '2002-09-07 03:59:59'
>         AND region_id = 'PU-REG-WTO-00200'
>         AND wx_element = 'TEMP_VALEUR1'
>         AND issue_time BETWEEN '2002-09-02 05:00:00' AND '2002-09-06 05:00:00'
>         AND origin = 'REGIONAL'
>         AND "time"(issue_time) = '05:00:00'
>       ORDER BY issue_time,reception_time DESC,valid_time
>      ) AS foo
>    WHERE
>          (   date(valid_time) = date(issue_time)+1 -1
>           OR date(valid_time) = date(issue_time)+1
>           OR (    valid_time BETWEEN '2002-09-07 00:00:00' AND '2002-09-07 03:59:59'
>               AND issue_time = '2002-09-06 05:00:00'
>              )
>          )
>    ORDER BY valid_time ,issue_time DESC;

Incidentally, I find it easier to analyze queries when they've been formatted
well. This makes what's going on much clearer.

From this it's clear your index doesn't match the query. Adding more columns
will be useless because only the leading column "valid_time" will be used at
all. Since you're fetching a whole range of valid_times the remaining columns
are all irrelevant. They only serve to bloat the index and require reading a
lot more data.

You could either try creating an index just on valid_time, or create an index
on (region_id,wx_element,valid_time) or (region_id,wx_element,issue_time)
whichever is more selective. You could put wx_element first if it's more
selective than region_id.

Moreover, what purpose does the inner ORDER BY clause serve? It's only going
to be re-sorted again by the outer ORDER BY.

--
greg