Thread: table configuration tweak for performance gain.

table configuration tweak for performance gain.

From
"Harvey, Allan AC"
Date:
Hi all,
I was hoping someone might be able to set me straight
so that I can get some select performance improvements.

The table in question has several thousand rows currently,
planning several million, it has a date time stamp column.
All selects will be on ranges of the date time column.

I created an index on the dt column, ran ANALYSE then,
EXPLAIN for some queries.
The returned plan was always sequential search.
The SELECT time also indicates sequential search.

Is there anything I can do to improve SELECT performance
on what is essentially an ordered table?

Thanks in advance.
Allan

Re: table configuration tweak for performance gain.

From
Michael Fuhr
Date:
On Mon, Nov 15, 2004 at 04:25:59PM +1100, Harvey, Allan AC wrote:

> The table in question has several thousand rows currently,
> planning several million, it has a date time stamp column.
> All selects will be on ranges of the date time column.
>
> I created an index on the dt column, ran ANALYSE then,
> EXPLAIN for some queries.
> The returned plan was always sequential search.
> The SELECT time also indicates sequential search.
>
> Is there anything I can do to improve SELECT performance
> on what is essentially an ordered table?

Could you post a query and the EXPLAIN ANALYZE output?  We could
probably give better advice if we could see what's happening.

Have you experimented with lowering random_page_cost?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: table configuration tweak for performance gain.

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> On Mon, Nov 15, 2004 at 04:25:59PM +1100, Harvey, Allan AC wrote:
>> I created an index on the dt column, ran ANALYSE then,
>> EXPLAIN for some queries.
>> The returned plan was always sequential search.

> Could you post a query and the EXPLAIN ANALYZE output?  We could
> probably give better advice if we could see what's happening.

Also, let's see EXPLAIN ANALYZE results after setting enable_seqscan to
OFF.  If that doesn't force it into an indexscan, then you have got
more fundamental issues (perhaps a datatype mismatch).  Note that I
don't recommend turning off enable_seqscan as a production solution;
but it's a useful tool for debugging.

            regards, tom lane

Re: table configuration tweak for performance gain.

From
"Harvey, Allan AC"
Date:
Tom, Michael,

Thanks for your interests. My original post was scant
on detail as I was unsure if I had found the right place.
It appears I have, so...

Version is 7.4.5

Table size these tests were carried out on:-
mill2=> select count(*) from history\g
 count
--------
 258606
(1 row)

Before index:-
mill2=> \d history
               Table "public.history"
  Column   |            Type             | Modifiers
-----------+-----------------------------+-----------
 pointname | character varying(32)       | not null
 parameter | character varying(8)        | not null
 value     | double precision            | not null
 dt        | timestamp without time zone | not null

snip.....
WARNING:  skipping "pg_conversion" --- only table or database owner can analyze it
WARNING:  skipping "pg_depend" --- only table or database owner can analyze it
ANALYZE
mill2=> explain select value from history where pointname = 'MILL2-SPEED' and dt < now() - interval '5 minutes'\g
                                                         QUERY PLAN
     

-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on history  (cost=0.00..8276.82 rows=8982 width=8)
   Filter: (((pointname)::text = 'MILL2-SPEED'::text) AND ((dt)::timestamp with time zone < (now() -
'00:05:00'::interval)))
(2 rows)

After index:-
mill2=> create index dtindex on history( dt )\g
CREATE INDEX
mill2=> \d history
               Table "public.history"
  Column   |            Type             | Modifiers
-----------+-----------------------------+-----------
 pointname | character varying(32)       | not null
 parameter | character varying(8)        | not null
 value     | double precision            | not null
 dt        | timestamp without time zone | not null
Indexes:
    "dtindex" btree (dt)

snip....
WARNING:  skipping "pg_conversion" --- only table or database owner can analyze it
WARNING:  skipping "pg_depend" --- only table or database owner can analyze it
ANALYZE
mill2=> explain select value from history where pointname = 'MILL2-SPEED' and dt < now() - interval '5 minutes'\g
                                                         QUERY PLAN
     

-----------------------------------------------------------------------------------------------------------------------------
 Seq Scan on history  (cost=0.00..8263.19 rows=9342 width=8)
   Filter: (((pointname)::text = 'MILL2-SPEED'::text) AND ((dt)::timestamp with time zone < (now() -
'00:05:00'::interval)))
(2 rows)

>don't recommend turning off enable_seqscan as a production solution
On your advise I did not go there.

On using BETWEEN:-
mill2=> select value from history where pointname = 'MILL2-SPEED' and dt between now() and now() - interval '5
minutes'\g
 value
-------
(0 rows)

mill2=> select value from history where pointname = 'MILL2-SPEED' and dt < now() - interval '5 minutes'\g
  value
---------
 85606.9
 85606.9
 85606.9
 85606.9
 85606.9
 85606.9
etc.....

I have obviously used it wrong but cannot see how/why.

Thanks again.

Allan




> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, 16 November 2004 2:26
> To: Michael Fuhr
> Cc: Harvey, Allan AC; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] table configuration tweak for
> performance gain.
>
>
> Michael Fuhr <mike@fuhr.org> writes:
> > On Mon, Nov 15, 2004 at 04:25:59PM +1100, Harvey, Allan AC wrote:
> >> I created an index on the dt column, ran ANALYSE then,
> >> EXPLAIN for some queries.
> >> The returned plan was always sequential search.
>
> > Could you post a query and the EXPLAIN ANALYZE output?  We could
> > probably give better advice if we could see what's happening.
>
> Also, let's see EXPLAIN ANALYZE results after setting
> enable_seqscan to
> OFF.  If that doesn't force it into an indexscan, then you have got
> more fundamental issues (perhaps a datatype mismatch).  Note that I
> don't recommend turning off enable_seqscan as a production solution;
> but it's a useful tool for debugging.
>
>             regards, tom lane
>

Re: table configuration tweak for performance gain.

From
Michael Fuhr
Date:
On Tue, Nov 16, 2004 at 10:39:10AM +1100, Harvey, Allan AC wrote:

> mill2=> explain select value from history where pointname = 'MILL2-SPEED' and dt < now() - interval '5 minutes'\g

We were looking for the output from "explain analyze select ...."
With EXPLAIN ANALYZE we can see how realistic the planner's estimates
were.

>  Seq Scan on history  (cost=0.00..8263.19 rows=9342 width=8)
>    Filter: (((pointname)::text = 'MILL2-SPEED'::text) AND ((dt)::timestamp with time zone < (now() -
'00:05:00'::interval)))

You declared dt to be TIMESTAMP WITHOUT TIME ZONE, so its index
won't be used because the filter's type is TIMESTAMP WITH TIME ZONE.
Try casting the filter to dt's type by using now()::TIMESTAMP (this
won't be necessary in 8.0).

Aside from the type issue, the planner estimates that the query
will return 9342 rows, so even if it could use an index it might
think a sequential scan will be faster.  The output from EXPLAIN
ANALYZE would tell us if that guess is correct.

Please show us the output of EXPLAIN ANALYZE after you've modified
the query to use now()::TIMESTAMP.  If the query still does a
sequential scan then execute "SET enable_seqscan TO off", run EXPLAIN
ANALYZE again, and show us that output as well.

> On using BETWEEN:-
> mill2=> select value from history where pointname = 'MILL2-SPEED' and dt between now() and now() - interval '5
minutes'\g
>  value
> -------
> (0 rows)

"a BETWEEN x AND y" is equivalent to "a >= x AND a <= y", so
if x and y aren't chosen correctly then the expression will
always evaluate to false:

SELECT 5 BETWEEN 1 AND 10;
 ?column?
----------
 t

SELECT 5 BETWEEN 10 AND 1;
 ?column?
----------
 f

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: table configuration tweak for performance gain.

From
"Harvey, Allan AC"
Date:
Michael,

I don't mind at all.

3.244 ms compared to 15706.179 ms.
A sizeable difference.

Screen scraps follow.

Allan

mill2=> set enable_seqscan=off\g
SET
mill2=> explain analyse select count(*) from history where pointname = 'MILL2-SPEED' and dt > (now() - interval '5
minutes')::timestamp\g
                                                                QUERY PLAN
                  

------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=34815.05..34815.05 rows=1 width=0) (actual time=2.267..2.275 rows=1 loops=1)
   ->  Index Scan using dtindex on history  (cost=0.00..34783.32 rows=12690 width=0) (actual time=1.931..1.931 rows=0
loops=1)
         Index Cond: (((pointname)::text = 'MILL2-SPEED'::text) AND (dt > ((now() - '00:05:00'::interval))::timestamp
withouttime zone)) 
 Total runtime: 3.244 ms
(4 rows)


mill2=> set enable_seqscan=on\g
SET
mill2=> select count(*) from history where pointname = 'MILL2-SPEED' and dt > (now() - interval '5
minutes')::timestamp\g
 count
-------
     0
(1 row)

mill2=> explain analyse select count(*) from history where pointname = 'MILL2-SPEED' and dt > (now() - interval '5
minutes')::timestamp\g
                                                              QUERY PLAN
              

--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10163.50..10163.50 rows=1 width=0) (actual time=15705.388..15705.395 rows=1 loops=1)
   ->  Seq Scan on history  (cost=0.00..10131.77 rows=12690 width=0) (actual time=15705.286..15705.286 rows=0 loops=1)
         Filter: (((pointname)::text = 'MILL2-SPEED'::text) AND (dt > ((now() - '00:05:00'::interval))::timestamp
withouttime zone)) 
 Total runtime: 15706.179 ms
(4 rows)




> -----Original Message-----
> From: Michael Fuhr [mailto:mike@fuhr.org]
> Sent: Wednesday, 17 November 2004 12:19
> To: Harvey, Allan AC
> Cc: Tom Lane
> Subject: Re: [GENERAL] table configuration tweak for performance gain.
>
>
> On Wed, Nov 17, 2004 at 09:39:11AM +1100, Harvey, Allan AC wrote:
>
> > The solution then was:-
> >     an index of the right columns.
> >     explicit, exact type casting ( I'm a casual ingres user, type
> >       casting is something I never need or think you can do)
>
> PostgreSQL 8.0 will allow cross-type index usage, making the explicit
> cast unnecessary.
>
> >     turning off enable_seqscan for specific queries seemed
> to help a bit too.
>
> Performance in general might improve if you address the planner's
> reasons for chosing an inefficient plan.  Even though you're satisifed
> with performance now, would you mind posting the output of "EXPLAIN
> ANALYZE select ..." with enable_seqscan on and then with it off?
> If nothing else, an analysis might be educational for others.
>
> --
> Michael Fuhr
> http://www.fuhr.org/~mfuhr/
>

Re: table configuration tweak for performance gain.

From
Michael Fuhr
Date:
On Wed, Nov 17, 2004 at 12:41:20PM +1100, Harvey, Allan AC wrote:

> mill2=> explain analyse select count(*) from history where pointname = 'MILL2-SPEED' and dt > (now() - interval '5
minutes')::timestamp\g

How typical is this query?  How many distinct values does pointname
have?  Have you considered adding an index on pointname or a multicolumn
index on pointname and dt?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/