Thread: Why date index is not used

Why date index is not used

From

"Andrus"

Date:

08 June 2006, 18:55:20

Why Postgres 8.1 does not use makse_kuupaev_idx index in the following query
?

How to speed this query up ?

explain analyze select * from makse order by kuupaev desc, kellaaeg desc
limit 100

"Limit  (cost=62907.94..62908.19 rows=100 width=876) (actual
time=33699.551..33701.001 rows=100 loops=1)"
"  ->  Sort  (cost=62907.94..63040.49 rows=53022 width=876) (actual
time=33699.534..33700.129 rows=100 loops=1)"
"        Sort Key: kuupaev, kellaaeg"
"        ->  Seq Scan on makse  (cost=0.00..2717.22 rows=53022 width=876)
(actual time=0.020..308.502 rows=53028 loops=1)"
"Total runtime: 37857.177 ms"


CREATE TABLE makse(
  kuupaev date,
  kellaaeg char(6) NOT NULL DEFAULT ''::bpchar,
  guid char(36) NOT NULL,
  CONSTRAINT makse_pkey PRIMARY KEY (guid) )


CREATE INDEX makse_kuupaev_idx  ON makse  USING btree  (kuupaev);


Andrus.

Re: Why date index is not used

From

Tomas Vondra

Date:

08 June 2006, 19:17:48

If you want to benefit from the usage of an index, the query has to
contain some WHERE conditions (on the indexed columns). This is a
'select all' query - there is no way to speed it up using index.

Tomas

> > Why Postgres 8.1 does not use makse_kuupaev_idx index in the
following query
> > ?
> >
> > How to speed this query up ?
> >
> > explain analyze select * from makse order by kuupaev desc, kellaaeg
desc
> > limit 100
> >
> > "Limit  (cost=62907.94..62908.19 rows=100 width=876) (actual
> > time=33699.551..33701.001 rows=100 loops=1)"
> > "  ->  Sort  (cost=62907.94..63040.49 rows=53022 width=876) (actual
> > time=33699.534..33700.129 rows=100 loops=1)"
> > "        Sort Key: kuupaev, kellaaeg"
> > "        ->  Seq Scan on makse  (cost=0.00..2717.22 rows=53022
width=876)
> > (actual time=0.020..308.502 rows=53028 loops=1)"
> > "Total runtime: 37857.177 ms"
> >
> >
> > CREATE TABLE makse(
> >   kuupaev date,
> >   kellaaeg char(6) NOT NULL DEFAULT ''::bpchar,
> >   guid char(36) NOT NULL,
> >   CONSTRAINT makse_pkey PRIMARY KEY (guid) )
> >
> >
> > CREATE INDEX makse_kuupaev_idx  ON makse  USING btree  (kuupaev);

Re: Why date index is not used

From

Tomas Vondra

Date:

08 June 2006, 19:20:29

More precisely - the Postgres could use the index to speed up the
sorting, but in this case the sorting is very fast (less than one
second according to the output), so Postgres probably decided not
to use the index because it would be slower.

Btw. have you run ANALYZE on the table recently? What is the number
of distinct values in the 'kuupaev' column?

Tomas

> Why Postgres 8.1 does not use makse_kuupaev_idx index in the following query
> ?
>
> How to speed this query up ?
>
> explain analyze select * from makse order by kuupaev desc, kellaaeg desc
> limit 100
>
> "Limit  (cost=62907.94..62908.19 rows=100 width=876) (actual
> time=33699.551..33701.001 rows=100 loops=1)"
> "  ->  Sort  (cost=62907.94..63040.49 rows=53022 width=876) (actual
> time=33699.534..33700.129 rows=100 loops=1)"
> "        Sort Key: kuupaev, kellaaeg"
> "        ->  Seq Scan on makse  (cost=0.00..2717.22 rows=53022 width=876)
> (actual time=0.020..308.502 rows=53028 loops=1)"
> "Total runtime: 37857.177 ms"
>
>
> CREATE TABLE makse(
>   kuupaev date,
>   kellaaeg char(6) NOT NULL DEFAULT ''::bpchar,
>   guid char(36) NOT NULL,
>   CONSTRAINT makse_pkey PRIMARY KEY (guid) )
>
>
> CREATE INDEX makse_kuupaev_idx  ON makse  USING btree  (kuupaev);
>
>
> Andrus.

Re: Why date index is not used

From

Tom Lane

Date:

08 June 2006, 19:21:06

"Andrus" <eetasoft@online.ee> writes:
> Why Postgres 8.1 does not use makse_kuupaev_idx index in the following query
> ?

Because it doesn't help --- the system still has to do the sort.
You'd need a two-column index on both of the ORDER BY columns to avoid
sorting.

            regards, tom lane

Re: Why date index is not used

From

"Dave Dutcher"

Date:

08 June 2006, 20:07:57

Actually It looks to me like the sorting is the slow part of this query.
Maybe if you did create an index on both kuupaev and kellaaeg it might
make the sorting faster.  Or maybe you could try increasing the server's
work mem.  The sort will be much slower if the server can't do the whole
thing in ram.



> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of
> Tomas Vondra
> Sent: Thursday, June 08, 2006 2:20 PM
> To: Andrus
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Why date index is not used
>
>
> More precisely - the Postgres could use the index to speed up the
> sorting, but in this case the sorting is very fast (less than one
> second according to the output), so Postgres probably decided not
> to use the index because it would be slower.
>
> Btw. have you run ANALYZE on the table recently? What is the number
> of distinct values in the 'kuupaev' column?
>
> Tomas
>
> > Why Postgres 8.1 does not use makse_kuupaev_idx index in
> the following query
> > ?
> >
> > How to speed this query up ?
> >
> > explain analyze select * from makse order by kuupaev desc,
> kellaaeg desc
> > limit 100
> >
> > "Limit  (cost=62907.94..62908.19 rows=100 width=876) (actual
> > time=33699.551..33701.001 rows=100 loops=1)"
> > "  ->  Sort  (cost=62907.94..63040.49 rows=53022 width=876) (actual
> > time=33699.534..33700.129 rows=100 loops=1)"
> > "        Sort Key: kuupaev, kellaaeg"
> > "        ->  Seq Scan on makse  (cost=0.00..2717.22
> rows=53022 width=876)
> > (actual time=0.020..308.502 rows=53028 loops=1)"
> > "Total runtime: 37857.177 ms"
> >
> >
> > CREATE TABLE makse(
> >   kuupaev date,
> >   kellaaeg char(6) NOT NULL DEFAULT ''::bpchar,
> >   guid char(36) NOT NULL,
> >   CONSTRAINT makse_pkey PRIMARY KEY (guid) )
> >
> >
> > CREATE INDEX makse_kuupaev_idx  ON makse  USING btree  (kuupaev);
> >
> >
> > Andrus.
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
               http://archives.postgresql.org

Re: Why date index is not used

From

"Jim C. Nasby"

Date:

08 June 2006, 21:57:57

On Thu, Jun 08, 2006 at 03:20:55PM -0400, Tom Lane wrote:
> "Andrus" <eetasoft@online.ee> writes:
> > Why Postgres 8.1 does not use makse_kuupaev_idx index in the following query
> > ?
>
> Because it doesn't help --- the system still has to do the sort.
> You'd need a two-column index on both of the ORDER BY columns to avoid
> sorting.

And even then you better have a pretty high correlation on the first
column, otherwise you'll still get a seqscan.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Why date index is not used

From

Tom Lane

Date:

08 June 2006, 22:19:51

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> And even then you better have a pretty high correlation on the first
> column, otherwise you'll still get a seqscan.

Not with the LIMIT.  (If he were fetching the whole table, very possibly
the sort would be the right plan anyway.)

            regards, tom lane

Re: Why date index is not used

From

"Andrus"

Date:

09 June 2006, 09:52:47

> Btw. have you run ANALYZE on the table recently?

I have autovacuum with default statitics settings running so I expect that
it is analyzed.

> What is the number
> of distinct values in the 'kuupaev' column?

select count(distinct kuupaev) from makse

returns 61

kuupaev is sales date.

So this can contain 365 distinct values per year and max 10 year database,
total can be  3650 distinct values after 10 years.

Andrus

Re: Why date index is not used

From

"Andrus"

Date:

09 June 2006, 09:52:49

> Actually It looks to me like the sorting is the slow part of this query.
> Maybe if you did create an index on both kuupaev and kellaaeg it might
> make the sorting faster.

Thank you. It makes query fast.

> Or maybe you could try increasing the server's
> work mem.  The sort will be much slower if the server can't do the whole
> thing in ram.

I have W2K server with 0.5 GB RAM
there are only 6 connections open ( 6 point of sales) to this server.
shared_buffes is 10000
I see approx 10 postgres processes in task manager each taking about 30 MB
ram

Server prefomance is very slow: Windows swap file size is 1 GB

For each sale a new row will be inserted to this table. So the file size
grows rapidly every day.
Changing work_mem by 1 MB  increares memory requirment by 10 MB since I may
have 10 processes running. Sorting in memory this table requires very large
amout of work_mem for each process address space.

I think that if I increase work_mem  then swap file will became bigger and
perfomance will decrease even more.

How to increase perfomance ?

Andrus.

Re: Why date index is not used

From

"Andrus"

Date:

09 June 2006, 09:52:50

Tom,

> Because it doesn't help --- the system still has to do the sort.

It can help a lot in this case.

kuupaev is sales date
kellaaeg is sales time

Postgres can use kuupaev index to fetch first 100 rows plus a number of more
rows whose kellaaeg value is equal to kellaaeg in 100 th row. I have 500
sales per day.
So it can fetch 600 rows using index on kuupaev column.

After that it can sort those 600 rows fast.
Currently it sorts blindly  all 54000 rows in table.

> You'd need a two-column index on both of the ORDER BY columns to avoid
> sorting.

Thank you. It works.

Andrus.

Re: Why date index is not used

From

"Jim C. Nasby"

Date:

09 June 2006, 15:54:13

On Fri, Jun 09, 2006 at 12:40:26PM +0300, Andrus wrote:
> > Actually It looks to me like the sorting is the slow part of this query.
> > Maybe if you did create an index on both kuupaev and kellaaeg it might
> > make the sorting faster.
>
> Thank you. It makes query fast.
>
> > Or maybe you could try increasing the server's
> > work mem.  The sort will be much slower if the server can't do the whole
> > thing in ram.
>
> I have W2K server with 0.5 GB RAM
> there are only 6 connections open ( 6 point of sales) to this server.
> shared_buffes is 10000
> I see approx 10 postgres processes in task manager each taking about 30 MB
> ram
>
> Server prefomance is very slow: Windows swap file size is 1 GB
>
> For each sale a new row will be inserted to this table. So the file size
> grows rapidly every day.
> Changing work_mem by 1 MB  increares memory requirment by 10 MB since I may
> have 10 processes running. Sorting in memory this table requires very large
> amout of work_mem for each process address space.
>
> I think that if I increase work_mem  then swap file will became bigger and
> perfomance will decrease even more.
>
> How to increase perfomance ?

Do you have effective_cache_size set correctly? You might try dropping
random_page_cost down to 2 or so.

Of course you could just put more memory in the machine, too.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461