Thread: Index problem

Index problem

From
"Rigmor Ukuhe"
Date:
Hi,

I have a table containing columns:

  "END_DATE" timestamptz NOT NULL
  "REO_ID" int4 NOT NULL

and i am indexed "REO_ID" coulumn.
I have a query:

select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where "REO_ID" IN
('112851'  ,'112859'  ,'112871'  ,'112883'  ,'112891'  ,'112904'  ,'112915'
,'112924'  ,'112939'  ,'112947'  ,'112960'  ,'112984'  ,'112999'  ,'113013'
,'113032'  ,'113059'  ,'113067'  ,'113084'  ,'113096'  ,'113103'  ,'113110'
,'113117'  ,'113125'  ,'113132'  ,'113139'  ,'113146'  ,'113153'  ,'113160'
,'113167'  ,'113174'  ,'113181'  ,'113188'  ,'113195'  ,'113204'  ,'113268'
,'113279'  ,'113294'  ,'113302'  ,'113317'  ,'113340'  ,'113358'  ,'113385'
,'113404'  ,'113412'  ,'113419'  ,'113429'  ,'113436'  ,'113443'  ,'113571'
,'113636'  ,'113649'  ,'113689'  ,'113705'  ,'113744'  ,'113755'  ,'113724'
,'113737' ,'113812'  ,'113828'  ,'113762'  ,'113842'  ,'113869'  ,'113925'
,'113976'  ,'114035'  ,'114044'  ,'114057'  ,'114070'  ,'114084'  ,'114094'
,'114119' )

and it is _not_ using that index

But following query (notice there are less id-s in WHERE clause, but rest is
same)

select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where  "REO_ID" IN
('112851'  ,'112859'  ,'112871'  ,'112883'  ,'112891'  ,'112904'  ,'112915'
,'112924'  ,'112939'  ,'112947'  ,'112960'  ,'112984'  ,'112999'  ,'113013'
,'113032'  ,'113059'  ,'113067'  ,'113084'  ,'113096'  ,'113103'  ,'113110'
,'113117'  ,'113125'  ,'113132'  ,'113139'  ,'113146'  ,'113153'  ,'113160'
,'113167'  ,'113174'  ,'113181'  ,'113188'  ,'113195'  ,'113204'  ,'113268'
,'113279'  ,'113294'  ,'113302'  ,'113317'  ,'113340'  ,'113358'  ,'113385'
,'113404'  ,'113412'  ,'113419'  ,'113429'  ,'113436'  ,'113443'  ,'113571'
,'113636'  ,'113649'  ,'113689'  ,'113705'  ,'113744'  ,'113755'  ,'113724'
,'113737' )

will _is_ using index:

Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS  (cost=0.00..394.06
rows=102 width=12)

What causes this behaviour? is there any workaround? Suggestions?

best,
Rigmor Ukuhe
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003


Re: Index problem

From
Tomasz Myrta
Date:
> Hi,
>
> I have a table containing columns:
>
>   "END_DATE" timestamptz NOT NULL
>   "REO_ID" int4 NOT NULL
>
> and i am indexed "REO_ID" coulumn.
> I have a query:
>
> select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where "REO_ID" IN
> ('112851'  ,'112859'  ,'112871'  ,'112883'  ,'112891'  ,'112904'  ,'112915'
> ,'112924'  ,'112939'  ,'112947'  ,'112960'  ,'112984'  ,'112999'  ,'113013'
> ,'113032'  ,'113059'  ,'113067'  ,'113084'  ,'113096'  ,'113103'  ,'113110'
> ,'113117'  ,'113125'  ,'113132'  ,'113139'  ,'113146'  ,'113153'  ,'113160'
> ,'113167'  ,'113174'  ,'113181'  ,'113188'  ,'113195'  ,'113204'  ,'113268'
> ,'113279'  ,'113294'  ,'113302'  ,'113317'  ,'113340'  ,'113358'  ,'113385'
> ,'113404'  ,'113412'  ,'113419'  ,'113429'  ,'113436'  ,'113443'  ,'113571'
> ,'113636'  ,'113649'  ,'113689'  ,'113705'  ,'113744'  ,'113755'  ,'113724'
> ,'113737' ,'113812'  ,'113828'  ,'113762'  ,'113842'  ,'113869'  ,'113925'
> ,'113976'  ,'114035'  ,'114044'  ,'114057'  ,'114070'  ,'114084'  ,'114094'
> ,'114119' )
>
> and it is _not_ using that index
>
> But following query (notice there are less id-s in WHERE clause, but rest is
> same)
>
> select "REO_ID", "END_DATE" from "PRIORITY_STATISTICS" where  "REO_ID" IN
> ('112851'  ,'112859'  ,'112871'  ,'112883'  ,'112891'  ,'112904'  ,'112915'
> ,'112924'  ,'112939'  ,'112947'  ,'112960'  ,'112984'  ,'112999'  ,'113013'
> ,'113032'  ,'113059'  ,'113067'  ,'113084'  ,'113096'  ,'113103'  ,'113110'
> ,'113117'  ,'113125'  ,'113132'  ,'113139'  ,'113146'  ,'113153'  ,'113160'
> ,'113167'  ,'113174'  ,'113181'  ,'113188'  ,'113195'  ,'113204'  ,'113268'
> ,'113279'  ,'113294'  ,'113302'  ,'113317'  ,'113340'  ,'113358'  ,'113385'
> ,'113404'  ,'113412'  ,'113419'  ,'113429'  ,'113436'  ,'113443'  ,'113571'
> ,'113636'  ,'113649'  ,'113689'  ,'113705'  ,'113744'  ,'113755'  ,'113724'
> ,'113737' )
>
> will _is_ using index:

Why not. It's just because the second query is more selective. Probably
you don't have too many rows in your table and Postgres thinks it's
better (faster) to use sequential scan than index one.

Regards,
Tomasz Myrta


Re: Index problem

From
"Matt Clark"
Date:
> What causes this behaviour? is there any workaround? Suggestions?
>

How many rows are there in the table, and can you post the 'explain analyze' for both queries after doing a 'vacuum
verboseanalyze 
[tablename]'?

Cheers

Matt



Re: Index problem

From
"Rigmor Ukuhe"
Date:
> > What causes this behaviour? is there any workaround? Suggestions?
> >
>
> How many rows are there in the table, and can you post the
> 'explain analyze' for both queries after doing a 'vacuum verbose analyze
> [tablename]'?

There are about 2500 rows in that table.

1st query explain analyze: Seq Scan on PRIORITY_STATISTICS
(cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0
loops=1)
Total runtime: 98.74 msec

2nd query explain analyze: NOTICE:  QUERY PLAN:

Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS  (cost=0.00..394.06
rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1)
Total runtime: 21.59 msec

Any help?

Rigmor


>
> Cheers
>
> Matt
>
>
>
> ---
> Incoming mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003
>
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.515 / Virus Database: 313 - Release Date: 01.09.2003


Re: Index problem

From
"Matt Clark"
Date:
> There are about 2500 rows in that table.
>
> 1st query explain analyze: Seq Scan on PRIORITY_STATISTICS
> (cost=0.00..491.44 rows=127 width=12) (actual time=98.58..98.58 rows=0
> loops=1)
> Total runtime: 98.74 msec
>
> 2nd query explain analyze: NOTICE:  QUERY PLAN:
>
> Index Scan using PRIORITY_STATISTICS_reo_id, PRIORITY_STATISTICS_reo_id,
[snip]
> PRIORITY_STATISTICS_reo_id on PRIORITY_STATISTICS  (cost=0.00..394.06
> rows=102 width=12) (actual time=20.93..20.93 rows=0 loops=1)
> Total runtime: 21.59 msec

With only 2500 rows the planner could be deciding that it's going to have to read every disk block to do an index scan
anyway,so it 
might as well do a sequential scan.  If the pages are in fact in the kernel cache then the compute time will dominate,
notthe IO 
time, so it ends up looking like a bad plan, but it's probably not really such a bad plan...

Is your effective_cache_size set to something sensibly large?

You could also try decreasing cpu_index_tuple_cost and cpu_tuple_cost.  These will affect all your queries though, so
whatyou gain 
on one might be lost on another.

Matt



Re: Index problem

From
Tom Lane
Date:
"Rigmor Ukuhe" <rigmor.ukuhe@finestmedia.com> writes:
>>> What causes this behaviour? is there any workaround? Suggestions?

At some point the planner is going to decide that one seqscan is cheaper
than repeated indexscans.  At some point it'll be right ... but in this
case it seems its relative cost estimates are off a bit.  You might try
reducing random_page_cost to bring them more into line with reality.
(But keep in mind that the reality you are measuring appears to be
small-table-already-fully-cached reality.  On a large table you might
find that small random_page_cost isn't such a hot idea after all.)

            regards, tom lane