Thread: Or selection on index versus union

Or selection on index versus union

From
han.holl@informationslogik.nl
Date:
Hello

I've got a table with an index, let's call it fase.

The following query is fine: 'select something from table where fase = '1';

However, this is disastrously slow:
select something from table where fase = '1' or fase = '2';

The reason is that the query planner decides to ignore the index, and goes for
a sequential scan of the table (with a couple of million records).

If I do:
select something from table where fase = '1'
union
select something from table where fase = '2';

it's fine again, but it's a lot of typing, and the first formulation has a
more natural feel to it.

Is there a way to convince the planner to use the fase index for this type of
query, or is there a hook somewhere that I missed that allows me to rewrite
a query like the above with a server-side function ?

Thanks in advance,

Han Holl

Re: Or selection on index versus union

From
kevin.kempter@dataintellect.com
Date:
Just a guess but have you tried using an in clause?

select something from table where fase in  ('1' , '2');




On Tuesday 04 October 2005 13:32, han.holl@informationslogik.nl wrote:
> Hello
>
> I've got a table with an index, let's call it fase.
>
> The following query is fine: 'select something from table where fase = '1';
>
> However, this is disastrously slow:
> select something from table where fase = '1' or fase = '2';
>
> The reason is that the query planner decides to ignore the index, and goes
> for a sequential scan of the table (with a couple of million records).
>
> If I do:
> select something from table where fase = '1'
> union
> select something from table where fase = '2';
>
> it's fine again, but it's a lot of typing, and the first formulation has a
> more natural feel to it.
>
> Is there a way to convince the planner to use the fase index for this type
> of query, or is there a hook somewhere that I missed that allows me to
> rewrite a query like the above with a server-side function ?
>
> Thanks in advance,
>
> Han Holl
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

Re: Or selection on index versus union

From
Michael Fuhr
Date:
On Tue, Oct 04, 2005 at 09:32:41PM +0200, han.holl@informationslogik.nl wrote:
> I've got a table with an index, let's call it fase.
>
> The following query is fine: 'select something from table where fase = '1';
>
> However, this is disastrously slow:
> select something from table where fase = '1' or fase = '2';

Could we see some EXPLAIN ANALYZE output?  What version of PostgreSQL
are you using?  Have you run VACUUM ANALYZE on the table to remove
dead tuples and update the statistics?  Have you considered clustering
the table on fase's index?

It might be useful to see the output of the following queries,
assuming the table name is foo:

SET enable_seqscan TO on;
SET enable_indexscan TO off;
EXPLAIN ANALYZE SELECT something FROM foo WHERE fase = '1';
EXPLAIN ANALYZE SELECT something FROM foo WHERE fase = '2';
EXPLAIN ANALYZE SELECT something FROM foo WHERE fase = '1' OR fase = '2';

SET enable_seqscan TO off;
SET enable_indexscan TO on;
EXPLAIN ANALYZE SELECT something FROM foo WHERE fase = '1';
EXPLAIN ANALYZE SELECT something FROM foo WHERE fase = '2';
EXPLAIN ANALYZE SELECT something FROM foo WHERE fase = '1' OR fase = '2';

SHOW random_page_cost;
SHOW effective_cache_size;
SELECT version();

\x
SELECT * FROM pg_stats WHERE tablename = 'foo' AND attname = 'fase';

BTW, pgsql-performance is a list dedicated to performance issues,
so you might want to ask performance-related questions there.

--
Michael Fuhr

Re: Or selection on index versus union

From
han.holl@informationslogik.nl
Date:
On Tuesday 04 October 2005 22:26, kevin.kempter@dataintellect.com wrote:
> Just a guess but have you tried using an in clause?
>
> select something from table where fase in  ('1' , '2');
>
Not surprisingly, this also does a sequential scan on the table. (It would be
_very_ diffcult to see that this is a union, I guess).

Thanks,

Han

Re: Or selection on index versus union

From
han.holl@informationslogik.nl
Date:

On Tuesday 04 October 2005 23:08, Michael Fuhr wrote:

> On Tue, Oct 04, 2005 at 09:32:41PM +0200, han.holl@informationslogik.nl wrote:

> > I've got a table with an index, let's call it fase.

> >

> > The following query is fine: 'select something from table where fase =

> > '1';

> >

> > However, this is disastrously slow:

> > select something from table where fase = '1' or fase = '2';

>

> Could we see some EXPLAIN ANALYZE output? What version of PostgreSQL

> are you using? Have you run VACUUM ANALYZE on the table to remove

> dead tuples and update the statistics? Have you considered clustering

> the table on fase's index?

>

Oh, well, thanks. I hadn't realized that a newly loaded database needs a vacuum analyze to begin with. And what's worse, I had the impression that vacuum full would include analyze, wich I see now it doesn't.

I'm not a database administrator, and I'm afraid it shows. I'm going to read a lot of docs in the coming months, because real people depend on reasonable performance of our databases.

Cheers, and thanks again,

Han Holl

PS We still have to be careful how to formulate conditions:

where fase in ('1','2')

is ok, and uses the index, but the logically identical:

where position(fase in '12') >= 1

does a sequential scan.

Re: Or selection on index versus union

From
Douglas McNaught
Date:
han.holl@informationslogik.nl writes:

> PS We still have to be careful how to formulate conditions:
>
> where fase in ('1','2')
>
> is ok, and uses the index, but the logically identical:
>
> where position(fase in '12') >= 1
>
> does a sequential scan.

Right, you would need a functional index to fix that.  The query
optimizer, in general, doesn't know what specific functions
do--they're black boxes.  So there would be no way for the optimizer
to figure out that the two expressions mean the same thing.

-Doug

Re: Or selection on index versus union

From
"Jim C. Nasby"
Date:
On Wed, Oct 05, 2005 at 11:13:58AM +0200, han.holl@informationslogik.nl wrote:
> On Tuesday 04 October 2005 23:08, Michael Fuhr wrote:
> > On Tue, Oct 04, 2005 at 09:32:41PM +0200, han.holl@informationslogik.nl wrote:
> > > I've got a table with an index, let's call it fase.
> > >
> > > The following query is fine: 'select something from table where fase =
> > > '1';
> > >
> > > However, this is disastrously slow:
> > > select something from table where fase = '1' or fase = '2';
> >
> > Could we see some EXPLAIN ANALYZE output?  What version of PostgreSQL
> > are you using?  Have you run VACUUM ANALYZE on the table to remove
> > dead tuples and update the statistics?  Have you considered clustering
> > the table on fase's index?
> >
>
> Oh, well, thanks. I hadn't realized that a newly loaded database needs a vacuum analyze to begin with. And what's
worse,I had the impression that vacuum full would include analyze, wich I see now it doesn't. 

When first loaded, there are no statistics for a table, so the database
has no idea what to expect. You don't need to vacuum a brand-new table,
but you do need to analyze it.

We've actually talked about changing the name of vacuum full so as not
to confuse people. But you really want to stay away from vacuum full
unless you have no choice, because of the exclusive lock on the table it
grabs.

> I'm not a database administrator, and I'm afraid it shows. I'm going to read a lot of docs in the coming months,
becausereal people depend on reasonable performance of our databases. 

If you need immediate help you can get commercial support from a number
of companies.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461