Thread: OR vs UNION

OR vs UNION

From
Scott Cain
Date:
Hello,

I have a query that uses a series of ORs and I have heard that sometimes
this type of query can be rewritten to use UNION instead and be more
efficient.  Are there any rules of thumb for when this might be the
case?  As an example here is a query of the type I am discussing:
    select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id    from feature f, featureloc fl    where
  (f.type_id = 219 OR       f.type_id = 368 OR       f.type_id = 514 OR       f.type_id = 475 OR       f.type_id = 426
OR      f.type_id = 456 OR       f.type_id = 461 OR       f.type_id = 553 OR       f.type_id = 89) and
fl.srcfeature_id= 1 and     f.feature_id  = fl.feature_id and     fl.fmin <= 2491413 and fl.fmax >= 2485521
 

which could easily be rewritten as a set of select statements for each
type_id and then union them together.  For this particular query,
explain analyze indicates that this is the more efficient form, but I
could easily see that at other times/for other parameters, a set unioned
together would be better.  Are there any guidelines for this?

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



Re: OR vs UNION

From
Josh Berkus
Date:
Scott,

> I have a query that uses a series of ORs and I have heard that sometimes
> this type of query can be rewritten to use UNION instead and be more
> efficient.

I'd be interested to know where you heard that; as far as I know, it could
only apply to conditional left outer joins.

>      select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id
>      from feature f, featureloc fl
>      where
>        (f.type_id = 219 OR
>         f.type_id = 368 OR
>         f.type_id = 514 OR
>         f.type_id = 475 OR
>         f.type_id = 426 OR
>         f.type_id = 456 OR
>         f.type_id = 461 OR
>         f.type_id = 553 OR
>         f.type_id = 89) and
>       fl.srcfeature_id = 1 and
>       f.feature_id  = fl.feature_id and
>       fl.fmin <= 2491413 and fl.fmax >= 2485521

Certainly a query of the above form would not benefit from being a union.

For readability, you could use an IN() statement rather than a bunch of ORs
... this would not help performance, but would make your query easier to
type/read.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: OR vs UNION

From
Scott Cain
Date:
On Thu, 2003-07-17 at 15:00, Josh Berkus wrote:
> Scott,
> 
> > I have a query that uses a series of ORs and I have heard that sometimes
> > this type of query can be rewritten to use UNION instead and be more
> > efficient.  
> 
> I'd be interested to know where you heard that; as far as I know, it could 
> only apply to conditional left outer joins.

Hmm, don't know for sure where I heard it, however I do know from
previous experience that unioned queries worked better in a somewhat
similar query, though it was a different schema, so it is hard to
compare directly.

One way in which I thought it might make a difference is if I build
partial indexes on feature_id for each of the type_ids of interest
(there are several thousand in the database, but only 15 or 20 that I am
interested in querying).  That way, when I write the separate queries
for each type_id, the query planner would have access to the partial
indexes for each type, and therefore may be able to complete the
individual queries very quickly.
> 
> >      select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id
> >      from feature f, featureloc fl
> >      where
> >        (f.type_id = 219 OR
> >         f.type_id = 368 OR
> >         f.type_id = 514 OR
> >         f.type_id = 475 OR
> >         f.type_id = 426 OR
> >         f.type_id = 456 OR
> >         f.type_id = 461 OR
> >         f.type_id = 553 OR
> >         f.type_id = 89) and
> >       fl.srcfeature_id = 1 and
> >       f.feature_id  = fl.feature_id and
> >       fl.fmin <= 2491413 and fl.fmax >= 2485521
> 
> Certainly a query of the above form would not benefit from being a union.
> 
> For readability, you could use an IN() statement rather than a bunch of ORs 
> ... this would not help performance, but would make your query easier to 
> type/read. 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain@cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



Re: OR vs UNION

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Certainly a query of the above form would not benefit from being a union.

Actually we used to have code in the planner that would automatically
transform an OR query to a UNION ALL construct (the old "ksqo" option).
It fell into disfavor, partly because it didn't really preserve
semantics exactly --- IIRC, should you mention the same field value more
than once in the OR, the UNION ALL version would generate duplicate
output rows.  At the time it was put in, it offered significant
performance wins, but subsequent planner improvements narrowed the gap
to nearly nothing, and we eventually took it out.  I'm not sure whether
there's any possible win given the current state of the planner.

If you dig for "ksqo" in the archives you can find all the details
(at least on days when the search engine is working ;-)).
        regards, tom lane


Re: OR vs UNION

From
Date:
Actually, I have used a UNION to replace OR's, the case (simpliefied to)
something like this:

Sample 1:
WHERE (f1 = 'v1' OR f1 = '') AND (f2 = 'v2' OR f2 = '')

Changed to Sample 2:
WHERE (f1 = 'v1') AND (f2 = 'v2')
UNION
WHERE (f1 = 'v1') AND (f2 = '')
UNION
WHERE (f1 = '') AND (f2 = '')


Note that Sample 1 is actually a simplified version, the queries are not
exactly equivalent.

The point is that sample 2 ran MUCH faster because:
a)  The table was *very* large
b)  The OR clauses of sample 1 prevented the use of an INDEX,

Reason:  It is faster to scan an index 3 times then scan this very large
table once.

I do not know if there is a proof to say that one can *always* replace OR's
with a union, but sometimes certainly, and in this case it made things much
better...

Terry Fielder
Manager Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com
Fax: (416) 441-9085


> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Josh Berkus
> Sent: Thursday, July 17, 2003 3:00 PM
> To: Scott Cain; pgsql-sql@postgresql.org
> Subject: Re: [SQL] OR vs UNION
>
>
> Scott,
>
> > I have a query that uses a series of ORs and I have heard
> that sometimes
> > this type of query can be rewritten to use UNION instead and be more
> > efficient.
>
> I'd be interested to know where you heard that; as far as I
> know, it could
> only apply to conditional left outer joins.
>
> >      select distinct
> f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id
> >      from feature f, featureloc fl
> >      where
> >        (f.type_id = 219 OR
> >         f.type_id = 368 OR
> >         f.type_id = 514 OR
> >         f.type_id = 475 OR
> >         f.type_id = 426 OR
> >         f.type_id = 456 OR
> >         f.type_id = 461 OR
> >         f.type_id = 553 OR
> >         f.type_id = 89) and
> >       fl.srcfeature_id = 1 and
> >       f.feature_id  = fl.feature_id and
> >       fl.fmin <= 2491413 and fl.fmax >= 2485521
>
> Certainly a query of the above form would not benefit from
> being a union.
>
> For readability, you could use an IN() statement rather than
> a bunch of ORs
> ... this would not help performance, but would make your
> query easier to
> type/read.
>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>



Re: OR vs UNION

From
Bruce Momjian
Date:
Gavin reported UNION faster than OR in some case when doing fts queries
two years ago at O'Reilly.

---------------------------------------------------------------------------

terry@ashtonwoodshomes.com wrote:
> Actually, I have used a UNION to replace OR's, the case (simpliefied to)
> something like this:
> 
> Sample 1:
> WHERE (f1 = 'v1' OR f1 = '')
>   AND (f2 = 'v2' OR f2 = '')
> 
> Changed to Sample 2:
> WHERE (f1 = 'v1')
>   AND (f2 = 'v2')
> UNION
> WHERE (f1 = 'v1')
>   AND (f2 = '')
> UNION
> WHERE (f1 = '')
>   AND (f2 = '')
> 
> 
> Note that Sample 1 is actually a simplified version, the queries are not
> exactly equivalent.
> 
> The point is that sample 2 ran MUCH faster because:
> a)  The table was *very* large
> b)  The OR clauses of sample 1 prevented the use of an INDEX,
> 
> Reason:  It is faster to scan an index 3 times then scan this very large
> table once.
> 
> I do not know if there is a proof to say that one can *always* replace OR's
> with a union, but sometimes certainly, and in this case it made things much
> better...
> 
> Terry Fielder
> Manager Software Development and Deployment
> Great Gulf Homes / Ashton Woods Homes
> terry@greatgulfhomes.com
> Fax: (416) 441-9085
> 
> 
> > -----Original Message-----
> > From: pgsql-sql-owner@postgresql.org
> > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Josh Berkus
> > Sent: Thursday, July 17, 2003 3:00 PM
> > To: Scott Cain; pgsql-sql@postgresql.org
> > Subject: Re: [SQL] OR vs UNION
> >
> >
> > Scott,
> >
> > > I have a query that uses a series of ORs and I have heard
> > that sometimes
> > > this type of query can be rewritten to use UNION instead and be more
> > > efficient.
> >
> > I'd be interested to know where you heard that; as far as I
> > know, it could
> > only apply to conditional left outer joins.
> >
> > >      select distinct
> > f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id
> > >      from feature f, featureloc fl
> > >      where
> > >        (f.type_id = 219 OR
> > >         f.type_id = 368 OR
> > >         f.type_id = 514 OR
> > >         f.type_id = 475 OR
> > >         f.type_id = 426 OR
> > >         f.type_id = 456 OR
> > >         f.type_id = 461 OR
> > >         f.type_id = 553 OR
> > >         f.type_id = 89) and
> > >       fl.srcfeature_id = 1 and
> > >       f.feature_id  = fl.feature_id and
> > >       fl.fmin <= 2491413 and fl.fmax >= 2485521
> >
> > Certainly a query of the above form would not benefit from
> > being a union.
> >
> > For readability, you could use an IN() statement rather than
> > a bunch of ORs
> > ... this would not help performance, but would make your
> > query easier to
> > type/read.
> >
> > --
> > -Josh Berkus
> >  Aglio Database Solutions
> >  San Francisco
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> >                http://archives.postgresql.org
> >
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073