Thread: OR vs UNION
Hello, I have a query that uses a series of ORs and I have heard that sometimes this type of query can be rewritten to use UNION instead and be more efficient. Are there any rules of thumb for when this might be the case? As an example here is a query of the type I am discussing: select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id from feature f, featureloc fl where (f.type_id = 219 OR f.type_id = 368 OR f.type_id = 514 OR f.type_id = 475 OR f.type_id = 426 OR f.type_id = 456 OR f.type_id = 461 OR f.type_id = 553 OR f.type_id = 89) and fl.srcfeature_id= 1 and f.feature_id = fl.feature_id and fl.fmin <= 2491413 and fl.fmax >= 2485521 which could easily be rewritten as a set of select statements for each type_id and then union them together. For this particular query, explain analyze indicates that this is the more efficient form, but I could easily see that at other times/for other parameters, a set unioned together would be better. Are there any guidelines for this? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory
Scott, > I have a query that uses a series of ORs and I have heard that sometimes > this type of query can be rewritten to use UNION instead and be more > efficient. I'd be interested to know where you heard that; as far as I know, it could only apply to conditional left outer joins. > select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id > from feature f, featureloc fl > where > (f.type_id = 219 OR > f.type_id = 368 OR > f.type_id = 514 OR > f.type_id = 475 OR > f.type_id = 426 OR > f.type_id = 456 OR > f.type_id = 461 OR > f.type_id = 553 OR > f.type_id = 89) and > fl.srcfeature_id = 1 and > f.feature_id = fl.feature_id and > fl.fmin <= 2491413 and fl.fmax >= 2485521 Certainly a query of the above form would not benefit from being a union. For readability, you could use an IN() statement rather than a bunch of ORs ... this would not help performance, but would make your query easier to type/read. -- -Josh BerkusAglio Database SolutionsSan Francisco
On Thu, 2003-07-17 at 15:00, Josh Berkus wrote: > Scott, > > > I have a query that uses a series of ORs and I have heard that sometimes > > this type of query can be rewritten to use UNION instead and be more > > efficient. > > I'd be interested to know where you heard that; as far as I know, it could > only apply to conditional left outer joins. Hmm, don't know for sure where I heard it, however I do know from previous experience that unioned queries worked better in a somewhat similar query, though it was a different schema, so it is hard to compare directly. One way in which I thought it might make a difference is if I build partial indexes on feature_id for each of the type_ids of interest (there are several thousand in the database, but only 15 or 20 that I am interested in querying). That way, when I write the separate queries for each type_id, the query planner would have access to the partial indexes for each type, and therefore may be able to complete the individual queries very quickly. > > > select distinct f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id > > from feature f, featureloc fl > > where > > (f.type_id = 219 OR > > f.type_id = 368 OR > > f.type_id = 514 OR > > f.type_id = 475 OR > > f.type_id = 426 OR > > f.type_id = 456 OR > > f.type_id = 461 OR > > f.type_id = 553 OR > > f.type_id = 89) and > > fl.srcfeature_id = 1 and > > f.feature_id = fl.feature_id and > > fl.fmin <= 2491413 and fl.fmax >= 2485521 > > Certainly a query of the above form would not benefit from being a union. > > For readability, you could use an IN() statement rather than a bunch of ORs > ... this would not help performance, but would make your query easier to > type/read. -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain@cshl.org GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory
Josh Berkus <josh@agliodbs.com> writes: > Certainly a query of the above form would not benefit from being a union. Actually we used to have code in the planner that would automatically transform an OR query to a UNION ALL construct (the old "ksqo" option). It fell into disfavor, partly because it didn't really preserve semantics exactly --- IIRC, should you mention the same field value more than once in the OR, the UNION ALL version would generate duplicate output rows. At the time it was put in, it offered significant performance wins, but subsequent planner improvements narrowed the gap to nearly nothing, and we eventually took it out. I'm not sure whether there's any possible win given the current state of the planner. If you dig for "ksqo" in the archives you can find all the details (at least on days when the search engine is working ;-)). regards, tom lane
Actually, I have used a UNION to replace OR's, the case (simpliefied to) something like this: Sample 1: WHERE (f1 = 'v1' OR f1 = '') AND (f2 = 'v2' OR f2 = '') Changed to Sample 2: WHERE (f1 = 'v1') AND (f2 = 'v2') UNION WHERE (f1 = 'v1') AND (f2 = '') UNION WHERE (f1 = '') AND (f2 = '') Note that Sample 1 is actually a simplified version, the queries are not exactly equivalent. The point is that sample 2 ran MUCH faster because: a) The table was *very* large b) The OR clauses of sample 1 prevented the use of an INDEX, Reason: It is faster to scan an index 3 times then scan this very large table once. I do not know if there is a proof to say that one can *always* replace OR's with a union, but sometimes certainly, and in this case it made things much better... Terry Fielder Manager Software Development and Deployment Great Gulf Homes / Ashton Woods Homes terry@greatgulfhomes.com Fax: (416) 441-9085 > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Josh Berkus > Sent: Thursday, July 17, 2003 3:00 PM > To: Scott Cain; pgsql-sql@postgresql.org > Subject: Re: [SQL] OR vs UNION > > > Scott, > > > I have a query that uses a series of ORs and I have heard > that sometimes > > this type of query can be rewritten to use UNION instead and be more > > efficient. > > I'd be interested to know where you heard that; as far as I > know, it could > only apply to conditional left outer joins. > > > select distinct > f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id > > from feature f, featureloc fl > > where > > (f.type_id = 219 OR > > f.type_id = 368 OR > > f.type_id = 514 OR > > f.type_id = 475 OR > > f.type_id = 426 OR > > f.type_id = 456 OR > > f.type_id = 461 OR > > f.type_id = 553 OR > > f.type_id = 89) and > > fl.srcfeature_id = 1 and > > f.feature_id = fl.feature_id and > > fl.fmin <= 2491413 and fl.fmax >= 2485521 > > Certainly a query of the above form would not benefit from > being a union. > > For readability, you could use an IN() statement rather than > a bunch of ORs > ... this would not help performance, but would make your > query easier to > type/read. > > -- > -Josh Berkus > Aglio Database Solutions > San Francisco > > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Gavin reported UNION faster than OR in some case when doing fts queries two years ago at O'Reilly. --------------------------------------------------------------------------- terry@ashtonwoodshomes.com wrote: > Actually, I have used a UNION to replace OR's, the case (simpliefied to) > something like this: > > Sample 1: > WHERE (f1 = 'v1' OR f1 = '') > AND (f2 = 'v2' OR f2 = '') > > Changed to Sample 2: > WHERE (f1 = 'v1') > AND (f2 = 'v2') > UNION > WHERE (f1 = 'v1') > AND (f2 = '') > UNION > WHERE (f1 = '') > AND (f2 = '') > > > Note that Sample 1 is actually a simplified version, the queries are not > exactly equivalent. > > The point is that sample 2 ran MUCH faster because: > a) The table was *very* large > b) The OR clauses of sample 1 prevented the use of an INDEX, > > Reason: It is faster to scan an index 3 times then scan this very large > table once. > > I do not know if there is a proof to say that one can *always* replace OR's > with a union, but sometimes certainly, and in this case it made things much > better... > > Terry Fielder > Manager Software Development and Deployment > Great Gulf Homes / Ashton Woods Homes > terry@greatgulfhomes.com > Fax: (416) 441-9085 > > > > -----Original Message----- > > From: pgsql-sql-owner@postgresql.org > > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of Josh Berkus > > Sent: Thursday, July 17, 2003 3:00 PM > > To: Scott Cain; pgsql-sql@postgresql.org > > Subject: Re: [SQL] OR vs UNION > > > > > > Scott, > > > > > I have a query that uses a series of ORs and I have heard > > that sometimes > > > this type of query can be rewritten to use UNION instead and be more > > > efficient. > > > > I'd be interested to know where you heard that; as far as I > > know, it could > > only apply to conditional left outer joins. > > > > > select distinct > > f.name,fl.fmin,fl.fmax,fl.strand,f.type_id,f.feature_id > > > from feature f, featureloc fl > > > where > > > (f.type_id = 219 OR > > > f.type_id = 368 OR > > > f.type_id = 514 OR > > > f.type_id = 475 OR > > > f.type_id = 426 OR > > > f.type_id = 456 OR > > > f.type_id = 461 OR > > > f.type_id = 553 OR > > > f.type_id = 89) and > > > fl.srcfeature_id = 1 and > > > f.feature_id = fl.feature_id and > > > fl.fmin <= 2491413 and fl.fmax >= 2485521 > > > > Certainly a query of the above form would not benefit from > > being a union. > > > > For readability, you could use an IN() statement rather than > > a bunch of ORs > > ... this would not help performance, but would make your > > query easier to > > type/read. > > > > -- > > -Josh Berkus > > Aglio Database Solutions > > San Francisco > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073