Thread: More efficient OR
Hi All, In several of my SQL statements I have to use a WHERE clause that contains mutiple ORs. i.e. WHERE column1 = 'A' OR column1 = 'B' OR column1 = 'C' Is there a more efficient SQL statement that accomplishes the same limiting functionality? Kind Regards, Keith
On Wed, Feb 16, 2005 at 11:02:59AM -0500, Keith Worthington wrote: > Hi All, > > In several of my SQL statements I have to use a WHERE clause that contains > mutiple ORs. i.e. > > WHERE column1 = 'A' OR > column1 = 'B' OR > column1 = 'C' > > Is there a more efficient SQL statement that accomplishes the same limiting > functionality? I do not know wheter it is more efficient in terms of execution, but I can read this more efficiently: WHERE column1 in ('A', 'B', 'C') -- __________________________________________________ "Nothing is as subjective as reality" Reinoud van Leeuwen reinoud.v@n.leeuwen.net http://www.xs4all.nl/~reinoud __________________________________________________
On Wed, 2005-02-16 at 10:02, Keith Worthington wrote: > Hi All, > > In several of my SQL statements I have to use a WHERE clause that contains > mutiple ORs. i.e. > > WHERE column1 = 'A' OR > column1 = 'B' OR > column1 = 'C' > > Is there a more efficient SQL statement that accomplishes the same limiting > functionality? The in() construct is (nowadays) basically the same as ORing multiple columns; where column1 in ('A','B','C')
Could 'in' or 'between' do what you want? I know that using 'in' is equivalent to what you have below. Could 'between' be more efficient--you could do explain analyze on various options to see what the actual plan would be. Sean On Feb 16, 2005, at 11:02 AM, Keith Worthington wrote: > Hi All, > > In several of my SQL statements I have to use a WHERE clause that > contains > mutiple ORs. i.e. > > WHERE column1 = 'A' OR > column1 = 'B' OR > column1 = 'C' > > Is there a more efficient SQL statement that accomplishes the same > limiting > functionality? > > Kind Regards, > Keith > > ---------------------------(end of > broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings
> > Hi All, > > > > In several of my SQL statements I have to use a WHERE clause > > that contains mutiple ORs. i.e. > > > > WHERE column1 = 'A' OR > > column1 = 'B' OR > > column1 = 'C' > > > > Is there a more efficient SQL statement that accomplishes the > > same limiting functionality? > > > > Kind Regards, > > Keith > > > > Scott wrote: > The in() construct is (nowadays) basically the same as > ORing multiple columns; > > where column1 in ('A','B','C') > > > Sean Davis wrote > Could 'in' or 'between' do what you want? I know that using 'in' > is equivalent to what you have below. Could 'between' be more > efficient--you could do explain analyze on various options to see > what the actual plan would be. > > Sean Thanks Scott and Sean for the post. It sounds like IN will save some typing and code space but not decrease the execution time. BETWEEN won't work for my real life query because the limiting values are quite disparate. Kind Regards, Keith
At least for between, I read that pgSQL rewrites it to a (a<x) AND (x<b). And my experience shows that IN is translated into a series of ORs. I may be wrong > -----Original Message----- > From: Keith Worthington [mailto:keithw@narrowpathinc.com] > Sent: Mittwoch, 16. Februar 2005 17:36 > To: PostgreSQL SQL > Cc: Sean Davis; Scott Marlowe > Subject: Re: [SQL] More efficient OR > > > > > Hi All, > > > > > > In several of my SQL statements I have to use a WHERE clause > > > that contains mutiple ORs. i.e. > > > > > > WHERE column1 = 'A' OR > > > column1 = 'B' OR > > > column1 = 'C' > > > > > > Is there a more efficient SQL statement that accomplishes the > > > same limiting functionality? > > > > > > Kind Regards, > > > Keith > > > > > > > Scott wrote: > > The in() construct is (nowadays) basically the same as > > ORing multiple columns; > > > > where column1 in ('A','B','C') > > > > > > Sean Davis wrote > > Could 'in' or 'between' do what you want? I know that using 'in' > > is equivalent to what you have below. Could 'between' be more > > efficient--you could do explain analyze on various options to see > > what the actual plan would be. > > > > Sean > > Thanks Scott and Sean for the post. > > It sounds like IN will save some typing and code space but > not decrease the > execution time. > > BETWEEN won't work for my real life query because the > limiting values are > quite disparate. > > Kind Regards, > Keith > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
You sound like you don't like the performance you get with OR or IN, from this I deduce that you have a very large list of values to OR from. These make huge queries which are not necessarily very fast ; also they are un-preparable by their very nature (ie. the planner has to look at each value, ponder its stats, think about it...) Basically a query with, say, 'column IN (100 values)' will make postgres work a lot more than a query with 'column in (SELECT something which yields 100 values)'. I have tested the following with good results : - Write a very simple set returning function which takes an array as a parameter and RETURN NEXT each array element in turn. It's just a FOR... RETURN NEXT. Say you call it array_flatten( INTEGER[] ) or something. - Then, instead of doing SELECT * FROM table WHERE id IN (1,4,77,586,1025) do: SELECT * FROM table WHERE id IN (SELECT * FROM array_flatten( '{1,4,77,586,1025}' ) ); or : SELECT t.* FROM table t, (SELECT * FROM array_flatten( '{1,4,77,586,1025}' ) ) foo WHERE t.id=foo.id; The first one will do a uniqu'ing on the array, the second one will not. You can also LEFT JOIN against your SRF to get the id's of the rows that were not in the table (you cannot do this with IN) And you can PREPARE the statement to something that will take an array as a parameter and won't have to be parsed everytime. Sometimes it can be a big performance boost. Try it ! However, if some value in your array matches a lot of rows in the table, it will be slower than the seq scan which would have been triggered by the planner actually seeing that value in the IN list and acting on it. But if you KNOW your column is unique, there is no point in forcing the planner to ponder each value in your list ! For an additional performance boost (likely negligible), you could sort your array in the function (or even in your application code) to ease the work of the index scanner, which will get a better cache hit rate. If you have, say, 20.000 values to get, this is the only way. Note that you could ask yourself why you need to get a lot of values. Are you fetching stuff from the database, computing a list of rows to get, then SELECTing them ? Then maybe you put something in the application that should really be in the database ? As a side note, it would be nice : - if that set returning function was a fast C built-in (I think there's one in contrib/intagg but obviously it works only for integers) because it's a useful tool and building brick ; same for array_accum, and some other commonly used five-lines aggregates and functions that everybody recodes once. - if postgresql used this kind of optimization for the SELECT * FROM table WHERE id =ANY( array ) which currently uses a seq scan. However, what IS nice from pg is that you can actually do the array SRF trick and pull a lot of rows by id's, at blazing speed, just by writing a three line function and tweaking your wuery. > It sounds like IN will save some typing and code space but not decrease > the > execution time. > > BETWEEN won't work for my real life query because the limiting values are > quite disparate. > > Kind Regards, > Keith > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >