Re: More efficient OR - Mailing list pgsql-sql
From | PFC |
---|---|
Subject | Re: More efficient OR |
Date | |
Msg-id | opsmaz1bxath1vuj@musicbox Whole thread Raw |
In response to | Re: More efficient OR ("Keith Worthington" <keithw@narrowpathinc.com>) |
List | pgsql-sql |
You sound like you don't like the performance you get with OR or IN, from this I deduce that you have a very large list of values to OR from. These make huge queries which are not necessarily very fast ; also they are un-preparable by their very nature (ie. the planner has to look at each value, ponder its stats, think about it...) Basically a query with, say, 'column IN (100 values)' will make postgres work a lot more than a query with 'column in (SELECT something which yields 100 values)'. I have tested the following with good results : - Write a very simple set returning function which takes an array as a parameter and RETURN NEXT each array element in turn. It's just a FOR... RETURN NEXT. Say you call it array_flatten( INTEGER[] ) or something. - Then, instead of doing SELECT * FROM table WHERE id IN (1,4,77,586,1025) do: SELECT * FROM table WHERE id IN (SELECT * FROM array_flatten( '{1,4,77,586,1025}' ) ); or : SELECT t.* FROM table t, (SELECT * FROM array_flatten( '{1,4,77,586,1025}' ) ) foo WHERE t.id=foo.id; The first one will do a uniqu'ing on the array, the second one will not. You can also LEFT JOIN against your SRF to get the id's of the rows that were not in the table (you cannot do this with IN) And you can PREPARE the statement to something that will take an array as a parameter and won't have to be parsed everytime. Sometimes it can be a big performance boost. Try it ! However, if some value in your array matches a lot of rows in the table, it will be slower than the seq scan which would have been triggered by the planner actually seeing that value in the IN list and acting on it. But if you KNOW your column is unique, there is no point in forcing the planner to ponder each value in your list ! For an additional performance boost (likely negligible), you could sort your array in the function (or even in your application code) to ease the work of the index scanner, which will get a better cache hit rate. If you have, say, 20.000 values to get, this is the only way. Note that you could ask yourself why you need to get a lot of values. Are you fetching stuff from the database, computing a list of rows to get, then SELECTing them ? Then maybe you put something in the application that should really be in the database ? As a side note, it would be nice : - if that set returning function was a fast C built-in (I think there's one in contrib/intagg but obviously it works only for integers) because it's a useful tool and building brick ; same for array_accum, and some other commonly used five-lines aggregates and functions that everybody recodes once. - if postgresql used this kind of optimization for the SELECT * FROM table WHERE id =ANY( array ) which currently uses a seq scan. However, what IS nice from pg is that you can actually do the array SRF trick and pull a lot of rows by id's, at blazing speed, just by writing a three line function and tweaking your wuery. > It sounds like IN will save some typing and code space but not decrease > the > execution time. > > BETWEEN won't work for my real life query because the limiting values are > quite disparate. > > Kind Regards, > Keith > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >