Thread: Optimizing queries

Optimizing queries

From
Patrice Beliveau
Date:
Hi,

I have a query that use a function and some column test to select row.
It's in the form of:

SELECT * FROM TABLE
   WHERE TABLE.COLUMN1=something
      AND TABLE.COLUMN2=somethingelse
      AND function(TABLE.COLUMN3,TABLE.COLUMN4) > 0;

The result of the function does NOT depend only from the table, but also
from some other tables.

Since it's long to process, I've add some output to see what's going on.
I find out that the function process every row even if the row should be
rejected as per the first or the second condition. Then , my question
is: Is there a way to formulate a query that wont do all the check if it
does not need to do it ? Meaning that, if condition1 is false then it
wont check condition2 and that way the function will only be called when
it's really necessary.

Thanks

Re: Optimizing queries

From
Scott Marlowe
Date:
On Tue, 2006-08-08 at 12:49, Patrice Beliveau wrote:
> Hi,
>
> I have a query that use a function and some column test to select row.
> It's in the form of:
>
> SELECT * FROM TABLE
>    WHERE TABLE.COLUMN1=something
>       AND TABLE.COLUMN2=somethingelse
>       AND function(TABLE.COLUMN3,TABLE.COLUMN4) > 0;
>
> The result of the function does NOT depend only from the table, but also
> from some other tables.
>
> Since it's long to process, I've add some output to see what's going on.
> I find out that the function process every row even if the row should be
> rejected as per the first or the second condition. Then , my question
> is: Is there a way to formulate a query that wont do all the check if it
> does not need to do it ? Meaning that, if condition1 is false then it
> wont check condition2 and that way the function will only be called when
> it's really necessary.

What version of postgresql are you running?  It might be better in later
versions.  The standard fix for such things is to use a subquery...

select * from (
   select * from table where
   col1='something'
   and col2='somethingelse'
) as a
where function(a.col3,a.col4) > 0;

Re: Optimizing queries

From
Patrice Beliveau
Date:
Scott Marlowe wrote:
> On Tue, 2006-08-08 at 12:49, Patrice Beliveau wrote:
>
>> Hi,
>>
>> I have a query that use a function and some column test to select row.
>> It's in the form of:
>>
>> SELECT * FROM TABLE
>>    WHERE TABLE.COLUMN1=something
>>       AND TABLE.COLUMN2=somethingelse
>>       AND function(TABLE.COLUMN3,TABLE.COLUMN4) > 0;
>>
>> The result of the function does NOT depend only from the table, but also
>> from some other tables.
>>
>> Since it's long to process, I've add some output to see what's going on.
>> I find out that the function process every row even if the row should be
>> rejected as per the first or the second condition. Then , my question
>> is: Is there a way to formulate a query that wont do all the check if it
>> does not need to do it ? Meaning that, if condition1 is false then it
>> wont check condition2 and that way the function will only be called when
>> it's really necessary.
>>
>
> What version of postgresql are you running?  It might be better in later
> versions.  The standard fix for such things is to use a subquery...
>
> select * from (
>    select * from table where
>    col1='something'
>    and col2='somethingelse'
> ) as a
> where function(a.col3,a.col4) > 0;
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
>
>
Thanks for the answer, but it does not work, maybe I did something wrong

First, I'm using version 8.1.3

This is what I did:

select * from (
    select * from sales_order_delivery
        where sales_order_id in (
                select sales_order_id from sales_order
                where closed=false
        )
  ) as a where outstandingorder(sales_order_id, sales_order_item,
date_due) > 0;

Some output that I've create look like
INFO:  so:03616 soi:1 date:1993-12-23
INFO:  so:09614 soi:1 date:1998-06-04

which are the three arguments passed to the function "outstandingorder",
but sales_order 03616 and 09614 are closed.

What's wrong ??

Thanks


Re: Optimizing queries

From
Tom Lane
Date:
Patrice Beliveau <pbeliveau@avior.ca> writes:
>>> SELECT * FROM TABLE
>>> WHERE TABLE.COLUMN1=something
>>> AND TABLE.COLUMN2=somethingelse
>>> AND function(TABLE.COLUMN3,TABLE.COLUMN4) > 0;

> I find out that the function process every row even if the row should be
> rejected as per the first or the second condition.
> ... I'm using version 8.1.3

PG 8.1 will not reorder WHERE clauses for a single table unless it has
some specific reason to do so (and AFAICT no version back to 7.0 or so
has done so either...)  So there's something you are not telling us that
is relevant.  Let's see the exact table schema (psql \d output is good),
the exact query, and EXPLAIN output for that query.

            regards, tom lane

Re: Optimizing queries

From
Patrice Beliveau
Date:
Tom Lane wrote:
> Patrice Beliveau <pbeliveau@avior.ca> writes:
>
>>>> SELECT * FROM TABLE
>>>> WHERE TABLE.COLUMN1=something
>>>> AND TABLE.COLUMN2=somethingelse
>>>> AND function(TABLE.COLUMN3,TABLE.COLUMN4) > 0;
>>>>
>
>
>> I find out that the function process every row even if the row should be
>> rejected as per the first or the second condition.
>> ... I'm using version 8.1.3
>>
>
> PG 8.1 will not reorder WHERE clauses for a single table unless it has
> some specific reason to do so (and AFAICT no version back to 7.0 or so
> has done so either...)  So there's something you are not telling us that
> is relevant.  Let's see the exact table schema (psql \d output is good),
> the exact query, and EXPLAIN output for that query.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>
>
>
Hi,

here is my query, and the query plan that result

explain select * from (
   select * from sales_order_delivery
       where sales_order_id in (
               select sales_order_id from sales_order
               where closed=false
       )
 ) as a where outstandingorder(sales_order_id, sales_order_item,
date_due) > 0;


                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash IN Join  (cost=498.89..8348.38 rows=34612 width=262)
   Hash Cond: (("outer".sales_order_id)::text =
("inner".sales_order_id)::text)
   ->  Seq Scan on sales_order_delivery  (cost=0.00..6465.03 rows=69223
width=262)
         Filter: (outstandingorder((sales_order_id)::text,
(sales_order_item)::text, date_due) > 0::double precision)
   ->  Hash  (cost=484.90..484.90 rows=5595 width=32)
         ->  Seq Scan on sales_order  (cost=0.00..484.90 rows=5595 width=32)
               Filter: (NOT closed)
(7 rows)


Re: Optimizing queries

From
Ruben Rubio
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If subquerys are not working I think you should try to create a view
with the subquery.

Maybe it will work.

Patrice Beliveau wrote:
> Tom Lane wrote:
>> Patrice Beliveau <pbeliveau@avior.ca> writes:
>>
>>>>> SELECT * FROM TABLE
>>>>> WHERE TABLE.COLUMN1=something
>>>>> AND TABLE.COLUMN2=somethingelse
>>>>> AND function(TABLE.COLUMN3,TABLE.COLUMN4) > 0;
>>>>>
>>
>>
>>> I find out that the function process every row even if the row should
>>> be rejected as per the first or the second condition.
>>> ... I'm using version 8.1.3
>>>
>>
>> PG 8.1 will not reorder WHERE clauses for a single table unless it has
>> some specific reason to do so (and AFAICT no version back to 7.0 or so
>> has done so either...)  So there's something you are not telling us that
>> is relevant.  Let's see the exact table schema (psql \d output is good),
>> the exact query, and EXPLAIN output for that query.
>>
>>             regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>>
>>
>>
> Hi,
>
> here is my query, and the query plan that result
>
> explain select * from (
>   select * from sales_order_delivery
>       where sales_order_id in (
>               select sales_order_id from sales_order
>               where closed=false
>       )
> ) as a where outstandingorder(sales_order_id, sales_order_item,
> date_due) > 0;
>
>
>                                                      QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
>
> Hash IN Join  (cost=498.89..8348.38 rows=34612 width=262)
>   Hash Cond: (("outer".sales_order_id)::text =
> ("inner".sales_order_id)::text)
>   ->  Seq Scan on sales_order_delivery  (cost=0.00..6465.03 rows=69223
> width=262)
>         Filter: (outstandingorder((sales_order_id)::text,
> (sales_order_item)::text, date_due) > 0::double precision)
>   ->  Hash  (cost=484.90..484.90 rows=5595 width=32)
>         ->  Seq Scan on sales_order  (cost=0.00..484.90 rows=5595 width=32)
>               Filter: (NOT closed)
> (7 rows)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE2dMTIo1XmbAXRboRAhbIAJwJGZ+ITP0gl38A3qROrzIeNbTtUwCcDOIW
eZ9NJqjL+58gyMfO95jwZSw=
=4Zxj
-----END PGP SIGNATURE-----

Re: Optimizing queries

From
Patrice Beliveau
Date:
I've create a view, same query plan (some number vary a bit, but nothing
significant) and same result, closed sales_order are processed

Ruben Rubio wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> If subquerys are not working I think you should try to create a view
> with the subquery.
>
> Maybe it will work.
>
> Patrice Beliveau wrote:
>
>> Tom Lane wrote:
>>
>>> Patrice Beliveau <pbeliveau@avior.ca> writes:
>>>
>>>
>>>>>> SELECT * FROM TABLE
>>>>>> WHERE TABLE.COLUMN1=something
>>>>>> AND TABLE.COLUMN2=somethingelse
>>>>>> AND function(TABLE.COLUMN3,TABLE.COLUMN4) > 0;
>>>>>>
>>>>>>
>>>
>>>
>>>> I find out that the function process every row even if the row should
>>>> be rejected as per the first or the second condition.
>>>> ... I'm using version 8.1.3
>>>>
>>>>
>>> PG 8.1 will not reorder WHERE clauses for a single table unless it has
>>> some specific reason to do so (and AFAICT no version back to 7.0 or so
>>> has done so either...)  So there's something you are not telling us that
>>> is relevant.  Let's see the exact table schema (psql \d output is good),
>>> the exact query, and EXPLAIN output for that query.
>>>
>>>             regards, tom lane
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>>        choose an index scan if your joining column's datatypes do not
>>>        match
>>>
>>>
>>>
>>>
>> Hi,
>>
>> here is my query, and the query plan that result
>>
>> explain select * from (
>>   select * from sales_order_delivery
>>       where sales_order_id in (
>>               select sales_order_id from sales_order
>>               where closed=false
>>       )
>> ) as a where outstandingorder(sales_order_id, sales_order_item,
>> date_due) > 0;
>>
>>
>>                                                      QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------
>>
>> Hash IN Join  (cost=498.89..8348.38 rows=34612 width=262)
>>   Hash Cond: (("outer".sales_order_id)::text =
>> ("inner".sales_order_id)::text)
>>   ->  Seq Scan on sales_order_delivery  (cost=0.00..6465.03 rows=69223
>> width=262)
>>         Filter: (outstandingorder((sales_order_id)::text,
>> (sales_order_item)::text, date_due) > 0::double precision)
>>   ->  Hash  (cost=484.90..484.90 rows=5595 width=32)
>>         ->  Seq Scan on sales_order  (cost=0.00..484.90 rows=5595 width=32)
>>               Filter: (NOT closed)
>> (7 rows)
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faq
>>
>>
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFE2dMTIo1XmbAXRboRAhbIAJwJGZ+ITP0gl38A3qROrzIeNbTtUwCcDOIW
> eZ9NJqjL+58gyMfO95jwZSw=
> =4Zxj
> -----END PGP SIGNATURE-----
>
>
>


Re: Optimizing queries

From
Tom Lane
Date:
Patrice Beliveau <pbeliveau@avior.ca> writes:
> Tom Lane wrote:
>> PG 8.1 will not reorder WHERE clauses for a single table unless it has
>> some specific reason to do so (and AFAICT no version back to 7.0 or so
>> has done so either...)  So there's something you are not telling us that
>> is relevant.

> here is my query, and the query plan that result

> explain select * from (
>    select * from sales_order_delivery
>        where sales_order_id in (
>                select sales_order_id from sales_order
>                where closed=false
>        )
>  ) as a where outstandingorder(sales_order_id, sales_order_item,
> date_due) > 0;

So this isn't a simple query, but a join.  PG will generally push
single-table restrictions down to the individual tables in order to
reduce the number of rows that have to be processed at the join.
In this case that's not a win, but the planner doesn't know enough
about the outstandingorder() function to realize that.

I think what you need is an "optimization fence" to prevent the subquery
from being flattened:

explain select * from (
   select * from sales_order_delivery
       where sales_order_id in (
               select sales_order_id from sales_order
               where closed=false
       )
   OFFSET 0
 ) as a where outstandingorder(sales_order_id, sales_order_item,
date_due) > 0;

Any LIMIT or OFFSET in a subquery prevents WHERE conditions from being
pushed down past it (since that might change the results).  OFFSET 0 is
otherwise a no-op, so that's what people usually use.

            regards, tom lane

Re: Optimizing queries

From
Simon Riggs
Date:
On Tue, 2006-08-08 at 16:42 -0400, Tom Lane wrote:
> Patrice Beliveau <pbeliveau@avior.ca> writes:
> >>> SELECT * FROM TABLE
> >>> WHERE TABLE.COLUMN1=something
> >>> AND TABLE.COLUMN2=somethingelse
> >>> AND function(TABLE.COLUMN3,TABLE.COLUMN4) > 0;
>
> > I find out that the function process every row even if the row should be
> > rejected as per the first or the second condition.
> > ... I'm using version 8.1.3
>
> PG 8.1 will not reorder WHERE clauses for a single table unless it has
> some specific reason to do so (and AFAICT no version back to 7.0 or so
> has done so either...)  So there's something you are not telling us that
> is relevant.  Let's see the exact table schema (psql \d output is good),
> the exact query, and EXPLAIN output for that query.

Is WHERE clause re-ordering done for 8.2, or is that still a TODO item?
(Don't remember seeing that at all).

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com