Thread: Query Optimizer Failure / Possible Bug

Query Optimizer Failure / Possible Bug

From
Hannes Dorbath
Date:
The query and the corresponding EXPLAIN is at

http://hannes.imos.net/query.txt

I'd like to use the column q.replaced_serials for multiple calculations
in the SELECT clause, but every time it is referenced there in some way
the whole query in the FROM clause returning q is executed again.

This doesn't make sense to me at all and eats performance.

If this wasn't clear enough, for every

q.replaced_serials <insert_random_calculation> AS some_column

in the SELECT clause there is new block of

---------------------------------------------------------------
->  Aggregate  (cost=884.23..884.23 rows=1 width=0)
       ->  Nested Loop  (cost=0.00..884.23 rows=1 width=0)
             ->  Index Scan using ix_rma_ticket_serials_replace on

                         rma_ticket_serials rts  (cost=0.00..122.35
                         rows=190 width=4)
                   Index Cond: ("replace" = false)
             ->  Index Scan using pk_serials on serials s
                           (cost=0.00..3.51 rows=1 width=4)
                   Index Cond: (s.serial_id = "outer".serial_id)
                   Filter: ((article_no = $0) AND (delivery_id = $1))
---------------------------------------------------------------

in the EXPLAIN result.

For those who wonder why I do this FROM (SELECT...). I was searching for
a way to use the result of an subselect for multiple calculations in the
SELECT clause and return that calculation results as individual columns.

I tested a bit further and found out that PG behaves the same in case q
is a view. This makes me wonder how efficient the optimizer can work
with views - or even worse - nested views.

Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.


Thanks in advance,
Hannes Dorbath

Re: Query Optimizer Failure / Possible Bug

From
Josh Berkus
Date:
Hannes,

> The query and the corresponding EXPLAIN is at
>
> http://hannes.imos.net/query.txt

The problem is that you're using a complex corellated sub-select in the SELECT
clause:

    SELECT
      d.delivery_id,
      da.article_no,
      da.amount,
      (
          SELECT
            COUNT(*)
          FROM
            serials s
            INNER JOIN rma_ticket_serials rts ON (
                s.serial_id = rts.serial_id
            )
          WHERE
            s.article_no  = da.article_no AND
            s.delivery_id = d.delivery_id AND
            rts.replace   = FALSE
      ) AS replaced_serials

This means that the planner pretty much has to iterate over the subquery,
running it once for each row in the result set.   If you want the optimizer
to use a JOIN structure instead, put the subselect in the FROM clause.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: Query Optimizer Failure / Possible Bug

From
Hannes Dorbath
Date:
Thank you very much for your reply. I'll try to modify it.


Josh Berkus wrote:
> Hannes,
>
>
>>The query and the corresponding EXPLAIN is at
>>
>>http://hannes.imos.net/query.txt
>
>
> The problem is that you're using a complex corellated sub-select in the SELECT
> clause:
>
>     SELECT
>       d.delivery_id,
>       da.article_no,
>       da.amount,
>       (
>           SELECT
>             COUNT(*)
>           FROM
>             serials s
>             INNER JOIN rma_ticket_serials rts ON (
>                 s.serial_id = rts.serial_id
>             )
>           WHERE
>             s.article_no  = da.article_no AND
>             s.delivery_id = d.delivery_id AND
>             rts.replace   = FALSE
>       ) AS replaced_serials
>
> This means that the planner pretty much has to iterate over the subquery,
> running it once for each row in the result set.   If you want the optimizer
> to use a JOIN structure instead, put the subselect in the FROM clause.
>

--
imos  Gesellschaft fuer Internet-Marketing und Online-Services mbH
Alfons-Feifel-Str. 9 // D-73037 Goeppingen  // Stauferpark Ost
Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net

Re: Query Optimizer Failure / Possible Bug

From
Hannes Dorbath
Date:
hm, a few days and not a single reply :|

any more information needed? test data? simplified test case? anything?


thanks


Hannes Dorbath wrote:
> The query and the corresponding EXPLAIN is at
>
> http://hannes.imos.net/query.txt
>
> I'd like to use the column q.replaced_serials for multiple calculations
> in the SELECT clause, but every time it is referenced there in some way
> the whole query in the FROM clause returning q is executed again.
>
> This doesn't make sense to me at all and eats performance.
>
> If this wasn't clear enough, for every
>
> q.replaced_serials <insert_random_calculation> AS some_column
>
> in the SELECT clause there is new block of
>
> ---------------------------------------------------------------
> ->  Aggregate  (cost=884.23..884.23 rows=1 width=0)
>       ->  Nested Loop  (cost=0.00..884.23 rows=1 width=0)
>             ->  Index Scan using ix_rma_ticket_serials_replace on
>                         rma_ticket_serials rts  (cost=0.00..122.35
>                         rows=190 width=4)
>                   Index Cond: ("replace" = false)
>             ->  Index Scan using pk_serials on serials s
>                           (cost=0.00..3.51 rows=1 width=4)
>                   Index Cond: (s.serial_id = "outer".serial_id)
>                   Filter: ((article_no = $0) AND (delivery_id = $1))
> ---------------------------------------------------------------
>
> in the EXPLAIN result.
>
> For those who wonder why I do this FROM (SELECT...). I was searching for
> a way to use the result of an subselect for multiple calculations in the
> SELECT clause and return that calculation results as individual columns.
>
> I tested a bit further and found out that PG behaves the same in case q
> is a view. This makes me wonder how efficient the optimizer can work
> with views - or even worse - nested views.
>
> Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
>
>
> Thanks in advance,
> Hannes Dorbath

--
imos  Gesellschaft fuer Internet-Marketing und Online-Services mbH
Alfons-Feifel-Str. 9 // D-73037 Goeppingen  // Stauferpark Ost
Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net

Re: Query Optimizer Failure / Possible Bug

From
PFC
Date:
    Noticed this problem,too.
    You can always make the calculation you want done once inside a set
returning function so it'll behave like a table, but that's ugly.

On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath
<> wrote:

> hm, a few days and not a single reply :|
>
> any more information needed? test data? simplified test case? anything?
>
>
> thanks
>
>
> Hannes Dorbath wrote:
>> The query and the corresponding EXPLAIN is at
>>  http://hannes.imos.net/query.txt
>>  I'd like to use the column q.replaced_serials for multiple calculations
>> in the SELECT clause, but every time it is referenced there in some way
>> the whole query in the FROM clause returning q is executed again.
>>  This doesn't make sense to me at all and eats performance.
>>  If this wasn't clear enough, for every
>>  q.replaced_serials <insert_random_calculation> AS some_column
>>  in the SELECT clause there is new block of
>>  ---------------------------------------------------------------
>> ->  Aggregate  (cost=884.23..884.23 rows=1 width=0)
>>       ->  Nested Loop  (cost=0.00..884.23 rows=1 width=0)
>>             ->  Index Scan using ix_rma_ticket_serials_replace on
>>                         rma_ticket_serials rts  (cost=0.00..122.35
>>                         rows=190 width=4)
>>                   Index Cond: ("replace" = false)
>>             ->  Index Scan using pk_serials on serials s
>>                           (cost=0.00..3.51 rows=1 width=4)
>>                   Index Cond: (s.serial_id = "outer".serial_id)
>>                   Filter: ((article_no = $0) AND (delivery_id = $1))
>> ---------------------------------------------------------------
>>  in the EXPLAIN result.
>>  For those who wonder why I do this FROM (SELECT...). I was searching
>> for
>> a way to use the result of an subselect for multiple calculations in the
>> SELECT clause and return that calculation results as individual columns.
>>  I tested a bit further and found out that PG behaves the same in case q
>> is a view. This makes me wonder how efficient the optimizer can work
>> with views - or even worse - nested views.
>>  Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
>>   Thanks in advance,
>> Hannes Dorbath
>



Re: Query Optimizer Failure / Possible Bug

From
Hannes Dorbath
Date:
Mhh. I have no clue about the internals of PostgreSQL and query planing,
but to me as user this should really be a thing the optimizer has to
work out..


On 03.04.2005 10:01, PFC wrote:
>
>     Noticed this problem,too.
>     You can always make the calculation you want done once inside a set
> returning function so it'll behave like a table, but that's ugly.
>
> On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath
> <> wrote:
>
>> hm, a few days and not a single reply :|
>>
>> any more information needed? test data? simplified test case? anything?
>>
>>
>> thanks
>>
>>
>> Hannes Dorbath wrote:
>>
>>> The query and the corresponding EXPLAIN is at
>>>  http://hannes.imos.net/query.txt
>>>  I'd like to use the column q.replaced_serials for multiple calculations
>>> in the SELECT clause, but every time it is referenced there in some way
>>> the whole query in the FROM clause returning q is executed again.
>>>  This doesn't make sense to me at all and eats performance.
>>>  If this wasn't clear enough, for every
>>>  q.replaced_serials <insert_random_calculation> AS some_column
>>>  in the SELECT clause there is new block of
>>>  ---------------------------------------------------------------
>>> ->  Aggregate  (cost=884.23..884.23 rows=1 width=0)
>>>       ->  Nested Loop  (cost=0.00..884.23 rows=1 width=0)
>>>             ->  Index Scan using ix_rma_ticket_serials_replace on
>>>                         rma_ticket_serials rts  (cost=0.00..122.35
>>>                         rows=190 width=4)
>>>                   Index Cond: ("replace" = false)
>>>             ->  Index Scan using pk_serials on serials s
>>>                           (cost=0.00..3.51 rows=1 width=4)
>>>                   Index Cond: (s.serial_id = "outer".serial_id)
>>>                   Filter: ((article_no = $0) AND (delivery_id = $1))
>>> ---------------------------------------------------------------
>>>  in the EXPLAIN result.
>>>  For those who wonder why I do this FROM (SELECT...). I was
>>> searching  for
>>> a way to use the result of an subselect for multiple calculations in the
>>> SELECT clause and return that calculation results as individual columns.
>>>  I tested a bit further and found out that PG behaves the same in case q
>>> is a view. This makes me wonder how efficient the optimizer can work
>>> with views - or even worse - nested views.
>>>  Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
>>>   Thanks in advance,
>>> Hannes Dorbath
>>
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

Re: Query Optimizer Failure / Possible Bug

From
Hannes Dorbath
Date:
Some people on the #postgresql irc channel pointed out that it's a known
issue.

http://www.qaix.com/postgresql-database-development/246-557-select-based-on-function-result-read.shtml

A more simple testcase is below. Adding OFFSET 0 to the inner query does
indeed fix it in my case.


SELECT
   tmp.user_id AS foo,
   tmp.user_id AS bar,
   tmp.user_id AS baz
FROM
   (
     SELECT
       u.user_id
     FROM
       users u
   ) AS tmp;



Seq Scan on users  (cost=0.00..1.53 rows=53 width=4) (actual
time=0.230..0.233 rows=1 loops=1)
Total runtime: 0.272 ms


---------------------------


SELECT
   tmp.user_id AS foo,
   tmp.user_id AS bar,
   tmp.user_id AS baz
FROM
   (
     SELECT
       (SELECT 1) AS user_id
     FROM
       users u
   ) AS tmp;



   Seq Scan on users u  (cost=0.03..1.56 rows=53 width=0) (actual
time=0.216..0.219 rows=1 loops=1)
   InitPlan
     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.004..0.006 rows=1 loops=1)
     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.002..0.004 rows=1 loops=1)
     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.002..0.003 rows=1 loops=1)
Total runtime: 0.270 ms


---------------------------


SELECT
   tmp.user_id AS foo,
   tmp.user_id AS bar,
   tmp.user_id AS baz
FROM
   (
     SELECT
       (SELECT 1) AS user_id
     FROM
       users u
     OFFSET 0
   ) AS tmp;


Subquery Scan tmp  (cost=0.01..1.03 rows=1 width=4) (actual
time=0.032..0.042 rows=1 loops=1)
   ->  Limit  (cost=0.01..1.02 rows=1 width=0) (actual time=0.026..0.033
rows=1 loops=1)
         InitPlan
           ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual
time=0.003..0.004 rows=1 loops=1)
         ->  Seq Scan on users u  (cost=0.00..1.01 rows=1 width=0)
(actual time=0.022..0.027 rows=1 loops=1)
Total runtime: 0.090 ms




On 04.04.2005 17:18, Hannes Dorbath wrote:
> Mhh. I have no clue about the internals of PostgreSQL and query planing,
> but to me as user this should really be a thing the optimizer has to
> work out..
>
>
> On 03.04.2005 10:01, PFC wrote:
>
>>
>>     Noticed this problem,too.
>>     You can always make the calculation you want done once inside a
>> set  returning function so it'll behave like a table, but that's ugly.
>>
>> On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath
>> <> wrote:
>>
>>> hm, a few days and not a single reply :|
>>>
>>> any more information needed? test data? simplified test case? anything?
>>>
>>>
>>> thanks
>>>
>>>
>>> Hannes Dorbath wrote:
>>>
>>>> The query and the corresponding EXPLAIN is at
>>>>  http://hannes.imos.net/query.txt
>>>>  I'd like to use the column q.replaced_serials for multiple
>>>> calculations
>>>> in the SELECT clause, but every time it is referenced there in some way
>>>> the whole query in the FROM clause returning q is executed again.
>>>>  This doesn't make sense to me at all and eats performance.
>>>>  If this wasn't clear enough, for every
>>>>  q.replaced_serials <insert_random_calculation> AS some_column
>>>>  in the SELECT clause there is new block of
>>>>  ---------------------------------------------------------------
>>>> ->  Aggregate  (cost=884.23..884.23 rows=1 width=0)
>>>>       ->  Nested Loop  (cost=0.00..884.23 rows=1 width=0)
>>>>             ->  Index Scan using ix_rma_ticket_serials_replace on
>>>>                         rma_ticket_serials rts  (cost=0.00..122.35
>>>>                         rows=190 width=4)
>>>>                   Index Cond: ("replace" = false)
>>>>             ->  Index Scan using pk_serials on serials s
>>>>                           (cost=0.00..3.51 rows=1 width=4)
>>>>                   Index Cond: (s.serial_id = "outer".serial_id)
>>>>                   Filter: ((article_no = $0) AND (delivery_id = $1))
>>>> ---------------------------------------------------------------
>>>>  in the EXPLAIN result.
>>>>  For those who wonder why I do this FROM (SELECT...). I was
>>>> searching  for
>>>> a way to use the result of an subselect for multiple calculations in
>>>> the
>>>> SELECT clause and return that calculation results as individual
>>>> columns.
>>>>  I tested a bit further and found out that PG behaves the same in
>>>> case q
>>>> is a view. This makes me wonder how efficient the optimizer can work
>>>> with views - or even worse - nested views.
>>>>  Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
>>>>   Thanks in advance,
>>>> Hannes Dorbath
>>>
>>>
>>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>>               http://www.postgresql.org/docs/faq
>>