Re: Join the same row - Mailing list pgsql-performance

From Edison Azzi
Subject Re: Join the same row
Date
Msg-id 43972DB8.90400@terra.com.br
Whole thread Raw
In response to Re: Join the same row  (Richard Huxton <dev@archonet.com>)
Responses Re: Join the same row
List pgsql-performance
Richard Huxton escreveu:

> Edison Azzi wrote:
>
>> Hi,
>>
>> I´m trying to optimize some selects between 2 tables and the best way
>> I found was
>> alter the first table and add the fields of the 2nd table. I adjusted
>> the contents and
>> now a have only one table with all info that I need. Now resides my
>> problem, because
>> of legacy queries I decided to make a Rule that replace the 2nd table.
>>
>> Until now all worked well, but I found when I make a join between de
>> result
>> table and de Rule, even tought is the same row in the same table, the
>> optimizer
>> generete two access for the same row:
>> cta_pag is the table and ctapag_adm is the  rule.
>>
>> CREATE OR REPLACE RULE "_RETURN" AS
>>    ON SELECT TO ctapag_adm DO INSTEAD  SELECT cta_pag.nrlancto,
>> cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai
>>   FROM cta_pag
>>  WHERE cta_pag.origem = 'A'::bpchar;
>>
>> This is one of the legacy queries:
>>
>> select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto
>> and p.nrlancto = 21861;
>
>
> OK - and you get a self-join (which is what you asked for, but you'd
> like the planner to notice that it might not be necessary).
>
>>    Resulting in twice the time for accessing.
>>
>> Acessing just on time the same row:
>>
>> select * from cta_pag p where  p.nrlancto = 21861
>
>
> This isn't the same query though. Your rule has an additional
> condition origem='A'. This means it wouldn't be correct to eliminate
> the self-join even if the planner could.
>
>>    Is there a way to force the optimizer to understand that is the
>> same row?
>
>
> However, even if you removed the condition on origem, I don't think
> the planner will notice that it can eliminate the join. It's just too
> unusual a case for the planner to have a rule for it.
>
> I might be wrong about the planner - I'm just another user. One of the
> developers may correct me.


You are rigth, the planner will not eliminate the join, see:

select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and
p.nrlancto = 21861;

EXPLAIN:
Nested Loop  (cost=0.00..11.48 rows=1 width=816)
  ->  Index Scan using cta_pag_pk on cta_pag a  (cost=0.00..5.74 rows=1
width=408)
        Index Cond: (21861::numeric = nrlancto)
  ->  Index Scan using cta_pag_pk on cta_pag p  (cost=0.00..5.74 rows=1
width=408)
        Index Cond: (nrlancto = 21861::numeric)


I know that this is too unusual case, but I hoped that the planner could
deal
with this condition. I´m trying to speed up without have to rewrite a
bunch of
queries. Now I'll have to think another way to work around this issue.

Thanks,

    Edison.




pgsql-performance by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: High context switches occurring
Next
From: Pål Stenslet
Date:
Subject: Should Oracle outperform PostgreSQL on a complex multidimensional query?