Thread: Join the same row
Hi, I´m trying to optimize some selects between 2 tables and the best way I found was alter the first table and add the fields of the 2nd table. I adjusted the contents and now a have only one table with all info that I need. Now resides my problem, because of legacy queries I decided to make a Rule that replace the 2nd table. Until now all worked well, but I found when I make a join between de result table and de Rule, even tought is the same row in the same table, the optimizer generete two access for the same row: cta_pag is the table and ctapag_adm is the rule. CREATE OR REPLACE RULE "_RETURN" AS ON SELECT TO ctapag_adm DO INSTEAD SELECT cta_pag.nrlancto, cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai FROM cta_pag WHERE cta_pag.origem = 'A'::bpchar; This is one of the legacy queries: select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto and p.nrlancto = 21861; EXPLAIN: Nested Loop (cost=0.00..11.49 rows=1 width=443) (actual time=0.081..0.088 rows=1 loops=1) -> Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1 width=408) (actual time=0.044..0.046 rows=1 loops=1) Index Cond: (nrlancto = 21861::numeric) -> Index Scan using cta_pag_pk on cta_pag (cost=0.00..5.74 rows=1 width=35) (actual time=0.023..0.025 rows=1 loops=1) Index Cond: (21861::numeric = nrlancto) Filter: (origem = 'A'::bpchar) Total runtime: 0.341 ms Resulting in twice the time for accessing. Acessing just on time the same row: select * from cta_pag p where p.nrlancto = 21861 EXPLAIN: Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1 width=408) (actual time=0.044..0.047 rows=1 loops=1) Index Cond: (nrlancto = 21861::numeric) Total runtime: 0.161 ms Is there a way to force the optimizer to understand that is the same row? Thanks, Edison -- Edison Azzi <edisonazzi (at ) terra ( dot ) com ( dot ) br>
Edison Azzi wrote: > Hi, > > I´m trying to optimize some selects between 2 tables and the best way I > found was > alter the first table and add the fields of the 2nd table. I adjusted > the contents and > now a have only one table with all info that I need. Now resides my > problem, because > of legacy queries I decided to make a Rule that replace the 2nd table. > > Until now all worked well, but I found when I make a join between de result > table and de Rule, even tought is the same row in the same table, the > optimizer > generete two access for the same row: > cta_pag is the table and ctapag_adm is the rule. > > CREATE OR REPLACE RULE "_RETURN" AS > ON SELECT TO ctapag_adm DO INSTEAD SELECT cta_pag.nrlancto, > cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai > FROM cta_pag > WHERE cta_pag.origem = 'A'::bpchar; > > This is one of the legacy queries: > > select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto and > p.nrlancto = 21861; OK - and you get a self-join (which is what you asked for, but you'd like the planner to notice that it might not be necessary). > Resulting in twice the time for accessing. > > Acessing just on time the same row: > > select * from cta_pag p where p.nrlancto = 21861 This isn't the same query though. Your rule has an additional condition origem='A'. This means it wouldn't be correct to eliminate the self-join even if the planner could. > Is there a way to force the optimizer to understand that is the same > row? However, even if you removed the condition on origem, I don't think the planner will notice that it can eliminate the join. It's just too unusual a case for the planner to have a rule for it. I might be wrong about the planner - I'm just another user. One of the developers may correct me. -- Richard Huxton Archonet Ltd
Edison Azzi wrote: > Richard Huxton escreveu: >> However, even if you removed the condition on origem, I don't think >> the planner will notice that it can eliminate the join. It's just too >> unusual a case for the planner to have a rule for it. >> >> I might be wrong about the planner - I'm just another user. One of the >> developers may correct me. > > > You are rigth, the planner will not eliminate the join, see: > > select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and > p.nrlancto = 21861; > > EXPLAIN: > Nested Loop (cost=0.00..11.48 rows=1 width=816) > -> Index Scan using cta_pag_pk on cta_pag a (cost=0.00..5.74 rows=1 > width=408) > Index Cond: (21861::numeric = nrlancto) > -> Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1 > width=408) > Index Cond: (nrlancto = 21861::numeric) > > > I know that this is too unusual case, but I hoped that the planner could > deal > with this condition. I´m trying to speed up without have to rewrite a > bunch of > queries. Now I'll have to think another way to work around this issue. Is the performance really so bad? All the data is guaranteed to be cached for the second index-scan. -- Richard Huxton Archonet Ltd
Richard Huxton escreveu: > Edison Azzi wrote: > >> Hi, >> >> I´m trying to optimize some selects between 2 tables and the best way >> I found was >> alter the first table and add the fields of the 2nd table. I adjusted >> the contents and >> now a have only one table with all info that I need. Now resides my >> problem, because >> of legacy queries I decided to make a Rule that replace the 2nd table. >> >> Until now all worked well, but I found when I make a join between de >> result >> table and de Rule, even tought is the same row in the same table, the >> optimizer >> generete two access for the same row: >> cta_pag is the table and ctapag_adm is the rule. >> >> CREATE OR REPLACE RULE "_RETURN" AS >> ON SELECT TO ctapag_adm DO INSTEAD SELECT cta_pag.nrlancto, >> cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai >> FROM cta_pag >> WHERE cta_pag.origem = 'A'::bpchar; >> >> This is one of the legacy queries: >> >> select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto >> and p.nrlancto = 21861; > > > OK - and you get a self-join (which is what you asked for, but you'd > like the planner to notice that it might not be necessary). > >> Resulting in twice the time for accessing. >> >> Acessing just on time the same row: >> >> select * from cta_pag p where p.nrlancto = 21861 > > > This isn't the same query though. Your rule has an additional > condition origem='A'. This means it wouldn't be correct to eliminate > the self-join even if the planner could. > >> Is there a way to force the optimizer to understand that is the >> same row? > > > However, even if you removed the condition on origem, I don't think > the planner will notice that it can eliminate the join. It's just too > unusual a case for the planner to have a rule for it. > > I might be wrong about the planner - I'm just another user. One of the > developers may correct me. You are rigth, the planner will not eliminate the join, see: select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and p.nrlancto = 21861; EXPLAIN: Nested Loop (cost=0.00..11.48 rows=1 width=816) -> Index Scan using cta_pag_pk on cta_pag a (cost=0.00..5.74 rows=1 width=408) Index Cond: (21861::numeric = nrlancto) -> Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1 width=408) Index Cond: (nrlancto = 21861::numeric) I know that this is too unusual case, but I hoped that the planner could deal with this condition. I´m trying to speed up without have to rewrite a bunch of queries. Now I'll have to think another way to work around this issue. Thanks, Edison.
Edison Azzi <edisonazzi@terra.com.br> writes: > You are rigth, the planner will not eliminate the join, see: > select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and > p.nrlancto = 21861; > EXPLAIN: > Nested Loop (cost=0.00..11.48 rows=1 width=816) > -> Index Scan using cta_pag_pk on cta_pag a (cost=0.00..5.74 rows=1 > width=408) > Index Cond: (21861::numeric = nrlancto) > -> Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1 > width=408) > Index Cond: (nrlancto = 21861::numeric) But do you care? That second fetch of the same row isn't going to cost much of anything, since everything it needs to touch will have been sucked into cache already. I don't really see the case for adding logic to the planner to detect this particular flavor of badly-written query. Notice that the planner *is* managing to propagate the constant comparison to both relations. regards, tom lane