Re: full join in view - Mailing list pgsql-sql

From Tambet Matiisen
Subject Re: full join in view
Date
Msg-id 81132473206F3A46A72BD6116E1A06AE1B14C2@black.aprote.com
Whole thread Raw
In response to full join in view  ("Tambet Matiisen" <t.matiisen@aprote.ee>)
Responses Re: full join in view  (jasiek@klaster.net)
List pgsql-sql
First I would like to say, that I'm quite amazed. You even guessed table names right! :) I did not expect such an
in-depthanalysis in such a short time. Thanks, Tomasz! 

> -----Original Message-----
> From: Tomasz Myrta [mailto:jasiek@klaster.net]
> Sent: Tuesday, January 14, 2003 11:51 AM
> To: Tambet Matiisen
> Cc: pgsql-sql@postgresql.org
> Subject: Re: [SQL] full join in view
>
>
> Tambet Matiisen wrote:
>
> > You are right. After disabling seq_scan, it uses indexes
> just as you
> > described. Unfortunately my view happens to use subquery:
>
> Don't disable seq_scan - sometimes it is better than indexscan.

I just did it for testing and for one session. I never disable it on production server.

...
>
> There is one more problem in your query - coalesce, which possibly
> disables any indexing in your view. Try to rewrite your view
> - subquery
> shouldn't return dor_kst_id and dor_mat_id null.
>
Coalesce did not pose any problems. Unless I tried to filter using one of the coalesce-fields, which does not use
indecesof course. 

> Is dor_kst_id the same as kdt_kst_id and as mat_id? After
> some database
> practicing I found, that using the same name in all tables is
> much more
> comfortably
>
This way I can refer most columns without prefixing them with table alias. But it's anyway good habit to use table
aliases,so this is not that important. I think in next project I try it in your way. 

> For each material (materjalid) and koostud (koostud) you want to find
> some current value (koostude_detaild) and compare it to some sum
> (documentid...)?
> I'm not sure if I understand well your view, but here is my
> version of
> this view - without subquery:
>

I tried to save few scans by not including "koostud" and "materjalid" in my original query. Based on yours, I created a
newversion: 

CREATE OR REPLACE VIEW v_tegelikud_kulud2 AS
SELECT   kst.kst_id,   mat.mat_id,   max(kdt.detaili_nr) AS detaili_nr,   max(kdt.arv) AS arv,   max(kdt.kulu) AS kulu,
 max(kdt.yhik) AS yhik,   max(kdt.koefitsent) AS koefitsent,   max(kdt.eeldatav_hind) AS eeldatav_hind,   sum(dor.kogus
*dor.koefitsent::numeric) AS kogus,   sum(dor.kokku) AS kokku 
FROM koostud kst
CROSS JOIN materjalid mat
LEFT JOIN koostude_detailid kdt   ON kst.kst_id = kdt.kdt_kst_id AND mat.mat_id = kdt.kdt_mat_id
LEFT JOIN dokumentide_read dor   ON kst.kst_id = dor.dor_kst_id AND mat.mat_id = dor.dor_mat_id AND EXISTS       (
SELECT 1       FROM dokumendid dok        WHERE dor.dor_dok_id = dok.dok_id AND dok.tyyp = 30 AND dok.kinnitaja IS NOT
NULL      ) 
WHERE kdt.kdt_id IS NOT NULL AND dor.dor_id IS NOT NULL
GROUP BY kst.kst_id, mat.mat_id;

I do a cross join between "koostud" and "materjalid", because "FROM koostud kst, materjalid mat" gave me syntax errors.
AlsoI had to move "dokumendid" table to EXISTS subquery, to get equivalent results with the original query. Just LEFT
JOIN-ingit is not enough and subquery troubled optimizer. There is also a WHERE condition to show only those materials,
thatappear in one of the tables "koostude_detailid" or "dokumentide_read". Here is the execution plan: 

explain select count(1) from v_tegelikud_kulud2 where kst_id = 1125;
    QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=92.60..92.60 rows=1 width=107)  ->  Subquery Scan v_tegelikud_kulud2  (cost=69.58..92.58 rows=8 width=107)
-> Aggregate  (cost=69.58..92.58 rows=8 width=107)              ->  Group  (cost=69.58..91.00 rows=79 width=107)
           ->  Merge Join  (cost=69.58..90.61 rows=79 width=107)                          Merge Cond: (("outer".kst_id
="inner".dor_kst_id) AND ("outer".mat_id = "inner".dor_mat_id))                          Join Filter: (subplan)
                Filter: ("inner".dor_id IS NOT NULL)                          ->  Merge Join  (cost=17.54..18.52
rows=79width=66)                                Merge Cond: (("outer".kst_id = "inner".kdt_kst_id) AND ("outer".mat_id
="inner".kdt_mat_id))                                Filter: ("inner".kdt_id IS NOT NULL)
->  Sort  (cost=10.56..10.76 rows=79 width=8)                                      Sort Key: kst.kst_id, mat.mat_id
                                ->  Nested Loop  (cost=0.00..8.07 rows=79 width=8)
     ->  Index Scan using kst_pk on koostud kst  (cost=0.00..4.49 rows=1 width=4)
          Index Cond: (kst_id = 1125)                                            ->  Seq Scan on materjalid mat
(cost=0.00..2.79rows=79 width=4)                                ->  Sort  (cost=6.98..7.27 rows=113 width=58)
                          Sort Key: kdt.kdt_kst_id, kdt.kdt_mat_id                                      ->  Seq Scan on
koostude_detailidkdt  (cost=0.00..3.13 rows=113 width=58)                          ->  Sort  (cost=52.03..53.93
rows=761width=41)                                Sort Key: dor.dor_kst_id, dor.dor_mat_id
->  Seq Scan on dokumentide_read dor  (cost=0.00..15.61 rows=761 width=41)                          SubPlan
              ->  Index Scan using dok_pk on dokumendid dok  (cost=0.00..3.47 rows=1 width=0)
      Index Cond: ($0 = dok_id)                                  Filter: ((tyyp = 30) AND (kinnitaja IS NOT NULL)) 
(27 rows)

But there are still few things that worry me:
1. Cross join between koostud and materjalid. Table "materjalid" may have up to 10000 rows and only 20-30 of them are
actuallyneeded. 
2. Indeces on "koostude_detailid" and "dokumentide_read" are not used. Probably my tables do not contain enough rows.
MaybeI should generate more test data first. 
3. The cost of this query is twice as big, as my original query. It seems to me, that SubPlan is causing this. I tried
tomove it to subquery, but then the optimizer chose a totally different execution plan and seemingly was not able to
useindeces of "dokumentide_read" table. The version with subquery: 

CREATE OR REPLACE VIEW v_tegelikud_kulud3 AS
SELECT   kst.kst_id,   mat.mat_id,   max(kdt.detaili_nr) AS detaili_nr,   max(kdt.arv) AS arv,   max(kdt.kulu) AS kulu,
 max(kdt.yhik) AS yhik,   max(kdt.koefitsent) AS koefitsent,   max(kdt.eeldatav_hind) AS eeldatav_hind,   sum(dor.kogus
*dor.koefitsent::numeric) AS kogus,   sum(dor.kokku) AS kokku 
FROM koostud kst
CROSS JOIN materjalid mat
LEFT JOIN koostude_detailid kdt   ON kst.kst_id = kdt.kdt_kst_id AND mat.mat_id = kdt.kdt_mat_id
LEFT JOIN    (   SELECT dor.*   FROM dokumentide_read dor   JOIN dokumendid dok ON dor.dor_dok_id = dok.dok_id    WHERE
dok.tyyp= 30 AND dok.kinnitaja IS NOT NULL   ) dor   ON kst.kst_id = dor.dor_kst_id AND mat.mat_id = dor.dor_mat_id 
WHERE kdt.kdt_id IS NOT NULL AND dor.dor_id IS NOT NULL
GROUP BY kst.kst_id, mat.mat_id;

And execution plan:

hekotek=# explain select count(1) from v_tegelikud_kulud3 where kst_id = 1125;
                  QUERY PLAN 

---------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=56.52..56.52 rows=1 width=111)  ->  Subquery Scan v_tegelikud_kulud3  (cost=53.31..56.50 rows=8 width=111)
-> Aggregate  (cost=53.31..56.50 rows=8 width=111)              ->  Group  (cost=53.31..54.92 rows=79 width=111)
           ->  Merge Join  (cost=53.31..54.53 rows=79 width=111)                          Merge Cond: (("outer".kst_id
="inner".dor_kst_id) AND ("outer".mat_id = "inner".dor_mat_id))                          Filter: ("inner".dor_id IS NOT
NULL)                         ->  Merge Join  (cost=17.53..18.50 rows=79 width=66)                                Merge
Cond:(("outer".kst_id = "inner".kdt_kst_id) AND ("outer".mat_id = "inner".kdt_mat_id))
Filter:("inner".kdt_id IS NOT NULL)                                ->  Sort  (cost=10.54..10.74 rows=79 width=8)
                             Sort Key: kst.kst_id, mat.mat_id                                      ->  Nested Loop
(cost=0.00..8.05rows=79 width=8)                                            ->  Index Scan using kst_pk on koostud kst
(cost=0.00..4.47rows=1 width=4)                                                  Index Cond: (kst_id = 1125)
                               ->  Seq Scan on materjalid mat  (cost=0.00..2.79 rows=79 width=4)
       ->  Sort  (cost=6.98..7.27 rows=113 width=58)                                      Sort Key: kdt.kdt_kst_id,
kdt.kdt_mat_id                                     ->  Seq Scan on koostude_detailid kdt  (cost=0.00..3.13 rows=113
width=58)                         ->  Sort  (cost=35.78..35.99 rows=84 width=45)                                Sort
Key:dor.dor_kst_id, dor.dor_mat_id                                ->  Merge Join  (cost=9.04..33.09 rows=84 width=45)
                                  Merge Cond: ("outer".dor_dok_id = "inner".dok_id)
->  Index Scan using dor_dok_fk_i on dokumentide_read dor  (cost=0.00..20.91 rows=761 width=41)
            ->  Sort  (cost=9.04..9.13 rows=36 width=4)                                            Sort Key: dok.dok_id
                                          ->  Seq Scan on dokumendid dok  (cost=0.00..8.10 rows=36 width=4)
                                    Filter: ((tyyp = 30) AND (kinnitaja IS NOT NULL)) 
(28 rows)

Uh.. Mails are getting very lengthy.
 Tambet


pgsql-sql by date:

Previous
From: "Pedro Igor"
Date:
Subject: Data between different databases
Next
From: "mdc@keko.com.ar"
Date:
Subject: database broken ?