Thread: To use a VIEW or not to use a View.....

To use a VIEW or not to use a View.....

From

"Ries van Twisk"

Date:

22 January 2003, 10:13:11

Dear PostgreSQL users,

I have a view and a table,

I understand that when a frontend accesses a VIEW that PostgreSQL cannot use
a index on that view.
For example when I do this: SELECT * FROM full_cablelist WHERE
projectocode=5; Correct?

Now I just want to make sure for myself if the VIEW I created is the right
way to go, or is it better
to contruct a SQL in my application that looks like the view and send it to
postgreSQL so it will
use all indexes correctly. I use postgreSQL 7.2.1

I beliefe there is a change in postgreSQL 7.3.x on which I can cache a
view??? Not sure what the issue was.

I ask this because I expect pore performance in feature when the cablelist
table holds up to around 20.000 rows.
Each query to full_cablelist will return around 1200 rows.

best regards,
Ries van Twisk



-- CABLE LIST
CREATE TABLE cablelist (   id             SERIAL,   cableno        VARCHAR(8),   projectcodeid    INTEGER CONSTRAINT
cablelist_projectcodes_conNOT NULL
 
REFERENCES projectcodes(id) ON DELETE CASCADE,   fromconnid        INTEGER CONSTRAINT cablelist_fromconnid_con NOT
NULL
REFERENCES libconnections(id) ON DELETE CASCADE,   toconnid        INTEGER CONSTRAINT cablelist_toconnid_con NOT NULL
REFERENCES
libconnections(id) ON DELETE CASCADE,   fromshiplocationid    INTEGER CONSTRAINT cablelist_fromshiplocationid_con
NOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE,   toshiplocationid    INTEGER CONSTRAINT
cablelist_toshiplocationid_conNOT
 
NULL REFERENCES shiplocations(id) ON DELETE CASCADE,   marktypesid        INTEGER CONSTRAINT cablelist_tomarktypeid_con
NOTNULL
 
REFERENCES marktypes(id) ON DELETE CASCADE,   cabletypeid        INTEGER CONSTRAINT cablelist_cabletypeid_con NOT NULL
REFERENCES cabletypes(id) ON DELETE CASCADE,   cut            BOOLEAN DEFAULT 'false' NOT NULL,   placed        BOOLEAN
DEFAULT'false' NOT NULL,   ok            BOOLEAN DEFAULT 'false'
 
);


-- CABLE LIST VIEW
CREATE VIEW full_cablelist AS
SELECT cl.id,
cl.cableno AS cableno,
pc.projectcode AS projectcode,
pc.id AS projectcodeid,
lcf.name AS fconnection, lct.name AS tconnection,
lif.name AS fitem, lit.name AS titem,
slf.rib AS frib,slt.rib AS trib,
slf.name AS fname, slt.name AS tname,
ct.cabletype AS cabletype, ct.coretype AS coretype,
cl.cut,
cl.placed,
cl.ok

FROM cablelist AS cl,
libconnections AS lcf, libconnections AS lct,
libitems AS lif, libitems AS lit,
shiplocations AS slf, shiplocations AS slt,
projectcodes AS pc,
cabletypes AS ct

WHERE
pc.id=cl.projectcodeid AND
lcf.id=cl.fromconnid AND lct.id=cl.toconnid AND
lif.id=lcf.libitemid AND lit.id=lct.libitemid AND
slf.id=cl.fromshiplocationid AND slt.id=cl.toshiplocationid AND
ct.id=cl.cabletypeid

Re: To use a VIEW or not to use a View.....

From

Bruno Wolff III

Date:

22 January 2003, 10:36:02

On Wed, Jan 22, 2003 at 16:12:52 +0100, Ries van Twisk <ries@jongert.nl> wrote:
> Dear PostgreSQL users,
> 
> I understand that when a frontend accesses a VIEW that PostgreSQL cannot use
> a index on that view.
> For example when I do this: SELECT * FROM full_cablelist WHERE
> projectocode=5; Correct?

For the most part views work like macros for selects and indexes should be
usable.
You can test this yourself using EXPLAIN to compare plans both using and
not using a view on a table.

Re: To use a VIEW or not to use a View.....

From

Jan Wieck

Date:

22 January 2003, 11:00:51

Ries van Twisk wrote:
> 
> Dear PostgreSQL users,
> 
> I have a view and a table,
> 
> I understand that when a frontend accesses a VIEW that PostgreSQL cannot use
> a index on that view.
> For example when I do this: SELECT * FROM full_cablelist WHERE
> projectocode=5; Correct?

No. 

> 
> Now I just want to make sure for myself if the VIEW I created is the right
> way to go, or is it better
> to contruct a SQL in my application that looks like the view and send it to
> postgreSQL so it will
> use all indexes correctly. I use postgreSQL 7.2.1

Views in PostgreSQL aren't materialized. They are implemented as query
rewrite rules that combine your query with the view definition. This is
done before planning and optimizing, so what the query planner is
chewing on (the internal parsetree representation of a query) is the
same as if your application would have sent down the complicated query
over the base tables. 

There are a few exceptions where an application could construct a better
WHERE clause, resulting in a different join order or better scan
qualifications. As long as we're not talking about gigabytes here, you
shouldn't worry.

Use tables, views and views over views, it's all fine and your indexes
will be used.

Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: To use a VIEW or not to use a View.....

From

Tomasz Myrta

Date:

22 January 2003, 11:19:24

Ries van Twisk wrote:

>Dear PostgreSQL users,
>
>I have a view and a table,
>
>I understand that when a frontend accesses a VIEW that PostgreSQL cannot use
>a index on that view.
>For example when I do this: SELECT * FROM full_cablelist WHERE
>projectocode=5; Correct?
>
>Now I just want to make sure for myself if the VIEW I created is the right
>way to go, or is it better
>to contruct a SQL in my application that looks like the view and send it to
>postgreSQL so it will
>use all indexes correctly. I use postgreSQL 7.2.1
>
>I beliefe there is a change in postgreSQL 7.3.x on which I can cache a
>view??? Not sure what the issue was.
>
>I ask this because I expect pore performance in feature when the cablelist
>table holds up to around 20.000 rows.
>Each query to full_cablelist will return around 1200 rows.
>
>best regards,
>Ries van Twisk
>
>
>
>-- CABLE LIST
>CREATE TABLE cablelist (
>    id             SERIAL,
>    cableno        VARCHAR(8),
>    projectcodeid    INTEGER CONSTRAINT cablelist_projectcodes_con NOT NULL
>REFERENCES projectcodes(id) ON DELETE CASCADE,
>    fromconnid        INTEGER CONSTRAINT cablelist_fromconnid_con NOT NULL
>REFERENCES libconnections(id) ON DELETE CASCADE,
>    toconnid        INTEGER CONSTRAINT cablelist_toconnid_con NOT NULL REFERENCES
>libconnections(id) ON DELETE CASCADE,
>    fromshiplocationid    INTEGER CONSTRAINT cablelist_fromshiplocationid_con
>NOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE,
>    toshiplocationid    INTEGER CONSTRAINT cablelist_toshiplocationid_con NOT
>NULL REFERENCES shiplocations(id) ON DELETE CASCADE,
>    marktypesid        INTEGER CONSTRAINT cablelist_tomarktypeid_con NOT NULL
>REFERENCES marktypes(id) ON DELETE CASCADE,
>    cabletypeid        INTEGER CONSTRAINT cablelist_cabletypeid_con NOT NULL
>REFERENCES cabletypes(id) ON DELETE CASCADE,
>    cut            BOOLEAN DEFAULT 'false' NOT NULL,
>    placed        BOOLEAN DEFAULT 'false' NOT NULL,
>    ok            BOOLEAN DEFAULT 'false'
>);
>
>
>-- CABLE LIST VIEW
>CREATE VIEW full_cablelist AS
>SELECT cl.id,
>cl.cableno AS cableno,
>pc.projectcode AS projectcode,
>pc.id AS projectcodeid,
>lcf.name AS fconnection, lct.name AS tconnection,
>lif.name AS fitem, lit.name AS titem,
>slf.rib AS frib,slt.rib AS trib,
>slf.name AS fname, slt.name AS tname,
>ct.cabletype AS cabletype, ct.coretype AS coretype,
>cl.cut,
>cl.placed,
>cl.ok
>
>FROM cablelist AS cl,
>libconnections AS lcf, libconnections AS lct,
>libitems AS lif, libitems AS lit,
>shiplocations AS slf, shiplocations AS slt,
>projectcodes AS pc,
>cabletypes AS ct
>
>WHERE
>pc.id=cl.projectcodeid AND
>lcf.id=cl.fromconnid AND lct.id=cl.toconnid AND
>lif.id=lcf.libitemid AND lit.id=lct.libitemid AND
>slf.id=cl.fromshiplocationid AND slt.id=cl.toshiplocationid AND
>ct.id=cl.cabletypeid

How can we help you with table indexing, if you didn't write anything
about indexes you have already created on your tables? Anyway you don't need
indexes on a view, but indexes on your tables. You need also a well constructed
view.

For your query:
- make sure, you have index on projectcodes(projectcode) - if you have many projectcodes and index on
cablelist(projectcodeid)
- make sure, you did "vacuum analyze" before you test your query.
- send result of "explain analyze SELECT * FROM full_cablelist WHERE
projectocode=5" to us.

Anyway result can't be too fast, because query returns >1000 rows which is rather
a big amount of data.

Regards,
Tomasz Myrta

Re: To use a VIEW or not to use a View.....

From

Stephan Szabo

Date:

22 January 2003, 11:21:19

On Wed, 22 Jan 2003, Ries van Twisk wrote:

> Dear PostgreSQL users,
>
> I have a view and a table,
>
> I understand that when a frontend accesses a VIEW that PostgreSQL cannot use
> a index on that view.
> For example when I do this: SELECT * FROM full_cablelist WHERE
> projectocode=5; Correct?

In general, no.  There are some exceptions, for example views using
EXCEPT I believe will not push conditions down.  In 7.2.x, views using
any of the set ops (INTERSECT, UNION, EXCEPT) wouldn't push conditions
down.  There are a few other such conditions, but for your view, I think
this isn't going to be an issue.

Re: To use a VIEW or not to use a View.....

From

Tomasz Myrta

Date:

22 January 2003, 11:52:28

Jan Wieck wrote:

>Use tables, views and views over views, it's all fine and your indexes
>will be used.

I can't agree with using views over views. It has some limitations.
I asked about it on this list several months ago
and Tom Lane's conclusion was:


>> Tomasz Myrta <jasiek@klaster.net> writes:
>> I'd like to split queries into views, but I can't join them - planner 
>> search all of records instead of using index. It works very slow.


I think this is the same issue that Stephan identified in his response
to your other posting ("sub-select with aggregate").  When you writeFROM x join y using (col) WHERE x.col = const
the WHERE-restriction is only applied to x.  I'm afraid you'll need
to writeFROM x join y using (col) WHERE x.col = const AND y.col = const
Ideally you should be able to write justFROM x join y using (col) WHERE col = const
but I think that will be taken the same as "x.col = const" :-(
        regards, tom lane


I don't know if anything changed on 7.3.

Regards,
Tomasz Myrta

Re: To use a VIEW or not to use a View.....

From

Stephan Szabo

Date:

22 January 2003, 12:33:20

On Wed, 22 Jan 2003, Tomasz Myrta wrote:

> >> Tomasz Myrta <jasiek@klaster.net> writes:
> >> I'd like to split queries into views, but I can't join them - planner
> >> search all of records instead of using index. It works very slow.
>
>
> I think this is the same issue that Stephan identified in his response
> to your other posting ("sub-select with aggregate").  When you write
>     FROM x join y using (col) WHERE x.col = const
> the WHERE-restriction is only applied to x.  I'm afraid you'll need
> to write
>     FROM x join y using (col) WHERE x.col = const AND y.col = const
> Ideally you should be able to write just
>     FROM x join y using (col) WHERE col = const
> but I think that will be taken the same as "x.col = const" :-(


> I don't know if anything changed on 7.3.

I don't think so, but this is a general transitivity constraint AFAIK, not
one actually to do with views (ie, if you wrote out the query without a
view, you can run into the same issue).  It's somewhat easier to run into
the case with views and the effect may be exasperated by views, but it's
a general condition.

For example:
create table a(a int);
create table c(a int);

sszabo=# explain select * from a join c using (a) where a=3;                        QUERY PLAN
-------------------------------------------------------------Hash Join  (cost=1.01..26.08 rows=6 width=8)  Hash Cond:
("outer".a= "inner".a)  ->  Seq Scan on c  (cost=0.00..20.00 rows=1000 width=4)  ->  Hash  (cost=1.01..1.01 rows=1
width=4)       ->  Seq Scan on a  (cost=0.00..1.01 rows=1 width=4)              Filter: (a = 3)
 
(6 rows)

The filter is applied only to a.  So, if you really wanted the
c.a=3 condition to be applied for whatever reason you're out of
luck.

Re: To use a VIEW or not to use a View.....

From

Jan Wieck

Date:

22 January 2003, 12:53:26

Tomasz Myrta wrote:
> 
> Jan Wieck wrote:
> 
> >Use tables, views and views over views, it's all fine and your indexes
> >will be used.
> 
> I can't agree with using views over views. It has some limitations.
> I asked about it on this list several months ago
> and Tom Lane's conclusion was:

It has less to do with the nesting level of rewriting, than with what
you do with the view in general. If you cram up all the functionality
with aggregating subselects into one monster view it might do better,
except for maintaining the code.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: To use a VIEW or not to use a View.....

From

Tomasz Myrta

Date:

22 January 2003, 13:57:49

Stephan Szabo wrote:

>On Wed, 22 Jan 2003, Tomasz Myrta wrote:
>
>
>>>>Tomasz Myrta  writes:
>>>>I'd like to split queries into views, but I can't join them - planner
>>>>search all of records instead of using index. It works very slow.
>>
>>
>>I think this is the same issue that Stephan identified in his response
>>to your other posting ("sub-select with aggregate").  When you write
>>    FROM x join y using (col) WHERE x.col = const
>>the WHERE-restriction is only applied to x.  I'm afraid you'll need
>>to write
>>    FROM x join y using (col) WHERE x.col = const AND y.col = const
>>Ideally you should be able to write just
>>    FROM x join y using (col) WHERE col = const
>>but I think that will be taken the same as "x.col = const" :-(
>
>
>
>>I don't know if anything changed on 7.3.
>
>
>I don't think so, but this is a general transitivity constraint AFAIK, not
>one actually to do with views (ie, if you wrote out the query without a
>view, you can run into the same issue).  It's somewhat easier to run into
>the case with views and the effect may be exasperated by views, but it's
>a general condition.
>
>For example:
>create table a(a int);
>create table c(a int);
>
>sszabo=# explain select * from a join c using (a) where a=3;
>                         QUERY PLAN
>-------------------------------------------------------------
> Hash Join  (cost=1.01..26.08 rows=6 width=8)
>   Hash Cond: ("outer".a = "inner".a)
>   ->  Seq Scan on c  (cost=0.00..20.00 rows=1000 width=4)
>   ->  Hash  (cost=1.01..1.01 rows=1 width=4)
>         ->  Seq Scan on a  (cost=0.00..1.01 rows=1 width=4)
>               Filter: (a = 3)
>(6 rows)

I don't understand your idea.

explain select * from przystanki p join miasta m using (id_miasta) where field_id=100
Both tables are indexed on field id_miasta. They have enough rows to use indexes.

Nested Loop  (cost=0.00..9.48 rows=1 width=64) ->  Index Scan using ind_miasto_przyst on przystanki p  (cost=0.00..5.54
rows=1width=41) ->  Index Scan using miasta_pkey on miasta m  (cost=0.00..3.10 rows=1 width=23)
 

Tomasz Myrta

Re: To use a VIEW or not to use a View.....

From

Tomasz Myrta

Date:

22 January 2003, 14:10:37

Stephan Szabo wrote:

>On Wed, 22 Jan 2003, Tomasz Myrta wrote:
>
>
>>>>Tomasz Myrta  writes:
>>>>I'd like to split queries into views, but I can't join them - planner
>>>>search all of records instead of using index. It works very slow.
>>
>>
>>I think this is the same issue that Stephan identified in his response
>>to your other posting ("sub-select with aggregate").  When you write
>>    FROM x join y using (col) WHERE x.col = const
>>the WHERE-restriction is only applied to x.  I'm afraid you'll need
>>to write
>>    FROM x join y using (col) WHERE x.col = const AND y.col = const
>>Ideally you should be able to write just
>>    FROM x join y using (col) WHERE col = const
>>but I think that will be taken the same as "x.col = const" :-(
>
>
>
>>I don't know if anything changed on 7.3.
>
>
>I don't think so, but this is a general transitivity constraint AFAIK, not
>one actually to do with views (ie, if you wrote out the query without a
>view, you can run into the same issue).  It's somewhat easier to run into
>the case with views and the effect may be exasperated by views, but it's
>a general condition.
>
>For example:
>create table a(a int);
>create table c(a int);
>
>sszabo=# explain select * from a join c using (a) where a=3;
>                         QUERY PLAN
>-------------------------------------------------------------
> Hash Join  (cost=1.01..26.08 rows=6 width=8)
>   Hash Cond: ("outer".a = "inner".a)
>   ->  Seq Scan on c  (cost=0.00..20.00 rows=1000 width=4)
>   ->  Hash  (cost=1.01..1.01 rows=1 width=4)
>         ->  Seq Scan on a  (cost=0.00..1.01 rows=1 width=4)
>               Filter: (a = 3)
>(6 rows)
>
>The filter is applied only to a.  So, if you really wanted the
>c.a=3 condition to be applied for whatever reason you're out of
>luck.

Let's make some test:

First, let's create some simple view with 2 tables join:
drop view pm;
create view pm as select   id_przystanku,  m.nazwa from  przystanki p  join miasta m using (id_miasta);
explain select * from pm where id_przystanku=1230;
Nested Loop  (cost=0.00..6.26 rows=1 width=23) ->  Index Scan using przystanki_pkey on przystanki p  (cost=0.00..3.14
rows=1width=8) ->  Index Scan using miasta_pkey on miasta m  (cost=0.00..3.10 rows=1 width=15)
 


Next, let's try query using this view 2 times with explicit join:
explain select * from pm a join pm b using(id_przystanku) where id_przystanku=1230;
Hash Join  (cost=13.00..30.10 rows=1 width=46) ->  Hash Join  (cost=6.74..21.02 rows=374 width=23)       ->  Seq Scan
onprzystanki p  (cost=0.00..7.74 rows=374 width=8)       ->  Hash  (cost=5.99..5.99 rows=299 width=15)             ->
SeqScan on miasta m  (cost=0.00..5.99 rows=299 width=15) ->  Hash  (cost=6.26..6.26 rows=1 width=23)       ->  Nested
Loop (cost=0.00..6.26 rows=1 width=23)             ->  Index Scan using przystanki_pkey on przystanki p
(cost=0.00..3.14rows=1 width=8)             ->  Index Scan using miasta_pkey on miasta m  (cost=0.00..3.10 rows=1
width=15)

And now similiar view, but without nesting views:
drop view pm2;
create view pm2 asselect  id_przystanku,  m1.nazwa as nazwa1,  m2.nazwa as nazwa2from  przystanki p1  join miasta m1
using(id_miasta)  join przystanki p2 using (id_przystanku)  join miasta m2 on (m2.id_miasta=p2.id_miasta);
 

explain select * from pm2 where id_przystanku=1230;
Nested Loop  (cost=0.00..12.52 rows=1 width=46) ->  Nested Loop  (cost=0.00..9.41 rows=1 width=31)       ->  Nested
Loop (cost=0.00..6.26 rows=1 width=23)             ->  Index Scan using przystanki_pkey on przystanki p1
(cost=0.00..3.14rows=1 width=8)             ->  Index Scan using miasta_pkey on miasta m1  (cost=0.00..3.10 rows=1
width=15)      ->  Index Scan using przystanki_pkey on przystanki p2  (cost=0.00..3.14 rows=1 width=8) ->  Index Scan
usingmiasta_pkey on miasta m2  (cost=0.00..3.10 rows=1 width=15)
 


Regards,
Tomasz Myrta

Re: To use a VIEW or not to use a View.....

From

Stephan Szabo

Date:

22 January 2003, 16:50:51

On Wed, 22 Jan 2003, Tomasz Myrta wrote:

> Let's make some test:
>
> First, let's create some simple view with 2 tables join:
> drop view pm;
> create view pm as
>  select
>    id_przystanku,
>    m.nazwa
>  from
>    przystanki p
>    join miasta m using (id_miasta);
>
> explain select * from pm where id_przystanku=1230;
> Nested Loop  (cost=0.00..6.26 rows=1 width=23)
>   ->  Index Scan using przystanki_pkey on przystanki p  (cost=0.00..3.14 rows=1 width=8)
>   ->  Index Scan using miasta_pkey on miasta m  (cost=0.00..3.10 rows=1 width=15)
>
>
> Next, let's try query using this view 2 times with explicit join:
> explain select * from pm a join pm b using(id_przystanku) where id_przystanku=1230;
> Hash Join  (cost=13.00..30.10 rows=1 width=46)
>   ->  Hash Join  (cost=6.74..21.02 rows=374 width=23)
>         ->  Seq Scan on przystanki p  (cost=0.00..7.74 rows=374 width=8)
>         ->  Hash  (cost=5.99..5.99 rows=299 width=15)
>               ->  Seq Scan on miasta m  (cost=0.00..5.99 rows=299 width=15)
>   ->  Hash  (cost=6.26..6.26 rows=1 width=23)
>         ->  Nested Loop  (cost=0.00..6.26 rows=1 width=23)
>               ->  Index Scan using przystanki_pkey on przystanki p  (cost=0.00..3.14 rows=1 width=8)
>               ->  Index Scan using miasta_pkey on miasta m  (cost=0.00..3.10 rows=1 width=15)
>
> And now similiar view, but without nesting views:


> drop view pm2;
> create view pm2 as
>  select
>    id_przystanku,
>    m1.nazwa as nazwa1,
>    m2.nazwa as nazwa2
>  from
>    przystanki p1
>    join miasta m1 using (id_miasta)
>    join przystanki p2 using (id_przystanku)
>    join miasta m2 on (m2.id_miasta=p2.id_miasta);

That's not the same join for optimization purposes
since postgresql treats explicit join syntax as a
constraint on the ordering of joins.

The same join would be something like:

przystanki p1 join miasta m1 using (id_miasta)
join (przystanki p2 join miasta m2 using (id_miasta))using (id_przystanku)

minus the fact I think you'd need some explicit naming in
there.

Re: To use a VIEW or not to use a View.....

From

Tom Lane

Date:

22 January 2003, 17:23:27

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> ... but this is a general transitivity constraint AFAIK, not
> one actually to do with views (ie, if you wrote out the query without a
> view, you can run into the same issue).  It's somewhat easier to run into
> the case with views and the effect may be exasperated by views, but it's
> a general condition.

Right.  Views are just macros --- they don't in themselves affect the
planner's ability to generate a good plan.  But they make it easier to
generate baroque queries without thinking much about what you're doing,
and in complex queries the planner doesn't always make the deductions
and simplifications that are obvious to a human.

> For example:
> create table a(a int);
> create table c(a int);

> sszabo=# explain select * from a join c using (a) where a=3;
>                          QUERY PLAN
> -------------------------------------------------------------
>  Hash Join  (cost=1.01..26.08 rows=6 width=8)
>    Hash Cond: ("outer".a = "inner".a)
>    ->  Seq Scan on c  (cost=0.00..20.00 rows=1000 width=4)
>    ->  Hash  (cost=1.01..1.01 rows=1 width=4)
>          ->  Seq Scan on a  (cost=0.00..1.01 rows=1 width=4)
>                Filter: (a = 3)
> (6 rows)

> The filter is applied only to a.  So, if you really wanted the
> c.a=3 condition to be applied for whatever reason you're out of
> luck.

FWIW, CVS tip is brighter: the condition does propagate to both relations.
Hash Join  (cost=22.51..45.04 rows=1 width=8)  Hash Cond: ("outer".a = "inner".a)  ->  Seq Scan on a  (cost=0.00..22.50
rows=5width=4)        Filter: (a = 3)  ->  Hash  (cost=22.50..22.50 rows=5 width=4)        ->  Seq Scan on c
(cost=0.00..22.50rows=5 width=4)              Filter: (3 = a)

The reason this is useful is that (a) fewer rows need to be joined,
and (b) we may be able to make effective use of indexes on both tables.
        regards, tom lane

Re: To use a VIEW or not to use a View.....

From

Tomasz Myrta

Date:

22 January 2003, 17:32:02

Stephan Szabo wrote:

>That's not the same join for optimization purposes
>since postgresql treats explicit join syntax as a
>constraint on the ordering of joins.
>
>The same join would be something like:
>
>przystanki p1 join miasta m1 using (id_miasta)
>join (przystanki p2 join miasta m2 using (id_miasta))
> using (id_przystanku)
>
>minus the fact I think you'd need some explicit naming in
>there.

You are right.

The result of your query is:
explain select * from
przystanki p1 join miasta m1 using (id_miasta)
join (przystanki p2 join miasta m2 using (id_miasta))using (id_przystanku)
where id_przystanku=1230

Hash Join  (cost=13.00..30.10 rows=1 width=128) ->  Hash Join  (cost=6.74..21.02 rows=374 width=64)       ->  Seq Scan
onprzystanki p2  (cost=0.00..7.74 rows=374 width=41)       ->  Hash  (cost=5.99..5.99 rows=299 width=23)             ->
Seq Scan on miasta m2  (cost=0.00..5.99 rows=299 width=23) ->  Hash  (cost=6.26..6.26 rows=1 width=64)       ->  Nested
Loop (cost=0.00..6.26 rows=1 width=64)             ->  Index Scan using przystanki_pkey on przystanki p1
(cost=0.00..3.14rows=1 width=41)             ->  Index Scan using miasta_pkey on miasta m1  (cost=0.00..3.10 rows=1
width=23)

Anyway - is it possible to expose table "przystanki alias p2" to get valid result?

The problem is similiar to my problem "sub-select with aggregate" dated on 2002-10-23 
and the answer (which doesn't satisfy me) is the same:
if we pass static values to "przystanki p2 join miasta m2", the query will work ok:
explain select * from
przystanki p1 join miasta m1 using (id_miasta)
cross join (przystanki p2 join miasta m2 using (id_miasta)) X
where p1.id_przystanku=1230 and X.id_przystanku=1230

Nested Loop  (cost=0.00..12.52 rows=1 width=128) ->  Nested Loop  (cost=0.00..6.26 rows=1 width=64)       ->  Index
Scanusing przystanki_pkey on przystanki p1  (cost=0.00..3.14 rows=1 width=41)       ->  Index Scan using miasta_pkey on
miastam1  (cost=0.00..3.10 rows=1 width=23) ->  Materialize  (cost=6.26..6.26 rows=1 width=64)       ->  Nested Loop
(cost=0.00..6.26rows=1 width=64)             ->  Index Scan using przystanki_pkey on przystanki p2  (cost=0.00..3.14
rows=1width=41)             ->  Index Scan using miasta_pkey on miasta m2  (cost=0.00..3.10 rows=1 width=23)
 

Stephan - I have some problems with mail relay to you. 
Does my mail server have any open-relay problem, or something like this (213.25.37.66) ?

Regards,
Tomasz Myrta

Re: To use a VIEW or not to use a View.....

From

Stephan Szabo

Date:

22 January 2003, 20:21:21

On Wed, 22 Jan 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

> > The filter is applied only to a.  So, if you really wanted the
> > c.a=3 condition to be applied for whatever reason you're out of
> > luck.
>
> FWIW, CVS tip is brighter: the condition does propagate to both relations.
>
>  Hash Join  (cost=22.51..45.04 rows=1 width=8)
>    Hash Cond: ("outer".a = "inner".a)
>    ->  Seq Scan on a  (cost=0.00..22.50 rows=5 width=4)
>          Filter: (a = 3)
>    ->  Hash  (cost=22.50..22.50 rows=5 width=4)
>          ->  Seq Scan on c  (cost=0.00..22.50 rows=5 width=4)
>                Filter: (3 = a)
>
> The reason this is useful is that (a) fewer rows need to be joined,
> and (b) we may be able to make effective use of indexes on both tables.

Yeah.  I was going to ask how hard you thought it would be to do for
this particular sort of case.  I thought about the simple case of using
and realized it'd probably be reasonable in amount of work, but it seems
I don't have to think about it. :)

Re: To use a VIEW or not to use a View.....

From

Stephan Szabo

Date:

22 January 2003, 20:26:27

On Wed, 22 Jan 2003, Tomasz Myrta wrote:

> Stephan Szabo wrote:
>
> >That's not the same join for optimization purposes
> >since postgresql treats explicit join syntax as a
> >constraint on the ordering of joins.
> >
> >The same join would be something like:
> >
> >przystanki p1 join miasta m1 using (id_miasta)
> >join (przystanki p2 join miasta m2 using (id_miasta))
> > using (id_przystanku)
> >
> >minus the fact I think you'd need some explicit naming in
> >there.
>
> You are right.
>
> The result of your query is:
> explain select * from
> przystanki p1 join miasta m1 using (id_miasta)
> join (przystanki p2 join miasta m2 using (id_miasta))
>  using (id_przystanku)
> where id_przystanku=1230
>
> Hash Join  (cost=13.00..30.10 rows=1 width=128)
>   ->  Hash Join  (cost=6.74..21.02 rows=374 width=64)
>         ->  Seq Scan on przystanki p2  (cost=0.00..7.74 rows=374 width=41)
>         ->  Hash  (cost=5.99..5.99 rows=299 width=23)
>               ->  Seq Scan on miasta m2  (cost=0.00..5.99 rows=299 width=23)
>   ->  Hash  (cost=6.26..6.26 rows=1 width=64)
>         ->  Nested Loop  (cost=0.00..6.26 rows=1 width=64)
>               ->  Index Scan using przystanki_pkey on przystanki p1  (cost=0.00..3.14 rows=1 width=41)
>               ->  Index Scan using miasta_pkey on miasta m1  (cost=0.00..3.10 rows=1 width=23)
>
> Anyway - is it possible to expose table "przystanki alias p2" to get valid result?

I think it's possible that the work Tom mentioned in current CVS may
make these cases work the way you want.  I don't have access to my
test machine to try it right now however.

> Stephan - I have some problems with mail relay to you.
> Does my mail server have any open-relay problem, or something like this (213.25.37.66) ?

Doesn't seem to be a simple open relay (but I didn't try anything
complicated).  It acted a little wierd about email addresses in
reacting to my telnet, but I think it was acting correctly as far as
standards are concerned.

Re: To use a VIEW or not to use a View.....

From

Tom Lane

Date:

22 January 2003, 20:30:16

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> On Wed, 22 Jan 2003, Tom Lane wrote:
>> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
>>> The filter is applied only to a.  So, if you really wanted the
>>> c.a=3 condition to be applied for whatever reason you're out of
>>> luck.
>> 
>> FWIW, CVS tip is brighter: the condition does propagate to both relations.

> Yeah.  I was going to ask how hard you thought it would be to do for
> this particular sort of case.  I thought about the simple case of using
> and realized it'd probably be reasonable in amount of work, but it seems
> I don't have to think about it. :)

It could still use more eyeballs looking at it.  One thing I'm concerned
about is whether the extra (derived) conditions lead to double-counting
restrictivity and thus underestimating the number of result rows.  I
haven't had time to really test that, but I suspect there may be a problem.
        regards, tom lane

Re: To use a VIEW or not to use a View.....

From

"Ries van Twisk"

Date:

23 January 2003, 03:06:16

First of all I want to thank you for all responses! I was overwhelmed with
it :D

Below you find the schema I'm currently using and the output of explain. I
removed all comments so the mail will be small, the schema is still work in
progress. I especially I need to take a look at the indexes. Any hints will
be appreciated.

best reghards,
Ries van Twisk


<-----------
Here you find the output of the explain again:
I cannot yet read the output of explain si I'm not sure if the output looks
good or bad.


echo "VACUUM ANALYZE; EXPLAIN SELECT * FROM full_cablelist WHERE
projectcode=5" | psql testdb > /tmp/explain.txt

NOTICE:  QUERY PLAN:

Hash Join  (cost=26.28..39.00 rows=23 width=200) ->  Hash Join  (cost=24.85..37.17 rows=23 width=182)       ->  Hash
Join (cost=23.43..35.34 rows=23 width=164)             ->  Seq Scan on libitems lit  (cost=0.00..7.39 rows=339
 
width=27)             ->  Hash  (cost=23.37..23.37 rows=23 width=137)                   ->  Hash Join
(cost=11.05..23.37rows=23 width=137)                         ->  Hash Join  (cost=9.75..21.67 rows=23
 
width=120)                               ->  Seq Scan on libitems lif
(cost=0.00..7.39 rows=339 width=27)                               ->  Hash  (cost=9.69..9.69 rows=23 width=93)
                          ->  Hash Join  (cost=4.76..9.69
 
rows=23 width=93)                                           ->  Hash Join  (cost=3.46..7.99
rows=23 width=76)                                                 ->  Hash Join
(cost=2.42..6.32 rows=69 width=63)                                                       ->  Seq Scan on
cablelist cl  (cost=0.00..2.69 rows=69 width=41)                                                       ->  Hash
(cost=2.06..2.06 rows=106 width=22)                                                             ->  Seq Scan
on cabletypes ct  (cost=0.00..2.06 rows=106 width=22)                                                 ->  Hash
(cost=1.04..1.04
rows=1 width=13)                                                       ->  Seq Scan on
projectcodes pc  (cost=0.00..1.04 rows=1 width=13)                                           ->  Hash
(cost=1.24..1.24
rows=24 width=17)                                                 ->  Seq Scan on
libconnections lcf  (cost=0.00..1.24 rows=24 width=17)                         ->  Hash  (cost=1.24..1.24 rows=24
width=17)                              ->  Seq Scan on libconnections lct
 
(cost=0.00..1.24 rows=24 width=17)       ->  Hash  (cost=1.34..1.34 rows=34 width=18)             ->  Seq Scan on
shiplocationsslt  (cost=0.00..1.34 rows=34
 
width=18) ->  Hash  (cost=1.34..1.34 rows=34 width=18)       ->  Seq Scan on shiplocations slf  (cost=0.00..1.34
rows=34
width=18)

<------------------
CREATE FUNCTION ord_fn (text,text) RETURNS text AS '
SELECT (CASE   WHEN $1 < $2   THEN $1 || $2   ELSE $2 || $1   END) as t;
' LANGUAGE SQL WITH (iscachable);

CREATE FUNCTION plpgsql_call_handler ()RETURNS OPAQUEAS '/usr/lib/postgresql/plpgsql.so'
LANGUAGE 'C';

CREATE LANGUAGE 'plpgsql' HANDLER plpgsql_call_handler
LANCOMPILER 'PL/pgSQL';

CREATE TABLE cabletypes (   id         SERIAL,   cabletype    VARCHAR(24) NOT NULL CHECK ( length(cabletype) > 1 ),
--Naam
 
van de kabel   coretype    VARCHAR(16) NOT NULL CHECK ( length(coretype) > 1 )    -- Type
kabel/aantal aders
);

CREATE UNIQUE INDEX cabletypes_idx ON cabletypes (id);

CREATE FUNCTION f_check_cabletypes() RETURNS OPAQUE AS '
DECLAREcheck     RECORD;
BEGINSELECT INTO check * FROM cabletypes WHERE cabletype=NEW.cabletype AND
coretype=NEW.coretype LIMIT 1;IF FOUND THEN             RAISE EXCEPTION ''[0001] cabletype and coretype combination
already
exsists in cabletypes!'';       END IF;RETURN NEW;
END;
' LANGUAGE 'plpgsql';

CREATE TRIGGER tr_cabletypesBEFORE INSERT OR UPDATE ON cabletypes FOR EACH ROWEXECUTE PROCEDURE f_check_cabletypes();

CREATE TABLE marktypes (   id         SERIAL,   name    VARCHAR(24) NOT NULL UNIQUE,        -- Naam van de markering
color   INTEGER NOT NULL            -- Eventuele kleur
 
);
CREATE UNIQUE INDEX marktypes_idx ON marktypes (id);

CREATE TABLE projectcodes (   id        SERIAL,   projectcode VARCHAR(16) NOT NULL UNIQUE,        -- Project code naam
projectname VARCHAR(64) NOT NULL,            -- Project uitleg   deleted    BOOLEAN DEFAULT 'false' NOT NULL
 
);

CREATE UNIQUE INDEX projectcodes_idx ON projectcodes (id);

CREATE TABLE libitems (   id            SERIAL,   projectcodeid    INTEGER DEFAULT 0 NOT NULL REFERENCES
projectcodes(id)ON
 
DELETE CASCADE,   name        VARCHAR(32) NOT NULL UNIQUE            -- Naam van de item bv boiler
);

CREATE UNIQUE INDEX libitems_idx ON libitems(id);

CREATE FUNCTION f_check_libitems() RETURNS OPAQUE AS '
DECLAREcheck RECORD;
BEGIN-- Update van de name mag welIF TG_OP = ''UPDATE'' THEN    IF NEW.projectcodeid = OLD.projectcodeid AND NEW.name =
OLD.nameTHEN           RETURN NEW;    END IF;END IF;
 
-- Controleer of the combinatie projectcode en ribnummer unique isSELECT INTO check * FROM libitems WHERE
projectcodeid=new.projectcodeidAND
 
name=new.name LIMIT 1;IF FOUND THEN             RAISE EXCEPTION ''[0005] projectcodide and name combination already
exsists in shiplocations!'';       END IF;
RETURN NEW;
END;
' LANGUAGE 'plpgsql';

CREATE TRIGGER tr_linitemsBEFORE INSERT OR UPDATE ON libitems FOR EACH ROWEXECUTE PROCEDURE f_check_libitems();

CREATE TABLE libconnections (   id        SERIAL,   libitemid    INTEGER CONSTRAINT libitemid_con NOT NULL REFERENCES
libitems(id) ON DELETE CASCADE,   name    VARCHAR(32),                -- Naam van de aansluiting aan een item   cableno
  VARCHAR(8)
 
);

CREATE UNIQUE INDEX libconnections_idx ON libconnections(id);

CREATE FUNCTION f_check_libconnections() RETURNS OPAQUE AS '
DECLAREcheck     RECORD;
BEGIN-- Update van de name mag welIF TG_OP = ''UPDATE'' THEN    IF NEW.libitemid = OLD.libitemid AND NEW.name =
OLD.nameTHEN           RETURN NEW;    END IF;END IF;
 
SELECT INTO check * FROM libconnections WHERE libitemid=NEW.libitemid AND
name=NEW.name LIMIT 1;IF FOUND THEN           RAISE EXCEPTION ''[0002] name and item combination already exsists in
libconnections!'';       END IF;RETURN NEW;
END;
' LANGUAGE 'plpgsql';

CREATE TRIGGER tr_libconnectionsBEFORE INSERT OR UPDATE ON libconnections FOR EACH ROWEXECUTE PROCEDURE
f_check_libconnections();
                          ---------------

CREATE TABLE shiplocations (   id    SERIAL,   projectcodeid    INTEGER NOT NULL REFERENCES projectcodes(id) ON DELETE
CASCADE,   rib            SMALLINT DEFAULT 0 NOT NULL CHECK (rib>0),    -- rib nummer   name        VARCHAR(32) NOT
NULL,               -- Naam van de locatie (bv voorschip,
 
middenschip, achterschip)   loc            SMALLINT DEFAULT 0 NOT NULL CHECK (loc>=0 AND loc<5)
);

CREATE VIEW shiplocationst AS SELECT id, projectcodeid, rib, name,   CASE loc WHEN 0 THEN 'ries'when 1 THEN 'ries1'when
2THEN 'ries2'ELSE 'other'   END   FROM shiplocations;
 

CREATE UNIQUE INDEX shiplocations_idx ON shiplocations(id);

CREATE TABLE cablelist (   id             SERIAL,   cableno        VARCHAR(8),   projectcodeid    INTEGER CONSTRAINT
cablelist_projectcodes_conNOT NULL
 
REFERENCES projectcodes(id) ON DELETE CASCADE,   fromconnid        INTEGER CONSTRAINT cablelist_fromconnid_con NOT
NULL
REFERENCES libconnections(id) ON DELETE CASCADE,   toconnid        INTEGER CONSTRAINT cablelist_toconnid_con NOT NULL
REFERENCES
libconnections(id) ON DELETE CASCADE,   fromshiplocationid    INTEGER CONSTRAINT cablelist_fromshiplocationid_con
NOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE,   toshiplocationid    INTEGER CONSTRAINT
cablelist_toshiplocationid_conNOT
 
NULL REFERENCES shiplocations(id) ON DELETE CASCADE,   marktypesid        INTEGER CONSTRAINT cablelist_tomarktypeid_con
NOTNULL
 
REFERENCES marktypes(id) ON DELETE CASCADE,   cabletypeid        INTEGER CONSTRAINT cablelist_cabletypeid_con NOT NULL
REFERENCES cabletypes(id) ON DELETE CASCADE,   cut            BOOLEAN DEFAULT 'false' NOT NULL,   placed        BOOLEAN
DEFAULT'false' NOT NULL,   ok            BOOLEAN DEFAULT 'false'
 
);

CREATE VIEW full_cablelist AS

SELECT cl.id,
cl.cableno AS cableno,
pc.projectcode AS projectcode,
pc.id AS projectcodeid,
lcf.name AS fconnection, lct.name AS tconnection,
lif.name AS fitem, lit.name AS titem,
slf.rib AS frib,slt.rib AS trib,
slf.name AS fname, slt.name AS tname,
ct.cabletype AS cabletype, ct.coretype AS coretype,
cl.cut,
cl.placed,
cl.ok

FROM cablelist AS cl,
libconnections AS lcf, libconnections AS lct,
libitems AS lif, libitems AS lit,
shiplocations AS slf, shiplocations AS slt,
projectcodes AS pc,
cabletypes AS ct

WHERE
pc.id=cl.projectcodeid AND
lcf.id=cl.fromconnid AND lct.id=cl.toconnid AND
lif.id=lcf.libitemid AND lit.id=lct.libitemid AND
slf.id=cl.fromshiplocationid AND slt.id=cl.toshiplocationid AND
ct.id=cl.cabletypeid
;

CREATE FUNCTION f_find_free_cableno(INTEGER,VARCHAR(4)) RETURNS VARCHAR(8)
AS '
DECLAREpcid        ALIAS FOR $1;prefix        ALIAS FOR $2;
cnmax        INTEGER;newcableno    INTEGER;CHECK        RECORD;cablename    VARCHAR(10);
BEGINnewcableno=0;SELECT INTO cnmax count(cableno) FROM cablelist WHERE projectcodeid = pcid;WHILE newcableno <= cnmax
LOOP   newcableno=newcableno+1;
 
    -- Grote getallen dan 9998 worden niet toegstaan omdat de lpad functie    -- ook een truncate doet    IF newcableno
>998 THEN    RETURN NULL;    END IF;
 
    -- Controleer op prefix, zo ja gebruik deze    IF prefix != '''' THEN    cablename = prefix || ''.'' ||
lpad(newcableno,3, ''0'');    ELSE    cablename =  lpad(newcableno, 3, ''0'');    END IF;
 
    -- Controleer of dit nummer al bestaad    SELECT INTO CHECK * FROM cablelist WHERE projectcodeid = pcid AND
cableno=cablename;    IF NOT FOUND THEN    RETURN cablename;    END IF;END LOOP;
-- Niets gevonden, hier zouden we normaal gesproken niet-- komen. Dit omdat we <= testen en dis altijd cnmax+1
testenRETURNNULL;
 
-- Alle nummer zijn in gebruik, kies nu een volgt nummer-- newcableno zal ALTIJD <= 998 zijn en dus altijd maar 3
positieinnemennewcableno=newcableno+1;       IF prefix != '''' THEN           cablename = prefix || ''.'' ||
lpad(newcableno,3, ''0'');ELSE    cablename = lpad(newcableno, 3, ''0'');END IF;
 
RETURN cableno;
END;
' LANGUAGE 'plpgsql';


> -----Oorspronkelijk bericht-----
> Van: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]Namens Tomasz Myrta
> Verzonden: woensdag 22 januari 2003 16:46
> Aan: Ries van Twisk
> CC: pgsql-sql@postgresql.org
> Onderwerp: Re: [SQL] To use a VIEW or not to use a View.....
>
>
> Ries van Twisk wrote:
>
> >Dear PostgreSQL users,
> >
> >I have a view and a table,
> >
> >I understand that when a frontend accesses a VIEW that
> PostgreSQL cannot use
> >a index on that view.
> >For example when I do this: SELECT * FROM full_cablelist WHERE
> >projectocode=5; Correct?
> >
> >Now I just want to make sure for myself if the VIEW I
> created is the right
> >way to go, or is it better
> >to contruct a SQL in my application that looks like the view
> and send it to
> >postgreSQL so it will
> >use all indexes correctly. I use postgreSQL 7.2.1
> >
> >I beliefe there is a change in postgreSQL 7.3.x on which I
> can cache a
> >view??? Not sure what the issue was.
> >
> >I ask this because I expect pore performance in feature when
> the cablelist
> >table holds up to around 20.000 rows.
> >Each query to full_cablelist will return around 1200 rows.
> >
> >best regards,
> >Ries van Twisk
> >
> >
> >
> >-- CABLE LIST
> >CREATE TABLE cablelist (
> >    id             SERIAL,
> >    cableno        VARCHAR(8),
> >    projectcodeid    INTEGER CONSTRAINT
> cablelist_projectcodes_con NOT NULL
> >REFERENCES projectcodes(id) ON DELETE CASCADE,
> >    fromconnid        INTEGER CONSTRAINT
> cablelist_fromconnid_con NOT NULL
> >REFERENCES libconnections(id) ON DELETE CASCADE,
> >    toconnid        INTEGER CONSTRAINT
> cablelist_toconnid_con NOT NULL REFERENCES
> >libconnections(id) ON DELETE CASCADE,
> >    fromshiplocationid    INTEGER CONSTRAINT
> cablelist_fromshiplocationid_con
> >NOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE,
> >    toshiplocationid    INTEGER CONSTRAINT
> cablelist_toshiplocationid_con NOT
> >NULL REFERENCES shiplocations(id) ON DELETE CASCADE,
> >    marktypesid        INTEGER CONSTRAINT
> cablelist_tomarktypeid_con NOT NULL
> >REFERENCES marktypes(id) ON DELETE CASCADE,
> >    cabletypeid        INTEGER CONSTRAINT
> cablelist_cabletypeid_con NOT NULL
> >REFERENCES cabletypes(id) ON DELETE CASCADE,
> >    cut            BOOLEAN DEFAULT 'false' NOT NULL,
> >    placed        BOOLEAN DEFAULT 'false' NOT NULL,
> >    ok            BOOLEAN DEFAULT 'false'
> >);
> >
> >
> >-- CABLE LIST VIEW
> >CREATE VIEW full_cablelist AS
> >SELECT cl.id,
> >cl.cableno AS cableno,
> >pc.projectcode AS projectcode,
> >pc.id AS projectcodeid,
> >lcf.name AS fconnection, lct.name AS tconnection,
> >lif.name AS fitem, lit.name AS titem,
> >slf.rib AS frib,slt.rib AS trib,
> >slf.name AS fname, slt.name AS tname,
> >ct.cabletype AS cabletype, ct.coretype AS coretype,
> >cl.cut,
> >cl.placed,
> >cl.ok
> >
> >FROM cablelist AS cl,
> >libconnections AS lcf, libconnections AS lct,
> >libitems AS lif, libitems AS lit,
> >shiplocations AS slf, shiplocations AS slt,
> >projectcodes AS pc,
> >cabletypes AS ct
> >
> >WHERE
> >pc.id=cl.projectcodeid AND
> >lcf.id=cl.fromconnid AND lct.id=cl.toconnid AND
> >lif.id=lcf.libitemid AND lit.id=lct.libitemid AND
> >slf.id=cl.fromshiplocationid AND slt.id=cl.toshiplocationid AND
> >ct.id=cl.cabletypeid
>
> How can we help you with table indexing, if you didn't write anything
> about indexes you have already created on your tables? Anyway
> you don't need
> indexes on a view, but indexes on your tables. You need also
> a well constructed
> view.
>
> For your query:
> - make sure, you have index on projectcodes(projectcode) - if
> you have many projectcodes
>   and index on cablelist(projectcodeid)
> - make sure, you did "vacuum analyze" before you test your query.
> - send result of "explain analyze SELECT * FROM full_cablelist WHERE
> projectocode=5" to us.
>
> Anyway result can't be too fast, because query returns >1000
> rows which is rather
> a big amount of data.
>
> Regards,
> Tomasz Myrta
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
> majordomo@postgresql.org)
>

Re: To use a VIEW or not to use a View.....

From

Tomasz Myrta

Date:

23 January 2003, 05:53:22

Ries van Twisk wrote:
> First of all I want to thank you for all responses! I was overwhelmed with
> it :D
> 
> Below you find the schema I'm currently using and the output of explain. I
> removed all comments so the mail will be small, the schema is still work in
> progress. I especially I need to take a look at the indexes. Any hints will
> be appreciated.
> 
> best reghards,
> Ries van Twisk
> 
> 
> <-----------
> Here you find the output of the explain again:
> I cannot yet read the output of explain si I'm not sure if the output looks
> good or bad.
> 
> 
It looks like your cablelist table doesn't contain too many records, 
so result is inacurate. Postgresql doesn't use indexes if you have too
little rows.
First look on your explain is ok, your query should work fine if tables
are well indexed.
Make additional tests with tables containing more rows, "explain analyze"
helps a bit, because it shows real times.

Tomasz Myrta



> echo "VACUUM ANALYZE; EXPLAIN SELECT * FROM full_cablelist WHERE
> projectcode=5" | psql testdb > /tmp/explain.txt
> 
> NOTICE:  QUERY PLAN:
> 
> Hash Join  (cost=26.28..39.00 rows=23 width=200)
>   ->  Hash Join  (cost=24.85..37.17 rows=23 width=182)
>         ->  Hash Join  (cost=23.43..35.34 rows=23 width=164)
>               ->  Seq Scan on libitems lit  (cost=0.00..7.39 rows=339
> width=27)
>               ->  Hash  (cost=23.37..23.37 rows=23 width=137)
>                     ->  Hash Join  (cost=11.05..23.37 rows=23 width=137)
>                           ->  Hash Join  (cost=9.75..21.67 rows=23
> width=120)
>                                 ->  Seq Scan on libitems lif
> (cost=0.00..7.39 rows=339 width=27)
>                                 ->  Hash  (cost=9.69..9.69 rows=23 width=93)
>                                       ->  Hash Join  (cost=4.76..9.69
> rows=23 width=93)
>                                             ->  Hash Join  (cost=3.46..7.99
> rows=23 width=76)
>                                                   ->  Hash Join
> (cost=2.42..6.32 rows=69 width=63)
>                                                         ->  Seq Scan on
> cablelist cl  (cost=0.00..2.69 rows=69 width=41)
>                                                         ->  Hash
> (cost=2.06..2.06 rows=106 width=22)
>                                                               ->  Seq Scan
> on cabletypes ct  (cost=0.00..2.06 rows=106 width=22)
>                                                   ->  Hash  (cost=1.04..1.04
> rows=1 width=13)
>                                                         ->  Seq Scan on
> projectcodes pc  (cost=0.00..1.04 rows=1 width=13)
>                                             ->  Hash  (cost=1.24..1.24
> rows=24 width=17)
>                                                   ->  Seq Scan on
> libconnections lcf  (cost=0.00..1.24 rows=24 width=17)
>                           ->  Hash  (cost=1.24..1.24 rows=24 width=17)
>                                 ->  Seq Scan on libconnections lct
> (cost=0.00..1.24 rows=24 width=17)
>         ->  Hash  (cost=1.34..1.34 rows=34 width=18)
>               ->  Seq Scan on shiplocations slt  (cost=0.00..1.34 rows=34
> width=18)
>   ->  Hash  (cost=1.34..1.34 rows=34 width=18)
>         ->  Seq Scan on shiplocations slf  (cost=0.00..1.34 rows=34
> width=18)

Re: To use a VIEW or not to use a View.....

From

Stephan Szabo

Date:

23 January 2003, 12:14:20

On Wed, 22 Jan 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > On Wed, 22 Jan 2003, Tom Lane wrote:
> >> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> >>> The filter is applied only to a.  So, if you really wanted the
> >>> c.a=3 condition to be applied for whatever reason you're out of
> >>> luck.
> >>
> >> FWIW, CVS tip is brighter: the condition does propagate to both relations.
>
> > Yeah.  I was going to ask how hard you thought it would be to do for
> > this particular sort of case.  I thought about the simple case of using
> > and realized it'd probably be reasonable in amount of work, but it seems
> > I don't have to think about it. :)
>
> It could still use more eyeballs looking at it.  One thing I'm concerned
> about is whether the extra (derived) conditions lead to double-counting
> restrictivity and thus underestimating the number of result rows.  I
> haven't had time to really test that, but I suspect there may be a problem.

I haven't looked at code yet but tried examples like Tomasz's and some
simple ones and have gotten reasonable seeming output for the estimates
given accurate statistics (joining two estimate 3 outputs, getting 8 for
the estimated rows, joining that with another copy getting 50 some odd
where in this case the real would be 81).  Not that I did  a
particularly thorough test.  I hope to get a chance over the next couple
of days to look and run more tests.

Tomasz, if you have the chance, you might want to try CVS and see what it
does for the queries you've been working with.

Re: To use a VIEW or not to use a View.....

From

Tom Lane

Date:

23 January 2003, 12:39:57

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> On Wed, 22 Jan 2003, Tom Lane wrote:
>> It could still use more eyeballs looking at it.  One thing I'm concerned
>> about is whether the extra (derived) conditions lead to double-counting
>> restrictivity and thus underestimating the number of result rows.  I
>> haven't had time to really test that, but I suspect there may be a problem.

> I haven't looked at code yet but tried examples like Tomasz's and some
> simple ones and have gotten reasonable seeming output for the estimates
> given accurate statistics

I realized this morning that there definitely is a problem.  Consider
this example using the regression database:

regression=# explain analyze select * from tenk1 a join tenk1 b using(ten)
regression-# where ten = 3;                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------Merge
Join (cost=1055.45..2102.12 rows=83006 width=488) (actual time=582.97..65486.57 rows=1000000 loops=1)  Merge Cond:
("outer".ten= "inner".ten)  ->  Sort  (cost=527.73..530.00 rows=910 width=244) (actual time=373.57..382.48 rows=1000
loops=1)       Sort Key: a.ten        ->  Seq Scan on tenk1 a  (cost=0.00..483.00 rows=910 width=244) (actual
time=8.98..330.39rows=1000 loops=1)              Filter: (ten = 3)  ->  Sort  (cost=527.73..530.00 rows=910 width=244)
(actualtime=209.19..8057.64 rows=999001 loops=1)        Sort Key: b.ten        ->  Seq Scan on tenk1 b
(cost=0.00..483.00rows=910 width=244) (actual time=0.40..193.93 rows=1000 loops=1)              Filter: (3 = ten)Total
runtime:73291.01 msec

(11 rows)

The condition "ten=3" will select 1000 rows out of the 10000 in the
table.  But, once we have applied that condition to both sides of the
join, the join condition "a.ten = b.ten" is a no-op --- it will not
reject any pair of rows coming out of the seqscans.  Presently we count
its restrictivity anyway, so the estimated row count at the merge is a
bad underestimate.

Not only should we ignore the join condition for selectivity purposes,
but it's a waste of time for execution as well.  We could have
implemented the above query as a nestloop with no join condition, and
saved the effort of the sort and merge logic.

What I was thinking was that any time the code sees a "var = const"
clause as part of a mergejoin equivalence set, we could mark all the
"var = var" clauses in the same set as no-ops.  For example, given

WHERE a.f1 = b.f2 AND b.f2 = c.f3 AND c.f3 = 42

then after we finish deducing a.f1 = 42 and b.f2 = 42, there is no
longer any value in either of the original clauses a.f1 = b.f2 and
b.f2 = c.f3, nor in the other deduced clause a.f1 = c.f3.  This would
take a little bit of restructuring of generate_implied_equalities() and
process_implied_equality(), but it doesn't seem too difficult to do.

Thoughts?  Are there any holes in that logic?
        regards, tom lane

Re: To use a VIEW or not to use a View.....

From

jasiek@klaster.net

Date:

23 January 2003, 13:06:44

On Thu, Jan 23, 2003 at 08:53:53AM -0800, Stephan Szabo wrote:
> On Wed, 22 Jan 2003, Tom Lane wrote:
> 
> I haven't looked at code yet but tried examples like Tomasz's and some
> simple ones and have gotten reasonable seeming output for the estimates
> given accurate statistics (joining two estimate 3 outputs, getting 8 for
> the estimated rows, joining that with another copy getting 50 some odd
> where in this case the real would be 81).  Not that I did  a
> particularly thorough test.  I hope to get a chance over the next couple
> of days to look and run more tests.
> 
> Tomasz, if you have the chance, you might want to try CVS and see what it
> does for the queries you've been working with.
Not too easy. Currently I have only windows machine with
Postgresql/cygwin. I use dial-up for accessing internet, which isn't
nice to use. I will try this if I find some free computer to install
postgresql/linux ;-)
Anyway I already gave up this kind of query, especially I can't use CVS
as production server (should I?)

Regards,
Tomasz Myrta

Re: To use a VIEW or not to use a View.....

From

Stephan Szabo

Date:

23 January 2003, 14:09:28

On Thu, 23 Jan 2003, Tom Lane wrote:

> regression=# explain analyze select * from tenk1 a join tenk1 b using(ten)
> regression-# where ten = 3;
>                                                       QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=1055.45..2102.12 rows=83006 width=488) (actual time=582.97..65486.57 rows=1000000 loops=1)
>    Merge Cond: ("outer".ten = "inner".ten)
>    ->  Sort  (cost=527.73..530.00 rows=910 width=244) (actual time=373.57..382.48 rows=1000 loops=1)
>          Sort Key: a.ten
>          ->  Seq Scan on tenk1 a  (cost=0.00..483.00 rows=910 width=244) (actual time=8.98..330.39 rows=1000
loops=1)
>                Filter: (ten = 3)
>    ->  Sort  (cost=527.73..530.00 rows=910 width=244) (actual time=209.19..8057.64 rows=999001 loops=1)
>          Sort Key: b.ten
>          ->  Seq Scan on tenk1 b  (cost=0.00..483.00 rows=910 width=244) (actual time=0.40..193.93 rows=1000
loops=1)
>                Filter: (3 = ten)
>  Total runtime: 73291.01 msec
> (11 rows)

Yeah, I see it once I got the estimated selectivity being smaller in the
joins in my test database as well.

> Not only should we ignore the join condition for selectivity purposes,
> but it's a waste of time for execution as well.  We could have
> implemented the above query as a nestloop with no join condition, and
> saved the effort of the sort and merge logic.
>
> What I was thinking was that any time the code sees a "var = const"
> clause as part of a mergejoin equivalence set, we could mark all the
> "var = var" clauses in the same set as no-ops.  For example, given
>
> WHERE a.f1 = b.f2 AND b.f2 = c.f3 AND c.f3 = 42
>
> then after we finish deducing a.f1 = 42 and b.f2 = 42, there is no
> longer any value in either of the original clauses a.f1 = b.f2 and
> b.f2 = c.f3, nor in the other deduced clause a.f1 = c.f3.  This would
> take a little bit of restructuring of generate_implied_equalities() and
> process_implied_equality(), but it doesn't seem too difficult to do.
>
> Thoughts?  Are there any holes in that logic?

The main thing I can think of is being careful when the types are
different (like padding vs no padding in strings).  Playing with text and
char() the explain output appears to be resulting in the right thing
for the clauses but I'm not 100% sure.

Given that it only appears to bring across equality conditions and
not the parts of conditions with or, I think you're right in general.

Re: To use a VIEW or not to use a View.....

From

Tom Lane

Date:

23 January 2003, 18:17:15

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> On Thu, 23 Jan 2003, Tom Lane wrote:
>> What I was thinking was that any time the code sees a "var = const"
>> clause as part of a mergejoin equivalence set, we could mark all the
>> "var = var" clauses in the same set as no-ops.  For example, given
>> WHERE a.f1 = b.f2 AND b.f2 = c.f3 AND c.f3 = 42
>> then after we finish deducing a.f1 = 42 and b.f2 = 42, there is no
>> longer any value in either of the original clauses a.f1 = b.f2 and
>> b.f2 = c.f3, nor in the other deduced clause a.f1 = c.f3.  This would
>> take a little bit of restructuring of generate_implied_equalities() and
>> process_implied_equality(), but it doesn't seem too difficult to do.
>> 
>> Thoughts?  Are there any holes in that logic?

> The main thing I can think of is being careful when the types are
> different (like padding vs no padding in strings).

This is a matter of being careful about marking cross-datatype operators
as mergejoinable.  We do not mark 'bpchar = text' as mergejoinable ---
in fact we don't even have such an operator.  AFAICS any pitfalls in
those semantics come up already from the existing logic to treat
mergejoinable equality as transitive for variables.  Extending that
transitivity to constants can't create problems that wouldn't exist
anyway.

For reference, these are the only cross-datatype mergejoinable operators
as of CVS tip:

regression=# select oid::regoperator,oprcode from pg_operator where oprlsortop!=0 and oprleft!=oprright;          oid
        |  oprcode
 
--------------------------+-----------=(integer,bigint)        | int48eq=(bigint,integer)        |
int84eq=(smallint,integer)     | int24eq=(integer,smallint)      | int42eq=(real,double precision) | float48eq=(double
precision,real)| float84eq=(smallint,bigint)       | int28eq=(bigint,smallint)       | int82eq
 
(8 rows)

        regards, tom lane