Thread: To use a VIEW or not to use a View.....
Dear PostgreSQL users, I have a view and a table, I understand that when a frontend accesses a VIEW that PostgreSQL cannot use a index on that view. For example when I do this: SELECT * FROM full_cablelist WHERE projectocode=5; Correct? Now I just want to make sure for myself if the VIEW I created is the right way to go, or is it better to contruct a SQL in my application that looks like the view and send it to postgreSQL so it will use all indexes correctly. I use postgreSQL 7.2.1 I beliefe there is a change in postgreSQL 7.3.x on which I can cache a view??? Not sure what the issue was. I ask this because I expect pore performance in feature when the cablelist table holds up to around 20.000 rows. Each query to full_cablelist will return around 1200 rows. best regards, Ries van Twisk -- CABLE LIST CREATE TABLE cablelist ( id SERIAL, cableno VARCHAR(8), projectcodeid INTEGER CONSTRAINT cablelist_projectcodes_conNOT NULL REFERENCES projectcodes(id) ON DELETE CASCADE, fromconnid INTEGER CONSTRAINT cablelist_fromconnid_con NOT NULL REFERENCES libconnections(id) ON DELETE CASCADE, toconnid INTEGER CONSTRAINT cablelist_toconnid_con NOT NULL REFERENCES libconnections(id) ON DELETE CASCADE, fromshiplocationid INTEGER CONSTRAINT cablelist_fromshiplocationid_con NOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE, toshiplocationid INTEGER CONSTRAINT cablelist_toshiplocationid_conNOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE, marktypesid INTEGER CONSTRAINT cablelist_tomarktypeid_con NOTNULL REFERENCES marktypes(id) ON DELETE CASCADE, cabletypeid INTEGER CONSTRAINT cablelist_cabletypeid_con NOT NULL REFERENCES cabletypes(id) ON DELETE CASCADE, cut BOOLEAN DEFAULT 'false' NOT NULL, placed BOOLEAN DEFAULT'false' NOT NULL, ok BOOLEAN DEFAULT 'false' ); -- CABLE LIST VIEW CREATE VIEW full_cablelist AS SELECT cl.id, cl.cableno AS cableno, pc.projectcode AS projectcode, pc.id AS projectcodeid, lcf.name AS fconnection, lct.name AS tconnection, lif.name AS fitem, lit.name AS titem, slf.rib AS frib,slt.rib AS trib, slf.name AS fname, slt.name AS tname, ct.cabletype AS cabletype, ct.coretype AS coretype, cl.cut, cl.placed, cl.ok FROM cablelist AS cl, libconnections AS lcf, libconnections AS lct, libitems AS lif, libitems AS lit, shiplocations AS slf, shiplocations AS slt, projectcodes AS pc, cabletypes AS ct WHERE pc.id=cl.projectcodeid AND lcf.id=cl.fromconnid AND lct.id=cl.toconnid AND lif.id=lcf.libitemid AND lit.id=lct.libitemid AND slf.id=cl.fromshiplocationid AND slt.id=cl.toshiplocationid AND ct.id=cl.cabletypeid
On Wed, Jan 22, 2003 at 16:12:52 +0100, Ries van Twisk <ries@jongert.nl> wrote: > Dear PostgreSQL users, > > I understand that when a frontend accesses a VIEW that PostgreSQL cannot use > a index on that view. > For example when I do this: SELECT * FROM full_cablelist WHERE > projectocode=5; Correct? For the most part views work like macros for selects and indexes should be usable. You can test this yourself using EXPLAIN to compare plans both using and not using a view on a table.
Ries van Twisk wrote: > > Dear PostgreSQL users, > > I have a view and a table, > > I understand that when a frontend accesses a VIEW that PostgreSQL cannot use > a index on that view. > For example when I do this: SELECT * FROM full_cablelist WHERE > projectocode=5; Correct? No. > > Now I just want to make sure for myself if the VIEW I created is the right > way to go, or is it better > to contruct a SQL in my application that looks like the view and send it to > postgreSQL so it will > use all indexes correctly. I use postgreSQL 7.2.1 Views in PostgreSQL aren't materialized. They are implemented as query rewrite rules that combine your query with the view definition. This is done before planning and optimizing, so what the query planner is chewing on (the internal parsetree representation of a query) is the same as if your application would have sent down the complicated query over the base tables. There are a few exceptions where an application could construct a better WHERE clause, resulting in a different join order or better scan qualifications. As long as we're not talking about gigabytes here, you shouldn't worry. Use tables, views and views over views, it's all fine and your indexes will be used. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Ries van Twisk wrote: >Dear PostgreSQL users, > >I have a view and a table, > >I understand that when a frontend accesses a VIEW that PostgreSQL cannot use >a index on that view. >For example when I do this: SELECT * FROM full_cablelist WHERE >projectocode=5; Correct? > >Now I just want to make sure for myself if the VIEW I created is the right >way to go, or is it better >to contruct a SQL in my application that looks like the view and send it to >postgreSQL so it will >use all indexes correctly. I use postgreSQL 7.2.1 > >I beliefe there is a change in postgreSQL 7.3.x on which I can cache a >view??? Not sure what the issue was. > >I ask this because I expect pore performance in feature when the cablelist >table holds up to around 20.000 rows. >Each query to full_cablelist will return around 1200 rows. > >best regards, >Ries van Twisk > > > >-- CABLE LIST >CREATE TABLE cablelist ( > id SERIAL, > cableno VARCHAR(8), > projectcodeid INTEGER CONSTRAINT cablelist_projectcodes_con NOT NULL >REFERENCES projectcodes(id) ON DELETE CASCADE, > fromconnid INTEGER CONSTRAINT cablelist_fromconnid_con NOT NULL >REFERENCES libconnections(id) ON DELETE CASCADE, > toconnid INTEGER CONSTRAINT cablelist_toconnid_con NOT NULL REFERENCES >libconnections(id) ON DELETE CASCADE, > fromshiplocationid INTEGER CONSTRAINT cablelist_fromshiplocationid_con >NOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE, > toshiplocationid INTEGER CONSTRAINT cablelist_toshiplocationid_con NOT >NULL REFERENCES shiplocations(id) ON DELETE CASCADE, > marktypesid INTEGER CONSTRAINT cablelist_tomarktypeid_con NOT NULL >REFERENCES marktypes(id) ON DELETE CASCADE, > cabletypeid INTEGER CONSTRAINT cablelist_cabletypeid_con NOT NULL >REFERENCES cabletypes(id) ON DELETE CASCADE, > cut BOOLEAN DEFAULT 'false' NOT NULL, > placed BOOLEAN DEFAULT 'false' NOT NULL, > ok BOOLEAN DEFAULT 'false' >); > > >-- CABLE LIST VIEW >CREATE VIEW full_cablelist AS >SELECT cl.id, >cl.cableno AS cableno, >pc.projectcode AS projectcode, >pc.id AS projectcodeid, >lcf.name AS fconnection, lct.name AS tconnection, >lif.name AS fitem, lit.name AS titem, >slf.rib AS frib,slt.rib AS trib, >slf.name AS fname, slt.name AS tname, >ct.cabletype AS cabletype, ct.coretype AS coretype, >cl.cut, >cl.placed, >cl.ok > >FROM cablelist AS cl, >libconnections AS lcf, libconnections AS lct, >libitems AS lif, libitems AS lit, >shiplocations AS slf, shiplocations AS slt, >projectcodes AS pc, >cabletypes AS ct > >WHERE >pc.id=cl.projectcodeid AND >lcf.id=cl.fromconnid AND lct.id=cl.toconnid AND >lif.id=lcf.libitemid AND lit.id=lct.libitemid AND >slf.id=cl.fromshiplocationid AND slt.id=cl.toshiplocationid AND >ct.id=cl.cabletypeid How can we help you with table indexing, if you didn't write anything about indexes you have already created on your tables? Anyway you don't need indexes on a view, but indexes on your tables. You need also a well constructed view. For your query: - make sure, you have index on projectcodes(projectcode) - if you have many projectcodes and index on cablelist(projectcodeid) - make sure, you did "vacuum analyze" before you test your query. - send result of "explain analyze SELECT * FROM full_cablelist WHERE projectocode=5" to us. Anyway result can't be too fast, because query returns >1000 rows which is rather a big amount of data. Regards, Tomasz Myrta
On Wed, 22 Jan 2003, Ries van Twisk wrote: > Dear PostgreSQL users, > > I have a view and a table, > > I understand that when a frontend accesses a VIEW that PostgreSQL cannot use > a index on that view. > For example when I do this: SELECT * FROM full_cablelist WHERE > projectocode=5; Correct? In general, no. There are some exceptions, for example views using EXCEPT I believe will not push conditions down. In 7.2.x, views using any of the set ops (INTERSECT, UNION, EXCEPT) wouldn't push conditions down. There are a few other such conditions, but for your view, I think this isn't going to be an issue.
Jan Wieck wrote: >Use tables, views and views over views, it's all fine and your indexes >will be used. I can't agree with using views over views. It has some limitations. I asked about it on this list several months ago and Tom Lane's conclusion was: >> Tomasz Myrta <jasiek@klaster.net> writes: >> I'd like to split queries into views, but I can't join them - planner >> search all of records instead of using index. It works very slow. I think this is the same issue that Stephan identified in his response to your other posting ("sub-select with aggregate"). When you writeFROM x join y using (col) WHERE x.col = const the WHERE-restriction is only applied to x. I'm afraid you'll need to writeFROM x join y using (col) WHERE x.col = const AND y.col = const Ideally you should be able to write justFROM x join y using (col) WHERE col = const but I think that will be taken the same as "x.col = const" :-( regards, tom lane I don't know if anything changed on 7.3. Regards, Tomasz Myrta
On Wed, 22 Jan 2003, Tomasz Myrta wrote: > >> Tomasz Myrta <jasiek@klaster.net> writes: > >> I'd like to split queries into views, but I can't join them - planner > >> search all of records instead of using index. It works very slow. > > > I think this is the same issue that Stephan identified in his response > to your other posting ("sub-select with aggregate"). When you write > FROM x join y using (col) WHERE x.col = const > the WHERE-restriction is only applied to x. I'm afraid you'll need > to write > FROM x join y using (col) WHERE x.col = const AND y.col = const > Ideally you should be able to write just > FROM x join y using (col) WHERE col = const > but I think that will be taken the same as "x.col = const" :-( > I don't know if anything changed on 7.3. I don't think so, but this is a general transitivity constraint AFAIK, not one actually to do with views (ie, if you wrote out the query without a view, you can run into the same issue). It's somewhat easier to run into the case with views and the effect may be exasperated by views, but it's a general condition. For example: create table a(a int); create table c(a int); sszabo=# explain select * from a join c using (a) where a=3; QUERY PLAN -------------------------------------------------------------Hash Join (cost=1.01..26.08 rows=6 width=8) Hash Cond: ("outer".a= "inner".a) -> Seq Scan on c (cost=0.00..20.00 rows=1000 width=4) -> Hash (cost=1.01..1.01 rows=1 width=4) -> Seq Scan on a (cost=0.00..1.01 rows=1 width=4) Filter: (a = 3) (6 rows) The filter is applied only to a. So, if you really wanted the c.a=3 condition to be applied for whatever reason you're out of luck.
Tomasz Myrta wrote: > > Jan Wieck wrote: > > >Use tables, views and views over views, it's all fine and your indexes > >will be used. > > I can't agree with using views over views. It has some limitations. > I asked about it on this list several months ago > and Tom Lane's conclusion was: It has less to do with the nesting level of rewriting, than with what you do with the view in general. If you cram up all the functionality with aggregating subselects into one monster view it might do better, except for maintaining the code. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Stephan Szabo wrote: >On Wed, 22 Jan 2003, Tomasz Myrta wrote: > > >>>>Tomasz Myrta writes: >>>>I'd like to split queries into views, but I can't join them - planner >>>>search all of records instead of using index. It works very slow. >> >> >>I think this is the same issue that Stephan identified in his response >>to your other posting ("sub-select with aggregate"). When you write >> FROM x join y using (col) WHERE x.col = const >>the WHERE-restriction is only applied to x. I'm afraid you'll need >>to write >> FROM x join y using (col) WHERE x.col = const AND y.col = const >>Ideally you should be able to write just >> FROM x join y using (col) WHERE col = const >>but I think that will be taken the same as "x.col = const" :-( > > > >>I don't know if anything changed on 7.3. > > >I don't think so, but this is a general transitivity constraint AFAIK, not >one actually to do with views (ie, if you wrote out the query without a >view, you can run into the same issue). It's somewhat easier to run into >the case with views and the effect may be exasperated by views, but it's >a general condition. > >For example: >create table a(a int); >create table c(a int); > >sszabo=# explain select * from a join c using (a) where a=3; > QUERY PLAN >------------------------------------------------------------- > Hash Join (cost=1.01..26.08 rows=6 width=8) > Hash Cond: ("outer".a = "inner".a) > -> Seq Scan on c (cost=0.00..20.00 rows=1000 width=4) > -> Hash (cost=1.01..1.01 rows=1 width=4) > -> Seq Scan on a (cost=0.00..1.01 rows=1 width=4) > Filter: (a = 3) >(6 rows) I don't understand your idea. explain select * from przystanki p join miasta m using (id_miasta) where field_id=100 Both tables are indexed on field id_miasta. They have enough rows to use indexes. Nested Loop (cost=0.00..9.48 rows=1 width=64) -> Index Scan using ind_miasto_przyst on przystanki p (cost=0.00..5.54 rows=1width=41) -> Index Scan using miasta_pkey on miasta m (cost=0.00..3.10 rows=1 width=23) Tomasz Myrta
Stephan Szabo wrote: >On Wed, 22 Jan 2003, Tomasz Myrta wrote: > > >>>>Tomasz Myrta writes: >>>>I'd like to split queries into views, but I can't join them - planner >>>>search all of records instead of using index. It works very slow. >> >> >>I think this is the same issue that Stephan identified in his response >>to your other posting ("sub-select with aggregate"). When you write >> FROM x join y using (col) WHERE x.col = const >>the WHERE-restriction is only applied to x. I'm afraid you'll need >>to write >> FROM x join y using (col) WHERE x.col = const AND y.col = const >>Ideally you should be able to write just >> FROM x join y using (col) WHERE col = const >>but I think that will be taken the same as "x.col = const" :-( > > > >>I don't know if anything changed on 7.3. > > >I don't think so, but this is a general transitivity constraint AFAIK, not >one actually to do with views (ie, if you wrote out the query without a >view, you can run into the same issue). It's somewhat easier to run into >the case with views and the effect may be exasperated by views, but it's >a general condition. > >For example: >create table a(a int); >create table c(a int); > >sszabo=# explain select * from a join c using (a) where a=3; > QUERY PLAN >------------------------------------------------------------- > Hash Join (cost=1.01..26.08 rows=6 width=8) > Hash Cond: ("outer".a = "inner".a) > -> Seq Scan on c (cost=0.00..20.00 rows=1000 width=4) > -> Hash (cost=1.01..1.01 rows=1 width=4) > -> Seq Scan on a (cost=0.00..1.01 rows=1 width=4) > Filter: (a = 3) >(6 rows) > >The filter is applied only to a. So, if you really wanted the >c.a=3 condition to be applied for whatever reason you're out of >luck. Let's make some test: First, let's create some simple view with 2 tables join: drop view pm; create view pm as select id_przystanku, m.nazwa from przystanki p join miasta m using (id_miasta); explain select * from pm where id_przystanku=1230; Nested Loop (cost=0.00..6.26 rows=1 width=23) -> Index Scan using przystanki_pkey on przystanki p (cost=0.00..3.14 rows=1width=8) -> Index Scan using miasta_pkey on miasta m (cost=0.00..3.10 rows=1 width=15) Next, let's try query using this view 2 times with explicit join: explain select * from pm a join pm b using(id_przystanku) where id_przystanku=1230; Hash Join (cost=13.00..30.10 rows=1 width=46) -> Hash Join (cost=6.74..21.02 rows=374 width=23) -> Seq Scan onprzystanki p (cost=0.00..7.74 rows=374 width=8) -> Hash (cost=5.99..5.99 rows=299 width=15) -> SeqScan on miasta m (cost=0.00..5.99 rows=299 width=15) -> Hash (cost=6.26..6.26 rows=1 width=23) -> Nested Loop (cost=0.00..6.26 rows=1 width=23) -> Index Scan using przystanki_pkey on przystanki p (cost=0.00..3.14rows=1 width=8) -> Index Scan using miasta_pkey on miasta m (cost=0.00..3.10 rows=1 width=15) And now similiar view, but without nesting views: drop view pm2; create view pm2 asselect id_przystanku, m1.nazwa as nazwa1, m2.nazwa as nazwa2from przystanki p1 join miasta m1 using(id_miasta) join przystanki p2 using (id_przystanku) join miasta m2 on (m2.id_miasta=p2.id_miasta); explain select * from pm2 where id_przystanku=1230; Nested Loop (cost=0.00..12.52 rows=1 width=46) -> Nested Loop (cost=0.00..9.41 rows=1 width=31) -> Nested Loop (cost=0.00..6.26 rows=1 width=23) -> Index Scan using przystanki_pkey on przystanki p1 (cost=0.00..3.14rows=1 width=8) -> Index Scan using miasta_pkey on miasta m1 (cost=0.00..3.10 rows=1 width=15) -> Index Scan using przystanki_pkey on przystanki p2 (cost=0.00..3.14 rows=1 width=8) -> Index Scan usingmiasta_pkey on miasta m2 (cost=0.00..3.10 rows=1 width=15) Regards, Tomasz Myrta
On Wed, 22 Jan 2003, Tomasz Myrta wrote: > Let's make some test: > > First, let's create some simple view with 2 tables join: > drop view pm; > create view pm as > select > id_przystanku, > m.nazwa > from > przystanki p > join miasta m using (id_miasta); > > explain select * from pm where id_przystanku=1230; > Nested Loop (cost=0.00..6.26 rows=1 width=23) > -> Index Scan using przystanki_pkey on przystanki p (cost=0.00..3.14 rows=1 width=8) > -> Index Scan using miasta_pkey on miasta m (cost=0.00..3.10 rows=1 width=15) > > > Next, let's try query using this view 2 times with explicit join: > explain select * from pm a join pm b using(id_przystanku) where id_przystanku=1230; > Hash Join (cost=13.00..30.10 rows=1 width=46) > -> Hash Join (cost=6.74..21.02 rows=374 width=23) > -> Seq Scan on przystanki p (cost=0.00..7.74 rows=374 width=8) > -> Hash (cost=5.99..5.99 rows=299 width=15) > -> Seq Scan on miasta m (cost=0.00..5.99 rows=299 width=15) > -> Hash (cost=6.26..6.26 rows=1 width=23) > -> Nested Loop (cost=0.00..6.26 rows=1 width=23) > -> Index Scan using przystanki_pkey on przystanki p (cost=0.00..3.14 rows=1 width=8) > -> Index Scan using miasta_pkey on miasta m (cost=0.00..3.10 rows=1 width=15) > > And now similiar view, but without nesting views: > drop view pm2; > create view pm2 as > select > id_przystanku, > m1.nazwa as nazwa1, > m2.nazwa as nazwa2 > from > przystanki p1 > join miasta m1 using (id_miasta) > join przystanki p2 using (id_przystanku) > join miasta m2 on (m2.id_miasta=p2.id_miasta); That's not the same join for optimization purposes since postgresql treats explicit join syntax as a constraint on the ordering of joins. The same join would be something like: przystanki p1 join miasta m1 using (id_miasta) join (przystanki p2 join miasta m2 using (id_miasta))using (id_przystanku) minus the fact I think you'd need some explicit naming in there.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > ... but this is a general transitivity constraint AFAIK, not > one actually to do with views (ie, if you wrote out the query without a > view, you can run into the same issue). It's somewhat easier to run into > the case with views and the effect may be exasperated by views, but it's > a general condition. Right. Views are just macros --- they don't in themselves affect the planner's ability to generate a good plan. But they make it easier to generate baroque queries without thinking much about what you're doing, and in complex queries the planner doesn't always make the deductions and simplifications that are obvious to a human. > For example: > create table a(a int); > create table c(a int); > sszabo=# explain select * from a join c using (a) where a=3; > QUERY PLAN > ------------------------------------------------------------- > Hash Join (cost=1.01..26.08 rows=6 width=8) > Hash Cond: ("outer".a = "inner".a) > -> Seq Scan on c (cost=0.00..20.00 rows=1000 width=4) > -> Hash (cost=1.01..1.01 rows=1 width=4) > -> Seq Scan on a (cost=0.00..1.01 rows=1 width=4) > Filter: (a = 3) > (6 rows) > The filter is applied only to a. So, if you really wanted the > c.a=3 condition to be applied for whatever reason you're out of > luck. FWIW, CVS tip is brighter: the condition does propagate to both relations. Hash Join (cost=22.51..45.04 rows=1 width=8) Hash Cond: ("outer".a = "inner".a) -> Seq Scan on a (cost=0.00..22.50 rows=5width=4) Filter: (a = 3) -> Hash (cost=22.50..22.50 rows=5 width=4) -> Seq Scan on c (cost=0.00..22.50rows=5 width=4) Filter: (3 = a) The reason this is useful is that (a) fewer rows need to be joined, and (b) we may be able to make effective use of indexes on both tables. regards, tom lane
Stephan Szabo wrote: >That's not the same join for optimization purposes >since postgresql treats explicit join syntax as a >constraint on the ordering of joins. > >The same join would be something like: > >przystanki p1 join miasta m1 using (id_miasta) >join (przystanki p2 join miasta m2 using (id_miasta)) > using (id_przystanku) > >minus the fact I think you'd need some explicit naming in >there. You are right. The result of your query is: explain select * from przystanki p1 join miasta m1 using (id_miasta) join (przystanki p2 join miasta m2 using (id_miasta))using (id_przystanku) where id_przystanku=1230 Hash Join (cost=13.00..30.10 rows=1 width=128) -> Hash Join (cost=6.74..21.02 rows=374 width=64) -> Seq Scan onprzystanki p2 (cost=0.00..7.74 rows=374 width=41) -> Hash (cost=5.99..5.99 rows=299 width=23) -> Seq Scan on miasta m2 (cost=0.00..5.99 rows=299 width=23) -> Hash (cost=6.26..6.26 rows=1 width=64) -> Nested Loop (cost=0.00..6.26 rows=1 width=64) -> Index Scan using przystanki_pkey on przystanki p1 (cost=0.00..3.14rows=1 width=41) -> Index Scan using miasta_pkey on miasta m1 (cost=0.00..3.10 rows=1 width=23) Anyway - is it possible to expose table "przystanki alias p2" to get valid result? The problem is similiar to my problem "sub-select with aggregate" dated on 2002-10-23 and the answer (which doesn't satisfy me) is the same: if we pass static values to "przystanki p2 join miasta m2", the query will work ok: explain select * from przystanki p1 join miasta m1 using (id_miasta) cross join (przystanki p2 join miasta m2 using (id_miasta)) X where p1.id_przystanku=1230 and X.id_przystanku=1230 Nested Loop (cost=0.00..12.52 rows=1 width=128) -> Nested Loop (cost=0.00..6.26 rows=1 width=64) -> Index Scanusing przystanki_pkey on przystanki p1 (cost=0.00..3.14 rows=1 width=41) -> Index Scan using miasta_pkey on miastam1 (cost=0.00..3.10 rows=1 width=23) -> Materialize (cost=6.26..6.26 rows=1 width=64) -> Nested Loop (cost=0.00..6.26rows=1 width=64) -> Index Scan using przystanki_pkey on przystanki p2 (cost=0.00..3.14 rows=1width=41) -> Index Scan using miasta_pkey on miasta m2 (cost=0.00..3.10 rows=1 width=23) Stephan - I have some problems with mail relay to you. Does my mail server have any open-relay problem, or something like this (213.25.37.66) ? Regards, Tomasz Myrta
On Wed, 22 Jan 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > The filter is applied only to a. So, if you really wanted the > > c.a=3 condition to be applied for whatever reason you're out of > > luck. > > FWIW, CVS tip is brighter: the condition does propagate to both relations. > > Hash Join (cost=22.51..45.04 rows=1 width=8) > Hash Cond: ("outer".a = "inner".a) > -> Seq Scan on a (cost=0.00..22.50 rows=5 width=4) > Filter: (a = 3) > -> Hash (cost=22.50..22.50 rows=5 width=4) > -> Seq Scan on c (cost=0.00..22.50 rows=5 width=4) > Filter: (3 = a) > > The reason this is useful is that (a) fewer rows need to be joined, > and (b) we may be able to make effective use of indexes on both tables. Yeah. I was going to ask how hard you thought it would be to do for this particular sort of case. I thought about the simple case of using and realized it'd probably be reasonable in amount of work, but it seems I don't have to think about it. :)
On Wed, 22 Jan 2003, Tomasz Myrta wrote: > Stephan Szabo wrote: > > >That's not the same join for optimization purposes > >since postgresql treats explicit join syntax as a > >constraint on the ordering of joins. > > > >The same join would be something like: > > > >przystanki p1 join miasta m1 using (id_miasta) > >join (przystanki p2 join miasta m2 using (id_miasta)) > > using (id_przystanku) > > > >minus the fact I think you'd need some explicit naming in > >there. > > You are right. > > The result of your query is: > explain select * from > przystanki p1 join miasta m1 using (id_miasta) > join (przystanki p2 join miasta m2 using (id_miasta)) > using (id_przystanku) > where id_przystanku=1230 > > Hash Join (cost=13.00..30.10 rows=1 width=128) > -> Hash Join (cost=6.74..21.02 rows=374 width=64) > -> Seq Scan on przystanki p2 (cost=0.00..7.74 rows=374 width=41) > -> Hash (cost=5.99..5.99 rows=299 width=23) > -> Seq Scan on miasta m2 (cost=0.00..5.99 rows=299 width=23) > -> Hash (cost=6.26..6.26 rows=1 width=64) > -> Nested Loop (cost=0.00..6.26 rows=1 width=64) > -> Index Scan using przystanki_pkey on przystanki p1 (cost=0.00..3.14 rows=1 width=41) > -> Index Scan using miasta_pkey on miasta m1 (cost=0.00..3.10 rows=1 width=23) > > Anyway - is it possible to expose table "przystanki alias p2" to get valid result? I think it's possible that the work Tom mentioned in current CVS may make these cases work the way you want. I don't have access to my test machine to try it right now however. > Stephan - I have some problems with mail relay to you. > Does my mail server have any open-relay problem, or something like this (213.25.37.66) ? Doesn't seem to be a simple open relay (but I didn't try anything complicated). It acted a little wierd about email addresses in reacting to my telnet, but I think it was acting correctly as far as standards are concerned.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Wed, 22 Jan 2003, Tom Lane wrote: >> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: >>> The filter is applied only to a. So, if you really wanted the >>> c.a=3 condition to be applied for whatever reason you're out of >>> luck. >> >> FWIW, CVS tip is brighter: the condition does propagate to both relations. > Yeah. I was going to ask how hard you thought it would be to do for > this particular sort of case. I thought about the simple case of using > and realized it'd probably be reasonable in amount of work, but it seems > I don't have to think about it. :) It could still use more eyeballs looking at it. One thing I'm concerned about is whether the extra (derived) conditions lead to double-counting restrictivity and thus underestimating the number of result rows. I haven't had time to really test that, but I suspect there may be a problem. regards, tom lane
First of all I want to thank you for all responses! I was overwhelmed with it :D Below you find the schema I'm currently using and the output of explain. I removed all comments so the mail will be small, the schema is still work in progress. I especially I need to take a look at the indexes. Any hints will be appreciated. best reghards, Ries van Twisk <----------- Here you find the output of the explain again: I cannot yet read the output of explain si I'm not sure if the output looks good or bad. echo "VACUUM ANALYZE; EXPLAIN SELECT * FROM full_cablelist WHERE projectcode=5" | psql testdb > /tmp/explain.txt NOTICE: QUERY PLAN: Hash Join (cost=26.28..39.00 rows=23 width=200) -> Hash Join (cost=24.85..37.17 rows=23 width=182) -> Hash Join (cost=23.43..35.34 rows=23 width=164) -> Seq Scan on libitems lit (cost=0.00..7.39 rows=339 width=27) -> Hash (cost=23.37..23.37 rows=23 width=137) -> Hash Join (cost=11.05..23.37rows=23 width=137) -> Hash Join (cost=9.75..21.67 rows=23 width=120) -> Seq Scan on libitems lif (cost=0.00..7.39 rows=339 width=27) -> Hash (cost=9.69..9.69 rows=23 width=93) -> Hash Join (cost=4.76..9.69 rows=23 width=93) -> Hash Join (cost=3.46..7.99 rows=23 width=76) -> Hash Join (cost=2.42..6.32 rows=69 width=63) -> Seq Scan on cablelist cl (cost=0.00..2.69 rows=69 width=41) -> Hash (cost=2.06..2.06 rows=106 width=22) -> Seq Scan on cabletypes ct (cost=0.00..2.06 rows=106 width=22) -> Hash (cost=1.04..1.04 rows=1 width=13) -> Seq Scan on projectcodes pc (cost=0.00..1.04 rows=1 width=13) -> Hash (cost=1.24..1.24 rows=24 width=17) -> Seq Scan on libconnections lcf (cost=0.00..1.24 rows=24 width=17) -> Hash (cost=1.24..1.24 rows=24 width=17) -> Seq Scan on libconnections lct (cost=0.00..1.24 rows=24 width=17) -> Hash (cost=1.34..1.34 rows=34 width=18) -> Seq Scan on shiplocationsslt (cost=0.00..1.34 rows=34 width=18) -> Hash (cost=1.34..1.34 rows=34 width=18) -> Seq Scan on shiplocations slf (cost=0.00..1.34 rows=34 width=18) <------------------ CREATE FUNCTION ord_fn (text,text) RETURNS text AS ' SELECT (CASE WHEN $1 < $2 THEN $1 || $2 ELSE $2 || $1 END) as t; ' LANGUAGE SQL WITH (iscachable); CREATE FUNCTION plpgsql_call_handler ()RETURNS OPAQUEAS '/usr/lib/postgresql/plpgsql.so' LANGUAGE 'C'; CREATE LANGUAGE 'plpgsql' HANDLER plpgsql_call_handler LANCOMPILER 'PL/pgSQL'; CREATE TABLE cabletypes ( id SERIAL, cabletype VARCHAR(24) NOT NULL CHECK ( length(cabletype) > 1 ), --Naam van de kabel coretype VARCHAR(16) NOT NULL CHECK ( length(coretype) > 1 ) -- Type kabel/aantal aders ); CREATE UNIQUE INDEX cabletypes_idx ON cabletypes (id); CREATE FUNCTION f_check_cabletypes() RETURNS OPAQUE AS ' DECLAREcheck RECORD; BEGINSELECT INTO check * FROM cabletypes WHERE cabletype=NEW.cabletype AND coretype=NEW.coretype LIMIT 1;IF FOUND THEN RAISE EXCEPTION ''[0001] cabletype and coretype combination already exsists in cabletypes!''; END IF;RETURN NEW; END; ' LANGUAGE 'plpgsql'; CREATE TRIGGER tr_cabletypesBEFORE INSERT OR UPDATE ON cabletypes FOR EACH ROWEXECUTE PROCEDURE f_check_cabletypes(); CREATE TABLE marktypes ( id SERIAL, name VARCHAR(24) NOT NULL UNIQUE, -- Naam van de markering color INTEGER NOT NULL -- Eventuele kleur ); CREATE UNIQUE INDEX marktypes_idx ON marktypes (id); CREATE TABLE projectcodes ( id SERIAL, projectcode VARCHAR(16) NOT NULL UNIQUE, -- Project code naam projectname VARCHAR(64) NOT NULL, -- Project uitleg deleted BOOLEAN DEFAULT 'false' NOT NULL ); CREATE UNIQUE INDEX projectcodes_idx ON projectcodes (id); CREATE TABLE libitems ( id SERIAL, projectcodeid INTEGER DEFAULT 0 NOT NULL REFERENCES projectcodes(id)ON DELETE CASCADE, name VARCHAR(32) NOT NULL UNIQUE -- Naam van de item bv boiler ); CREATE UNIQUE INDEX libitems_idx ON libitems(id); CREATE FUNCTION f_check_libitems() RETURNS OPAQUE AS ' DECLAREcheck RECORD; BEGIN-- Update van de name mag welIF TG_OP = ''UPDATE'' THEN IF NEW.projectcodeid = OLD.projectcodeid AND NEW.name = OLD.nameTHEN RETURN NEW; END IF;END IF; -- Controleer of the combinatie projectcode en ribnummer unique isSELECT INTO check * FROM libitems WHERE projectcodeid=new.projectcodeidAND name=new.name LIMIT 1;IF FOUND THEN RAISE EXCEPTION ''[0005] projectcodide and name combination already exsists in shiplocations!''; END IF; RETURN NEW; END; ' LANGUAGE 'plpgsql'; CREATE TRIGGER tr_linitemsBEFORE INSERT OR UPDATE ON libitems FOR EACH ROWEXECUTE PROCEDURE f_check_libitems(); CREATE TABLE libconnections ( id SERIAL, libitemid INTEGER CONSTRAINT libitemid_con NOT NULL REFERENCES libitems(id) ON DELETE CASCADE, name VARCHAR(32), -- Naam van de aansluiting aan een item cableno VARCHAR(8) ); CREATE UNIQUE INDEX libconnections_idx ON libconnections(id); CREATE FUNCTION f_check_libconnections() RETURNS OPAQUE AS ' DECLAREcheck RECORD; BEGIN-- Update van de name mag welIF TG_OP = ''UPDATE'' THEN IF NEW.libitemid = OLD.libitemid AND NEW.name = OLD.nameTHEN RETURN NEW; END IF;END IF; SELECT INTO check * FROM libconnections WHERE libitemid=NEW.libitemid AND name=NEW.name LIMIT 1;IF FOUND THEN RAISE EXCEPTION ''[0002] name and item combination already exsists in libconnections!''; END IF;RETURN NEW; END; ' LANGUAGE 'plpgsql'; CREATE TRIGGER tr_libconnectionsBEFORE INSERT OR UPDATE ON libconnections FOR EACH ROWEXECUTE PROCEDURE f_check_libconnections(); --------------- CREATE TABLE shiplocations ( id SERIAL, projectcodeid INTEGER NOT NULL REFERENCES projectcodes(id) ON DELETE CASCADE, rib SMALLINT DEFAULT 0 NOT NULL CHECK (rib>0), -- rib nummer name VARCHAR(32) NOT NULL, -- Naam van de locatie (bv voorschip, middenschip, achterschip) loc SMALLINT DEFAULT 0 NOT NULL CHECK (loc>=0 AND loc<5) ); CREATE VIEW shiplocationst AS SELECT id, projectcodeid, rib, name, CASE loc WHEN 0 THEN 'ries'when 1 THEN 'ries1'when 2THEN 'ries2'ELSE 'other' END FROM shiplocations; CREATE UNIQUE INDEX shiplocations_idx ON shiplocations(id); CREATE TABLE cablelist ( id SERIAL, cableno VARCHAR(8), projectcodeid INTEGER CONSTRAINT cablelist_projectcodes_conNOT NULL REFERENCES projectcodes(id) ON DELETE CASCADE, fromconnid INTEGER CONSTRAINT cablelist_fromconnid_con NOT NULL REFERENCES libconnections(id) ON DELETE CASCADE, toconnid INTEGER CONSTRAINT cablelist_toconnid_con NOT NULL REFERENCES libconnections(id) ON DELETE CASCADE, fromshiplocationid INTEGER CONSTRAINT cablelist_fromshiplocationid_con NOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE, toshiplocationid INTEGER CONSTRAINT cablelist_toshiplocationid_conNOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE, marktypesid INTEGER CONSTRAINT cablelist_tomarktypeid_con NOTNULL REFERENCES marktypes(id) ON DELETE CASCADE, cabletypeid INTEGER CONSTRAINT cablelist_cabletypeid_con NOT NULL REFERENCES cabletypes(id) ON DELETE CASCADE, cut BOOLEAN DEFAULT 'false' NOT NULL, placed BOOLEAN DEFAULT'false' NOT NULL, ok BOOLEAN DEFAULT 'false' ); CREATE VIEW full_cablelist AS SELECT cl.id, cl.cableno AS cableno, pc.projectcode AS projectcode, pc.id AS projectcodeid, lcf.name AS fconnection, lct.name AS tconnection, lif.name AS fitem, lit.name AS titem, slf.rib AS frib,slt.rib AS trib, slf.name AS fname, slt.name AS tname, ct.cabletype AS cabletype, ct.coretype AS coretype, cl.cut, cl.placed, cl.ok FROM cablelist AS cl, libconnections AS lcf, libconnections AS lct, libitems AS lif, libitems AS lit, shiplocations AS slf, shiplocations AS slt, projectcodes AS pc, cabletypes AS ct WHERE pc.id=cl.projectcodeid AND lcf.id=cl.fromconnid AND lct.id=cl.toconnid AND lif.id=lcf.libitemid AND lit.id=lct.libitemid AND slf.id=cl.fromshiplocationid AND slt.id=cl.toshiplocationid AND ct.id=cl.cabletypeid ; CREATE FUNCTION f_find_free_cableno(INTEGER,VARCHAR(4)) RETURNS VARCHAR(8) AS ' DECLAREpcid ALIAS FOR $1;prefix ALIAS FOR $2; cnmax INTEGER;newcableno INTEGER;CHECK RECORD;cablename VARCHAR(10); BEGINnewcableno=0;SELECT INTO cnmax count(cableno) FROM cablelist WHERE projectcodeid = pcid;WHILE newcableno <= cnmax LOOP newcableno=newcableno+1; -- Grote getallen dan 9998 worden niet toegstaan omdat de lpad functie -- ook een truncate doet IF newcableno >998 THEN RETURN NULL; END IF; -- Controleer op prefix, zo ja gebruik deze IF prefix != '''' THEN cablename = prefix || ''.'' || lpad(newcableno,3, ''0''); ELSE cablename = lpad(newcableno, 3, ''0''); END IF; -- Controleer of dit nummer al bestaad SELECT INTO CHECK * FROM cablelist WHERE projectcodeid = pcid AND cableno=cablename; IF NOT FOUND THEN RETURN cablename; END IF;END LOOP; -- Niets gevonden, hier zouden we normaal gesproken niet-- komen. Dit omdat we <= testen en dis altijd cnmax+1 testenRETURNNULL; -- Alle nummer zijn in gebruik, kies nu een volgt nummer-- newcableno zal ALTIJD <= 998 zijn en dus altijd maar 3 positieinnemennewcableno=newcableno+1; IF prefix != '''' THEN cablename = prefix || ''.'' || lpad(newcableno,3, ''0'');ELSE cablename = lpad(newcableno, 3, ''0'');END IF; RETURN cableno; END; ' LANGUAGE 'plpgsql'; > -----Oorspronkelijk bericht----- > Van: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]Namens Tomasz Myrta > Verzonden: woensdag 22 januari 2003 16:46 > Aan: Ries van Twisk > CC: pgsql-sql@postgresql.org > Onderwerp: Re: [SQL] To use a VIEW or not to use a View..... > > > Ries van Twisk wrote: > > >Dear PostgreSQL users, > > > >I have a view and a table, > > > >I understand that when a frontend accesses a VIEW that > PostgreSQL cannot use > >a index on that view. > >For example when I do this: SELECT * FROM full_cablelist WHERE > >projectocode=5; Correct? > > > >Now I just want to make sure for myself if the VIEW I > created is the right > >way to go, or is it better > >to contruct a SQL in my application that looks like the view > and send it to > >postgreSQL so it will > >use all indexes correctly. I use postgreSQL 7.2.1 > > > >I beliefe there is a change in postgreSQL 7.3.x on which I > can cache a > >view??? Not sure what the issue was. > > > >I ask this because I expect pore performance in feature when > the cablelist > >table holds up to around 20.000 rows. > >Each query to full_cablelist will return around 1200 rows. > > > >best regards, > >Ries van Twisk > > > > > > > >-- CABLE LIST > >CREATE TABLE cablelist ( > > id SERIAL, > > cableno VARCHAR(8), > > projectcodeid INTEGER CONSTRAINT > cablelist_projectcodes_con NOT NULL > >REFERENCES projectcodes(id) ON DELETE CASCADE, > > fromconnid INTEGER CONSTRAINT > cablelist_fromconnid_con NOT NULL > >REFERENCES libconnections(id) ON DELETE CASCADE, > > toconnid INTEGER CONSTRAINT > cablelist_toconnid_con NOT NULL REFERENCES > >libconnections(id) ON DELETE CASCADE, > > fromshiplocationid INTEGER CONSTRAINT > cablelist_fromshiplocationid_con > >NOT NULL REFERENCES shiplocations(id) ON DELETE CASCADE, > > toshiplocationid INTEGER CONSTRAINT > cablelist_toshiplocationid_con NOT > >NULL REFERENCES shiplocations(id) ON DELETE CASCADE, > > marktypesid INTEGER CONSTRAINT > cablelist_tomarktypeid_con NOT NULL > >REFERENCES marktypes(id) ON DELETE CASCADE, > > cabletypeid INTEGER CONSTRAINT > cablelist_cabletypeid_con NOT NULL > >REFERENCES cabletypes(id) ON DELETE CASCADE, > > cut BOOLEAN DEFAULT 'false' NOT NULL, > > placed BOOLEAN DEFAULT 'false' NOT NULL, > > ok BOOLEAN DEFAULT 'false' > >); > > > > > >-- CABLE LIST VIEW > >CREATE VIEW full_cablelist AS > >SELECT cl.id, > >cl.cableno AS cableno, > >pc.projectcode AS projectcode, > >pc.id AS projectcodeid, > >lcf.name AS fconnection, lct.name AS tconnection, > >lif.name AS fitem, lit.name AS titem, > >slf.rib AS frib,slt.rib AS trib, > >slf.name AS fname, slt.name AS tname, > >ct.cabletype AS cabletype, ct.coretype AS coretype, > >cl.cut, > >cl.placed, > >cl.ok > > > >FROM cablelist AS cl, > >libconnections AS lcf, libconnections AS lct, > >libitems AS lif, libitems AS lit, > >shiplocations AS slf, shiplocations AS slt, > >projectcodes AS pc, > >cabletypes AS ct > > > >WHERE > >pc.id=cl.projectcodeid AND > >lcf.id=cl.fromconnid AND lct.id=cl.toconnid AND > >lif.id=lcf.libitemid AND lit.id=lct.libitemid AND > >slf.id=cl.fromshiplocationid AND slt.id=cl.toshiplocationid AND > >ct.id=cl.cabletypeid > > How can we help you with table indexing, if you didn't write anything > about indexes you have already created on your tables? Anyway > you don't need > indexes on a view, but indexes on your tables. You need also > a well constructed > view. > > For your query: > - make sure, you have index on projectcodes(projectcode) - if > you have many projectcodes > and index on cablelist(projectcodeid) > - make sure, you did "vacuum analyze" before you test your query. > - send result of "explain analyze SELECT * FROM full_cablelist WHERE > projectocode=5" to us. > > Anyway result can't be too fast, because query returns >1000 > rows which is rather > a big amount of data. > > Regards, > Tomasz Myrta > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to > majordomo@postgresql.org) >
Ries van Twisk wrote: > First of all I want to thank you for all responses! I was overwhelmed with > it :D > > Below you find the schema I'm currently using and the output of explain. I > removed all comments so the mail will be small, the schema is still work in > progress. I especially I need to take a look at the indexes. Any hints will > be appreciated. > > best reghards, > Ries van Twisk > > > <----------- > Here you find the output of the explain again: > I cannot yet read the output of explain si I'm not sure if the output looks > good or bad. > > It looks like your cablelist table doesn't contain too many records, so result is inacurate. Postgresql doesn't use indexes if you have too little rows. First look on your explain is ok, your query should work fine if tables are well indexed. Make additional tests with tables containing more rows, "explain analyze" helps a bit, because it shows real times. Tomasz Myrta > echo "VACUUM ANALYZE; EXPLAIN SELECT * FROM full_cablelist WHERE > projectcode=5" | psql testdb > /tmp/explain.txt > > NOTICE: QUERY PLAN: > > Hash Join (cost=26.28..39.00 rows=23 width=200) > -> Hash Join (cost=24.85..37.17 rows=23 width=182) > -> Hash Join (cost=23.43..35.34 rows=23 width=164) > -> Seq Scan on libitems lit (cost=0.00..7.39 rows=339 > width=27) > -> Hash (cost=23.37..23.37 rows=23 width=137) > -> Hash Join (cost=11.05..23.37 rows=23 width=137) > -> Hash Join (cost=9.75..21.67 rows=23 > width=120) > -> Seq Scan on libitems lif > (cost=0.00..7.39 rows=339 width=27) > -> Hash (cost=9.69..9.69 rows=23 width=93) > -> Hash Join (cost=4.76..9.69 > rows=23 width=93) > -> Hash Join (cost=3.46..7.99 > rows=23 width=76) > -> Hash Join > (cost=2.42..6.32 rows=69 width=63) > -> Seq Scan on > cablelist cl (cost=0.00..2.69 rows=69 width=41) > -> Hash > (cost=2.06..2.06 rows=106 width=22) > -> Seq Scan > on cabletypes ct (cost=0.00..2.06 rows=106 width=22) > -> Hash (cost=1.04..1.04 > rows=1 width=13) > -> Seq Scan on > projectcodes pc (cost=0.00..1.04 rows=1 width=13) > -> Hash (cost=1.24..1.24 > rows=24 width=17) > -> Seq Scan on > libconnections lcf (cost=0.00..1.24 rows=24 width=17) > -> Hash (cost=1.24..1.24 rows=24 width=17) > -> Seq Scan on libconnections lct > (cost=0.00..1.24 rows=24 width=17) > -> Hash (cost=1.34..1.34 rows=34 width=18) > -> Seq Scan on shiplocations slt (cost=0.00..1.34 rows=34 > width=18) > -> Hash (cost=1.34..1.34 rows=34 width=18) > -> Seq Scan on shiplocations slf (cost=0.00..1.34 rows=34 > width=18)
On Wed, 22 Jan 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > On Wed, 22 Jan 2003, Tom Lane wrote: > >> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > >>> The filter is applied only to a. So, if you really wanted the > >>> c.a=3 condition to be applied for whatever reason you're out of > >>> luck. > >> > >> FWIW, CVS tip is brighter: the condition does propagate to both relations. > > > Yeah. I was going to ask how hard you thought it would be to do for > > this particular sort of case. I thought about the simple case of using > > and realized it'd probably be reasonable in amount of work, but it seems > > I don't have to think about it. :) > > It could still use more eyeballs looking at it. One thing I'm concerned > about is whether the extra (derived) conditions lead to double-counting > restrictivity and thus underestimating the number of result rows. I > haven't had time to really test that, but I suspect there may be a problem. I haven't looked at code yet but tried examples like Tomasz's and some simple ones and have gotten reasonable seeming output for the estimates given accurate statistics (joining two estimate 3 outputs, getting 8 for the estimated rows, joining that with another copy getting 50 some odd where in this case the real would be 81). Not that I did a particularly thorough test. I hope to get a chance over the next couple of days to look and run more tests. Tomasz, if you have the chance, you might want to try CVS and see what it does for the queries you've been working with.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Wed, 22 Jan 2003, Tom Lane wrote: >> It could still use more eyeballs looking at it. One thing I'm concerned >> about is whether the extra (derived) conditions lead to double-counting >> restrictivity and thus underestimating the number of result rows. I >> haven't had time to really test that, but I suspect there may be a problem. > I haven't looked at code yet but tried examples like Tomasz's and some > simple ones and have gotten reasonable seeming output for the estimates > given accurate statistics I realized this morning that there definitely is a problem. Consider this example using the regression database: regression=# explain analyze select * from tenk1 a join tenk1 b using(ten) regression-# where ten = 3; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------Merge Join (cost=1055.45..2102.12 rows=83006 width=488) (actual time=582.97..65486.57 rows=1000000 loops=1) Merge Cond: ("outer".ten= "inner".ten) -> Sort (cost=527.73..530.00 rows=910 width=244) (actual time=373.57..382.48 rows=1000 loops=1) Sort Key: a.ten -> Seq Scan on tenk1 a (cost=0.00..483.00 rows=910 width=244) (actual time=8.98..330.39rows=1000 loops=1) Filter: (ten = 3) -> Sort (cost=527.73..530.00 rows=910 width=244) (actualtime=209.19..8057.64 rows=999001 loops=1) Sort Key: b.ten -> Seq Scan on tenk1 b (cost=0.00..483.00rows=910 width=244) (actual time=0.40..193.93 rows=1000 loops=1) Filter: (3 = ten)Total runtime:73291.01 msec (11 rows) The condition "ten=3" will select 1000 rows out of the 10000 in the table. But, once we have applied that condition to both sides of the join, the join condition "a.ten = b.ten" is a no-op --- it will not reject any pair of rows coming out of the seqscans. Presently we count its restrictivity anyway, so the estimated row count at the merge is a bad underestimate. Not only should we ignore the join condition for selectivity purposes, but it's a waste of time for execution as well. We could have implemented the above query as a nestloop with no join condition, and saved the effort of the sort and merge logic. What I was thinking was that any time the code sees a "var = const" clause as part of a mergejoin equivalence set, we could mark all the "var = var" clauses in the same set as no-ops. For example, given WHERE a.f1 = b.f2 AND b.f2 = c.f3 AND c.f3 = 42 then after we finish deducing a.f1 = 42 and b.f2 = 42, there is no longer any value in either of the original clauses a.f1 = b.f2 and b.f2 = c.f3, nor in the other deduced clause a.f1 = c.f3. This would take a little bit of restructuring of generate_implied_equalities() and process_implied_equality(), but it doesn't seem too difficult to do. Thoughts? Are there any holes in that logic? regards, tom lane
On Thu, Jan 23, 2003 at 08:53:53AM -0800, Stephan Szabo wrote: > On Wed, 22 Jan 2003, Tom Lane wrote: > > I haven't looked at code yet but tried examples like Tomasz's and some > simple ones and have gotten reasonable seeming output for the estimates > given accurate statistics (joining two estimate 3 outputs, getting 8 for > the estimated rows, joining that with another copy getting 50 some odd > where in this case the real would be 81). Not that I did a > particularly thorough test. I hope to get a chance over the next couple > of days to look and run more tests. > > Tomasz, if you have the chance, you might want to try CVS and see what it > does for the queries you've been working with. Not too easy. Currently I have only windows machine with Postgresql/cygwin. I use dial-up for accessing internet, which isn't nice to use. I will try this if I find some free computer to install postgresql/linux ;-) Anyway I already gave up this kind of query, especially I can't use CVS as production server (should I?) Regards, Tomasz Myrta
On Thu, 23 Jan 2003, Tom Lane wrote: > regression=# explain analyze select * from tenk1 a join tenk1 b using(ten) > regression-# where ten = 3; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------- > Merge Join (cost=1055.45..2102.12 rows=83006 width=488) (actual time=582.97..65486.57 rows=1000000 loops=1) > Merge Cond: ("outer".ten = "inner".ten) > -> Sort (cost=527.73..530.00 rows=910 width=244) (actual time=373.57..382.48 rows=1000 loops=1) > Sort Key: a.ten > -> Seq Scan on tenk1 a (cost=0.00..483.00 rows=910 width=244) (actual time=8.98..330.39 rows=1000 loops=1) > Filter: (ten = 3) > -> Sort (cost=527.73..530.00 rows=910 width=244) (actual time=209.19..8057.64 rows=999001 loops=1) > Sort Key: b.ten > -> Seq Scan on tenk1 b (cost=0.00..483.00 rows=910 width=244) (actual time=0.40..193.93 rows=1000 loops=1) > Filter: (3 = ten) > Total runtime: 73291.01 msec > (11 rows) Yeah, I see it once I got the estimated selectivity being smaller in the joins in my test database as well. > Not only should we ignore the join condition for selectivity purposes, > but it's a waste of time for execution as well. We could have > implemented the above query as a nestloop with no join condition, and > saved the effort of the sort and merge logic. > > What I was thinking was that any time the code sees a "var = const" > clause as part of a mergejoin equivalence set, we could mark all the > "var = var" clauses in the same set as no-ops. For example, given > > WHERE a.f1 = b.f2 AND b.f2 = c.f3 AND c.f3 = 42 > > then after we finish deducing a.f1 = 42 and b.f2 = 42, there is no > longer any value in either of the original clauses a.f1 = b.f2 and > b.f2 = c.f3, nor in the other deduced clause a.f1 = c.f3. This would > take a little bit of restructuring of generate_implied_equalities() and > process_implied_equality(), but it doesn't seem too difficult to do. > > Thoughts? Are there any holes in that logic? The main thing I can think of is being careful when the types are different (like padding vs no padding in strings). Playing with text and char() the explain output appears to be resulting in the right thing for the clauses but I'm not 100% sure. Given that it only appears to bring across equality conditions and not the parts of conditions with or, I think you're right in general.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Thu, 23 Jan 2003, Tom Lane wrote: >> What I was thinking was that any time the code sees a "var = const" >> clause as part of a mergejoin equivalence set, we could mark all the >> "var = var" clauses in the same set as no-ops. For example, given >> WHERE a.f1 = b.f2 AND b.f2 = c.f3 AND c.f3 = 42 >> then after we finish deducing a.f1 = 42 and b.f2 = 42, there is no >> longer any value in either of the original clauses a.f1 = b.f2 and >> b.f2 = c.f3, nor in the other deduced clause a.f1 = c.f3. This would >> take a little bit of restructuring of generate_implied_equalities() and >> process_implied_equality(), but it doesn't seem too difficult to do. >> >> Thoughts? Are there any holes in that logic? > The main thing I can think of is being careful when the types are > different (like padding vs no padding in strings). This is a matter of being careful about marking cross-datatype operators as mergejoinable. We do not mark 'bpchar = text' as mergejoinable --- in fact we don't even have such an operator. AFAICS any pitfalls in those semantics come up already from the existing logic to treat mergejoinable equality as transitive for variables. Extending that transitivity to constants can't create problems that wouldn't exist anyway. For reference, these are the only cross-datatype mergejoinable operators as of CVS tip: regression=# select oid::regoperator,oprcode from pg_operator where oprlsortop!=0 and oprleft!=oprright; oid | oprcode --------------------------+-----------=(integer,bigint) | int48eq=(bigint,integer) | int84eq=(smallint,integer) | int24eq=(integer,smallint) | int42eq=(real,double precision) | float48eq=(double precision,real)| float84eq=(smallint,bigint) | int28eq=(bigint,smallint) | int82eq (8 rows) regards, tom lane