Thread: Creating Index
Hi! CREATE TABLE table1 ( d DATE PRIMARY KEY, amount INTEGER ); CREATE TABLE table2 ( PRIMARY KEY (y,m), y INTEGER, m INTEGER amount INTEGER ); CREATE VIEW view1 AS SELECT EXTRACT(YEAR FROM d) AS year, EXTRACT(MONTH FROM d) AS month, amount UNION ALL SELECT * from table2; Table1 contains 9000 rows and table2 contains 0 row. This query, which takes 13489 msec, is extremely slow as pgsql sequentially scans all rows in table1: EXPLAIN ANALYZE SELECT COUNT(*) FROM view1; I am in the impression that building an index on column d surely will help improve the performance but I am not smart enough to apply its usage explained in the manual. I would much appreciate if anyone could show me how to build that index something similar to (I guess) the following query (which is illegal of course): CREATE INDEX index1 ON table1 EXTRACT(YEAR FROM d) || EXTRACT(MONTH FROM d); TIA CN -- http://www.fastmail.fm - Faster than the air-speed velocity of an unladen european swallow
CN writes: > Table1 contains 9000 rows and table2 contains 0 row. This query, which > takes 13489 msec, is extremely slow as pgsql sequentially scans all rows > in table1: > > EXPLAIN ANALYZE SELECT COUNT(*) FROM view1; Unqualified count() cannot use an index because it has to visit all the rows in the table. Then again, I don't quite believe that visiting 9000 rows takes 13 seconds. Can you show us the result of EXPLAIN ANALYZE and your real table and view definitions, because the ones you showed contained a few syntax errors. -- Peter Eisentraut peter_e@gmx.net
Peter, Thanks a lot! > Unqualified count() cannot use an index because it has to visit all the > rows in the table. It is only for my test. In my real practice, queries like "SELECT * FROM view1 WHERE year = 2003 AND month BETWEEN 10 AND 12" will be performed. > Then again, I don't quite believe that visiting 9000 > rows takes 13 seconds. Can you show us the result of EXPLAIN ANALYZE and > your real table and view definitions, because the ones you showed > contained a few syntax errors. Sure. I did not post the real script as I did not want to make readers feel headache :-). The following view is, again, a simplified version. The real version, which takes 13 seconds, joins 2 more tables. -------------------------------------------- CREATE TABLE table1 ( PRIMARY KEY (f0,f1), f0 VARCHAR(20), f1 VARCHAR(20), f2 DATE, f3 DATE, f4 "char", f5 VARCHAR(30) )WITHOUT OIDS; CREATE INDEX itable1f2 ON table1 (f2); -------------------------------------------- CREATE TABLE table2 ( PRIMARY KEY (f0,f1,f2), f0 VARCHAR(20), f1 VARCHAR(20), f2 INTEGER, f3 VARCHAR(20), f4 "char", f5 CHAR(3), f6 NUMERIC, f7 NUMERIC, f8 VARCHAR(20), f9 "char", f10 VARCHAR(80), f11 VARCHAR(20) )WITHOUT OIDS; -------------------------------------------- CREATE TABLE table3 ( PRIMARY KEY (f0,f1,f2,f3,f4), f0 VARCHAR(20), f1 INTEGER, f2 VARCHAR(20), f3 VARCHAR(20), f4 INTEGER, f5 INTEGER )WITHOUT OIDS; -------------------------------------------- CREATE OR REPLACE VIEW view1 AS SELECT table1.f0 AS company ,FALSE AS IsBudget ,EXTRACT(YEAR FROM table1.f2) AS year ,EXTRACT(MONTH FROM table1.f2) AS month,table2.f8 AS department ,table2.f3 AS account ,table2.f7 AS amount FROM table1,table2 WHERE table2.f0=table1.f0 AND table2.f1=table1.f1 UNION ALL SELECT f0,TRUE,f1,f4,f3,f2,f5 FROM table3; -------------------------------------------- -------------------------------------------- db1=# \d table1 Table "public.table1"Column | Type | Modifiers --------+-----------------------+-----------f0 | character varying(20) | not nullf1 | character varying(20) | notnullf2 | date | f3 | date | f4 | "char" | f5 | charactervarying(30) | Indexes: table1_pkey primary key btree (f0, f1), itable1f2 btree (f2) db1=# \d table2 Table "public.table2"Column | Type | Modifiers --------+-----------------------+-----------f0 | character varying(20) | not nullf1 | character varying(20) | notnullf2 | integer | not nullf3 | character varying(20) | f4 | "char" | f5 | character(3) | f6 | numeric | f7 | numeric | f8 | character varying(20)| f9 | "char" | f10 | character varying(80) | f11 | character varying(20) | Indexes: table2_pkey primary key btree (f0, f1, f2) db1=# \d table3 Table "public.table3"Column | Type | Modifiers --------+-----------------------+-----------f0 | character varying(20) | not nullf1 | integer | notnullf2 | character varying(20) | not nullf3 | character varying(20) | not nullf4 | integer |not nullf5 | integer | Indexes: table3_pkey primary key btree (f0, f1, f2, f3, f4) db1=# \d view1 View "public.view1" Column | Type | Modifiers ------------+-------------------+-----------company | character varying | isbudget | boolean | year | double precision | month | double precision | department | character varying | account | character varying |amount | numeric | View definition: ((SELECT table1.f0 AS company, false AS isbudget, date_part('year'::text, table1.f2) AS "year", date_part('month'::text, table1.f2) AS "month", table2.f8 AS department, table2.f3 AS account, table2.f7 AS amount FROM table1, table2 WHERE ((table2.f0 = table1.f0) AND (table2.f1 = table1.f1))) UNION ALL (SELECT table3.f0 AS company, true AS isbudget, table3.f1 AS "year", table3.f4 AS "month", table3.f3 AS department, table3.f2 AS account, table3.f5 AS amount FROM table3)); db1=# EXPLAIN ANALYZE SELECT COUNT(*) FROM view1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=131.94..131.94 rows=1 width=324) (actualtime=5025.00..5025.01 rows=1 loops=1) -> Subquery Scan view1 (cost=0.00..129.38rows=1025 width=324) (actual time=6.14..4862.74 rows=28482 loops=1) -> Append (cost=0.00..129.38 rows=1025 width=324) (actual time=6.13..4677.45 rows=28482 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..109.38 rows=25 width=324) (actual time=6.12..4571.25 rows=28482 loops=1) -> Merge Join (cost=0.00..109.38 rows=25 width=324)(actual time=6.09..4192.39 rows=28482 loops=1) Merge Cond: (("outer".f0 = "inner".f0) AND ("outer".f1= "inner".f1)) -> Index Scan using table1_pkey on table1 (cost=0.00..52.00rows=1000 width=100) (actual time=0.69..220.87 rows=9428 loops=1) -> Index Scan using table2_pkey on table2 (cost=0.00..52.00 rows=1000 width=224) (actual time=0.63..959.95 rows=28482 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..20.00 rows=1000 width=156) (actual time=0.02..0.02 rows=0 loops=1) -> Seq Scan on table3 (cost=0.00..20.00 rows=1000 width=156) (actualtime=0.01..0.01 rows=0 loops=1)Total runtime: 5025.73 msec (11 rows) db1=# EXPLAIN ANALYZE SELECT COUNT(*) FROM table1; QUERY PLAN ------------------------------------------------------------------------------------------------------------Aggregate (cost=22.50..22.50rows=1 width=0) (actualtime=116.90..116.91 rows=1 loops=1) -> Seq Scan on table1 (cost=0.00..20.00 rows=1000width=0) (actual time=0.22..76.37 rows=9429 loops=1)Total runtime: 117.10 msec (3 rows) -- http://www.fastmail.fm - A fast, anti-spam email service.
"CN" <cnliou9@fastmail.fm> writes: > The following view is, again, a simplified version. The real version, > which takes 13 seconds, joins 2 more tables. You're really doing your best to make sure we don't figure out what's going on :-( One thing I can see from your EXPLAIN ANALYZE results, though, is that you've never VACUUMed or ANALYZEd these tables. If you had, there'd be something other than the default 1000-row table size estimates: > -> Index Scan using table1_pkey on table1 > (cost=0.00..52.00 rows=1000 width=100) > (actual time=0.69..220.87 rows=9428 loops=1) > -> Index Scan using table2_pkey on table2 > (cost=0.00..52.00 rows=1000 width=224) > (actual time=0.63..959.95 rows=28482 loops=1) and possibly the planner would have picked a more appropriate plan. regards, tom lane
> -> Seq Scan on table1 (cost=0.00..20.00 rows=1000 width=0) (actual Run VACUUM ANALYZE, then repost your EXPLAIN ANALYZE results please.
> > -> Seq Scan on table1 (cost=0.00..20.00 rows=1000 width=0) (actual > > Run VACUUM ANALYZE, then repost your EXPLAIN ANALYZE results please. QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=1858.09..1858.09 rows=1 width=156) (actualtime=5089.34..5089.34 rows=1 loops=1) -> Subquery Scan view1 (cost=187.86..1788.14rows=27980 width=156) (actual time=187.74..4952.09 rows=28482 loops=1) -> Append (cost=187.86..1788.14 rows=27980 width=156) (actual time=187.72..4787.18 rows=28482 loops=1) -> Subquery Scan "*SELECT* 1" (cost=187.86..1788.14 rows=27979 width=69) (actual time=187.72..4687.71 rows=28482 loops=1) -> Hash Join (cost=187.86..1788.14 rows=27979 width=69) (actual time=187.68..4332.30 rows=28482 loops=1) Hash Cond: ("outer".f1 = "inner".f1) Join Filter: ("outer".f0= "inner".f0) -> Seq Scan on table2 (cost=0.00..745.82 rows=28482width=47) (actual time=0.27..547.90 rows=28482 loops=1) -> Hash (cost=164.29..164.29 rows=9429 width=22) (actual time=165.17..165.17 rows=0 loops=1) -> Seq Scan on table1 (cost=0.00..164.29rows=9429 width=22) (actual time=0.23..89.18 rows=9429 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..0.00 rows=1 width=156)(actual time=0.03..0.03 rows=0 loops=1) -> Seq Scan on table3 (cost=0.00..0.00 rows=1 width=156) (actualtime=0.01..0.01 rows=0 loops=1)Total runtime: 5114.47 msec (13 rows) Thanks again! Gurus. Regards, CN -- http://www.fastmail.fm - The professional email service
On Wed, 1 Oct 2003, CN wrote: > Peter, Thanks a lot! > > > Unqualified count() cannot use an index because it has to visit all the > > rows in the table. > > It is only for my test. In my real practice, queries like > "SELECT * FROM view1 WHERE year = 2003 AND month BETWEEN 10 AND 12" > will be performed. You do realize that extract returns a double precision value not an integer, and it's probably not going to be willing to push clauses down through the union where the types are different .
> You do realize that extract returns a double precision value not an > integer, and it's probably not going to be willing to push clauses down > through the union where the types are different . > Argh! I didn't noticed that. Thanks for the reminder. Let's do not consider table2 and view1 for this moment and focus only on table1. Table1 in my original post was incorrect. Please forgive me! (I posted it midnight when my head was not clear and tried to make my case simple for understanding.) The correct one is: CREATE TABLE table1 ( id VARCHAR(20) PRIMARY KEY, d DATE, amount INTEGER ); CREATE INDEX itable1 ON table1 (d); EXPLAIN ANALYZE SELECT COUNT(*) FROM table1 WHERE EXTRACT(YEAR FROM d) >= 2001.0 AND EXTRACT(MONTH FROM d) >= 1.; takes 630 msec on my AMD 450MHz machine. While EXPLAIN ANALYZE SELECT COUNT(*) FROM table1 WHERE d >= '2001-1-1'; takes only 114 msec. ------------------Aggregate (cost=535.20..535.20 rows=1 width=0) (actualtime=625.10..625.11 rows=1 loops=1) -> Seq Scanon table1 (cost=0.00..532.58 rows=1048 width=0) (actual time=14.84..605.85 rows=3603 loops=1) Filter: ((date_part('year'::text,f2) > 2001::double precision) AND (date_part('month'::text, f2) >= 1::double precision))Total runtime: 626.61 msec -----------------------Aggregate (cost=464.12..464.12 rows=1 width=0) (actualtime=114.28..114.28 rows=1 loops=1) -> SeqScan on table1 (cost=0.00..461.86 rows=902 width=0) (actual time=10.71..102.99 rows=3603 loops=1) Filter: (f2>= '2002-01-01'::date)Total runtime: 114.50 msec Does the first query perform sequential scan? If a composit index (year,month) derived from column "d" helps and is available, then someone please show me how to build that index like: CREATE INDEX i1 ON table1 <EXTRACT(YEAR FROM d)::TEXT || EXTRACT(MONTH FROM d)::TEXT> Is creating a function that eats DATE as argument to build that index my only solution? Best Regards, CN -- http://www.fastmail.fm - The professional email service
"CN" <cnliou9@fastmail.fm> writes: > Is creating a function that eats DATE as argument to build that index my > only solution? You haven't really explained what your problem is. In general I'd not guess that "year >= X and month >= Y" is a useful operation if Y is different from 1. You'd be selecting a very discontinuous set of dates ... regards, tom lane
On Wed, 1 Oct 2003, CN wrote: > > You do realize that extract returns a double precision value not an > > integer, and it's probably not going to be willing to push clauses down > > through the union where the types are different . > > > > Argh! I didn't noticed that. Thanks for the reminder. > > Let's do not consider table2 and view1 for this moment and focus only on > table1. > Table1 in my original post was incorrect. Please forgive me! (I posted it > midnight when my head was not clear and tried to make my case simple for > understanding.) The correct one is: > > CREATE TABLE table1 > ( id VARCHAR(20) PRIMARY KEY, > d DATE, > amount INTEGER > ); > CREATE INDEX itable1 ON table1 (d); > > EXPLAIN ANALYZE SELECT COUNT(*) FROM table1 WHERE EXTRACT(YEAR FROM d) >= > 2001.0 AND EXTRACT(MONTH FROM d) >= 1.; > > takes 630 msec on my AMD 450MHz machine. While > > EXPLAIN ANALYZE SELECT COUNT(*) FROM table1 WHERE d >= '2001-1-1'; These two queries seem fairly equivalent, but WHERE EXTRACT(YEAR FROM d) >=2001 AND EXTRACT(MONTH FROM d)>=2is not equivalent to WHERE d>='2001-2-1' Are you trying to get certain months in a group of years or all months after a given fixed time point? If the former, only the former form in general will work, if the latter the former form really doesn't work at all with the exception of the case where you're doing month>=1 (which might as well mean you don't do a month test at all). > Aggregate (cost=535.20..535.20 rows=1 width=0) (actual > time=625.10..625.11 rows=1 loops=1) > -> Seq Scan on table1 (cost=0.00..532.58 rows=1048 width=0) (actual > time=14.84..605.85 rows=3603 loops=1) > Filter: ((date_part('year'::text, f2) > 2001::double precision) > AND (date_part('month'::text, f2) >= > 1::double precision)) > Total runtime: 626.61 msec > > ----------------------- > Aggregate (cost=464.12..464.12 rows=1 width=0) (actual > time=114.28..114.28 rows=1 loops=1) > -> Seq Scan on table1 (cost=0.00..461.86 rows=902 width=0) (actual > time=10.71..102.99 rows=3603 loops=1) > Filter: (f2 >= '2002-01-01'::date) > Total runtime: 114.50 msec > > Does the first query perform sequential scan? They both are. I'd have to guess that most of the real cost is coming from evaluating the conditions, which seems wierd. > If a composit index (year,month) derived from column "d" helps and is > available, then someone please show me how to build that index like: > > CREATE INDEX i1 ON table1 <EXTRACT(YEAR FROM d)::TEXT || EXTRACT(MONTH > FROM d)::TEXT> > > Is creating a function that eats DATE as argument to build that index my > only solution? For 7.3 and earlier, yes I think so and you'd have to use that form in the query. However that wouldn't help for the union query. In 7.4, you can make an index on table((extract(year from d)), (extract(month from d))) and I believe once the type issues were resolved that would get used. However, I think the evaluations of the extracts would mean that it'd still probably lose to a comparison on date (unless you want the fraction of a set of years solution).
Stephan and Tom, Many thanks! I am trying to explain the meaning of tables and views: The tables in the first SELECT in the UNION of view1 are jorunal, whose rows are entered daily. The table in the second SELECT in the UNION of view1 is budget, whose rows use year+month as key. View1 unions journal and budget to present users a pseudo table that can be selected to produce comparison reports - budget vs. actual. User will be prompted to enter the interested year+month before selecting the view. I think I have found out a way to make the index from year and month. Although this approach produces reduntant data but it should be feasible: - Create 2 more columns, year and month for table2. - Feed NEW.c3 and NEW.f4 with values EXTRACT(YEAR FROM c2) and EXTRACT(MONTH FROM c2), respectively in plpgsql trigger function. - Create composit index for year+month. There is still one thing I don't know why - query on view1 being extermely slow. I also removed the subquery from view1 to form view2. The query on view2 is swift. (postgresql v7.3.2) Both queries return 0 rows. The results are correct because table2 contains data of years on or before 2003 and table4 contains no rows. Again, I would much appreciate any idea helping me speed up view1. One off-topic issue is that I wish postgresql could be smarter to make use of the index even there are INT2 columns in composit index. As my case shows, INT2 is big enough for columns year and month, isn't it? Best Regards, CN --------------------- --This table contains 1036 rows. CREATE TABLE table1 ( c1 VARCHAR(20) PRIMARY KEY, c2 "char" )WITHOUT OIDS; --------------------- --This table contains 9255 rows. CREATE TABLE table2 ( c1 VARCHAR(20) PRIMARY KEY, c2 DATE, c3 INTEGER, --year part of c2 c4 INTEGER --month part of c2 )WITHOUT OIDS; CREATE INDEX i2c3c4 ON table2 (c3,c4); --------------------- --This table contains 28482 rows. CREATE TABLE table3 ( CONSTRAINT fk2c1 FOREIGN KEY (c1) REFERENCES table2 (c1) ON UPDATE CASCADE ON DELETE CASCADE, CONSTRAINT fk2c3 FOREIGN KEY (c3) REFERENCES table1 (c1), PRIMARY KEY (c1,c2), c1 VARCHAR(20), c2 INTEGER, c3 VARCHAR(20), c4 "char", c5 NUMERIC --amount )WITHOUT OIDS; --------------------- --This table contains 0 row. CREATE TABLE table4 ( PRIMARY KEY (c1,c2), c1 INTEGER, --year c2 INTEGER, --month c3 INTEGER )WITHOUT OIDS; --------------------- CREATE OR REPLACE VIEW view1 AS SELECT table2.c3 AS year ,table2.c4 AS month ,(SELECT CASE WHEN (table1.c2 = 'A' OR table1.c2 = 'E') AND table3.c4 ='D' OR table1.c2 IN ('L','O','I') AND table3.c4 = 'C' THEN table3.c5 ELSE 0-table3.c5 END AS amount FROMtable1 WHERE table1.c1=table3.c3 ) FROM table2,table3 WHERE table3.c1=table2.c1 UNION ALL SELECT c1,c2,c3 FROM table4; --------------------- CREATE OR REPLACE VIEW view2 AS SELECT table2.c3 AS year,table2.c4 AS month FROM table2,table3 WHERE table3.c1=table2.c1 UNION ALL SELECT c1,c2 FROM table4; --------------------- --------------------- EXPLAIN ANALYZE SELECT * FROM view1 WHERE year > 2003; ---------------------Subquery Scan view1 (cost=0.00..141.50 rows=2000 width=185) (actualtime=4348.92..4348.92 rows=0 loops=1) Filter: ("year" > 2003) -> Append (cost=0.00..141.50 rows=2000 width=185) (actual time=2.65..4230.44 rows=28482loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..121.50 rows=1000 width=185) (actual time=2.64..4127.71 rows=28482 loops=1) -> Merge Join (cost=0.00..121.50 rows=1000 width=185) (actual time=2.62..3875.23rows=28482 loops=1) Merge Cond: ("outer".c1 = "inner".c1) -> Index Scan using table2_pkey ontable2 (cost=0.00..52.00 rows=1000 width=56) (actual time=0.81..183.37 rows=9255 loops=1) -> Index Scan using table3_pkey on table3 (cost=0.00..52.00 rows=1000 width=129) (actual time=0.74..649.32 rows=28482 loops=1) SubPlan -> Index Scan using table1_pkeyon table1 (cost=0.00..4.82 rows=1 width=1) (actual time=0.07..0.07 rows=1 loops=28482) Index Cond: (c1 = $2) -> Subquery Scan "*SELECT* 2" (cost=0.00..20.00 rows=1000 width=12) (actual time=0.02..0.02 rows=0 loops=1) -> Seq Scan on table4 (cost=0.00..20.00 rows=1000 width=12) (actual time=0.01..0.01rows=0 loops=1)Total runtime: 4350.24 msec --------------------- --------------------- EXPLAIN ANALYZE SELECT * FROM view2 WHERE year > 2003; ---------------------Subquery Scan view2 (cost=36.47..119.30 rows=667 width=104) (actualtime=40.90..40.90 rows=0 loops=1) -> Append (cost=36.47..119.30 rows=667 width=104) (actual time=40.88..40.88 rows=0 loops=1) -> SubqueryScan "*SELECT* 1" (cost=36.47..96.80 rows=333 width=104) (actual time=40.85..40.85 rows=0 loops=1) -> Merge Join (cost=36.47..96.80 rows=333 width=104) (actual time=40.84..40.84rows=0 loops=1) Merge Cond: ("outer".c1 = "inner".c1) -> Index Scan using table3_pkey ontable3 (cost=0.00..52.00 rows=1000 width=48) (actual time=0.52..0.52 rows=1 loops=1) -> Sort (cost=36.47..37.30 rows=333 width=56) (actual time=40.30..40.30 rows=0 loops=1) Sort Key: table2.c1 -> Seq Scan on table2 (cost=0.00..22.50 rows=333 width=56) (actual time=38.65..38.65 rows=0 loops=1) Filter: (c3 > 2003) -> Subquery Scan "*SELECT* 2" (cost=0.00..22.50 rows=333 width=8) (actual time=0.02..0.02 rows=0 loops=1) -> Seq Scan on table4 (cost=0.00..22.50 rows=333 width=8) (actual time=0.01..0.01 rows=0 loops=1) Filter: (c1 > 2003)Total runtime: 41.86 msec -- http://www.fastmail.fm - Send your email first class
On Thu, 2 Oct 2003, CN wrote: > I am trying to explain the meaning of tables and views: > The tables in the first SELECT in the UNION of view1 are jorunal, whose > rows are entered daily. > The table in the second SELECT in the UNION of view1 is budget, whose > rows use year+month as key. > View1 unions journal and budget to present users a pseudo table that can > be selected to produce comparison reports - budget vs. actual. > User will be prompted to enter the interested year+month before selecting > the view. So, in practice you'll actually be doing queries with equality rather than ranges? > I think I have found out a way to make the index from year and month. > Although this approach produces reduntant data but it should be feasible: > - Create 2 more columns, year and month for table2. > - Feed NEW.c3 and NEW.f4 with values EXTRACT(YEAR FROM c2) and > EXTRACT(MONTH FROM c2), respectively in > plpgsql trigger function. > - Create composit index for year+month. > > There is still one thing I don't know why - query on view1 being > extermely slow. I also removed the subquery > from view1 to form view2. The query on view2 is swift. (postgresql > v7.3.2) > Both queries return 0 rows. The results are correct because table2 > contains data of years on or before 2003 > and table4 contains no rows. > > Again, I would much appreciate any idea helping me speed up view1. Well, you should probably again analyze the tables. I think it's running on default statistics again. It might do better with stats. > --------------------- > --This table contains 1036 rows. > CREATE TABLE table1 ( > c1 VARCHAR(20) PRIMARY KEY, > c2 "char" > )WITHOUT OIDS; > --------------------- > --This table contains 9255 rows. > CREATE TABLE table2 ( > c1 VARCHAR(20) PRIMARY KEY, > c2 DATE, > c3 INTEGER, --year part of c2 > c4 INTEGER --month part of c2 > )WITHOUT OIDS; > CREATE INDEX i2c3c4 ON table2 (c3,c4); > --------------------- > --This table contains 28482 rows. > CREATE TABLE table3 ( > CONSTRAINT fk2c1 FOREIGN KEY (c1) REFERENCES table2 (c1) ON UPDATE > CASCADE ON DELETE CASCADE, > CONSTRAINT fk2c3 FOREIGN KEY (c3) REFERENCES table1 (c1), > PRIMARY KEY (c1,c2), > c1 VARCHAR(20), > c2 INTEGER, > c3 VARCHAR(20), > c4 "char", > c5 NUMERIC --amount > )WITHOUT OIDS; > --------------------- > --This table contains 0 row. > CREATE TABLE table4 ( > PRIMARY KEY (c1,c2), > c1 INTEGER, --year > c2 INTEGER, --month > c3 INTEGER > )WITHOUT OIDS; > --------------------- > CREATE OR REPLACE VIEW view1 AS > SELECT > table2.c3 AS year > ,table2.c4 AS month > ,(SELECT CASE > WHEN (table1.c2 = 'A' OR table1.c2 = 'E') AND table3.c4 = 'D' > OR table1.c2 IN ('L','O','I') AND table3.c4 = 'C' > THEN table3.c5 ELSE 0-table3.c5 END > AS amount > FROM table1 > WHERE table1.c1=table3.c3 > ) > FROM table2,table3 > WHERE table3.c1=table2.c1 > > UNION ALL > > SELECT c1,c2,c3 FROM table4; > --------------------- > CREATE OR REPLACE VIEW view2 AS > SELECT table2.c3 AS year,table2.c4 AS month > FROM table2,table3 > WHERE table3.c1=table2.c1 > > UNION ALL > > SELECT c1,c2 FROM table4; > --------------------- > --------------------- > EXPLAIN ANALYZE SELECT * FROM view1 WHERE year > 2003; > --------------------- > Subquery Scan view1 (cost=0.00..141.50 rows=2000 width=185) (actual > time=4348.92..4348.92 rows=0 loops=1) > Filter: ("year" > 2003) > -> Append (cost=0.00..141.50 rows=2000 width=185) (actual > time=2.65..4230.44 rows=28482 loops=1) > -> Subquery Scan "*SELECT* 1" (cost=0.00..121.50 rows=1000 > width=185) (actual time=2.64..4127.71 > rows=28482 loops=1) > -> Merge Join (cost=0.00..121.50 rows=1000 width=185) > (actual time=2.62..3875.23 rows=28482 > loops=1) > Merge Cond: ("outer".c1 = "inner".c1) > -> Index Scan using table2_pkey on table2 > (cost=0.00..52.00 rows=1000 width=56) > (actual time=0.81..183.37 rows=9255 loops=1) > -> Index Scan using table3_pkey on table3 > (cost=0.00..52.00 rows=1000 width=129) > (actual time=0.74..649.32 rows=28482 loops=1) > SubPlan > -> Index Scan using table1_pkey on table1 > (cost=0.00..4.82 rows=1 width=1) (actual > time=0.07..0.07 rows=1 loops=28482) > Index Cond: (c1 = $2) > -> Subquery Scan "*SELECT* 2" (cost=0.00..20.00 rows=1000 > width=12) (actual time=0.02..0.02 rows=0 > loops=1) > -> Seq Scan on table4 (cost=0.00..20.00 rows=1000 > width=12) (actual time=0.01..0.01 rows=0 > loops=1) > Total runtime: 4350.24 msec > --------------------- > --------------------- > EXPLAIN ANALYZE SELECT * FROM view2 WHERE year > 2003; > --------------------- > Subquery Scan view2 (cost=36.47..119.30 rows=667 width=104) (actual > time=40.90..40.90 rows=0 loops=1) > -> Append (cost=36.47..119.30 rows=667 width=104) (actual > time=40.88..40.88 rows=0 loops=1) > -> Subquery Scan "*SELECT* 1" (cost=36.47..96.80 rows=333 > width=104) (actual time=40.85..40.85 > rows=0 loops=1) > -> Merge Join (cost=36.47..96.80 rows=333 width=104) > (actual time=40.84..40.84 rows=0 > loops=1) > Merge Cond: ("outer".c1 = "inner".c1) > -> Index Scan using table3_pkey on table3 > (cost=0.00..52.00 rows=1000 width=48) > (actual time=0.52..0.52 rows=1 loops=1) > -> Sort (cost=36.47..37.30 rows=333 width=56) > (actual time=40.30..40.30 rows=0 loops=1) > Sort Key: table2.c1 > -> Seq Scan on table2 (cost=0.00..22.50 > rows=333 width=56) (actual > time=38.65..38.65 rows=0 loops=1) > Filter: (c3 > 2003) > -> Subquery Scan "*SELECT* 2" (cost=0.00..22.50 rows=333 > width=8) (actual time=0.02..0.02 rows=0 > loops=1) > -> Seq Scan on table4 (cost=0.00..22.50 rows=333 > width=8) (actual time=0.01..0.01 rows=0 > loops=1) > Filter: (c1 > 2003) > Total runtime: 41.86 msec >
Greetings! Stephan, > So, in practice you'll actually be doing queries with equality rather than ranges? I'm so glad being understood :-) > > There is still one thing I don't know why - query on view1 being > > extermely slow. I also removed the subquery > > from view1 to form view2. The query on view2 is swift. > Well, you should probably again analyze the tables. I think it's running > on default statistics again. It might do better with stats. Not much is improved on view1 after "vacuumdb -z db1". I thought since both SELECT * FROM view1 WHERE year > 2003 and SELECT * FROM view2 WHERE year > 2003 returns 0 rows, subquery in view1 should consume no CPU and thus both queries should consume roughly the same amount of time. Why the result is contrary to my expection? ----Subquery Scan view1 (cost=180.69..1579.97 rows=28483 width=56) (actualtime=4676.80..4676.80 rows=0 loops=1) Filter:("year" > 2003) -> Append (cost=180.69..1579.97 rows=28483 width=56) (actual time=168.35..4526.08 rows=28482 loops=1) -> Subquery Scan "*SELECT* 1" (cost=180.69..1579.97 rows=28482 width=56) (actual time=168.34..4413.81 rows=28482 loops=1) -> Hash Join (cost=180.69..1579.97 rows=28482 width=56) (actual time=168.33..4197.19 rows=28482 loops=1) Hash Cond: ("outer".c1 = "inner".c1) -> Seq Scan on table3 (cost=0.00..544.82 rows=28482 width=35) (actual time=0.24..376.60 rows=28482 loops=1) -> Hash (cost=157.55..157.55 rows=9255 width=21) (actual time=166.66..166.66 rows=0 loops=1) -> Seq Scan on table2 (cost=0.00..157.55 rows=9255 width=21)(actual time=0.24..97.23 rows=9255 loops=1) SubPlan -> Index Scan using table1_pkey on table1 (cost=0.00..3.01 rows=1 width=1) (actual time=0.07..0.07 rows=1 loops=28482) Index Cond: (c1 = $2) -> Subquery Scan "*SELECT* 2" (cost=0.00..0.00 rows=1 width=12) (actual time=0.02..0.02 rows=0 loops=1) -> Seq Scan on table4 (cost=0.00..0.00 rows=1 width=12) (actual time=0.01..0.01 rows=0 loops=1)Total runtime: 4677.39 msec =========== Subquery Scan view2 (cost=0.00..35.57 rows=4 width=34) (actualtime=0.62..0.62 rows=0 loops=1) -> Append (cost=0.00..35.57rows=4 width=34) (actual time=0.61..0.61 rows=0 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..35.57rows=3 width=34) (actual time=0.59..0.59 rows=0 loops=1) -> Nested Loop (cost=0.00..35.57 rows=3 width=34) (actual time=0.58..0.58 rows=0 loops=1) -> Index Scan using i2c3c4 on table2 (cost=0.00..5.04 rows=1 width=21) (actual time=0.57..0.57 rows=0 loops=1) Index Cond: (c3 > 2003) -> Index Scan usingtable3_pkey on table3 (cost=0.00..30.42 rows=9 width=13) (never executed) Index Cond: (table3.c1 = "outer".c1) -> Subquery Scan "*SELECT* 2" (cost=0.00..0.00rows=1 width=8) (actual time=0.02..0.02 rows=0 loops=1) -> Seq Scan on table4 (cost=0.00..0.00 rows=1 width=8) (actual time=0.01..0.01 rows=0 loops=1) Filter: (c1 > 2003)Total runtime: 1.09 msec (12 rows) > > --------------------- > > --------------------- > > EXPLAIN ANALYZE SELECT * FROM view1 WHERE year > 2003; > > --------------------- > > Subquery Scan view1 (cost=0.00..141.50 rows=2000 width=185) (actual > > time=4348.92..4348.92 rows=0 loops=1) > > Filter: ("year" > 2003) > > -> Append (cost=0.00..141.50 rows=2000 width=185) (actual > > time=2.65..4230.44 rows=28482 loops=1) > > -> Subquery Scan "*SELECT* 1" (cost=0.00..121.50 rows=1000 > > width=185) (actual time=2.64..4127.71 > > rows=28482 loops=1) > > -> Merge Join (cost=0.00..121.50 rows=1000 width=185) > > (actual time=2.62..3875.23 rows=28482 > > loops=1) > > Merge Cond: ("outer".c1 = "inner".c1) > > -> Index Scan using table2_pkey on table2 > > (cost=0.00..52.00 rows=1000 width=56) > > (actual time=0.81..183.37 rows=9255 loops=1) > > -> Index Scan using table3_pkey on table3 > > (cost=0.00..52.00 rows=1000 width=129) > > (actual time=0.74..649.32 rows=28482 loops=1) > > SubPlan > > -> Index Scan using table1_pkey on table1 > > (cost=0.00..4.82 rows=1 width=1) (actual > > time=0.07..0.07 rows=1 loops=28482) > > Index Cond: (c1 = $2) > > -> Subquery Scan "*SELECT* 2" (cost=0.00..20.00 rows=1000 > > width=12) (actual time=0.02..0.02 rows=0 > > loops=1) > > -> Seq Scan on table4 (cost=0.00..20.00 rows=1000 > > width=12) (actual time=0.01..0.01 rows=0 > > loops=1) > > Total runtime: 4350.24 msec > > --------------------- > > --------------------- > > EXPLAIN ANALYZE SELECT * FROM view2 WHERE year > 2003; > > --------------------- > > Subquery Scan view2 (cost=36.47..119.30 rows=667 width=104) (actual > > time=40.90..40.90 rows=0 loops=1) > > -> Append (cost=36.47..119.30 rows=667 width=104) (actual > > time=40.88..40.88 rows=0 loops=1) > > -> Subquery Scan "*SELECT* 1" (cost=36.47..96.80 rows=333 > > width=104) (actual time=40.85..40.85 > > rows=0 loops=1) > > -> Merge Join (cost=36.47..96.80 rows=333 width=104) > > (actual time=40.84..40.84 rows=0 > > loops=1) > > Merge Cond: ("outer".c1 = "inner".c1) > > -> Index Scan using table3_pkey on table3 > > (cost=0.00..52.00 rows=1000 width=48) > > (actual time=0.52..0.52 rows=1 loops=1) > > -> Sort (cost=36.47..37.30 rows=333 width=56) > > (actual time=40.30..40.30 rows=0 loops=1) > > Sort Key: table2.c1 > > -> Seq Scan on table2 (cost=0.00..22.50 > > rows=333 width=56) (actual > > time=38.65..38.65 rows=0 loops=1) > > Filter: (c3 > 2003) > > -> Subquery Scan "*SELECT* 2" (cost=0.00..22.50 rows=333 > > width=8) (actual time=0.02..0.02 rows=0 > > loops=1) > > -> Seq Scan on table4 (cost=0.00..22.50 rows=333 > > width=8) (actual time=0.01..0.01 rows=0 > > loops=1) > > Filter: (c1 > 2003) > > Total runtime: 41.86 msec Best Regards, CN -- http://www.fastmail.fm - mmm... Fastmail...
"CN" <cnliou9@fastmail.fm> writes: > I thought since both > SELECT * FROM view1 WHERE year > 2003 > and > SELECT * FROM view2 WHERE year > 2003 > returns 0 rows, subquery in view1 should consume no CPU and thus both > queries should consume roughly the same amount of time. The reason view1 is a lot slower is that the condition on "year" isn't getting pushed down into the plan; instead the entire view result is computed and then filtered at the top step: > Subquery Scan view1 (cost=180.69..1579.97 rows=28483 width=56) (actual > time=4676.80..4676.80 rows=0 loops=1) > Filter: ("year" > 2003) > -> Append (cost=180.69..1579.97 rows=28483 width=56) (actual > time=168.35..4526.08 rows=28482 loops=1) In view2 the year condition is successfully pushed all the way to the bottom scans: > -> Index Scan using i2c3c4 on table2 > (cost=0.00..5.04 rows=1 width=21) (actual > time=0.57..0.57 rows=0 loops=1) > Index Cond: (c3 > 2003) thus eliminating a lot of join work. The reason why view1 isn't well optimized is that you've been sloppy about datatypes. It looks to me like the "SELECT CASE" business yields NUMERIC while the other arm of the UNION yields INTEGER for table4.c3. For various subtle semantic reasons we do not try to push down conditions into UNIONs when the UNION arms yield different datatypes. 7.4 would optimize this query successfully anyway because the condition isn't actually on the column with inconsistent datatype --- but 7.3 just punts if *any* of the UNION columns have inconsistent datatypes. In short: add a cast to make the UNION arms have the same datatypes. regards, tom lane
> The reason why view1 isn't well optimized is that you've been sloppy > about datatypes. It looks to me like the "SELECT CASE" business yields > NUMERIC while the other arm of the UNION yields INTEGER for table4.c3. > For various subtle semantic reasons we do not try to push down > conditions into UNIONs when the UNION arms yield different datatypes. Absolutely right! After switching table3.c5 to INTEGER, query to view1 is lightening fast: --------------Subquery Scan view1 (cost=0.00..23.18 rows=4 width=48) (actualtime=0.13..0.13 rows=0 loops=1) -> Append (cost=0.00..23.18 rows=4 width=48) (actual time=0.12..0.12 rows=0 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..23.18 rows=3 width=48) (actual time=0.11..0.11 rows=0 loops=1) -> Nested Loop (cost=0.00..23.18 rows=3 width=48) (actual time=0.10..0.10 rows=0 loops=1) -> Index Scan using i2c3c4 on table2 (cost=0.00..5.04 rows=1 width=21) (actual time=0.09..0.09 rows=0 loops=1) Index Cond: (c3 > 2003) -> Index Scan usingtable3_pkey on table3 (cost=0.00..18.09 rows=4 width=27) (never executed) Index Cond: (table3.c1 = "outer".c1) SubPlan -> Index Scan using table1_pkey on table1 (cost=0.00..3.01 rows=1 width=1) (never executed) Index Cond: (c1 = $2) -> Subquery Scan "*SELECT* 2" (cost=0.00..0.00 rows=1 width=12) (actual time=0.01..0.01 rows=0 loops=1) -> Seq Scan on table4 (cost=0.00..0.00 rows=1 width=12) (actual time=0.01..0.01 rows=0 loops=1) Filter: (c1 > 2003)Total runtime: 0.69 msec > 7.4 would optimize this query successfully anyway because the condition > isn't actually on the column with inconsistent datatype --- but 7.3 just > punts if *any* of the UNION columns have inconsistent datatypes. Apparently this postgreSQL beast has always been well under control by the fingers of you genious developers! Long live the king! Ooops! I'm sorry! Please pardon my English! Long live postgreSQL, the no. 1 DBMS, and its masters - the developers! Best Regards, CN -- http://www.fastmail.fm - Email service worth paying for. Try it for free
Hi! I thought I have got no more question in this thread, and I was wrong :-( Is the extra 300ms in UNION essential? Best Regards, CN --This table contains 1036 rows. CREATE TABLE table1 ( c1 VARCHAR(20) PRIMARY KEY, c2 "char" )WITHOUT OIDS; --------------------- --This table contains 9429 rows. CREATE TABLE table2 ( c1 VARCHAR(20) PRIMARY KEY, c2 DATE )WITHOUT OIDS; --------------------- --This table contains 28482 rows. CREATE TABLE table3 ( CONSTRAINT fk2c1 FOREIGN KEY (c1) REFERENCES table2 (c1) ON UPDATE CASCADE ON DELETE CASCADE, CONSTRAINT fk2c3 FOREIGN KEY (c3) REFERENCES table1 (c1), PRIMARY KEY (c1,c2), c1 VARCHAR(20), c2 INTEGER, c3 VARCHAR(20), c4 "char", c5 INTEGER )WITHOUT OIDS; --------------------- --This table contains 0 row. CREATE TABLE table4 ( PRIMARY KEY (c1,c2), c1 INTEGER, --year c2 INTEGER, --month c3 INTEGER )WITHOUT OIDS; --------------------- EXPLAIN ANALYZE SELECT table2.c2 AS date ,CASE WHEN (table1.c2 = 'A' OR table1.c2 = 'E') AND table3.c4 = 'D' OR table1.c2 IN ('L','O','I')AND table3.c4 = 'C' THEN table3.c5 ELSE 0-table3.c5 END AS amount FROM table2,table3,table1 WHERE table3.c1=table2.c1 AND table1.c1=table3.c3 UNION ALL SELECT TO_DATE(c1 || ' ' || c2 || ' 1','YYYY MM DD'),c3 FROM table4; ---------------Append (cost=199.64..2446.56 rows=28483 width=54) (actualtime=154.06..2200.53 rows=28482 loops=1) -> SubqueryScan "*SELECT* 1" (cost=199.64..2446.56 rows=28482 width=54) (actual time=154.05..2116.96 rows=28482 loops=1) -> Hash Join (cost=199.64..2446.56 rows=28482 width=54) (actual time=154.03..1941.40rows=28482 loops=1) Hash Cond: ("outer".c1 = "inner".c1) -> Hash Join (cost=18.95..1411.41 rows=28482 width=37) (actual time=22.85..934.94 rows=28482 loops=1) Hash Cond: ("outer".c3 = "inner".c1) -> Seq Scan on table3 (cost=0.00..822.82 rows=28482 width=27) (actual time=8.26..403.61 rows=28482 loops=1) -> Hash (cost=16.36..16.36 rows=1036 width=10) (actual time=14.18..14.18 rows=0 loops=1) -> Seq Scan on table1 (cost=0.00..16.36 rows=1036 width=10)(actual time=0.23..8.26 rows=1036 loops=1) -> Hash (cost=157.55..157.55 rows=9255 width=17) (actual time=130.63..130.63rows=0 loops=1) -> Seq Scan on table2 (cost=0.00..157.55 rows=9255 width=17) (actual time=0.24..78.22 rows=9255 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..0.00 rows=1 width=12) (actual time=0.02..0.02 rows=0 loops=1) -> Seq Scan on table4 (cost=0.00..0.00 rows=1 width=12) (actual time=0.01..0.01 rows=0 loops=1)Totalruntime: 2259.10 msec --------------------- EXPLAIN ANALYZE SELECT table2.c2 AS date ,CASE WHEN (table1.c2 = 'A' OR table1.c2 = 'E') AND table3.c4 = 'D' OR table1.c2 IN ('L','O','I')AND table3.c4 = 'C' THEN table3.c5 ELSE 0-table3.c5 END AS amount FROM table2,table3,table1 WHERE table3.c1=table2.c1 AND table1.c1=table3.c3; ---------------Hash Join (cost=199.64..2446.56 rows=28482 width=54) (actualtime=155.37..1857.25 rows=28482 loops=1) HashCond: ("outer".c1 = "inner".c1) -> Hash Join (cost=18.95..1411.41 rows=28482 width=37) (actual time=23.08..891.03rows=28482 loops=1) Hash Cond: ("outer".c3 = "inner".c1) -> Seq Scan on table3 (cost=0.00..822.82rows=28482 width=27) (actual time=8.12..386.64 rows=28482 loops=1) -> Hash (cost=16.36..16.36 rows=1036 width=10) (actual time=14.58..14.58 rows=0 loops=1) -> Seq Scan on table1 (cost=0.00..16.36 rows=1036 width=10) (actual time=0.26..9.19 rows=1036 loops=1) -> Hash (cost=157.55..157.55 rows=9255 width=17) (actual time=131.84..131.84 rows=0 loops=1) -> Seq Scan on table2 (cost=0.00..157.55 rows=9255 width=17) (actual time=0.24..77.11 rows=9255 loops=1)Total runtime: 1915.33 msec -- http://www.fastmail.fm - A no graphics, no pop-ups email service
"CN" <cnliou9@fastmail.fm> writes: > I thought I have got no more question in this thread, and I was wrong :-( > Is the extra 300ms in UNION essential? Dividing, it looks like the Append node is taking about 3 microseconds per tuple, which seems kind of a lot considering it isn't really doing anything. I suspect though that most of this overhead is due to the EXPLAIN ANALYZE instrumentation, and would in fact go away when you are just running the query. regards, tom lane