Thread: Huge Performance Difference on Similar Query in Pg7.2
Hi, I have two views: LowHiSectorGridEmissionsView: > 1.000.000 rows DataSetsView: > 30.000 rows When I now perform the two similar (results equal) queries on those rows: 1) select count(*) from EmissionsView, DataSetsView where DataSetsView.setid = EmissionsView.setid and EmissionsView.setid = '4614' ; ------------- 2) select count(*) from EmissionsView, DataSetsView where DataSetsView.setid = EmissionsView.setid and DataSetsView.setid = '4614' ; ------------ I have a huge performance difference. 1) Makes a index search on EmissionsView and a Seq scan on DataSetsView: time: 1-2s 2) Makes a index search on DataSetsView and a Seq scan on EmissionsView: time: ~50s The count(*) isn't the point here, the same is true for every other type of select. I'm just porting the database from a Ora7 on 2x200Mhz SGI machine to a Pg7.2 on a 850Mhz Athlon/Linux PC. On Ora7, both queries take about the same time: 1-2s. I cannot easily change all queries, since I got only time to work on the database, not on the frontend, which creates the queries. Why is Pg7.2 using a Seq scan at all? (Even after a vacuum analyze!) And if a Seq scan is needed: Why doesn't the query-planer change the query to the much faster version itself? And most important: How should I work on, is there a Pg internal solution for it, or do I have to go the hard way and modify the frontend? Heiko Below are the explain results: 1) emep=> explain select count(*) from EmissionsView , DataSetsView where DataSetsView.setid = EmissionsView.setid and EmissionsView.setid = '4614' ; NOTICE: QUERY PLAN: Aggregate (cost=5975.66..5975.66 rows=1 width=131) -> Hash Join (cost=5958.18..5975.58 rows=35 width=131) -> Hash Join (cost=5956.30..5972.99 rows=35 width=127) -> Hash Join (cost=5954.41..5970.40 rows=35 width=123) -> Seq Scan on reports (cost=0.00..11.78 rows=378 width=8) -> Hash (cost=5954.30..5954.30 rows=42 width=115) -> Nested Loop (cost=4375.32..5954.30 rows=42 width=115) -> Nested Loop (cost=4375.32..5812.95 rows=42 width=103) -> Seq Scan on footnotes (cost=0.00..1.01 rows=1 width=4) -> Materialize (cost=5811.31..5811.31 rows=42 width=99) -> Hash Join (cost=4375.32..5811.31 rows=42 width=99) -> Hash Join (cost=4371.95..5796.47 rows=307 width=77) -> Merge Join (cost=4370.73..5789.88 rows=307 width=50) -> Sort (cost=4370.73..4370.73 rows=28192 width=38) -> Hash Join (cost=24.06..1834.10 rows=28192 width=38) -> Hash Join (cost=22.17..1268.37 rows=28192width=34) -> Seq Scan on datasets (cost=0.00..724.86rows=33786 width=22) -> Hash (cost=21.23..21.23 rows=378 width=12) -> Hash Join (cost=1.89..21.23 rows=378width=12) -> Seq Scan on reports (cost=0.00..11.78rows=378 width=8) -> Hash (cost=1.71..1.71 rows=71width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71width=4) -> Index Scan using lhsecgriemis_setid_idx on emissions (cost=0.00..1343.91rows=368 width=12) -> Hash (cost=1.18..1.18 rows=18 width=27) -> Seq Scan on datatypes (cost=0.00..1.18 rows=18 width=27) -> Hash (cost=3.09..3.09 rows=109 width=22) -> Seq Scan on sectordefinitions (cost=0.00..3.09 rows=109 width=22) -> Index Scan using datasets_setid_idx on datasets (cost=0.00..3.37 rows=1 width=12) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) EXPLAIN 2) emep=> explain select count(*) from EmissionsView , DataSetsView where DataSetsView.setid = EmissionsView.setid and DataSetsView.setid = '4614' ; NOTICE: QUERY PLAN: Aggregate (cost=91660.36..91660.36 rows=1 width=131) -> Hash Join (cost=2732.23..91660.35 rows=3 width=131) -> Nested Loop (cost=2719.53..91058.61 rows=117798 width=103) -> Seq Scan on footnotes (cost=0.00..1.01 rows=1 width=4) -> Materialize (cost=89290.63..89290.63 rows=117798 width=99) -> Hash Join (cost=2719.53..89290.63 rows=117798 width=99) -> Hash Join (cost=2716.17..56957.45 rows=863754 width=77) -> Seq Scan on emissions (cost=0.00..18502.34 rows=1035134 width=12) -> Hash (cost=2328.69..2328.69 rows=28192 width=65) -> Hash Join (cost=25.29..2328.69 rows=28192 width=65) -> Hash Join (cost=24.06..1834.10 rows=28192 width=38) -> Hash Join (cost=22.17..1268.37 rows=28192 width=34) -> Seq Scan on datasets (cost=0.00..724.86 rows=33786 width=22) -> Hash (cost=21.23..21.23 rows=378 width=12) -> Hash Join (cost=1.89..21.23 rows=378 width=12) -> Seq Scan on reports (cost=0.00..11.78 rows=378 width=8) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71width=4) -> Hash (cost=1.71..1.71 rows=71 width=4) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Hash (cost=1.18..1.18 rows=18 width=27) -> Seq Scan on datatypes (cost=0.00..1.18 rows=18 width=27) -> Hash (cost=3.09..3.09 rows=109 width=22) -> Seq Scan on sectordefinitions (cost=0.00..3.09 rows=109 width=22) -> Hash (cost=12.70..12.70 rows=1 width=28) -> Hash Join (cost=10.62..12.70 rows=1 width=28) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Hash (cost=10.62..10.62 rows=1 width=24) -> Nested Loop (cost=0.00..10.62 rows=1 width=24) -> Nested Loop (cost=0.00..5.96 rows=1 width=16) -> Index Scan using datasets_setid_idx on datasets (cost=0.00..3.37 rows=1 width=12) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) -> Index Scan using reports_pkey on reports (cost=0.00..4.64 rows=1 width=8) EXPLAIN
can you send "explain analyze" instead of "explain" results in order to view real time? Regards
Luis Alberto Amigo Navarro writes: > can you send "explain analyze" instead of "explain" results in order to view > real time? > Regards > Here it is: 1) emep=> explain analyze select count(*) from EmissionsView , DataSetsView where DataSetsView.setid = EmissionsView.setidand EmissionsView.setid = '4614' ; NOTICE: QUERY PLAN: Aggregate (cost=5975.66..5975.66 rows=1 width=131) (actual time=1264.12..1264.13 rows=1 loops=1) -> Hash Join (cost=5958.18..5975.58 rows=35 width=131) (actual time=1205.90..1262.46 rows=1606 loops=1) -> Hash Join (cost=5956.30..5972.99 rows=35 width=127) (actual time=1197.85..1248.93 rows=1606 loops=1) -> Hash Join (cost=5954.41..5970.40 rows=35 width=123) (actual time=1197.50..1224.92 rows=1606 loops=1) -> Seq Scan on reports (cost=0.00..11.78 rows=378 width=8) (actual time=0.02..1.03 rows=378 loops=1) -> Hash (cost=5954.30..5954.30 rows=42 width=115) (actual time=1193.55..1193.55 rows=0 loops=1) -> Nested Loop (cost=4375.32..5954.30 rows=42 width=115) (actual time=1121.44..1188.23 rows=1606loops=1) -> Nested Loop (cost=4375.32..5812.95 rows=42 width=103) (actual time=1115.41..1134.58rows=1606 loops=1) -> Seq Scan on footnotes (cost=0.00..1.01 rows=1 width=4) (actual time=0.01..0.02rows=1 loops=1) -> Materialize (cost=5811.31..5811.31 rows=42 width=99) (actual time=1115.36..1116.33rows=1606 loops=1) -> Hash Join (cost=4375.32..5811.31 rows=42 width=99) (actual time=1047.90..1112.31rows=1606 loops=1) -> Hash Join (cost=4371.95..5796.47 rows=307 width=77) (actual time=1042.42..1076.85rows=1606 loops=1) -> Merge Join (cost=4370.73..5789.88 rows=307 width=50) (actualtime=1039.84..1059.42 rows=1606 loops=1) -> Sort (cost=4370.73..4370.73 rows=28192 width=38) (actualtime=996.25..1001.69 rows=3830 loops=1) -> Hash Join (cost=24.06..1834.10 rows=28192 width=38)(actual time=7.50..507.66 rows=33593 loops=1) -> Hash Join (cost=22.17..1268.37 rows=28192width=34) (actual time=7.15..303.78 rows=33593 loops=1) -> Seq Scan on datasets (cost=0.00..724.86rows=33786 width=22) (actual time=0.36..122.78 rows=33786 loops=1) -> Hash (cost=21.23..21.23 rows=378 width=12)(actual time=6.73..6.73 rows=0 loops=1) -> Hash Join (cost=1.89..21.23 rows=378width=12) (actual time=3.17..6.03 rows=377 loops=1) -> Seq Scan on reports (cost=0.00..11.78rows=378 width=8) (actual time=2.83..4.05 rows=378 loops=1) -> Hash (cost=1.71..1.71 rows=71width=4) (actual time=0.26..0.26 rows=0 loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) (actual time=0.01..0.14 rows=71 loops=1) -> Hash (cost=1.71..1.71 rows=71 width=4) (actualtime=0.27..0.27 rows=0 loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71width=4) (actual time=0.01..0.15 rows=71 loops=1) -> Index Scan using lhsecgriemis_setid_idx on emissions (cost=0.00..1343.91rows=368 width=12) (actual time=33.77..42.61 rows=1606 loops=1) -> Hash (cost=1.18..1.18 rows=18 width=27) (actual time=2.49..2.49rows=0 loops=1) -> Seq Scan on datatypes (cost=0.00..1.18 rows=18 width=27)(actual time=2.40..2.44 rows=18 loops=1) -> Hash (cost=3.09..3.09 rows=109 width=22) (actual time=5.41..5.41 rows=0loops=1) -> Seq Scan on sectordefinitions (cost=0.00..3.09 rows=109 width=22)(actual time=4.89..5.19 rows=109 loops=1) -> Index Scan using datasets_setid_idx on datasets (cost=0.00..3.37 rows=1 width=12) (actualtime=0.02..0.02 rows=1 loops=1606) -> Hash (cost=1.71..1.71 rows=71 width=4) (actual time=0.25..0.25 rows=0 loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) (actual time=0.01..0.13 rows=71 loops=1) -> Hash (cost=1.71..1.71 rows=71 width=4) (actual time=7.95..7.95 rows=0 loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) (actual time=7.69..7.83 rows=71 loops=1) Total runtime: 1361.51 msec EXPLAIN 2) emep=> explain analyze select count(*) from EmissionsView , DataSetsView where DataSetsView.setid = EmissionsView.setidand DataSetsView.setid = '4614' ; NOTICE: QUERY PLAN: Aggregate (cost=91660.36..91660.36 rows=1 width=131) (actual time=64414.80..64414.80 rows=1 loops=1) -> Hash Join (cost=2732.23..91660.35 rows=3 width=131) (actual time=58428.47..64413.14 rows=1606 loops=1) -> Nested Loop (cost=2719.53..91058.61 rows=117798 width=103) (actual time=49523.50..63005.67 rows=1025405 loops=1) -> Seq Scan on footnotes (cost=0.00..1.01 rows=1 width=4) (actual time=0.01..0.03 rows=1 loops=1) -> Materialize (cost=89290.63..89290.63 rows=117798 width=99) (actual time=49523.43..51974.76 rows=1025405loops=1) -> Hash Join (cost=2719.53..89290.63 rows=117798 width=99) (actual time=1619.56..47188.00 rows=1025405loops=1) -> Hash Join (cost=2716.17..56957.45 rows=863754 width=77) (actual time=1617.06..27358.00 rows=1035128loops=1) -> Seq Scan on emissions (cost=0.00..18502.34 rows=1035134 width=12) (actual time=6.42..5027.94rows=1035134 loops=1) -> Hash (cost=2328.69..2328.69 rows=28192 width=65) (actual time=848.29..848.29 rows=0loops=1) -> Hash Join (cost=25.29..2328.69 rows=28192 width=65) (actual time=12.33..762.21rows=33593 loops=1) -> Hash Join (cost=24.06..1834.10 rows=28192 width=38) (actual time=10.57..519.08rows=33593 loops=1) -> Hash Join (cost=22.17..1268.37 rows=28192 width=34) (actual time=10.22..315.37rows=33593 loops=1) -> Seq Scan on datasets (cost=0.00..724.86 rows=33786 width=22)(actual time=0.44..129.13 rows=33786 loops=1) -> Hash (cost=21.23..21.23 rows=378 width=12) (actual time=9.71..9.71rows=0 loops=1) -> Hash Join (cost=1.89..21.23 rows=378 width=12) (actualtime=5.23..8.98 rows=377 loops=1) -> Seq Scan on reports (cost=0.00..11.78 rows=378 width=8)(actual time=4.90..7.04 rows=378 loops=1) -> Hash (cost=1.71..1.71 rows=71 width=4) (actual time=0.26..0.26rows=0 loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71width=4) (actual time=0.01..0.14 rows=71 loops=1) -> Hash (cost=1.71..1.71 rows=71 width=4) (actual time=0.27..0.27 rows=0loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) (actualtime=0.01..0.15 rows=71 loops=1) -> Hash (cost=1.18..1.18 rows=18 width=27) (actual time=1.69..1.69 rows=0 loops=1) -> Seq Scan on datatypes (cost=0.00..1.18 rows=18 width=27) (actual time=1.60..1.64rows=18 loops=1) -> Hash (cost=3.09..3.09 rows=109 width=22) (actual time=2.44..2.44 rows=0 loops=1) -> Seq Scan on sectordefinitions (cost=0.00..3.09 rows=109 width=22) (actual time=1.91..2.20rows=109 loops=1) -> Hash (cost=12.70..12.70 rows=1 width=28) (actual time=25.72..25.72 rows=0 loops=1) -> Hash Join (cost=10.62..12.70 rows=1 width=28) (actual time=25.53..25.71 rows=1 loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) (actual time=0.01..0.14 rows=71 loops=1) -> Hash (cost=10.62..10.62 rows=1 width=24) (actual time=25.44..25.44 rows=0 loops=1) -> Nested Loop (cost=0.00..10.62 rows=1 width=24) (actual time=25.24..25.43 rows=1 loops=1) -> Nested Loop (cost=0.00..5.96 rows=1 width=16) (actual time=21.79..21.98 rows=1 loops=1) -> Index Scan using datasets_setid_idx on datasets (cost=0.00..3.37 rows=1 width=12)(actual time=15.48..15.48 rows=1 loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) (actual time=6.23..6.38 rows=71loops=1) -> Index Scan using reports_pkey on reports (cost=0.00..4.64 rows=1 width=8) (actual time=3.44..3.44rows=1 loops=1) Total runtime: 64568.82 msec EXPLAIN
first of all the difference between expected times and real times may be reduced tuning your postgresql.conf try to set enable_seqscan to off and then re explain analyze to see if there is another possible solution Regards
Luis Alberto Amigo Navarro writes: > first of all the difference between expected times and real times may be > reduced tuning your postgresql.conf > try to set enable_seqscan to off and then re explain analyze to see if there > is another possible solution > Regards About the times, how do I adjust those? Does it really makes such a huge difference or is it just for better display. I set enable_seqscan to off, and it changed to the Seq scan on emissions to a index scan on emissions, but time didn't change, still about a factor 50 between both queries: New explain analyze report of 2) emep=> explain analyze select count(*) from LowHiSectorGridEmissionsView , DataSetsView where DataSetsView.setid ='4614' and DataSetsView.setid = LowHiSectorGridEmissionsView.setid ; NOTICE: QUERY PLAN: Aggregate (cost=103440665.94..103440665.94 rows=1 width=131) (actual time=61151.81..61151.81 rows=1 loops=1) -> Hash Join (cost=100005420.21..103440665.93 rows=4 width=131) (actual time=49892.37..61150.12 rows=1606 loops=1) -> Nested Loop (cost=100005403.12..103440051.40 rows=119480 width=103) (actual time=46157.46..59840.49 rows=1025405loops=1) -> Seq Scan on footnotes (cost=100000000.00..100000001.01 rows=1 width=4) (actual time=0.01..0.03 rows=1loops=1) -> Materialize (cost=3438258.19..3438258.19 rows=119480 width=99) (actual time=46157.41..48845.75 rows=1025405loops=1) -> Hash Join (cost=5403.12..3438258.19 rows=119480 width=99) (actual time=1344.17..43964.04 rows=1025405loops=1) -> Merge Join (cost=5392.13..3405513.72 rows=871449 width=77) (actual time=1336.38..23857.31rows=1035128 loops=1) -> Sort (cost=5392.13..5392.13 rows=28443 width=65) (actual time=1323.65..1383.10 rows=31471loops=1) -> Hash Join (cost=54.91..2615.14 rows=28443 width=65) (actual time=21.62..901.24rows=33593 loops=1) -> Hash Join (cost=50.15..2112.63 rows=28443 width=38) (actual time=11.27..646.34rows=33593 loops=1) -> Hash Join (cost=43.48..1537.09 rows=28443 width=34) (actual time=10.80..443.36rows=33593 loops=1) -> Index Scan using datasets_setid_idx on datasets (cost=0.00..969.14rows=33786 width=22) (actual time=0.64..249.20 rows=33786 loops=1) -> Hash (cost=42.53..42.53 rows=378 width=12) (actual time=10.08..10.08rows=0 loops=1) -> Hash Join (cost=6.68..42.53 rows=378 width=12) (actualtime=0.53..9.39 rows=377 loops=1) -> Index Scan using reports_pkey on reports (cost=0.00..28.29rows=378 width=8) (actual time=0.07..7.25 rows=378 loops=1) -> Hash (cost=6.50..6.50 rows=71 width=4) (actual time=0.38..0.38rows=0 loops=1) -> Index Scan using areas_pkey on areas (cost=0.00..6.50rows=71 width=4) (actual time=0.01..0.24 rows=71 loops=1) -> Hash (cost=6.50..6.50 rows=71 width=4) (actual time=0.40..0.40 rows=0loops=1) -> Index Scan using areas_pkey on areas (cost=0.00..6.50 rows=71width=4) (actual time=0.02..0.27 rows=71 loops=1) -> Hash (cost=4.71..4.71 rows=18 width=27) (actual time=10.26..10.26 rows=0loops=1) -> Index Scan using datatypes_pkey on datatypes (cost=0.00..4.71 rows=18width=27) (actual time=10.15..10.23 rows=18 loops=1) -> Index Scan using lhsecgriemis_setid_idx on lowhisectorgridemissions (cost=0.00..3386569.53rows=1035134 width=12) (actual time=12.67..13841.30 rows=1035134 loops=1) -> Hash (cost=10.72..10.72 rows=109 width=22) (actual time=7.68..7.68 rows=0 loops=1) -> Index Scan using sectordefinitions_pkey on sectordefinitions (cost=0.00..10.72 rows=109width=22) (actual time=6.74..7.46 rows=109 loops=1) -> Hash (cost=17.09..17.09 rows=1 width=28) (actual time=27.32..27.32 rows=0 loops=1) -> Nested Loop (cost=0.00..17.09 rows=1 width=28) (actual time=27.29..27.31 rows=1 loops=1) -> Nested Loop (cost=0.00..12.55 rows=1 width=24) (actual time=27.26..27.28 rows=1 loops=1) -> Nested Loop (cost=0.00..8.01 rows=1 width=20) (actual time=12.04..12.05 rows=1 loops=1) -> Index Scan using datasets_setid_idx on datasets (cost=0.00..3.36 rows=1 width=12) (actualtime=4.61..4.62 rows=1 loops=1) -> Index Scan using reports_pkey on reports (cost=0.00..4.64 rows=1 width=8) (actual time=7.41..7.41rows=1 loops=1) -> Index Scan using areas_pkey on areas (cost=0.00..4.52 rows=1 width=4) (actual time=15.21..15.21rows=1 loops=1) -> Index Scan using areas_pkey on areas (cost=0.00..4.52 rows=1 width=4) (actual time=0.02..0.02 rows=1loops=1) Total runtime: 61309.75 msec EXPLAIN
In fact it seems it's no planner's mistake, you would need for some indexing how is the table layout, and what indexes u have?
Luis Alberto Amigo Navarro writes: > In fact it seems it's no planner's mistake, you would need for some indexing > how is the table layout, and what indexes u have? > Those are the main tables: CREATE TABLE DataSets (setid INTEGER NOT NULL, areaid INTEGER NULL, reportid INTEGER NOT NULL, datatype VARCHAR(12) NULL, component VARCHAR(12) NULL, year INTEGER NULL, updated DATE NULL, scaling INTEGER NULL, xcomment VARCHAR(2000) NULL, quality CHARACTER(1) DEFAULT '9' NULL, PRIMARY KEY (setid), FOREIGN KEY (areaid) REFERENCES Areas(areaid), FOREIGN KEY (reportid) REFERENCES Reports(reportid), FOREIGN KEY (datatype) REFERENCES DataTypes(datatype), FOREIGN KEY (component) REFERENCES Components(component) ); Additional indexes on areaid, reportid (as far as I understood, Primary key is allways a index?) CREATE TABLE LowHiSectorGridEmissions (setid integer not null, sector integer not null, x integer not null, y integer not null, lowemission integer null, highemission integer null, footnoteid INTEGER NULL, PRIMARY KEY (setid, sector, x, y), FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid), FOREIGN KEY (setid) REFERENCES DataSets ); Additional indexes on setid, sector, x, y. The main Views: CREATE VIEW DataSetsView AS SELECT setid, DataSets.areaid AS areaid, Areas.name AS name, Areas.acronym AS acronym, DataSets.reportid AS reportid, reportcode, reportyear, datereceived, datatype, origin, component, year, (SUBSTRING(origin FROM 1 FOR 1) || SUBSTRING(datatype FROM 1 FOR 2) || '-' || TRIM(BOTH ' ' FROM component) || '-' || TRIM(BOTH ' ' FROM Areas.acronym) || SUBSTRING(year FROM 3 FOR 2) || SUBSTRING (reportyear FROM 3 FOR 2) ) AS datasetcode, updated, scaling, DataSets.xcomment AS xcomment, datesent, datemscw, quality FROM DataSets, Areas, ReportsView WHERE DataSets.areaid = Areas.areaid AND DataSets.reportid = ReportsView.reportid ; CREATE VIEW LowHiSectorGridEmissionsView AS SELECT LowHiSectorGridEmissions.setid AS setid, LowHiSectorGridEmissions.sector AS sector, sectorcode, x, y, scaling, ('EMIS_' || scaling || '_0x' || int2hex(LowHiSectorGridEmissions.lowemission) ) AS exlowemission, ('EMIS_' || scaling || '_0x' || int2hex(LowHiSectorGridEmissions.highemission) ) AS exhighemission, ('0x' || int2hex(LowHiSectorGridEmissions.lowemission) ) AS rawlowemission, ('0x' || int2hex(LowHiSectorGridEmissions.highemission) ) AS rawhighemission, FootNotes.footnote AS footnote FROM DataSetsView, LowHiSectorGridEmissions, DataTypes, SectorDefinitions, FootNotes WHERE DataSetsView.setid = LowHiSectorGridEmissions.setid AND DataSetsView.datatype = DataTypes.datatype AND DataTypes.sectordefinition = SectorDefinitions.sectordefinition AND LowHiSectorGridEmissions.sector = SectorDefinitions.sector AND ( LowHiSectorGridEmissions.footnoteid = FootNotes.footnoteid OR FootNotes.footnoteid = '0') ; Area, Footnotes, Components, DataTypes, SectorDefinitions are all small tables. I tried also to add indexes to all things, which are part of the where clauses of the views. Without success. Heiko Below I add the complete tabledescribtion: -- $Author$ -- $Date$ -- $Header$ -- $Id$ -- $Name$ -- $Locker$ -- $Log$ -- $Revision$ -- $Source$ -- $State$ CREATE TABLE AreaCollections (collection varchar(12) not null, acronyms varchar(2000) null, description varchar(100) null, PRIMARY KEY (collection) ); CREATE TABLE Areas (areaid INTEGER NOT NULL, includedin INTEGER NULL, name VARCHAR(100) NULL, acronym VARCHAR(12) NULL, areastart DATE NULL, areaend DATE NULL, PRIMARY KEY (areaid), FOREIGN KEY (includedin) REFERENCES Areas(areaid) ); CREATE TABLE ComponentCategories (componentcategory VARCHAR(12) NOT NULL, description VARCHAR(100) NULL, PRIMARY KEY (componentcategory) ); CREATE TABLE Components (component VARCHAR(12) NOT NULL, componentcategory VARCHAR(12) NOT NULL, name VARCHAR(100) NULL, PRIMARY KEY (component), FOREIGN KEY (componentcategory) REFERENCES ComponentCategories(componentcategory) ); CREATE TABLE SectorDefinitions (sdid integer not null, sectordefinition varchar(15) not null, sector integer not null, sectorcode varchar(20) not null, anthropogenic integer not null, description varchar(2000) null, PRIMARY KEY (sectordefinition, sector), UNIQUE (sdid) ); CREATE TABLE GridDefinitions (griddefinition varchar(12) not null, xlow integer null, xhigh integer null, ylow integer null, yhigh integer null, description varchar(2000) null, PRIMARY KEY (griddefinition) ); CREATE TABLE GridAllocations (griddefinition VARCHAR(12) NOT NULL, areaid INTEGER NOT NULL, x INTEGER NOT NULL, y INTEGER NOT NULL, fraction VARCHAR(12) NULL, PRIMARY KEY (griddefinition, areaid, x, y), FOREIGN KEY (areaid) REFERENCES Areas(areaid), FOREIGN KEY (griddefinition) REFERENCES GridDefinitions(griddefinition) ); CREATE TABLE DataTypes (datatype VARCHAR(12) NOT NULL, sectordefinition VARCHAR(15) NULL, griddefinition VARCHAR(12) NULL, description VARCHAR(2000) NULL, datatable VARCHAR(100) NULL, PRIMARY KEY (datatype), FOREIGN KEY (sectordefinition) REFERENCES SectorDefinitions(sectordefinition), FOREIGN KEY (griddefinition) REFERENCES GridDefinitions (griddefinition) ); create table Origins (origin varchar(12) not null, description varchar(2000) null, PRIMARY KEY (origin) ); CREATE TABLE Reports (reportid INTEGER NOT NULL, reportcode VARCHAR(100) NULL, areaid INTEGER NULL, datereceived DATE NULL, datesent DATE NULL, datemscw DATE NULL, origin VARCHAR(12) NULL, xcomment VARCHAR(2000) NULL, PRIMARY KEY (reportid), FOREIGN KEY (areaid) REFERENCES Areas(areaid) ); CREATE TABLE DataSets (setid INTEGER NOT NULL, areaid INTEGER NULL, reportid INTEGER NOT NULL, datatype VARCHAR(12) NULL, component VARCHAR(12) NULL, year INTEGER NULL, updated DATE NULL, scaling INTEGER NULL, xcomment VARCHAR(2000) NULL, quality CHARACTER(1) DEFAULT '9' NULL, PRIMARY KEY (setid), FOREIGN KEY (areaid) REFERENCES Areas(areaid), FOREIGN KEY (reportid) REFERENCES Reports(reportid), FOREIGN KEY (datatype) REFERENCES DataTypes(datatype), FOREIGN KEY (component) REFERENCES Components(component) ); create table DataCollections (collection varchar(12) not null, description varchar(2000) null, PRIMARY KEY (collection) ); CREATE TABLE CollectionSpecs (collection VARCHAR(12) NOT NULL, setid INTEGER NOT NULL, PRIMARY KEY (collection, setid), FOREIGN KEY (collection) REFERENCES DataCollections(collection), FOREIGN KEY (setid) REFERENCES DataSets(setid) ); CREATE TABLE FootNotes (footnoteid INTEGER NOT NULL, footnote VARCHAR(2000) NULL, reportid INTEGER NULL, PRIMARY KEY (footnoteid), FOREIGN KEY (reportid) REFERENCES Reports(reportid) ); CREATE TABLE Emissions (setid INTEGER NOT NULL, emission INTEGER NULL, xcomment VARCHAR(2000) NULL, footnoteid INTEGER NULL, PRIMARY KEY (setid), FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid), FOREIGN KEY (setid) REFERENCES DataSets ); create table GridEmissions (setid integer not null, x integer not null, y integer not null, emission integer null, footnoteid INTEGER NULL, PRIMARY KEY (setid, x, y), FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid), FOREIGN KEY (setid) REFERENCES DataSets ); create table PointEmissions (setid integer not null, lps varchar(12) not null, emission integer null, footnoteid INTEGER NULL, PRIMARY KEY (setid, lps), FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid), FOREIGN KEY (setid) REFERENCES DataSets ); create table LowHiEmissions (setid integer not null, lowemission integer null, highemission integer null, xcomment varchar(2000) null, footnoteid INTEGER NULL, PRIMARY KEY (setid), FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid), FOREIGN KEY (setid) REFERENCES DataSets ); create table LowHiGridEmissions (setid integer not null, x integer not null, y integer not null, lowemission integer null, highemission integer null, footnoteid INTEGER NULL, PRIMARY KEY (setid, x, y), FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid), FOREIGN KEY (setid) REFERENCES DataSets ); create table SectorEmissions (setid integer not null, sector integer not null, emission integer null, xcomment varchar(2000) null, footnoteid INTEGER NULL, PRIMARY KEY (setid, sector), FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid), FOREIGN KEY (setid) REFERENCES DataSets ); create table SectorGridEmissions (setid integer not null, sector integer not null, x integer not null, y integer not null, emission integer null, footnoteid INTEGER NULL, PRIMARY KEY (setid, sector, x, y), FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid), FOREIGN KEY (setid) REFERENCES DataSets ); create table LowHiSectorGridEmissions (setid integer not null, sector integer not null, x integer not null, y integer not null, lowemission integer null, highemission integer null, footnoteid INTEGER NULL, PRIMARY KEY (setid, sector, x, y), FOREIGN KEY (footnoteid) REFERENCES FootNotes(footnoteid), FOREIGN KEY (setid) REFERENCES DataSets ); create table LargePointSources (lpsid integer not null, areaid integer not null, lps varchar(5) not null, name varchar(100) null, latitude integer null, longitude integer null, height integer null, exitsurface integer null, speed integer null, temperature integer null, PRIMARY KEY (areaid, lps), FOREIGN KEY (areaid) REFERENCES Areas(areaid), UNIQUE (lpsid) ); create table UpdateSessions (sesid integer not null, updatetime date not null, logname varchar(30) not null, tablename varchar(30) not null, PRIMARY KEY (sesid) ); create table UpdateRecords (sesid integer not null, seqnum integer not null, updatetype varchar(12) not null, recordid varchar(12) null, -- recordid can but must not be the Records/recordid -- this is an awful design-flaw!!! setid integer null, sector integer null, x integer null, y integer null, PRIMARY KEY (sesid,seqnum), FOREIGN KEY (sesid) REFERENCES UpdateSessions(sesid) ); create table UpdateValues (sesid integer not null, seqnum integer not null, varname varchar(30) not null, oldval varchar(2000) null, newval varchar(2000) null, PRIMARY KEY (sesid, seqnum, varname), FOREIGN KEY (sesid) REFERENCES UpdateSessions(sesid) ); create table Globals (globid varchar(12) not null, globvalue integer null, PRIMARY KEY (globid) ); create table Batches (batchid varchar(12) not null, first integer not null, last integer not null, PRIMARY KEY (batchid, first) ) -- $Author$ -- $Date$ -- $Header$ -- $Id$ -- $Name$ -- $Locker$ -- $Log$ -- $Revision$ -- $Source$ -- $State$ -- This works only as superuser. On polar, plperl is compiled into template1 -- so will exist in each db created -- DROP LANGUAGE 'plperl'; -- CREATE LANGUAGE 'plperl'; DROP FUNCTION int2hex(integer); CREATE FUNCTION int2hex(integer) RETURNS char(10) AS '$tmp = sprintf "%x", $_[0]; return ("0"x(8-length($tmp)).uc($tmp));' LANGUAGE 'plperl'; DROP VIEW SectorDefView; CREATE VIEW SectorDefView AS SELECT DISTINCT sectordefinition FROM SectorDefinitions ; DROP VIEW GridAllocView; CREATE VIEW GridAllocView AS SELECT griddefinition, Areas.areaid AS areaid, name, acronym, x, y, fraction FROM GridAllocations, Areas WHERE GridAllocations.areaid = Areas.areaid ; DROP VIEW ReportsView; CREATE VIEW ReportsView AS SELECT reportid, to_char(datereceived,'YYYY') AS reportyear, reportcode, Areas.areaid AS areaid, acronym, name, to_char(datereceived,'YYYYMMDD') AS datereceived, to_char(datesent,'YYYYMMDD') AS datesent, to_char(datemscw,'YYYYMMDD') AS datemscw, origin, xcomment FROM Reports, Areas WHERE Reports.areaid = Areas.areaid ; DROP VIEW YearOfRepView; CREATE VIEW YearOfRepView AS SELECT DISTINCT reportyear FROM ReportsView ; DROP VIEW DataSetsView; CREATE VIEW DataSetsView AS SELECT setid, DataSets.areaid AS areaid, Areas.name AS name, Areas.acronym AS acronym, DataSets.reportid AS reportid, reportcode, reportyear, datereceived, datatype, origin, component, year, (SUBSTRING(origin FROM 1 FOR 1) || SUBSTRING(datatype FROM 1 FOR 2) || '-' || TRIM(BOTH ' ' FROM component) || '-' || TRIM(BOTH ' ' FROM Areas.acronym) || SUBSTRING(year FROM 3 FOR 2) || SUBSTRING (reportyear FROM 3 FOR 2) ) AS datasetcode, updated, scaling, DataSets.xcomment AS xcomment, datesent, datemscw, quality FROM DataSets, Areas, ReportsView WHERE DataSets.areaid = Areas.areaid AND DataSets.reportid = ReportsView.reportid ; DROP VIEW EmissionsView; CREATE VIEW EmissionsView AS SELECT Emissions.setid AS setid, areaid, name, acronym, DataSetsView.reportid, reportcode, reportyear, datereceived, datatype, origin, component, year, datasetcode, updated, scaling, Emissions.emission AS emission , ('EMIS_' || scaling || '_0x' || int2hex(Emissions.emission) ) AS exemission, Emissions.xcomment AS xcomment, FootNotes.footnote AS footnote, ('0x' || int2hex(Emissions.emission) ) AS rawemission FROM DataSetsView, Emissions, FootNotes WHERE DataSetsView.setid = Emissions.setid AND ( Emissions.footnoteid = FootNotes.footnoteid OR FootNotes.footnoteid = '0' ) ; DROP VIEW GridEmissionsView; CREATE VIEW GridEmissionsView AS SELECT GridEmissions.setid AS setid, x, y, scaling, GridEmissions.emission AS emission, ('EMIS_' || scaling || '_0x' || int2hex(GridEmissions.emission) ) AS exemission, ('0x' || int2hex(GridEmissions.emission) ) AS rawemission, FootNotes.footnote as footnote FROM DataSetsView, GridEmissions, FootNotes WHERE DataSetsView.setid = GridEmissions.setid AND ( GridEmissions.footnoteid = FootNotes.footnoteid OR FootNotes.footnoteid = '0') ; DROP VIEW LowHiEmissionsView; CREATE VIEW LowHiEmissionsView AS SELECT LowHiEmissions.setid AS setid, areaid, name, acronym, DataSetsView.reportid, reportcode, reportyear, datereceived, datatype, origin, component, year, datasetcode, updated, scaling, ('EMIS_' || scaling || '_0x' || int2hex(LowHiEmissions.lowemission) ) AS exlowemission, ('EMIS_' || scaling || '_0x' || int2hex(LowHiEmissions.highemission) ) AS exhighemission, LowHiEmissions.xcomment AS xcomment, ('0x' || int2hex(LowHiEmissions.lowemission) ) AS rawlowemission, ('0x' || int2hex(LowHiEmissions.highemission) ) AS rawhighemission, datesent, datemscw, FootNotes.footnote AS footnote FROM DataSetsView, LowHiEmissions, FootNotes WHERE DataSetsView.setid = LowHiEmissions.setid AND ( LowHiEmissions.footnoteid = FootNotes.footnoteid OR FootNotes.footnoteid = '0' ) ; DROP VIEW LowHiGridEmissionsView; CREATE VIEW LowHiGridEmissionsView AS SELECT LowHiGridEmissions.setid AS setid, x, y, scaling, ('EMIS_' || scaling || '_0x' || int2hex(LowHiGridEmissions.lowemission) ) AS exlowemission, ('EMIS_' || scaling || '_0x' || int2hex(LowHiGridEmissions.highemission) ) AS exhighemission, ('0x' || int2hex(LowHiGridEmissions.lowemission) ) AS rawlowemission, ('0x' || int2hex(LowHiGridEmissions.highemission) ) AS rawhighemission, FootNotes.footnote AS footnote FROM DataSetsView, LowHiGridEmissions, FootNotes WHERE DataSetsView.setid = LowHiGridEmissions.setid AND ( LowHiGridEmissions.footnoteid = FootNotes.footnoteid OR FootNotes.footnoteid = '0') ; DROP VIEW SectorEmissionsView; CREATE VIEW SectorEmissionsView AS SELECT SectorEmissions.setid AS setid, areaid, name, acronym, DataSetsView.reportid, reportcode, reportyear, datereceived, DataSetsView.datatype AS datatype, origin, component, year, datasetcode, updated, scaling, SectorEmissions.sector AS sector, sectorcode, ('EMIS_' || scaling || '_0x' || int2hex(SectorEmissions.emission) ) AS exemission, SectorEmissions.xcomment AS xcomment, ('0x' || int2hex(SectorEmissions.emission) ) AS rawemission, datesent, datemscw, FootNotes.footnote AS footnote FROM DataSetsView, SectorEmissions, DataTypes, SectorDefinitions, FootNotes WHERE DataSetsView.setid = SectorEmissions.setid AND DataSetsView.datatype = DataTypes.datatype AND DataTypes.sectordefinition = SectorDefinitions.sectordefinition AND SectorEmissions.sector = SectorDefinitions.sector AND ( SectorEmissions.footnoteid = FootNotes.footnoteid OR FootNotes.footnoteid = '0') ; DROP VIEW SectorGridEmissionsView; CREATE VIEW SectorGridEmissionsView AS SELECT SectorGridEmissions.setid AS setid, SectorGridEmissions.sector AS sector, sectorcode, x, y, scaling, ('EMIS_' || scaling || '_0x' || int2hex(SectorGridEmissions.emission) ) AS exemission, ('0x' || int2hex(SectorGridEmissions.emission) ) AS rawemission, FootNotes.footnote AS footnote FROM DataSetsView, SectorGridEmissions, DataTypes, SectorDefinitions, FootNotes WHERE DataSetsView.setid = SectorGridEmissions.setid AND DataSetsView.datatype = DataTypes.datatype AND DataTypes.sectordefinition = SectorDefinitions.sectordefinition AND SectorGridEmissions.sector = SectorDefinitions.sector AND ( SectorGridEmissions.footnoteid = FootNotes.footnoteid OR FootNotes.footnoteid = '0' ) ; DROP VIEW LowHiSectorGridEmissionsView; CREATE VIEW LowHiSectorGridEmissionsView AS SELECT LowHiSectorGridEmissions.setid AS setid, LowHiSectorGridEmissions.sector AS sector, sectorcode, x, y, scaling, ('EMIS_' || scaling || '_0x' || int2hex(LowHiSectorGridEmissions.lowemission) ) AS exlowemission, ('EMIS_' || scaling || '_0x' || int2hex(LowHiSectorGridEmissions.highemission) ) AS exhighemission, ('0x' || int2hex(LowHiSectorGridEmissions.lowemission) ) AS rawlowemission, ('0x' || int2hex(LowHiSectorGridEmissions.highemission) ) AS rawhighemission, FootNotes.footnote AS footnote FROM DataSetsView, LowHiSectorGridEmissions, DataTypes, SectorDefinitions, FootNotes WHERE DataSetsView.setid = LowHiSectorGridEmissions.setid AND DataSetsView.datatype = DataTypes.datatype AND DataTypes.sectordefinition = SectorDefinitions.sectordefinition AND LowHiSectorGridEmissions.sector = SectorDefinitions.sector AND ( LowHiSectorGridEmissions.footnoteid = FootNotes.footnoteid OR FootNotes.footnoteid = '0') ; DROP VIEW EmissionYearView; CREATE VIEW EmissionYearView AS SELECT DISTINCT year FROM DataSets ; DROP VIEW CollSpecsView; CREATE VIEW CollSpecsView AS SELECT collection, CollectionSpecs.setid AS setid, datasetcode FROM CollectionSpecs, DataSetsView WHERE CollectionSpecs.setid = DataSetsView.setid ; DROP VIEW LaPoSoView; CREATE VIEW LaPoSoView AS SELECT LargePointSources.lpsid AS lpsid, LargePointSources.areaid AS areaid, Areas.name AS name, acronym, lps, (acronym || lps) AS lpscode, LargePointSources.name AS lpsname, latitude, longitude, height, exitsurface, speed, temperature FROM LargePointSources, Areas WHERE LargePointSources.areaid = Areas.areaid ; DROP VIEW EmissionsUpdView; CREATE VIEW EmissionsUpdView AS SELECT setid, xcomment, ('0x' || int2hex(Emissions.emission) ) AS rawemission FROM Emissions ;
> Aggregate (cost=5975.66..5975.66 rows=1 width=131) (actual time=1264.12..1264.13 rows=1 loops=1) > -> Hash Join (cost=5958.18..5975.58 rows=35 width=131) (actual time=1205.90..1262.46 rows=1606 loops=1) > -> Hash Join (cost=5956.30..5972.99 rows=35 width=127) (actual time=1197.85..1248.93 rows=1606 loops=1) 2) > emep=> explain analyze select count(*) from EmissionsView , DataSetsView where DataSetsView.setid = EmissionsView.setid and DataSetsView.setid = '4614' ; > NOTICE: QUERY PLAN: > > Aggregate (cost=91660.36..91660.36 rows=1 width=131) (actual time=64414.80..64414.80 rows=1 loops=1) > -> Hash Join (cost=2732.23..91660.35 rows=3 width=131) (actual time=58428.47..64413.14 rows=1606 loops=1) > -> Nested Loop (cost=2719.53..91058.61 rows=117798 width=103) (actual time=49523.50..63005.67 rows=1025405 loops=1) I think here is the difference, in the first case you are performing hash-join from 33000 rows, in second case it is from 1M rows, so it is wasting time on hashing, it seems that 1st conditions are more restrictive than the second one. I mean there are less 4614 on EmissionsView than on DataSetsView Regards
On Fri, 22 Mar 2002, Luis Alberto Amigo Navarro wrote: > > Aggregate (cost=5975.66..5975.66 rows=1 width=131) (actual > time=1264.12..1264.13 rows=1 loops=1) > > -> Hash Join (cost=5958.18..5975.58 rows=35 width=131) (actual > time=1205.90..1262.46 rows=1606 loops=1) > > -> Hash Join (cost=5956.30..5972.99 rows=35 width=127) (actual > time=1197.85..1248.93 rows=1606 loops=1) > 2) > > emep=> explain analyze select count(*) from EmissionsView , > DataSetsView where DataSetsView.setid = EmissionsView.setid and > DataSetsView.setid = '4614' ; > > NOTICE: QUERY PLAN: > > > > Aggregate (cost=91660.36..91660.36 rows=1 width=131) (actual > time=64414.80..64414.80 rows=1 loops=1) > > -> Hash Join (cost=2732.23..91660.35 rows=3 width=131) (actual > time=58428.47..64413.14 rows=1606 loops=1) > > -> Nested Loop (cost=2719.53..91058.61 rows=117798 width=103) > (actual time=49523.50..63005.67 rows=1025405 loops=1) > > I think here is the difference, in the first case you are performing > hash-join from 33000 rows, in second case it is from 1M rows, so it is > wasting time on hashing, it seems that 1st conditions are more restrictive > than the second one. I mean there are less 4614 on EmissionsView than on > DataSetsView > Regards > If I may be so bold as to join this thread as well, this is a close enough match to the subject of my long pending post I mentioned in another thread to warrent it I believe. I composed my message within a newsgroup thread but I had trouble posting it there. Therefore I have some message fragments quoted at the start which will not have been seen on this list. To summarise: it's long (sorry), it's mostly to do with the planner not using an index and it's long. Here goes...
Heiko Klein <Heiko.Klein@met.no> writes: > When I now perform the two similar (results equal) queries on those > rows: > 1) > select count(*) from EmissionsView, DataSetsView > where DataSetsView.setid = EmissionsView.setid > and EmissionsView.setid = '4614' ; > ------------- > 2) > select count(*) from EmissionsView, DataSetsView > where DataSetsView.setid = EmissionsView.setid > and DataSetsView.setid = '4614' ; > ------------ > I have a huge performance difference. Apparently the condition EmissionsView.setid = '4614' is a lot more selective on that table than DataSetsView.setid = '4614' is on its table. So pushing down the former condition translates into lots fewer rows to be joined than pushing down the latter. Unfortunately the planner is not very bright about transitivity of equals and so does not realize that it could derive EmissionsView.setid = '4614' from the other two conditions. I imagine the reason Oracle gives similar behavior for both queries is that it does expend the cycles to make that deduction. (This is a guess though; it'd be interesting to see their query plans.) Given the complexity of the plans, it may just be that the planner is switching into GEQO search mode and is failing to find the best plan. You might consider setting geqo_threshold larger than the number of base tables in the query (looks like 12, here) to see if better plans emerge. It'd also be worth asking whether you couldn't simplify your views ;-). Do you really require a 12-way join to do whatever it is you're doing? I'm pretty amazed that the system is able to find a good plan for either query ... regards, tom lane
Tom Lane writes: > Unfortunately the planner is not very bright about transitivity of > equals and so does not realize that it could derive EmissionsView.setid > = '4614' from the other two conditions. I imagine the reason Oracle > gives similar behavior for both queries is that it does expend the > cycles to make that deduction. (This is a guess though; it'd be > interesting to see their query plans.) I don't know how to retrieve the query plan from oracle7. Any hints and I will send them > > Given the complexity of the plans, it may just be that the planner is > switching into GEQO search mode and is failing to find the best plan. > You might consider setting geqo_threshold larger than the number of > base tables in the query (looks like 12, here) to see if better plans > emerge. Wow, thats impressive. By changing the geqo_threshold from 11 to 12, I reduced time from 70000ms to 700ms. Increasing step by step it stay constant until 15, then I have 70000ms at 16 and 17, and drop to 200ms at 18 and stay between 100 and 150 when I increase it further. So why not using a very high geqo_threshold (maybe 100) as threshold? (See new query-plan below with geqo_threshold = 20) > It'd also be worth asking whether you couldn't simplify your views ;-). > Do you really require a 12-way join to do whatever it is you're doing? > I'm pretty amazed that the system is able to find a good plan for either > query ... Exactly that was the problems, all queries where quite slow. This one was only one I used very often. But I cannot do everything at once. I just convinced my boss to switch from Oracle to Postgres, and that this wouldn't take a long time. So the frontend is important for the views (it asks for them), and I may not change this before I haven't managed to converert to Postgres smoothly. Short: its politics But thanks a lot for your help, that was really usefull and a huge step forward for me. Heiko emep=> explain analyze select count(*) from LowHiSectorGridEmissionsView , DataSetsView where DataSetsView.setid ='4614' and DataSetsView.setid = LowHiSectorGridEmissionsView.setid ; NOTICE: QUERY PLAN: Aggregate (cost=1992.56..1992.56 rows=1 width=131) (actual time=112.80..112.80 rows=1 loops=1) -> Nested Loop (cost=1989.38..1992.55 rows=4 width=131) (actual time=105.12..111.02 rows=1606 loops=1) -> Seq Scan on footnotes (cost=0.00..1.01 rows=1 width=4) (actual time=0.03..0.04 rows=1 loops=1) -> Materialize (cost=1991.49..1991.49 rows=4 width=127) (actual time=105.06..106.01 rows=1606 loops=1) -> Hash Join (cost=1989.38..1991.49 rows=4 width=127) (actual time=75.34..102.23 rows=1606 loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) (actual time=0.02..0.19 rows=71 loops=1) -> Hash (cost=1989.37..1989.37 rows=4 width=123) (actual time=75.15..75.15 rows=0 loops=1) -> Hash Join (cost=1984.27..1989.37 rows=4 width=123) (actual time=42.51..71.05 rows=1606 loops=1) -> Seq Scan on sectordefinitions (cost=0.00..3.09 rows=109 width=22) (actual time=0.01..0.35rows=109 loops=1) -> Hash (cost=1984.20..1984.20 rows=26 width=101) (actual time=30.29..30.29 rows=0 loops=1) -> Nested Loop (cost=22.01..1984.20 rows=26 width=101) (actual time=1.02..25.55 rows=1606loops=1) -> Hash Join (cost=22.01..24.08 rows=1 width=89) (actual time=0.97..1.19 rows=1loops=1) -> Seq Scan on areas (cost=0.00..1.71 rows=71 width=4) (actual time=0.01..0.16rows=71 loops=1) -> Hash (cost=22.00..22.00 rows=1 width=85) (actual time=0.87..0.87 rows=0loops=1) -> Nested Loop (cost=16.07..22.00 rows=1 width=85) (actual time=0.81..0.87rows=1 loops=1) -> Hash Join (cost=16.07..17.35 rows=1 width=77) (actualtime=0.78..0.84 rows=1 loops=1) -> Seq Scan on datatypes (cost=0.00..1.18 rows=18 width=27)(actual time=0.01..0.06 rows=18 loops=1) -> Hash (cost=16.07..16.07 rows=1 width=50) (actualtime=0.69..0.69 rows=0 loops=1) -> Nested Loop (cost=8.02..16.07 rows=1 width=50)(actual time=0.31..0.68 rows=1 loops=1) -> Nested Loop (cost=8.02..12.69 rows=1width=28) (actual time=0.28..0.65 rows=1 loops=1) -> Hash Join (cost=8.02..10.09 rows=1width=24) (actual time=0.21..0.39 rows=1 loops=1) -> Seq Scan on areas (cost=0.00..1.71rows=71 width=4) (actual time=0.01..0.14 rows=71 loops=1) -> Hash (cost=8.01..8.01 rows=1width=20) (actual time=0.11..0.11 rows=0 loops=1) -> Nested Loop (cost=0.00..8.01rows=1 width=20) (actual time=0.10..0.11 rows=1 loops=1) -> Index Scan usingdatasets_setid_idx on datasets (cost=0.00..3.36 rows=1 width=12) (actual time=0.06..0.06 rows=1 loops=1) -> Index Scan usingreports_pkey on reports (cost=0.00..4.64 rows=1 width=8) (actual time=0.03..0.03 rows=1 loops=1) -> Seq Scan on areas (cost=0.00..1.71rows=71 width=4) (actual time=0.00..0.16 rows=71 loops=1) -> Index Scan using datasets_setid_idx ondatasets (cost=0.00..3.36 rows=1 width=22) (actual time=0.02..0.02 rows=1 loops=1) -> Index Scan using reports_pkey on reports (cost=0.00..4.64rows=1 width=8) (actual time=0.01..0.01 rows=1 loops=1) -> Index Scan using lhsecgriemis_setid_idx on lowhisectorgridemissions (cost=0.00..1953.36rows=541 width=12) (actual time=0.04..7.30 rows=1606 loops=1) Total runtime: 114.50 msec EXPLAIN
> I've read lots of messages on this subject, and the FAQ, recently but I'm still > confused. None of what I have read seems to account for the execution plans I'm > seeing, except for this mention of scanning a million values in an index and > discarding all but a small number. However, even this I can not see applies to > a primary key on a million row table. I upgraded from 7.0.x to 7.2 because I > was thinking it was an oddity that was probably fixed. > > First, my apologies for the length of this posting. > > Next some background: > > I have a database where one of it's tables records 'sessions', called > chat_sessions. It has an integer field, session_id, declared as primary key. > This table is the small table in the example with only about 2000 rows. > > There is another table called chat_post, the large table holding about 1 > million rows. It has two integer fields, session_id and post_number, which Are they actually integers (int4), or are either of them a different type like int2 or int8? There are special case workarounds for those two due to a problem with the types of integer literals. This looks likely since even with seq_scan set off it wanted to do a sequence scan which generally means it doesn't believe it can use the index. > explain analyze select count(*) from chat_post cp where cp.session_id > = 123; Does cp.session_id='123' give something different? > Show looking up in large table, selecting on primary key, uses > sequential scan on large > > explain analyze select count(*) from chat_post cp where cp.session_id > = 123 and cp.post_number = 10; Same here for '123' and '10'.
On Fri, 22 Mar 2002, Stephan Szabo wrote: > > I wrote: > > > > I've read lots of messages on this subject, and the FAQ, recently but I'm still > > confused. None of what I have read seems to account for the execution plans I'm > > seeing, except for this mention of scanning a million values in an index and > > discarding all but a small number. However, even this I can not see applies to > > a primary key on a million row table. I upgraded from 7.0.x to 7.2 because I > > was thinking it was an oddity that was probably fixed. > > > > First, my apologies for the length of this posting. > > > > Next some background: > > > > I have a database where one of it's tables records 'sessions', called > > chat_sessions. It has an integer field, session_id, declared as primary key. > > This table is the small table in the example with only about 2000 rows. > > > > There is another table called chat_post, the large table holding about 1 > > million rows. It has two integer fields, session_id and post_number, which > > Are they actually integers (int4), or are either of them a different type > like int2 or int8? There are special case workarounds for those two due > to a problem with the types of integer literals. This looks likely since > even with seq_scan set off it wanted to do a sequence scan which generally > means it doesn't believe it can use the index. > > > explain analyze select count(*) from chat_post cp where cp.session_id > > = 123; > > Does cp.session_id='123' give something different? > > > Show looking up in large table, selecting on primary key, uses > > sequential scan on large > > > > explain analyze select count(*) from chat_post cp where cp.session_id > > = 123 and cp.post_number = 10; > > Same here for '123' and '10'. Spectacular! Bingo! etc. Using the numbers quoted yields use of the primary key. I am indeed using something other than int4, int2 in fact. So this is something to do with the using integer literals which are presumably first interpreted as int4 and then are converted in some long winded fashion, or something, to int2 for each and every test or row, whereas specifying them as text causes the backend to convert to the correct int2 only at the start? I choose the smaller int because these are unlikely to be restrictive for this DB and I thought I may as well try and not waste space. Whether it used the same storage as int4 didn't really matter as I'd given it the opportunity to use less if it could. Is it worth me moving these to int4 type? Thanks for that, I can now tell my friend who had a good laugh with me last weekend about this. BTW, is this sort of feature documented anywhere or does it come into the 'what makes a person an expert' catagory? Nigel Andrews Logictree Systems Limited
On Fri, 22 Mar 2002, Nigel J. Andrews wrote: > Spectacular! Bingo! etc. > > Using the numbers quoted yields use of the primary key. I am indeed using > something other than int4, int2 in fact. So this is something to do with the > using integer literals which are presumably first interpreted as int4 and then > are converted in some long winded fashion, or something, to int2 for each and > every test or row, whereas specifying them as text causes the backend to > convert to the correct int2 only at the start? Basically. The quoted one gets kept as unknown rather than prematurely being made into an int4. > I choose the smaller int because these are unlikely to be restrictive for this > DB and I thought I may as well try and not waste space. Whether it used the > same storage as int4 didn't really matter as I'd given it the opportunity to > use less if it could. Is it worth me moving these to int4 type? Probably unless the space is really going to be an issue. Eventually someone will come up with a way of dealing with it that doesn't involve hardcoding type information and it'll get gixed. > BTW, is this sort of feature documented anywhere or does it come into the 'what > makes a person an expert' catagory? Hmm, I'm not sure. If it's not in the FAQ 4.8 it probably should be.
On Fri, 22 Mar 2002 13:35:47 +0000 (GMT) "Nigel J. Andrews" <nandrews@investsystems.co.uk> wrote: > Show looking up in large table, selecting on partial primary key, uses > sequential scan on large > > explain analyze select count(*) from chat_post cp where cp.session_id > = 123; > NOTICE: QUERY PLAN: > > Aggregate (cost=20411.68..20411.68 rows=1 width=0) (actual > time=31691.92..31691.93 rows=1 loops=1) > -> Seq Scan on chat_post cp (cost=0.00..20411.49 rows=77 width=0) > (actual time=1736.29..31688.80 rows=321 loops=1) > Total runtime: 31692.35 msec Judging from the output of the EXPLAIN, queries--which include a chat_post that is limited by session_id=123--always seem to use a sequential scan on it. On the other hand, other queries--which include one that isn't--seem to use an index scan on one. Therefore, instead of session_id=123, you may as well execute a series of your queries again with other conditions which will use an index scan. But, this opinion is not based on any real evidence. BTW, even though the number of selected rows in a chat_post is small, aggregating and sorting time seem to be long. if sort_mem is a default value, before trying above, you need to increase it -- possibly 10 or 20 times. Regards, Masaru Sugawara
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > Using the numbers quoted yields use of the primary key. I am indeed using > something other than int4, int2 in fact. So this is something to do with the > using integer literals which are presumably first interpreted as int4 and then > are converted in some long winded fashion, or something, to int2 for each and > every test or row, whereas specifying them as text causes the backend to > convert to the correct int2 only at the start? No, they're not converted at all: if you write, say, int2var = 42 then the constant is immediately taken as int4, and "=" is resolved as the int2-equals-int4 operator, which works fine but is not one of the set of operators that the system knows how to use with an int2 index. If you write int2var = '42' then the '42' is initially treated as an unknown-type literal, and there are resolution rules that will preferentially choose int2 to match what's on the other side of the operator, whereupon "=" gets resolved as the int2-equals-int2 operator, which is indexable. To fix this and related problems we need to twiddle the ambiguity resolution rules so that numeric constants can be given something other than their "natural" datatype ... without breaking a lot of other cases that work conveniently today. See (many) past discussions on the pghackers list. regards, tom lane