Thread: Re: Help optimize view
Oops. Realized I posted the wrong SQL and EXPLAIN ANALYZE results. Also forgot to mention that my "server" has 1.5 GB memory. SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" FROM "AnalysisModules" JOIN ("tblColors" JOIN ("tblTPNamesAndColors" JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text = "PrintSamples"."TestPatternName"::text JOIN ("DigitalImages" JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID" JOIN ("ParameterNames" JOIN ("Measurements" JOIN "ParameterValues" ON "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID") ON "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName", "ParameterNames"."ParameterName", "PrintSamples"."TestPatternName" HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND "ParameterNames"."ParameterName"::text = 'NMF'::text AND "tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text; QUERY PLAN HashAggregate (cost=519801.96..519898.00 rows=7683 width=70) (actual time=106219.710..106249.456 rows=14853 loops=1) -> Hash Join (cost=286101.76..519667.51 rows=7683 width=70) (actual time=50466.513..106111.635 rows=15123 loops=1) Hash Cond: ("Measurements"."MetricID" = "AnalysisModules"."MetricID") -> Hash Join (cost=286099.98..519260.45 rows=87588 width=61) (actual time=50466.417..106055.182 rows=15123 loops=1) Hash Cond: ("ParameterValues"."MeasurementID" = "Measurements"."MeasurementID") -> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21) (actual time=143.017..55178.583 rows=289724 loops=1) -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17) (actual time=0.012..0.027 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Bitmap Heap Scan on "ParameterValues" (cost=8054.81..231033.70 rows=608089 width=12) (actual time=142.986..54432.650 rows=289724 loops=1) Recheck Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Bitmap Index Scan on "PVParameterID_idx" (cost=0.00..7902.79 rows=608089 width=0) (actual time=109.178..109.178 rows=289724 loops=1) Index Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Hash (cost=259861.12..259861.12 rows=1454724 width=48) (actual time=50306.950..50306.950 rows=961097 loops=1) -> Hash Join (cost=8139.75..259861.12 rows=1454724 width=48) (actual time=971.910..48649.190 rows=961097 loops=1) Hash Cond: ("Measurements"."psaID" = "PrintSampleAnalyses"."psaID") -> Seq Scan on "Measurements" (cost=0.00..199469.09 rows=7541009 width=12) (actual time=0.047..35628.599 rows=7539838 loops=1) -> Hash (cost=7949.67..7949.67 rows=15206 width=44) (actual time=971.734..971.734 rows=18901 loops=1) -> Hash Join (cost=5069.24..7949.67 rows=15206 width=44) (actual time=590.003..938.744 rows=18901 loops=1) Hash Cond: ("PrintSampleAnalyses"."ImageID" = "DigitalImages"."ImageID") -> Seq Scan on "PrintSampleAnalyses" (cost=0.00..2334.25 rows=78825 width=8) (actual time=0.021..130.335 rows=78859 loops=1) -> Hash (cost=4879.10..4879.10 rows=15211 width=44) (actual time=589.940..589.940 rows=18901 loops=1) -> Hash Join (cost=2220.11..4879.10 rows=15211 width=44) (actual time=168.307..557.675 rows=18901 loops=1) Hash Cond: ("DigitalImages"."PrintSampleID" = "PrintSamples"."PrintSampleID") -> Seq Scan on "DigitalImages" (cost=0.00..1915.50 rows=78850 width=8) (actual time=16.126..194.911 rows=78859 loops=1) -> Hash (cost=2029.98..2029.98 rows=15211 width=44) (actual time=152.128..152.128 rows=18645 loops=1) -> Hash Join (cost=564.39..2029.98 rows=15211 width=44) (actual time=13.951..121.903 rows=18645 loops=1) Hash Cond: (("PrintSamples"."TestPatternName")::text = ("tblTPNamesAndColors"."TestPatternName")::text) -> Bitmap Heap Scan on "PrintSamples" (cost=561.39..1781.53 rows=24891 width=40) (actual time=13.680..59.919 rows=24914 loops=1) Filter: (("TestPatternName")::text ~~ 'IQAF-TP8%'::text) -> Bitmap Index Scan on "PSTestPatternName_idx" (cost=0.00..555.17 rows=24891 width=0) (actual time=13.487..13.487 rows=24914 loops=1) Index Cond: ((("TestPatternName")::text >= 'IQAF-TP8'::character varying) AND (("TestPatternName")::text < 'IQAF-TP9'::character varying)) -> Hash (cost=2.72..2.72 rows=22 width=30) (actual time=0.242..0.242 rows=21 loops=1) -> Hash Join (cost=1.09..2.72 rows=22 width=30) (actual time=0.101..0.200 rows=21 loops=1) Hash Cond: ("tblTPNamesAndColors"."ColorID" = "tblColors"."ColorID") -> Seq Scan on "tblTPNamesAndColors" (cost=0.00..1.30 rows=30 width=30) (actual time=0.050..0.085 rows=30 loops=1) -> Hash (cost=1.05..1.05 rows=3 width=4) (actual time=0.028..0.028 rows=3 loops=1) -> Seq Scan on "tblColors" (cost=0.00..1.05 rows=3 width=4) (actual time=0.009..0.016 rows=3 loops=1) Filter: ("ColorID" <> 3) -> Hash (cost=1.71..1.71 rows=5 width=17) (actual time=0.072..0.072 rows=5 loops=1) -> Seq Scan on "AnalysisModules" (cost=0.00..1.71 rows=5 width=17) (actual time=0.038..0.055 rows=5 loops=1) Filter: (("AnalysisModuleName")::text = 'NMF'::text) Total runtime: 106358.738 ms
"Relyea, Mike" <Mike.Relyea@xerox.com> writes: > SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", > "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS > "Mottle_NMF" > FROM "AnalysisModules" > JOIN ("tblColors" > JOIN ("tblTPNamesAndColors" > JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text > =3D "PrintSamples"."TestPatternName"::text > JOIN ("DigitalImages" > JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" =3D > "PrintSampleAnalyses"."ImageID" > JOIN ("ParameterNames" > JOIN ("Measurements" > JOIN "ParameterValues" ON "Measurements"."MeasurementID" =3D > "ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" =3D > "ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" =3D > "Measurements"."psaID") ON "PrintSamples"."PrintSampleID" =3D > "DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" =3D > "tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" =3D > "Measurements"."MetricID" Try increasing join_collapse_limit --- you have just enough tables here that the planner isn't going to consider all possible join orders. And it sorta looks like it's picking a bad one. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, August 10, 2007 5:44 PM > To: Relyea, Mike > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Help optimize view > > Try increasing join_collapse_limit --- you have just enough > tables here that the planner isn't going to consider all > possible join orders. > And it sorta looks like it's picking a bad one. > > regards, tom lane > I tried increasing join_collapse_limit with no significant change in run time although a different plan was chosen. I've included a re-send of my original post, it looks like it didn't go through - it's not in the archives. I've also included an explain analyze before and after the join_collapse_limit change. I'm have the following view as part of a larger, aggregate query that is running slower than I'd like. There are 4 views total, each very similar to this one. Each of the views is then left joined with data from some other tables to give me the final result that I'm looking for. I'm hoping that if I can get some insight in to how to make this view execute faster, I can apply that learning to the other 3 views and thereby decrease the run time for my aggregate query. I'm running 8.2.4 on Windows XP with a single 10K rpm disk dedicated to the data directory and 1.5 GB memory. shared_buffers = 12288 work_mem = 262144 maintenance_work_mem = 131072 max_fsm_pages = 204800 random_page_cost = 2.0 effective_cache_size = 10000 autovacuum = on ==================================== EXPLAIN ANALYZE SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" FROM "AnalysisModules" JOIN ("tblColors" JOIN ("tblTPNamesAndColors" JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text = "PrintSamples"."TestPatternName"::text JOIN ("DigitalImages" JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID" JOIN ("ParameterNames" JOIN ("Measurements" JOIN "ParameterValues" ON "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID") ON "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName", "ParameterNames"."ParameterName", "PrintSamples"."TestPatternName" HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND "ParameterNames"."ParameterName"::text = 'NMF'::text AND "tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text; HashAggregate (cost=519801.96..519898.00 rows=7683 width=70) (actual time=121101.027..121146.385 rows=14853 loops=1) -> Hash Join (cost=286101.76..519667.51 rows=7683 width=70) (actual time=52752.600..120989.713 rows=15123 loops=1) Hash Cond: ("Measurements"."MetricID" = "AnalysisModules"."MetricID") -> Hash Join (cost=286099.98..519260.45 rows=87588 width=61) (actual time=52752.502..120933.784 rows=15123 loops=1) Hash Cond: ("ParameterValues"."MeasurementID" = "Measurements"."MeasurementID") -> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21) (actual time=165.510..67811.086 rows=289724 loops=1) -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17) (actual time=0.012..0.026 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Bitmap Heap Scan on "ParameterValues" (cost=8054.81..231033.70 rows=608089 width=12) (actual time=165.481..67094.656 rows=289724 loops=1) Recheck Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Bitmap Index Scan on "PVParameterID_idx" (cost=0.00..7902.79 rows=608089 width=0) (actual time=141.013..141.013 rows=289724 loops=1) Index Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Hash (cost=259861.12..259861.12 rows=1454724 width=48) (actual time=52573.270..52573.270 rows=961097 loops=1) -> Hash Join (cost=8139.75..259861.12 rows=1454724 width=48) (actual time=1399.575..50896.641 rows=961097 loops=1) Hash Cond: ("Measurements"."psaID" = "PrintSampleAnalyses"."psaID") -> Seq Scan on "Measurements" (cost=0.00..199469.09 rows=7541009 width=12) (actual time=6.697..37199.702 rows=7539838 loops=1) -> Hash (cost=7949.67..7949.67 rows=15206 width=44) (actual time=1392.743..1392.743 rows=18901 loops=1) -> Hash Join (cost=5069.24..7949.67 rows=15206 width=44) (actual time=986.589..1358.908 rows=18901 loops=1) Hash Cond: ("PrintSampleAnalyses"."ImageID" = "DigitalImages"."ImageID") -> Seq Scan on "PrintSampleAnalyses" (cost=0.00..2334.25 rows=78825 width=8) (actual time=13.747..158.867 rows=78859 loops=1) -> Hash (cost=4879.10..4879.10 rows=15211 width=44) (actual time=972.787..972.787 rows=18901 loops=1) -> Hash Join (cost=2220.11..4879.10 rows=15211 width=44) (actual time=341.158..938.970 rows=18901 loops=1) Hash Cond: ("DigitalImages"."PrintSampleID" = "PrintSamples"."PrintSampleID") -> Seq Scan on "DigitalImages" (cost=0.00..1915.50 rows=78850 width=8) (actual time=34.028..418.113 rows=78859 loops=1) -> Hash (cost=2029.98..2029.98 rows=15211 width=44) (actual time=307.073..307.073 rows=18645 loops=1) -> Hash Join (cost=564.39..2029.98 rows=15211 width=44) (actual time=92.565..275.879 rows=18645 loops=1) Hash Cond: (("PrintSamples"."TestPatternName")::text = ("tblTPNamesAndColors"."TestPatternName")::text) -> Bitmap Heap Scan on "PrintSamples" (cost=561.39..1781.53 rows=24891 width=40) (actual time=92.296..208.635 rows=24914 loops=1) Filter: (("TestPatternName")::text ~~ 'IQAF-TP8%'::text) -> Bitmap Index Scan on "PSTestPatternName_idx" (cost=0.00..555.17 rows=24891 width=0) (actual time=76.711..76.711 rows=24914 loops=1) Index Cond: ((("TestPatternName")::text >= 'IQAF-TP8'::character varying) AND (("TestPatternName")::text < 'IQAF-TP9'::character varying)) -> Hash (cost=2.72..2.72 rows=22 width=30) (actual time=0.238..0.238 rows=21 loops=1) -> Hash Join (cost=1.09..2.72 rows=22 width=30) (actual time=0.097..0.196 rows=21 loops=1) Hash Cond: ("tblTPNamesAndColors"."ColorID" = "tblColors"."ColorID") -> Seq Scan on "tblTPNamesAndColors" (cost=0.00..1.30 rows=30 width=30) (actual time=0.046..0.080 rows=30 loops=1) -> Hash (cost=1.05..1.05 rows=3 width=4) (actual time=0.028..0.028 rows=3 loops=1) -> Seq Scan on "tblColors" (cost=0.00..1.05 rows=3 width=4) (actual time=0.009..0.016 rows=3 loops=1) Filter: ("ColorID" <> 3) -> Hash (cost=1.71..1.71 rows=5 width=17) (actual time=0.072..0.072 rows=5 loops=1) -> Seq Scan on "AnalysisModules" (cost=0.00..1.71 rows=5 width=17) (actual time=0.036..0.054 rows=5 loops=1) Filter: (("AnalysisModuleName")::text = 'NMF'::text) Total runtime: 121178.595 ms ============================================ SELECT set_config('join_collapse_limit', '20', false); EXPLAIN ANALYZE SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" FROM "AnalysisModules" JOIN ("tblColors" JOIN ("tblTPNamesAndColors" JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text = "PrintSamples"."TestPatternName"::text JOIN ("DigitalImages" JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID" JOIN ("ParameterNames" JOIN ("Measurements" JOIN "ParameterValues" ON "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID") ON "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName", "ParameterNames"."ParameterName", "PrintSamples"."TestPatternName" HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND "ParameterNames"."ParameterName"::text = 'NMF'::text AND "tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text; HashAggregate (cost=489274.71..489372.94 rows=7858 width=70) (actual time=120391.220..120420.367 rows=14853 loops=1) -> Hash Join (cost=256774.03..489137.20 rows=7858 width=70) (actual time=51021.953..120276.494 rows=15123 loops=1) Hash Cond: ("ParameterValues"."MeasurementID" = "Measurements"."MeasurementID") -> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21) (actual time=159.781..68959.258 rows=289724 loops=1) -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17) (actual time=0.021..0.039 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Bitmap Heap Scan on "ParameterValues" (cost=8054.81..231033.70 rows=608089 width=12) (actual time=159.740..68235.713 rows=289724 loops=1) Recheck Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Bitmap Index Scan on "PVParameterID_idx" (cost=0.00..7902.79 rows=608089 width=0) (actual time=135.166..135.166 rows=289724 loops=1) Index Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Hash (cost=247087.84..247087.84 rows=130510 width=57) (actual time=50844.324..50844.324 rows=15123 loops=1) -> Hash Join (cost=8141.52..247087.84 rows=130510 width=57) (actual time=11034.877..50791.185 rows=15123 loops=1) Hash Cond: ("Measurements"."psaID" = "PrintSampleAnalyses"."psaID") -> Hash Join (cost=1.77..234364.57 rows=661492 width=21) (actual time=31.302..48949.943 rows=289724 loops=1) Hash Cond: ("Measurements"."MetricID" = "AnalysisModules"."MetricID") -> Seq Scan on "Measurements" (cost=0.00..199469.09 rows=7541009 width=12) (actual time=10.700..37931.726 rows=7539838 loops=1) -> Hash (cost=1.71..1.71 rows=5 width=17) (actual time=0.066..0.066 rows=5 loops=1) -> Seq Scan on "AnalysisModules" (cost=0.00..1.71 rows=5 width=17) (actual time=0.033..0.049 rows=5 loops=1) Filter: (("AnalysisModuleName")::text = 'NMF'::text) -> Hash (cost=7949.67..7949.67 rows=15206 width=44) (actual time=1325.797..1325.797 rows=18901 loops=1) -> Hash Join (cost=5069.24..7949.67 rows=15206 width=44) (actual time=906.105..1290.289 rows=18901 loops=1) Hash Cond: ("PrintSampleAnalyses"."ImageID" = "DigitalImages"."ImageID") -> Seq Scan on "PrintSampleAnalyses" (cost=0.00..2334.25 rows=78825 width=8) (actual time=4.456..153.999 rows=78859 loops=1) -> Hash (cost=4879.10..4879.10 rows=15211 width=44) (actual time=901.596..901.596 rows=18901 loops=1) -> Hash Join (cost=2220.11..4879.10 rows=15211 width=44) (actual time=293.264..866.364 rows=18901 loops=1) Hash Cond: ("DigitalImages"."PrintSampleID" = "PrintSamples"."PrintSampleID") -> Seq Scan on "DigitalImages" (cost=0.00..1915.50 rows=78850 width=8) (actual time=21.967..380.287 rows=78859 loops=1) -> Hash (cost=2029.98..2029.98 rows=15211 width=44) (actual time=271.232..271.232 rows=18645 loops=1) -> Hash Join (cost=564.39..2029.98 rows=15211 width=44) (actual time=60.780..237.748 rows=18645 loops=1) Hash Cond: (("PrintSamples"."TestPatternName")::text = ("tblTPNamesAndColors"."TestPatternName")::text) -> Bitmap Heap Scan on "PrintSamples" (cost=561.39..1781.53 rows=24891 width=40) (actual time=60.482..168.602 rows=24914 loops=1) Filter: (("TestPatternName")::text ~~ 'IQAF-TP8%'::text) -> Bitmap Index Scan on "PSTestPatternName_idx" (cost=0.00..555.17 rows=24891 width=0) (actual time=52.269..52.269 rows=24914 loops=1) Index Cond: ((("TestPatternName")::text >= 'IQAF-TP8'::character varying) AND (("TestPatternName")::text < 'IQAF-TP9'::character varying)) -> Hash (cost=2.72..2.72 rows=22 width=30) (actual time=0.266..0.266 rows=21 loops=1) -> Hash Join (cost=1.09..2.72 rows=22 width=30) (actual time=0.120..0.223 rows=21 loops=1) Hash Cond: ("tblTPNamesAndColors"."ColorID" = "tblColors"."ColorID") -> Seq Scan on "tblTPNamesAndColors" (cost=0.00..1.30 rows=30 width=30) (actual time=0.025..0.059 rows=30 loops=1) -> Hash (cost=1.05..1.05 rows=3 width=4) (actual time=0.068..0.068 rows=3 loops=1) -> Seq Scan on "tblColors" (cost=0.00..1.05 rows=3 width=4) (actual time=0.048..0.054 rows=3 loops=1) Filter: ("ColorID" <> 3) Total runtime: 120443.640 ms
>>> On Mon, Aug 13, 2007 at 10:35 AM, in message <1806D1F73FCB7F439F2C842EE0627B18065BF2C0@USA0300MS01.na.xerox.net>, "Relyea, Mike" <Mike.Relyea@xerox.com> wrote: > I'm running 8.2.4 on Windows XP with 1.5 GB memory. > shared_buffers = 12288 > effective_cache_size = 10000 For starters, you might want to adjust one or both of these. It looks to me like you're telling it that it only has 78.125 MB cache space. That will make it tend to want to scan entire tables, on the assumption that the cache hit ratio will be poor for random reads. Since you're on 8.2.4, you can use units of measure to help make this easier to read. You could, for example, say: shared_buffers = 96MB effective_cache_size = 1200MB -Kevin
> >>> On Mon, Aug 13, 2007 at 10:35 AM, in message > <1806D1F73FCB7F439F2C842EE0627B18065BF2C0@USA0300MS01.na.xerox > .net>, "Relyea, Mike" <Mike.Relyea@xerox.com> wrote: > > I'm running 8.2.4 on Windows XP with 1.5 GB memory. > > shared_buffers = 12288 > > effective_cache_size = 10000 > > For starters, you might want to adjust one or both of these. > It looks to me like you're telling it that it only has 78.125 > MB cache space. That will make it tend to want to scan > entire tables, on the assumption that the cache hit ratio > will be poor for random reads. > > Since you're on 8.2.4, you can use units of measure to help > make this easier to read. You could, for example, say: > > shared_buffers = 96MB > effective_cache_size = 1200MB > > -Kevin I've increased shared_buffers to 128MB, and restarted the server. My total run time didn't really change. SELECT set_config('effective_cache_size', '1000MB', false); I have another app that uses about 500MB. SELECT set_config('join_collapse_limit', '20', false); explain analyze SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" FROM "AnalysisModules" JOIN ("tblColors" JOIN ("tblTPNamesAndColors" JOIN "PrintSamples" ON "tblTPNamesAndColors"."TestPatternName"::text = "PrintSamples"."TestPatternName"::text JOIN ("DigitalImages" JOIN "PrintSampleAnalyses" ON "DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID" JOIN ("ParameterNames" JOIN ("Measurements" JOIN "ParameterValues" ON "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID") ON "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID") ON "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID") ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID") ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", "AnalysisModules"."AnalysisModuleName", "ParameterNames"."ParameterName", "PrintSamples"."TestPatternName" HAVING "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND "ParameterNames"."ParameterName"::text = 'NMF'::text AND "tblColors"."ColorID" <> 3 AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text; HashAggregate (cost=489274.71..489372.94 rows=7858 width=70) (actual time=117632.844..117663.228 rows=14853 loops=1) -> Hash Join (cost=256774.03..489137.20 rows=7858 width=70) (actual time=50297.022..117530.665 rows=15123 loops=1) Hash Cond: ("ParameterValues"."MeasurementID" = "Measurements"."MeasurementID") -> Nested Loop (cost=8054.81..238636.75 rows=454040 width=21) (actual time=172.341..66959.288 rows=289724 loops=1) -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=17) (actual time=0.020..0.034 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Bitmap Heap Scan on "ParameterValues" (cost=8054.81..231033.70 rows=608089 width=12) (actual time=172.297..66241.380 rows=289724 loops=1) Recheck Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Bitmap Index Scan on "PVParameterID_idx" (cost=0.00..7902.79 rows=608089 width=0) (actual time=147.690..147.690 rows=289724 loops=1) Index Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Hash (cost=247087.84..247087.84 rows=130510 width=57) (actual time=50109.022..50109.022 rows=15123 loops=1) -> Hash Join (cost=8141.52..247087.84 rows=130510 width=57) (actual time=11095.022..50057.777 rows=15123 loops=1) Hash Cond: ("Measurements"."psaID" = "PrintSampleAnalyses"."psaID") -> Hash Join (cost=1.77..234364.57 rows=661492 width=21) (actual time=31.457..48123.380 rows=289724 loops=1) Hash Cond: ("Measurements"."MetricID" = "AnalysisModules"."MetricID") -> Seq Scan on "Measurements" (cost=0.00..199469.09 rows=7541009 width=12) (actual time=10.920..37814.792 rows=7539838 loops=1) -> Hash (cost=1.71..1.71 rows=5 width=17) (actual time=0.066..0.066 rows=5 loops=1) -> Seq Scan on "AnalysisModules" (cost=0.00..1.71 rows=5 width=17) (actual time=0.032..0.049 rows=5 loops=1) Filter: (("AnalysisModuleName")::text = 'NMF'::text) -> Hash (cost=7949.67..7949.67 rows=15206 width=44) (actual time=1424.025..1424.025 rows=18901 loops=1) -> Hash Join (cost=5069.24..7949.67 rows=15206 width=44) (actual time=1007.901..1387.787 rows=18901 loops=1) Hash Cond: ("PrintSampleAnalyses"."ImageID" = "DigitalImages"."ImageID") -> Seq Scan on "PrintSampleAnalyses" (cost=0.00..2334.25 rows=78825 width=8) (actual time=4.432..153.090 rows=78859 loops=1) -> Hash (cost=4879.10..4879.10 rows=15211 width=44) (actual time=1003.424..1003.424 rows=18901 loops=1) -> Hash Join (cost=2220.11..4879.10 rows=15211 width=44) (actual time=348.841..968.194 rows=18901 loops=1) Hash Cond: ("DigitalImages"."PrintSampleID" = "PrintSamples"."PrintSampleID") -> Seq Scan on "DigitalImages" (cost=0.00..1915.50 rows=78850 width=8) (actual time=22.080..427.303 rows=78859 loops=1) -> Hash (cost=2029.98..2029.98 rows=15211 width=44) (actual time=326.703..326.703 rows=18645 loops=1) -> Hash Join (cost=564.39..2029.98 rows=15211 width=44) (actual time=90.425..293.223 rows=18645 loops=1) Hash Cond: (("PrintSamples"."TestPatternName")::text = ("tblTPNamesAndColors"."TestPatternName")::text) -> Bitmap Heap Scan on "PrintSamples" (cost=561.39..1781.53 rows=24891 width=40) (actual time=90.188..221.310 rows=24914 loops=1) Filter: (("TestPatternName")::text ~~ 'IQAF-TP8%'::text) -> Bitmap Index Scan on "PSTestPatternName_idx" (cost=0.00..555.17 rows=24891 width=0) (actual time=72.897..72.897 rows=24914 loops=1) Index Cond: ((("TestPatternName")::text >= 'IQAF-TP8'::character varying) AND (("TestPatternName")::text < 'IQAF-TP9'::character varying)) -> Hash (cost=2.72..2.72 rows=22 width=30) (actual time=0.210..0.210 rows=21 loops=1) -> Hash Join (cost=1.09..2.72 rows=22 width=30) (actual time=0.070..0.168 rows=21 loops=1) Hash Cond: ("tblTPNamesAndColors"."ColorID" = "tblColors"."ColorID") -> Seq Scan on "tblTPNamesAndColors" (cost=0.00..1.30 rows=30 width=30) (actual time=0.022..0.056 rows=30 loops=1) -> Hash (cost=1.05..1.05 rows=3 width=4) (actual time=0.026..0.026 rows=3 loops=1) -> Seq Scan on "tblColors" (cost=0.00..1.05 rows=3 width=4) (actual time=0.008..0.014 rows=3 loops=1) Filter: ("ColorID" <> 3) Total runtime: 117692.834 ms
>>> On Mon, Aug 13, 2007 at 1:48 PM, in message <1806D1F73FCB7F439F2C842EE0627B18065F78DF@USA0300MS01.na.xerox.net>, "Relyea, Mike" <Mike.Relyea@xerox.com> wrote: > I've increased shared_buffers to 128MB, and restarted the server. My > total run time didn't really change. Please forgive me if this guess doesn't help either, but could you try eliminating the GROUP BY options which don't echovalues in the select value list, and move the HAVING conditions to a WHERE clause? Something like: explain analyze SELECT "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID", avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" FROM "AnalysisModules" JOIN ( "tblColors" JOIN ( "tblTPNamesAndColors" JOIN "PrintSamples" ON ("tblTPNamesAndColors"."TestPatternName"::text = "PrintSamples"."TestPatternName"::text) JOIN ( "DigitalImages" JOIN "PrintSampleAnalyses" ON ("DigitalImages"."ImageID" = "PrintSampleAnalyses"."ImageID") JOIN ( "ParameterNames" JOIN ( "Measurements" JOIN "ParameterValues" ON "Measurements"."MeasurementID" = "ParameterValues"."MeasurementID" ) ON "ParameterNames"."ParameterID" = "ParameterValues"."ParameterID" ) ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID" ) ON "PrintSamples"."PrintSampleID" = "DigitalImages"."PrintSampleID" ) ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID" ) ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" WHERE "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text AND "ParameterNames"."ParameterName"::text = 'NMF'::text AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text AND "tblColors"."ColorID" <> 3 GROUP BY "PrintSamples"."MachineID", "PrintSamples"."PrintCopyID", "tblColors"."ColorID" ; I'd also be inclined to simplify the FROM clause by eliminating the parentheses and putting the ON conditions closer to wherethey are used, but that would be more for readability than any expectation that it would affect the plan. -Kevin
"Relyea, Mike" <Mike.Relyea@xerox.com> writes: > I've increased shared_buffers to 128MB, and restarted the server. My > total run time didn't really change. It doesn't look like you can hope for much in terms of improving the plan. The bulk of the time is going into scanning ParameterValues and Measurements, but AFAICS there is no way for the query to pull fewer rows from those tables than it is doing, and the size of the join means that a nestloop indexscan is likely to suck. (You could try forcing one by setting enable_hashjoin and enable_mergejoin to OFF, but I don't have much hope for that.) If you haven't played with work_mem yet, increasing that might make the hash joins go a bit faster --- but it looks like most of the time is going into the raw relation scans, so there's not going to be a lot of win to be had there either. Basically, joining lots of rows like this takes awhile. If you have to have a faster answer, I can only suggest rethinking your table design. Sometimes denormalization of the schema is necessary for performance. regards, tom lane
> >>> On Mon, Aug 13, 2007 at 1:48 PM, in message > <1806D1F73FCB7F439F2C842EE0627B18065F78DF@USA0300MS01.na.xerox > .net>, "Relyea, Mike" <Mike.Relyea@xerox.com> wrote: > > I've increased shared_buffers to 128MB, and restarted the > server. My > > total run time didn't really change. > > Please forgive me if this guess doesn't help either, but > could you try eliminating the GROUP BY options which don't > echo values in the select value list, and move the HAVING > conditions to a WHERE clause? Something like: > > explain analyze > SELECT > "PrintSamples"."MachineID", > "PrintSamples"."PrintCopyID", > "tblColors"."ColorID", > avg("ParameterValues"."ParameterValue") AS "Mottle_NMF" > FROM "AnalysisModules" > JOIN > ( > "tblColors" > JOIN > ( > "tblTPNamesAndColors" > JOIN "PrintSamples" > ON ("tblTPNamesAndColors"."TestPatternName"::text = > "PrintSamples"."TestPatternName"::text) > JOIN > ( > "DigitalImages" > JOIN "PrintSampleAnalyses" > ON ("DigitalImages"."ImageID" = > "PrintSampleAnalyses"."ImageID") > JOIN > ( > "ParameterNames" > JOIN > ( > "Measurements" > JOIN "ParameterValues" > ON "Measurements"."MeasurementID" = > "ParameterValues"."MeasurementID" > ) ON "ParameterNames"."ParameterID" = > "ParameterValues"."ParameterID" > ) ON "PrintSampleAnalyses"."psaID" = "Measurements"."psaID" > ) ON "PrintSamples"."PrintSampleID" = > "DigitalImages"."PrintSampleID" > ) ON "tblColors"."ColorID" = "tblTPNamesAndColors"."ColorID" > ) ON "AnalysisModules"."MetricID" = "Measurements"."MetricID" > WHERE "AnalysisModules"."AnalysisModuleName"::text = 'NMF'::text > AND "ParameterNames"."ParameterName"::text = 'NMF'::text > AND "PrintSamples"."TestPatternName"::text ~~ 'IQAF-TP8%'::text > AND "tblColors"."ColorID" <> 3 > GROUP BY > "PrintSamples"."MachineID", > "PrintSamples"."PrintCopyID", > "tblColors"."ColorID" > ; > > I'd also be inclined to simplify the FROM clause by > eliminating the parentheses and putting the ON conditions > closer to where they are used, but that would be more for > readability than any expectation that it would affect the plan. > > -Kevin Thanks for your help. Re-writing the view like this maybe bought me something. I've pasted the explain analyze results below. Tough to tell because I also increased some of the statistics. From what Tom says, it sounds like if I want the data returned faster I'm likely to have to get beefier hardware. ALTER TABLE "ParameterValues" ALTER "MeasurementID" SET STATISTICS 500; ALTER TABLE "ParameterValues" ALTER "ParameterID" SET STATISTICS 500; ANALYZE "ParameterValues"; ALTER TABLE "Measurements" ALTER COLUMN "MetricID" SET STATISTICS 500; ALTER TABLE "Measurements" ALTER COLUMN "psaID" SET STATISTICS 500; ANALYZE "Measurements"; Running the above SQL: HashAggregate (cost=461541.53..461634.88 rows=7468 width=16) (actual time=110002.041..110024.777 rows=14853 loops=1) -> Hash Join (cost=230789.57..461464.70 rows=7683 width=16) (actual time=56847.814..109936.722 rows=15123 loops=1) Hash Cond: ("Measurements"."MetricID" = "AnalysisModules"."MetricID") -> Hash Join (cost=230787.80..461057.64 rows=87588 width=20) (actual time=56847.697..109884.122 rows=15123 loops=1) Hash Cond: ("ParameterValues"."MeasurementID" = "Measurements"."MeasurementID") -> Nested Loop (cost=6353.15..234044.47 rows=454038 width=8) (actual time=179.154..52780.680 rows=289724 loops=1) -> Seq Scan on "ParameterNames" (cost=0.00..1.94 rows=1 width=4) (actual time=0.012..0.027 rows=1 loops=1) Filter: (("ParameterName")::text = 'NMF'::text) -> Bitmap Heap Scan on "ParameterValues" (cost=6353.15..228047.32 rows=479617 width=12) (actual time=179.123..52102.572 rows=289724 loops=1) Recheck Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Bitmap Index Scan on "PVParameterID_idx" (cost=0.00..6233.25 rows=479617 width=0) (actual time=152.752..152.752 rows=289724 loops=1) Index Cond: ("ParameterNames"."ParameterID" = "ParameterValues"."ParameterID") -> Hash (cost=206253.42..206253.42 rows=1454498 width=20) (actual time=56657.022..56657.022 rows=961097 loops=1) -> Nested Loop (cost=5069.24..206253.42 rows=1454498 width=20) (actual time=932.249..55176.315 rows=961097 loops=1) -> Hash Join (cost=5069.24..7949.67 rows=15206 width=16) (actual time=908.275..1257.120 rows=18901 loops=1) Hash Cond: ("PrintSampleAnalyses"."ImageID" = "DigitalImages"."ImageID") -> Seq Scan on "PrintSampleAnalyses" (cost=0.00..2334.25 rows=78825 width=8) (actual time=10.440..139.945 rows=78859 loops=1) -> Hash (cost=4879.10..4879.10 rows=15211 width=16) (actual time=897.776..897.776 rows=18901 loops=1) -> Hash Join (cost=2220.11..4879.10 rows=15211 width=16) (actual time=297.330..868.632 rows=18901 loops=1) Hash Cond: ("DigitalImages"."PrintSampleID" = "PrintSamples"."PrintSampleID") -> Seq Scan on "DigitalImages" (cost=0.00..1915.50 rows=78850 width=8) (actual time=15.859..408.784 rows=78859 loops=1) -> Hash (cost=2029.98..2029.98 rows=15211 width=16) (actual time=281.413..281.413 rows=18645 loops=1) -> Hash Join (cost=564.39..2029.98 rows=15211 width=16) (actual time=84.182..251.833 rows=18645 loops=1) Hash Cond: (("PrintSamples"."TestPatternName")::text = ("tblTPNamesAndColors"."TestPatternName")::text) -> Bitmap Heap Scan on "PrintSamples" (cost=561.39..1781.53 rows=24891 width=40) (actual time=83.925..184.775 rows=24914 loops=1) Filter: (("TestPatternName")::text ~~ 'IQAF-TP8%'::text) -> Bitmap Index Scan on "PSTestPatternName_idx" (cost=0.00..555.17 rows=24891 width=0) (actual time=74.198..74.198 rows=24914 loops=1) Index Cond: ((("TestPatternName")::text >= 'IQAF-TP8'::character varying) AND (("TestPatternName")::text < 'IQAF-TP9'::character varying)) -> Hash (cost=2.72..2.72 rows=22 width=30) (actual time=0.225..0.225 rows=21 loops=1) -> Hash Join (cost=1.09..2.72 rows=22 width=30) (actual time=0.086..0.184 rows=21 loops=1) Hash Cond: ("tblTPNamesAndColors"."ColorID" = "tblColors"."ColorID") -> Seq Scan on "tblTPNamesAndColors" (cost=0.00..1.30 rows=30 width=30) (actual time=0.025..0.060 rows=30 loops=1) -> Hash (cost=1.05..1.05 rows=3 width=4) (actual time=0.040..0.040 rows=3 loops=1) -> Seq Scan on "tblColors" (cost=0.00..1.05 rows=3 width=4) (actual time=0.021..0.027 rows=3 loops=1) Filter: ("ColorID" <> 3) -> Index Scan using "MpsaID_idx" on "Measurements" (cost=0.00..11.13 rows=153 width=12) (actual time=1.615..2.728 rows=51 loops=18901) Index Cond: ("PrintSampleAnalyses"."psaID" = "Measurements"."psaID") -> Hash (cost=1.71..1.71 rows=5 width=4) (actual time=0.092..0.092 rows=5 loops=1) -> Seq Scan on "AnalysisModules" (cost=0.00..1.71 rows=5 width=4) (actual time=0.060..0.077 rows=5 loops=1) Filter: (("AnalysisModuleName")::text = 'NMF'::text) Total runtime: 110047.601 ms
>>> On Mon, Aug 13, 2007 at 4:00 PM, in message <1806D1F73FCB7F439F2C842EE0627B18065F7A86@USA0300MS01.na.xerox.net>, "Relyea, Mike" <Mike.Relyea@xerox.com> wrote: > > Re-writing the view like this maybe bought me something. > Tough to tell because I also increased some of the statistics. I don't know whether it was the finer-grained statistics or the simplification, but it bought you a new plan. I don't know if the seven second improvement is real or within the run-to-run variation, though; it could be because you happened to be better-cached at the time. > From what Tom > says, it sounds like if I want the data returned faster I'm likely to > have to get beefier hardware. That's not what he suggested. If you introduce redundancy in a controlled fashion, you could have a single table with an index to more quickly get you to the desired set of data. That can be maintained on an ongoing basis (possibly using triggers) or could be materialized periodically or prior to running a series of reports or queries. Such redundancies violate the normalization rules which are generally used in database design, but some denormalization is often needed for acceptable performance. -Kevin
>>> On Mon, Aug 13, 2007 at 4:25 PM, in message <46C0860D.EE98.0025.0@wicourts.gov>, "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: >>>> On Mon, Aug 13, 2007 at 4:00 PM, in message > <1806D1F73FCB7F439F2C842EE0627B18065F7A86@USA0300MS01.na.xerox.net>, "Relyea, > Mike" <Mike.Relyea@xerox.com> wrote: > >> From what Tom >> says, it sounds like if I want the data returned faster I'm likely to >> have to get beefier hardware. > > That's not what he suggested. If you introduce redundancy in a controlled > fashion, you could have a single table with an index to more quickly get you > to the desired set of data. That can be maintained on an ongoing basis > (possibly using triggers) or could be materialized periodically or prior to > running a series of reports or queries. > > Such redundancies violate the normalization rules which are generally used > in database design, but some denormalization is often needed for acceptable > performance. One last thought regarding your table structure -- I noticed you were often joining on column names ending in "ID" and selecting using column names ending in "Name", where the values for the name columns were only a few characters long. It is not always a good idea to create a meaningless ID number for a primary key if you have a meaningful value (or combination of values) which would uniquely identify a row. If you were able to use the columns in your search criteria as keys, you would have them in the Measurements table without creating any troublesome redundancy. You could then add Measurements indexes on these columns, and your query might run in under a second. The down side of meaningful keys (oft cited by proponents of the technique) is that if you decide that everything with an AnalysisModuleName" name of 'NMF' should now be named 'NMX', you would have to update all rows which contain the old value. To be able to do this safely and reliably, you would want to use DOMAIN definitions rigorously. If you link through meaningless ID numbers (and what would be the point of changing those?) you can change 'NMF' to 'NMX' in one place, and everything would reflect the new value, since it would always join to one place for those characters. -Kevin