Thread: Slow query with planner row strange estimation
Hello, I try to make a query run quicker but I don't really know how to give hints to the planner. We are using postgresql 8.4.3 64bit on ubuntu 9.10 server. The hardware is a 10 SAS drive (15k) on a single RAID 10 array with 8Go RAM. Queries come from J2EE application (OLAP cube), but running them in pg_admin perform the same way. I made a short example that shows what I think is the problem. The real query is much longer but with only one join it already cause problems. Here is the short example : select rfoadv_8.rfoadvsup as c8, sum(dwhinv.dwhinvqte) as m0 from dwhinv as dwhinv, rfoadv as rfoadv_8 where (dwhinv.dwhinv___rforefide = 'HPLUS' and (dwhinv.dwhinv___rfodomide = 'PMSI' and dwhinv.dwhinv___rfoindrvs = '1' and dwhinv.dwhinv___rfoindide='recN3_BB_reel') ) and dwhinv.dwhinv_p2rfodstide = rfoadv_8.rfoadvinf and rfoadv_8.rfoadvsup = 'ACTI' group by rfoadv_8.rfoadvsup dwhinv is a table with almost 6.000.000 records rfoadv is a view with 800.000 records rfoadv is based on rfoade which is 50.000 records Here is the explain analyse : GroupAggregate (cost=0.00..16.56 rows=1 width=13) (actual time=2028.452..2028.453 rows=1 loops=1) -> Nested Loop (cost=0.00..16.54 rows=1 width=13) (actual time=0.391..1947.432 rows=42664 loops=1) Join Filter: (((ade2.rfoadegch)::text >= (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <= (ade1.rfoadedrt)::text)) -> Nested Loop (cost=0.00..12.54 rows=1 width=214) (actual time=0.304..533.281 rows=114350 loops=1) -> Index Scan using dwhinv_rdi_idx on dwhinv (cost=0.00..4.87 rows=1 width=12) (actual time=0.227..16.827 rows=6360 loops=1) Index Cond: (((dwhinv___rforefide)::text = 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND (dwhinv___rfoindrvs = 1)) -> Index Scan using rfoade_dsi_idx on rfoade ade2 (cost=0.00..7.63 rows=3 width=213) (actual time=0.007..0.037 rows=18 loops=6360) Index Cond: ((ade2.rfoade_i_rfodstide)::text = (dwhinv.dwhinv_p2rfodstide)::text) -> Index Scan using rfoade_pk on rfoade ade1 (cost=0.00..3.98 rows=1 width=213) (actual time=0.008..0.009 rows=0 loops=114350) Index Cond: (((ade1.rfoade___rforefide)::text = (ade2.rfoade___rforefide)::text) AND ((ade1.rfoade_i_rfodstide)::text = 'ACTI'::text) AND ((ade1.rfoade___rfovdeide)::text = (ade2.rfoade___rfovdeide)::text) AND (ade1.rfoadervs = ade2.rfoadervs)) We can see that the planner think that accessing dwhinv with the dwhinv_rdi_idx index will return 1 row, but in fact there are 6360. So the nested loop is not done with 1 loop but 6360. With only one Join, the query runs in about 1.5 sec which is not really long, but with 8 join, the same mistake is repeated 8 times, the query runs in 30-60 sec. I try to disable nested loop, hash join and merge join are done instead of nested loops, example query runs in 0.2 - 0.5 sec, and the real query no more that 1 sec ! Which is great. Here is the execution plan with nested loop off: GroupAggregate (cost=12.56..2453.94 rows=1 width=13) (actual time=817.306..817.307 rows=1 loops=1) -> Hash Join (cost=12.56..2453.93 rows=1 width=13) (actual time=42.583..720.746 rows=42664 loops=1) Hash Cond: (((ade2.rfoade___rforefide)::text = (ade1.rfoade___rforefide)::text) AND ((ade2.rfoade___rfovdeide)::text = (ade1.rfoade___rfovdeide)::text) AND (ade2.rfoadervs = ade1.rfoadervs)) Join Filter: (((ade2.rfoadegch)::text >= (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <= (ade1.rfoadedrt)::text)) -> Hash Join (cost=4.88..2446.21 rows=1 width=214) (actual time=42.168..411.962 rows=114350 loops=1) Hash Cond: ((ade2.rfoade_i_rfodstide)::text = (dwhinv.dwhinv_p2rfodstide)::text) -> Seq Scan on rfoade ade2 (cost=0.00..2262.05 rows=47805 width=213) (actual time=0.057..78.988 rows=47805 loops=1) -> Hash (cost=4.87..4.87 rows=1 width=12) (actual time=41.632..41.632 rows=6360 loops=1) -> Index Scan using dwhinv_rdi_idx on dwhinv (cost=0.00..4.87 rows=1 width=12) (actual time=0.232..28.199 rows=6360 loops=1) Index Cond: (((dwhinv___rforefide)::text = 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND (dwhinv___rfoindrvs = 1)) -> Hash (cost=7.63..7.63 rows=3 width=213) (actual time=0.347..0.347 rows=11 loops=1) -> Index Scan using rfoade_dsi_idx on rfoade ade1 (cost=0.00..7.63 rows=3 width=213) (actual time=0.095..0.307 rows=11 loops=1) Index Cond: ((rfoade_i_rfodstide)::text = 'ACTI'::text) Even if dwhinv row estimation is wrong, the query is quicker So after looking at dwhinv_rdi_idx statistics, I found that dwhinv___rfoindide related stats wasn't good, so I try "ALTER TABLE dwhinv ALTER dwhinv_p2rfodstide SET STATISTICS 2000" and launch an vaccum analyse to gather more impressive stats. Stats are better but query plan is the same and query is not optimised. So I try reindex on DWHINV as a last chance, but it changes nothing ! Maybe I'm wrong with the interpretation of the plan but I don't really think so because with no nested loops this query is really fast ! I do not plan to disable nested loop on the whole database because sometimes, nested loops are greats ! Now I'm stuck ! I don't know how to make the planner understand there are 6000 rows. Or maybe the 3 column index is a bad idea... ?! Thanks -- HOSTIN Damien - Equipe R&D Société Axège www.axege.com
Hello, Before the week end I tried to change the index, but even with the mono-column index on differents columns, the estimated number of rows from dwhinv is 1. Anyone have a suggestion, what can I check ? thx damien hostin a écrit : > Hello, > > I try to make a query run quicker but I don't really know how to give > hints to the planner. > > We are using postgresql 8.4.3 64bit on ubuntu 9.10 server. The > hardware is a 10 SAS drive (15k) on a single RAID 10 array with 8Go RAM. > Queries come from J2EE application (OLAP cube), but running them in > pg_admin perform the same way. > > I made a short example that shows what I think is the problem. The > real query is much longer but with only one join it already cause > problems. > > Here is the short example : > > select rfoadv_8.rfoadvsup as c8, > sum(dwhinv.dwhinvqte) as m0 > from > dwhinv as dwhinv, > rfoadv as rfoadv_8 > where (dwhinv.dwhinv___rforefide = 'HPLUS' > and (dwhinv.dwhinv___rfodomide = 'PMSI' and > dwhinv.dwhinv___rfoindrvs = '1' and > dwhinv.dwhinv___rfoindide='recN3_BB_reel') ) > and dwhinv.dwhinv_p2rfodstide = rfoadv_8.rfoadvinf > and rfoadv_8.rfoadvsup = 'ACTI' > group by rfoadv_8.rfoadvsup > > dwhinv is a table with almost 6.000.000 records > rfoadv is a view with 800.000 records > rfoadv is based on rfoade which is 50.000 records > > Here is the explain analyse : > GroupAggregate (cost=0.00..16.56 rows=1 width=13) (actual > time=2028.452..2028.453 rows=1 loops=1) > -> Nested Loop (cost=0.00..16.54 rows=1 width=13) (actual > time=0.391..1947.432 rows=42664 loops=1) > Join Filter: (((ade2.rfoadegch)::text >= > (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <= > (ade1.rfoadedrt)::text)) > -> Nested Loop (cost=0.00..12.54 rows=1 width=214) (actual > time=0.304..533.281 rows=114350 loops=1) > -> Index Scan using dwhinv_rdi_idx on dwhinv > (cost=0.00..4.87 rows=1 width=12) (actual time=0.227..16.827 rows=6360 > loops=1) > Index Cond: (((dwhinv___rforefide)::text = > 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND > ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND > (dwhinv___rfoindrvs = 1)) > -> Index Scan using rfoade_dsi_idx on rfoade ade2 > (cost=0.00..7.63 rows=3 width=213) (actual time=0.007..0.037 rows=18 > loops=6360) > Index Cond: ((ade2.rfoade_i_rfodstide)::text = > (dwhinv.dwhinv_p2rfodstide)::text) > -> Index Scan using rfoade_pk on rfoade ade1 (cost=0.00..3.98 > rows=1 width=213) (actual time=0.008..0.009 rows=0 loops=114350) > Index Cond: (((ade1.rfoade___rforefide)::text = > (ade2.rfoade___rforefide)::text) AND ((ade1.rfoade_i_rfodstide)::text > = 'ACTI'::text) AND ((ade1.rfoade___rfovdeide)::text = > (ade2.rfoade___rfovdeide)::text) AND (ade1.rfoadervs = ade2.rfoadervs)) > > We can see that the planner think that accessing dwhinv with the > dwhinv_rdi_idx index will return 1 row, but in fact there are 6360. So > the nested loop is not done with 1 loop but 6360. With only one Join, > the query runs in about 1.5 sec which is not really long, but with 8 > join, the same mistake is repeated 8 times, the query runs in 30-60 > sec. I try to disable nested loop, hash join and merge join are done > instead of nested loops, example query runs in 0.2 - 0.5 sec, and the > real query no more that 1 sec ! Which is great. > > Here is the execution plan with nested loop off: > > GroupAggregate (cost=12.56..2453.94 rows=1 width=13) (actual > time=817.306..817.307 rows=1 loops=1) > -> Hash Join (cost=12.56..2453.93 rows=1 width=13) (actual > time=42.583..720.746 rows=42664 loops=1) > Hash Cond: (((ade2.rfoade___rforefide)::text = > (ade1.rfoade___rforefide)::text) AND ((ade2.rfoade___rfovdeide)::text > = (ade1.rfoade___rfovdeide)::text) AND (ade2.rfoadervs = ade1.rfoadervs)) > Join Filter: (((ade2.rfoadegch)::text >= > (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <= > (ade1.rfoadedrt)::text)) > -> Hash Join (cost=4.88..2446.21 rows=1 width=214) (actual > time=42.168..411.962 rows=114350 loops=1) > Hash Cond: ((ade2.rfoade_i_rfodstide)::text = > (dwhinv.dwhinv_p2rfodstide)::text) > -> Seq Scan on rfoade ade2 (cost=0.00..2262.05 > rows=47805 width=213) (actual time=0.057..78.988 rows=47805 loops=1) > -> Hash (cost=4.87..4.87 rows=1 width=12) (actual > time=41.632..41.632 rows=6360 loops=1) > -> Index Scan using dwhinv_rdi_idx on dwhinv > (cost=0.00..4.87 rows=1 width=12) (actual time=0.232..28.199 rows=6360 > loops=1) > Index Cond: (((dwhinv___rforefide)::text = > 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND > ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND > (dwhinv___rfoindrvs = 1)) > -> Hash (cost=7.63..7.63 rows=3 width=213) (actual > time=0.347..0.347 rows=11 loops=1) > -> Index Scan using rfoade_dsi_idx on rfoade ade1 > (cost=0.00..7.63 rows=3 width=213) (actual time=0.095..0.307 rows=11 > loops=1) > Index Cond: ((rfoade_i_rfodstide)::text = > 'ACTI'::text) > > Even if dwhinv row estimation is wrong, the query is quicker > > > So after looking at dwhinv_rdi_idx statistics, I found that > dwhinv___rfoindide related stats wasn't good, so I try "ALTER TABLE > dwhinv ALTER dwhinv_p2rfodstide SET STATISTICS 2000" and launch an > vaccum analyse to gather more impressive stats. Stats are better but > query plan is the same and query is not optimised. So I try reindex on > DWHINV as a last chance, but it changes nothing ! > > Maybe I'm wrong with the interpretation of the plan but I don't really > think so because with no nested loops this query is really fast ! I do > not plan to disable nested loop on the whole database because > sometimes, nested loops are greats ! > > Now I'm stuck ! I don't know how to make the planner understand there > are 6000 rows. Or maybe the 3 column index is a bad idea... ?! > > Thanks > -- HOSTIN Damien - Equipe R&D Tel:+33(0)4 63 05 95 40 Société Axège 23 rue Saint Simon 63000 Clermont Ferrand www.axege.com
Hello, Postgresql configuration was default. So I take a look at pgtune which help me start a bit of tuning. I thought that the planner mistake could come from the default low memory configuration. But after applying new parameters, nothing has changed. The query is still low, the execution plan is still using nested loops where hashjoin/hashmerge seems a lot better. Here are the postgresql.conf parameters I changed using pgtune advises, all other are defaults. (The hardware is a 10 SAS drive (15k) on a single RAID 10 array with 8Go RAM, with 2 opteron dual core 64bit (I can't remember the exact model)) # generated for 100 connection and 6G RAM with datawarehouse type # default_statistics_target = 100 maintenance_work_mem = 768MB #constraint_exclusion = on #checkpoint_completion_target = 0.9 effective_cache_size = 4608MB work_mem = 30MB wal_buffers = 32MB checkpoint_segments = 64 shared_buffers = 1536MB Some information that I may have forgotten. SELECT version(); "PostgreSQL 8.4.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.1-4ubuntu8) 4.4.1, 64-bit" and here is a link with the full request explain analyse http://explain.depesz.com/s/Yx0 I will try the same query with the same data on another server, with "PostgreSQL 8.3.11 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3)". damien hostin a écrit : > Hello, > > Before the week end I tried to change the index, but even with the > mono-column index on differents columns, the estimated number of rows > from dwhinv is 1. > > Anyone have a suggestion, what can I check ? > > > thx > > > damien hostin a écrit : >> Hello, >> >> I try to make a query run quicker but I don't really know how to give >> hints to the planner. >> >> We are using postgresql 8.4.3 64bit on ubuntu 9.10 server. The >> hardware is a 10 SAS drive (15k) on a single RAID 10 array with 8Go RAM. >> Queries come from J2EE application (OLAP cube), but running them in >> pg_admin perform the same way. >> >> I made a short example that shows what I think is the problem. The >> real query is much longer but with only one join it already cause >> problems. >> >> Here is the short example : >> >> select rfoadv_8.rfoadvsup as c8, >> sum(dwhinv.dwhinvqte) as m0 >> from >> dwhinv as dwhinv, >> rfoadv as rfoadv_8 >> where (dwhinv.dwhinv___rforefide = 'HPLUS' >> and (dwhinv.dwhinv___rfodomide = 'PMSI' and >> dwhinv.dwhinv___rfoindrvs = '1' and >> dwhinv.dwhinv___rfoindide='recN3_BB_reel') ) >> and dwhinv.dwhinv_p2rfodstide = rfoadv_8.rfoadvinf >> and rfoadv_8.rfoadvsup = 'ACTI' >> group by rfoadv_8.rfoadvsup >> >> dwhinv is a table with almost 6.000.000 records >> rfoadv is a view with 800.000 records >> rfoadv is based on rfoade which is 50.000 records >> >> Here is the explain analyse : >> GroupAggregate (cost=0.00..16.56 rows=1 width=13) (actual >> time=2028.452..2028.453 rows=1 loops=1) >> -> Nested Loop (cost=0.00..16.54 rows=1 width=13) (actual >> time=0.391..1947.432 rows=42664 loops=1) >> Join Filter: (((ade2.rfoadegch)::text >= >> (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <= >> (ade1.rfoadedrt)::text)) >> -> Nested Loop (cost=0.00..12.54 rows=1 width=214) (actual >> time=0.304..533.281 rows=114350 loops=1) >> -> Index Scan using dwhinv_rdi_idx on dwhinv >> (cost=0.00..4.87 rows=1 width=12) (actual time=0.227..16.827 >> rows=6360 loops=1) >> Index Cond: (((dwhinv___rforefide)::text = >> 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND >> ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND >> (dwhinv___rfoindrvs = 1)) >> -> Index Scan using rfoade_dsi_idx on rfoade ade2 >> (cost=0.00..7.63 rows=3 width=213) (actual time=0.007..0.037 rows=18 >> loops=6360) >> Index Cond: ((ade2.rfoade_i_rfodstide)::text = >> (dwhinv.dwhinv_p2rfodstide)::text) >> -> Index Scan using rfoade_pk on rfoade ade1 >> (cost=0.00..3.98 rows=1 width=213) (actual time=0.008..0.009 rows=0 >> loops=114350) >> Index Cond: (((ade1.rfoade___rforefide)::text = >> (ade2.rfoade___rforefide)::text) AND ((ade1.rfoade_i_rfodstide)::text >> = 'ACTI'::text) AND ((ade1.rfoade___rfovdeide)::text = >> (ade2.rfoade___rfovdeide)::text) AND (ade1.rfoadervs = ade2.rfoadervs)) >> >> We can see that the planner think that accessing dwhinv with the >> dwhinv_rdi_idx index will return 1 row, but in fact there are 6360. >> So the nested loop is not done with 1 loop but 6360. With only one >> Join, the query runs in about 1.5 sec which is not really long, but >> with 8 join, the same mistake is repeated 8 times, the query runs in >> 30-60 sec. I try to disable nested loop, hash join and merge join are >> done instead of nested loops, example query runs in 0.2 - 0.5 sec, >> and the real query no more that 1 sec ! Which is great. >> >> Here is the execution plan with nested loop off: >> >> GroupAggregate (cost=12.56..2453.94 rows=1 width=13) (actual >> time=817.306..817.307 rows=1 loops=1) >> -> Hash Join (cost=12.56..2453.93 rows=1 width=13) (actual >> time=42.583..720.746 rows=42664 loops=1) >> Hash Cond: (((ade2.rfoade___rforefide)::text = >> (ade1.rfoade___rforefide)::text) AND ((ade2.rfoade___rfovdeide)::text >> = (ade1.rfoade___rfovdeide)::text) AND (ade2.rfoadervs = >> ade1.rfoadervs)) >> Join Filter: (((ade2.rfoadegch)::text >= >> (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <= >> (ade1.rfoadedrt)::text)) >> -> Hash Join (cost=4.88..2446.21 rows=1 width=214) (actual >> time=42.168..411.962 rows=114350 loops=1) >> Hash Cond: ((ade2.rfoade_i_rfodstide)::text = >> (dwhinv.dwhinv_p2rfodstide)::text) >> -> Seq Scan on rfoade ade2 (cost=0.00..2262.05 >> rows=47805 width=213) (actual time=0.057..78.988 rows=47805 loops=1) >> -> Hash (cost=4.87..4.87 rows=1 width=12) (actual >> time=41.632..41.632 rows=6360 loops=1) >> -> Index Scan using dwhinv_rdi_idx on dwhinv >> (cost=0.00..4.87 rows=1 width=12) (actual time=0.232..28.199 >> rows=6360 loops=1) >> Index Cond: (((dwhinv___rforefide)::text = >> 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND >> ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND >> (dwhinv___rfoindrvs = 1)) >> -> Hash (cost=7.63..7.63 rows=3 width=213) (actual >> time=0.347..0.347 rows=11 loops=1) >> -> Index Scan using rfoade_dsi_idx on rfoade ade1 >> (cost=0.00..7.63 rows=3 width=213) (actual time=0.095..0.307 rows=11 >> loops=1) >> Index Cond: ((rfoade_i_rfodstide)::text = >> 'ACTI'::text) >> >> Even if dwhinv row estimation is wrong, the query is quicker >> >> >> So after looking at dwhinv_rdi_idx statistics, I found that >> dwhinv___rfoindide related stats wasn't good, so I try "ALTER TABLE >> dwhinv ALTER dwhinv_p2rfodstide SET STATISTICS 2000" and launch an >> vaccum analyse to gather more impressive stats. Stats are better but >> query plan is the same and query is not optimised. So I try reindex >> on DWHINV as a last chance, but it changes nothing ! >> >> Maybe I'm wrong with the interpretation of the plan but I don't >> really think so because with no nested loops this query is really >> fast ! I do not plan to disable nested loop on the whole database >> because sometimes, nested loops are greats ! >> >> Now I'm stuck ! I don't know how to make the planner understand there >> are 6000 rows. Or maybe the 3 column index is a bad idea... ?! >> >> Thanks >> > > -- HOSTIN Damien - Equipe R&D Tel:+33(0)4 63 05 95 40 Société Axège 23 rue Saint Simon 63000 Clermont Ferrand www.axege.com
Hello again, At last, I check the same query with the same data on my desktop computer. Just after loading the data, the queries were slow, I launch a vaccum analyse which collect good stats on the main table, the query became quick (~200ms). Now 1classic sata disk computer is faster than our little monster server !! I compare the volume between the two database. On my desktop computer, the table dwinv has 12000 row with 6000 implicated in my query. The dev server has 6000000 rows with only 6000 implicated in the query. I check the repartition of the column I am using in this query and actually, only the 6000 rows implicated in the query are using column with non null values. I put statistics target on this columns at 10000 which make the analyse take half the table as sample for stats. This way I get some values for these columns. But the execution plan is still mistaking. (plan : http://explain.depesz.com/s/LKW) I try to compare with desktop plan, but it seems to have nothing comparable. I though I would find something like "access on dwhinv with 6000 estimated rows", but it does the following : http://explain.depesz.com/s/kbn I don't understand "rows=0" in : Index Scan using dwhinv_dig_idx on dwhinv (cost=0.00..25.91 rows=1 width=80) (actual time=0.009..0.010 rows=0 loops=120) * Index Cond: ((dwhinv.dwhinv___rsadigide)::text = (adi2.rsaadi_i_rsadigide)::text) * Filter: (((dwhinv.dwhinv___rforefide)::text = 'HPLUS'::text) AND (dwhinv.dwhinv___rfoindrvs = 1) AND ((dwhinv.dwhinv___rfodomide)::text = 'PMSI'::text) AND ((dwhinv.dwhinv___rfoindide)::text = 'recN3_BB_reel'::text)) I also managed to make the query run 10x faster with SQL92 join syntax instead of old "from table1, table where table1.col1=table2.col1". This way the query takes 3sec instead of 30sec. But again, without nested loops, 200ms ! I will try later with new mondrian release and a better balanced fact table. Thanks anyway__ damien hostin a écrit : > Hello, > > Postgresql configuration was default. So I take a look at pgtune which > help me start a bit of tuning. I thought that the planner mistake > could come from the default low memory configuration. But after > applying new parameters, nothing has changed. The query is still low, > the execution plan is still using nested loops where > hashjoin/hashmerge seems a lot better. > > Here are the postgresql.conf parameters I changed using pgtune > advises, all other are defaults. > (The hardware is a 10 SAS drive (15k) on a single RAID 10 array with > 8Go RAM, with 2 opteron dual core 64bit (I can't remember the exact > model)) > > # generated for 100 connection and 6G RAM with datawarehouse type > # > default_statistics_target = 100 > maintenance_work_mem = 768MB > #constraint_exclusion = on > #checkpoint_completion_target = 0.9 > effective_cache_size = 4608MB > work_mem = 30MB > wal_buffers = 32MB > checkpoint_segments = 64 > shared_buffers = 1536MB > > Some information that I may have forgotten. > SELECT version(); > "PostgreSQL 8.4.3 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real > (Ubuntu 4.4.1-4ubuntu8) 4.4.1, 64-bit" > > > and here is a link with the full request explain analyse > http://explain.depesz.com/s/Yx0 > > > I will try the same query with the same data on another server, with > "PostgreSQL 8.3.11 on i486-pc-linux-gnu, compiled by GCC cc (GCC) > 4.2.4 (Ubuntu 4.2.4-1ubuntu3)". > > > damien hostin a écrit : >> Hello, >> >> Before the week end I tried to change the index, but even with the >> mono-column index on differents columns, the estimated number of rows >> from dwhinv is 1. >> >> Anyone have a suggestion, what can I check ? >> >> >> thx >> >> >> damien hostin a écrit : >>> Hello, >>> >>> I try to make a query run quicker but I don't really know how to >>> give hints to the planner. >>> >>> We are using postgresql 8.4.3 64bit on ubuntu 9.10 server. The >>> hardware is a 10 SAS drive (15k) on a single RAID 10 array with 8Go >>> RAM. >>> Queries come from J2EE application (OLAP cube), but running them in >>> pg_admin perform the same way. >>> >>> I made a short example that shows what I think is the problem. The >>> real query is much longer but with only one join it already cause >>> problems. >>> >>> Here is the short example : >>> >>> select rfoadv_8.rfoadvsup as c8, >>> sum(dwhinv.dwhinvqte) as m0 >>> from >>> dwhinv as dwhinv, >>> rfoadv as rfoadv_8 >>> where (dwhinv.dwhinv___rforefide = 'HPLUS' >>> and (dwhinv.dwhinv___rfodomide = 'PMSI' and >>> dwhinv.dwhinv___rfoindrvs = '1' and >>> dwhinv.dwhinv___rfoindide='recN3_BB_reel') ) >>> and dwhinv.dwhinv_p2rfodstide = rfoadv_8.rfoadvinf >>> and rfoadv_8.rfoadvsup = 'ACTI' >>> group by rfoadv_8.rfoadvsup >>> >>> dwhinv is a table with almost 6.000.000 records >>> rfoadv is a view with 800.000 records >>> rfoadv is based on rfoade which is 50.000 records >>> >>> Here is the explain analyse : >>> GroupAggregate (cost=0.00..16.56 rows=1 width=13) (actual >>> time=2028.452..2028.453 rows=1 loops=1) >>> -> Nested Loop (cost=0.00..16.54 rows=1 width=13) (actual >>> time=0.391..1947.432 rows=42664 loops=1) >>> Join Filter: (((ade2.rfoadegch)::text >= >>> (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <= >>> (ade1.rfoadedrt)::text)) >>> -> Nested Loop (cost=0.00..12.54 rows=1 width=214) (actual >>> time=0.304..533.281 rows=114350 loops=1) >>> -> Index Scan using dwhinv_rdi_idx on dwhinv >>> (cost=0.00..4.87 rows=1 width=12) (actual time=0.227..16.827 >>> rows=6360 loops=1) >>> Index Cond: (((dwhinv___rforefide)::text = >>> 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND >>> ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND >>> (dwhinv___rfoindrvs = 1)) >>> -> Index Scan using rfoade_dsi_idx on rfoade ade2 >>> (cost=0.00..7.63 rows=3 width=213) (actual time=0.007..0.037 rows=18 >>> loops=6360) >>> Index Cond: ((ade2.rfoade_i_rfodstide)::text = >>> (dwhinv.dwhinv_p2rfodstide)::text) >>> -> Index Scan using rfoade_pk on rfoade ade1 >>> (cost=0.00..3.98 rows=1 width=213) (actual time=0.008..0.009 rows=0 >>> loops=114350) >>> Index Cond: (((ade1.rfoade___rforefide)::text = >>> (ade2.rfoade___rforefide)::text) AND >>> ((ade1.rfoade_i_rfodstide)::text = 'ACTI'::text) AND >>> ((ade1.rfoade___rfovdeide)::text = (ade2.rfoade___rfovdeide)::text) >>> AND (ade1.rfoadervs = ade2.rfoadervs)) >>> >>> We can see that the planner think that accessing dwhinv with the >>> dwhinv_rdi_idx index will return 1 row, but in fact there are 6360. >>> So the nested loop is not done with 1 loop but 6360. With only one >>> Join, the query runs in about 1.5 sec which is not really long, but >>> with 8 join, the same mistake is repeated 8 times, the query runs in >>> 30-60 sec. I try to disable nested loop, hash join and merge join >>> are done instead of nested loops, example query runs in 0.2 - 0.5 >>> sec, and the real query no more that 1 sec ! Which is great. >>> >>> Here is the execution plan with nested loop off: >>> >>> GroupAggregate (cost=12.56..2453.94 rows=1 width=13) (actual >>> time=817.306..817.307 rows=1 loops=1) >>> -> Hash Join (cost=12.56..2453.93 rows=1 width=13) (actual >>> time=42.583..720.746 rows=42664 loops=1) >>> Hash Cond: (((ade2.rfoade___rforefide)::text = >>> (ade1.rfoade___rforefide)::text) AND >>> ((ade2.rfoade___rfovdeide)::text = (ade1.rfoade___rfovdeide)::text) >>> AND (ade2.rfoadervs = ade1.rfoadervs)) >>> Join Filter: (((ade2.rfoadegch)::text >= >>> (ade1.rfoadegch)::text) AND ((ade2.rfoadedrt)::text <= >>> (ade1.rfoadedrt)::text)) >>> -> Hash Join (cost=4.88..2446.21 rows=1 width=214) (actual >>> time=42.168..411.962 rows=114350 loops=1) >>> Hash Cond: ((ade2.rfoade_i_rfodstide)::text = >>> (dwhinv.dwhinv_p2rfodstide)::text) >>> -> Seq Scan on rfoade ade2 (cost=0.00..2262.05 >>> rows=47805 width=213) (actual time=0.057..78.988 rows=47805 loops=1) >>> -> Hash (cost=4.87..4.87 rows=1 width=12) (actual >>> time=41.632..41.632 rows=6360 loops=1) >>> -> Index Scan using dwhinv_rdi_idx on dwhinv >>> (cost=0.00..4.87 rows=1 width=12) (actual time=0.232..28.199 >>> rows=6360 loops=1) >>> Index Cond: (((dwhinv___rforefide)::text = >>> 'HPLUS'::text) AND ((dwhinv___rfodomide)::text = 'PMSI'::text) AND >>> ((dwhinv___rfoindide)::text = 'recN3_BB_reel'::text) AND >>> (dwhinv___rfoindrvs = 1)) >>> -> Hash (cost=7.63..7.63 rows=3 width=213) (actual >>> time=0.347..0.347 rows=11 loops=1) >>> -> Index Scan using rfoade_dsi_idx on rfoade ade1 >>> (cost=0.00..7.63 rows=3 width=213) (actual time=0.095..0.307 rows=11 >>> loops=1) >>> Index Cond: ((rfoade_i_rfodstide)::text = >>> 'ACTI'::text) >>> >>> Even if dwhinv row estimation is wrong, the query is quicker >>> >>> >>> So after looking at dwhinv_rdi_idx statistics, I found that >>> dwhinv___rfoindide related stats wasn't good, so I try "ALTER TABLE >>> dwhinv ALTER dwhinv_p2rfodstide SET STATISTICS 2000" and launch an >>> vaccum analyse to gather more impressive stats. Stats are better but >>> query plan is the same and query is not optimised. So I try reindex >>> on DWHINV as a last chance, but it changes nothing ! >>> >>> Maybe I'm wrong with the interpretation of the plan but I don't >>> really think so because with no nested loops this query is really >>> fast ! I do not plan to disable nested loop on the whole database >>> because sometimes, nested loops are greats ! >>> >>> Now I'm stuck ! I don't know how to make the planner understand >>> there are 6000 rows. Or maybe the 3 column index is a bad idea... ?! >>> >>> Thanks >>> >> >> > > -- HOSTIN Damien - Equipe R&D Tel:+33(0)4 63 05 95 40 Société Axège 23 rue Saint Simon 63000 Clermont Ferrand www.axege.com
On Wed, Jul 7, 2010 at 10:39 AM, damien hostin <damien.hostin@axege.com> wrote: > Hello again, > > At last, I check the same query with the same data on my desktop computer. > Just after loading the data, the queries were slow, I launch a vaccum > analyse which collect good stats on the main table, the query became quick > (~200ms). Now 1classic sata disk computer is faster than our little monster > server !! Have you tried running ANALYZE on the production server? You might also want to try ALTER TABLE ... SET STATISTICS to a large value on some of the join columns involved in the query. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas a écrit : > On Wed, Jul 7, 2010 at 10:39 AM, damien hostin <damien.hostin@axege.com> wrote: > >> Hello again, >> >> At last, I check the same query with the same data on my desktop computer. >> Just after loading the data, the queries were slow, I launch a vaccum >> analyse which collect good stats on the main table, the query became quick >> (~200ms). Now 1classic sata disk computer is faster than our little monster >> server !! >> > > Have you tried running ANALYZE on the production server? > > You might also want to try ALTER TABLE ... SET STATISTICS to a large > value on some of the join columns involved in the query. > > Hello, Before comparing the test case on the two machines, I run analyse on the whole and look at pg_stats table to see if change occurs for the columns. but on the production server the stats never became as good as on the desktop computer. I set statistic at 10000 on column used by the join, run analyse which take a 3000000 row sample then look at the stats. The stats are not as good as on the desktop. Row number is nearly the same but only 1 or 2 values are found. The data are not balanced the same way on the two computer : - Desktop is 12000 rows with 6000 implicated in the query (50%), - "Production" (actually a dev/test server) is 6 million rows with 6000 implicated in the query (0,1%). Columns used in the query are nullable, and in the 5994000 other rows that are not implicated in the query these columns are null. I don't know if the statistic target is a % or a number of value to obtain, but event set at max (10000), it didn't managed to collect good stats (for this particular query). As I don't know what more to do, my conclusion is that the data need to be better balanced to allow the analyse gather better stats. But if there is a way to improve the stats/query with this ugly balanced data, I'm open to it ! I hope that in real production, data will never be loaded this way. If this appened we will maybe set enable_nestloop to off, but I don't think it's a good solution, other query have a chance to get slower. Thanks for helping -- HOSTIN Damien - Equipe R&D Tel:+33(0)4 63 05 95 40 Société Axège 23 rue Saint Simon 63000 Clermont Ferrand www.axege.com
On Fri, Jul 9, 2010 at 6:13 AM, damien hostin <damien.hostin@axege.com> wrote: >> Have you tried running ANALYZE on the production server? >> >> You might also want to try ALTER TABLE ... SET STATISTICS to a large >> value on some of the join columns involved in the query. > > Hello, > > Before comparing the test case on the two machines, I run analyse on the > whole and look at pg_stats table to see if change occurs for the columns. > but on the production server the stats never became as good as on the > desktop computer. I set statistic at 10000 on column used by the join, run > analyse which take a 3000000 row sample then look at the stats. The stats > are not as good as on the desktop. Row number is nearly the same but only 1 > or 2 values are found. > > The data are not balanced the same way on the two computer : > - Desktop is 12000 rows with 6000 implicated in the query (50%), > - "Production" (actually a dev/test server) is 6 million rows with 6000 > implicated in the query (0,1%). > Columns used in the query are nullable, and in the 5994000 other rows that > are not implicated in the query these columns are null. > > I don't know if the statistic target is a % or a number of value to obtain, It's a number of values to obtain. > but event set at max (10000), it didn't managed to collect good stats (for > this particular query). I think there's a cutoff where it won't collect values unless they occur significantly more often than the average frequency. I wonder if that might be biting you here: without the actual values in the MCV table, the join selectivity estimates probably aren't too good. > As I don't know what more to do, my conclusion is that the data need to be > better balanced to allow the analyse gather better stats. But if there is a > way to improve the stats/query with this ugly balanced data, I'm open to it > ! > > I hope that in real production, data will never be loaded this way. If this > appened we will maybe set enable_nestloop to off, but I don't think it's a > good solution, other query have a chance to get slower. Yeah, that usually works out poorly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
It's probably one of the cases when having HINTS in PostgreSQL may be very helpful.. SELECT /*+ enable_nestloop=off */ ... FROM ... will just fix this query without impacting other queries and without adding any additional instructions into the application code.. So, why there is a such resistance to implement hints withing SQL queries in PG?.. Rgds, -Dimitri On 7/9/10, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jul 9, 2010 at 6:13 AM, damien hostin <damien.hostin@axege.com> > wrote: >>> Have you tried running ANALYZE on the production server? >>> >>> You might also want to try ALTER TABLE ... SET STATISTICS to a large >>> value on some of the join columns involved in the query. >> >> Hello, >> >> Before comparing the test case on the two machines, I run analyse on the >> whole and look at pg_stats table to see if change occurs for the columns. >> but on the production server the stats never became as good as on the >> desktop computer. I set statistic at 10000 on column used by the join, run >> analyse which take a 3000000 row sample then look at the stats. The stats >> are not as good as on the desktop. Row number is nearly the same but only >> 1 >> or 2 values are found. >> >> The data are not balanced the same way on the two computer : >> - Desktop is 12000 rows with 6000 implicated in the query (50%), >> - "Production" (actually a dev/test server) is 6 million rows with 6000 >> implicated in the query (0,1%). >> Columns used in the query are nullable, and in the 5994000 other rows that >> are not implicated in the query these columns are null. >> >> I don't know if the statistic target is a % or a number of value to >> obtain, > > It's a number of values to obtain. > >> but event set at max (10000), it didn't managed to collect good stats (for >> this particular query). > > I think there's a cutoff where it won't collect values unless they > occur significantly more often than the average frequency. I wonder > if that might be biting you here: without the actual values in the MCV > table, the join selectivity estimates probably aren't too good. > >> As I don't know what more to do, my conclusion is that the data need to be >> better balanced to allow the analyse gather better stats. But if there is >> a >> way to improve the stats/query with this ugly balanced data, I'm open to >> it >> ! >> >> I hope that in real production, data will never be loaded this way. If >> this >> appened we will maybe set enable_nestloop to off, but I don't think it's a >> good solution, other query have a chance to get slower. > > Yeah, that usually works out poorly. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise Postgres Company > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Dimitri a écrit : > It's probably one of the cases when having HINTS in PostgreSQL may be > very helpful.. > > SELECT /*+ enable_nestloop=off */ ... FROM ... > > will just fix this query without impacting other queries and without > adding any additional instructions into the application code.. > > So, why there is a such resistance to implement hints withing SQL > queries in PG?.. > > Rgds, > -Dimitri > > +1. Another typical case when it would be helpful is with setting the cursor_tuple_fraction GUC variable for a specific statement, without being obliged to issue 2 SET statements, one before the SELECT and the other after. > On 7/9/10, Robert Haas <robertmhaas@gmail.com> wrote: > >> On Fri, Jul 9, 2010 at 6:13 AM, damien hostin <damien.hostin@axege.com> >> wrote: >> >>>> Have you tried running ANALYZE on the production server? >>>> >>>> You might also want to try ALTER TABLE ... SET STATISTICS to a large >>>> value on some of the join columns involved in the query. >>>> >>> Hello, >>> >>> Before comparing the test case on the two machines, I run analyse on the >>> whole and look at pg_stats table to see if change occurs for the columns. >>> but on the production server the stats never became as good as on the >>> desktop computer. I set statistic at 10000 on column used by the join, run >>> analyse which take a 3000000 row sample then look at the stats. The stats >>> are not as good as on the desktop. Row number is nearly the same but only >>> 1 >>> or 2 values are found. >>> >>> The data are not balanced the same way on the two computer : >>> - Desktop is 12000 rows with 6000 implicated in the query (50%), >>> - "Production" (actually a dev/test server) is 6 million rows with 6000 >>> implicated in the query (0,1%). >>> Columns used in the query are nullable, and in the 5994000 other rows that >>> are not implicated in the query these columns are null. >>> >>> I don't know if the statistic target is a % or a number of value to >>> obtain, >>> >> It's a number of values to obtain. >> >> >>> but event set at max (10000), it didn't managed to collect good stats (for >>> this particular query). >>> >> I think there's a cutoff where it won't collect values unless they >> occur significantly more often than the average frequency. I wonder >> if that might be biting you here: without the actual values in the MCV >> table, the join selectivity estimates probably aren't too good. >> >> >>> As I don't know what more to do, my conclusion is that the data need to be >>> better balanced to allow the analyse gather better stats. But if there is >>> a >>> way to improve the stats/query with this ugly balanced data, I'm open to >>> it >>> ! >>> >>> I hope that in real production, data will never be loaded this way. If >>> this >>> appened we will maybe set enable_nestloop to off, but I don't think it's a >>> good solution, other query have a chance to get slower. >>> >> Yeah, that usually works out poorly. >> >> -- >> Robert Haas >> EnterpriseDB: http://www.enterprisedb.com >> The Enterprise Postgres Company >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance >> >> Regards. Philippe Beaudoin.
phb07 a écrit : > > Dimitri a écrit : >> It's probably one of the cases when having HINTS in PostgreSQL may be >> very helpful.. >> >> SELECT /*+ enable_nestloop=off */ ... FROM ... >> >> will just fix this query without impacting other queries and without >> adding any additional instructions into the application code.. >> >> So, why there is a such resistance to implement hints withing SQL >> queries in PG?.. >> >> Rgds, >> -Dimitri >> >> > +1. > Another typical case when it would be helpful is with setting the > cursor_tuple_fraction GUC variable for a specific statement, without > being obliged to issue 2 SET statements, one before the SELECT and the > other after. > > I remember that the "dimension" columns of the fact table have indexes like with "WHERE IS NOT NULL" on the column indexed. Example: CREATE INDEX dwhinv_pd2_idx ON dwhinv USING btree (dwhinv_p2rfodstide) TABLESPACE tb_index WHERE dwhinv_p2rfodstide IS NOT NULL; Is the where clause being used to select the sample rows on which the stats will be calculated or just used to exclude values after collecting stat ? As I am writing I realize there's must be no link between a table column stats and an index a the same column. (By the way, If I used is not null on each column with such an index, it changes nothing) About the oracle-like hints, it does not really help, because the query is generated in an external jar that I should fork to include the modification. I would prefer forcing a plan based on the query hashcode, but this does not fix what make the planner goes wrong. -- HOSTIN Damien - Equipe R&D Tel:+33(0)4 63 05 95 40 Société Axège 23 rue Saint Simon 63000 Clermont Ferrand www.axege.com
On Mon, Jul 12, 2010 at 4:33 PM, phb07 <phb07@apra.asso.fr> wrote: > > Dimitri a écrit : >> >> It's probably one of the cases when having HINTS in PostgreSQL may be >> very helpful.. >> >> SELECT /*+ enable_nestloop=off */ ... FROM ... >> >> will just fix this query without impacting other queries and without >> adding any additional instructions into the application code.. >> >> So, why there is a such resistance to implement hints withing SQL >> queries in PG?.. >> > > +1. > Another typical case when it would be helpful is with setting the > cursor_tuple_fraction GUC variable for a specific statement, without being > obliged to issue 2 SET statements, one before the SELECT and the other > after. We've previously discussed adding a command something like: LET (variable = value, variable = value, ...) command ...which would set those variables just for that one command. But honestly I'm not sure how much it'll help with query planner problems. Disabling nestloops altogether, even for one particular query, is often going to be a sledgehammer where you need a scalpel. But then again, a sledgehammer is better than no hammer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company