Thread: BUG #1878: Different execution plans for the same query.
The following bug has been logged online: Bug reference: 1878 Logged by: Alexei Email address: alexei_novakov@yahoo.com PostgreSQL version: 8.0 Operating system: Windows 2000 Description: Different execution plans for the same query. Details: I was doing performance evaluation of various DB servers including PostgreSQL. For one of test queries I got strange result - timing was different (from 0.5 to 240 sec) even if I was running it one time after another without going anything in between (like there were no inserts or whatever similar). I did vacuum and vacuum analyse, but it didn't change anything. I checked query plan and found that this plan is different every time. Sometimes it generates plan which uses indices all the way, but sometimes it goes for table (or even two tables) scan. Although the result of the query always was correct whichever plan it's using. I repete - I did not do anything between queries, but just ran the same 'explain select ...' sequentially. The query itself is heavy, but pretty strightforward - 2 tables joined 12 times. Here it is, just in case: select T1.MAIN_ID, T2.ORDER_F, T3.ORDER_F, T4.ORDER_F, T5.ORDER_F, T6.LEVEL_IND, T6.PRE_IND, T7.ORDER_F, T8.ORDER_F, T9.ORDER_F, T10.ORDER_F, T11.ORDER_F, T12.MAIN_ID from TMP_MAIN T1, TMP_RS T2, TMP_RS T3, TMP_RS T4, TMP_RS T5, RSCACHE_R T6, TMP_RS T7, TMP_RS T8, TMP_RS T9, TMP_RS T10, TMP_RS T11, TMP_MAIN T12 where T2.ID_2 = T1.MAIN_ID and T2.ID_TYPE = 269 and 990765472621084681 = T2.ID_1 and T4.ID_1 = T12.MAIN_ID and T4.ID_TYPE = 143 and T3.ID_TYPE = 143 and 990765472621084681 = T3.ID_1 and T4.ID_2 = T3.ID_2 and T7.ID_1 = T12.MAIN_ID and T7.ID_TYPE = 224 and T6.ID_TYPE = 209 and T6.LEVEL_IND != 0 and T5.ID_TYPE = 224 and 990765472621084681 = T5.ID_1 and T6.ID_1 = T5.ID_2 and T7.ID_2 = T6.ID_2 and T9.ID_1 = T12.MAIN_ID and T9.ID_TYPE = 125 and T8.ID_TYPE = 125 and 990765472621084681 = T8.ID_1 and T9.ID_2 = T8.ID_2 and T11.ID_1 = T12.MAIN_ID and T11.ID_TYPE = 71 and T10.ID_TYPE = 71 and 990765472621084681 = T10.ID_1 and T11.ID_2 = T10.ID_2 and T12.ID_TYPE = 286 Both tables have 500000-1000000 records. If you need more information contact me via e-mail. Regards. Alexei.
"Alexei" <alexei_novakov@yahoo.com> writes: > .... Although the result > of the query always was correct whichever plan it's using. I repete - I did > not do anything between queries, but just ran the same 'explain select ...' > sequentially. The query itself is heavy, but pretty strightforward - 2 > tables joined 12 times. With 12 tables you'll be using the geqo optimizer by default, and it has a random component to the plans by design. It's a bit disappointing that it doesn't consistently find plans of about the same quality though. Can you provide a complete test case? (No, the bare text of a query isn't a test case...) regards, tom lane
Alexei Novakov <alexei-novakov@rogers.com> writes: > Attaching the dumps of the tables. And here is the > query: > select T1.MAIN_ID, T6.LEVEL_IND, T12.MAIN_ID > from TMP_1 T1, TMP_2 T2, TMP_2 T3, TMP_2 T4, TMP_2 T5, > TMP_3 T6, > TMP_2 T7, TMP_2 T8, TMP_2 T9, TMP_2 T10, TMP_2 > T11, TMP_1 T12 > where T2.ID_2 = T1.MAIN_ID > and T2.ID_TO_ID_TYPE = 201269 > and 990765472621084681 = T2.ID_1 > and T4.ID_1 = T12.MAIN_ID > and T4.ID_TO_ID_TYPE = 201143 > and T3.ID_TO_ID_TYPE = 201143 > and 990765472621084681 = T3.ID_1 > and T4.ID_2 = T3.ID_2 > and T7.ID_1 = T12.MAIN_ID > and T7.ID_TO_ID_TYPE = 201224 > and T6.ID_TO_ID_TYPE = 201209 > and T6.LEVEL_IND != 0 > and T5.ID_TO_ID_TYPE = 201224 > and 990765472621084681 = T5.ID_1 > and T6.ID_1 = T5.ID_2 > and T7.ID_2 = T6.ID_2 > and T9.ID_1 = T12.MAIN_ID > and T9.ID_TO_ID_TYPE = 201125 > and T8.ID_TO_ID_TYPE = 201125 > and 990765472621084681 = T8.ID_1 > and T9.ID_2 = T8.ID_2 > and T11.ID_1 = T12.MAIN_ID > and T11.ID_TO_ID_TYPE = 201071 > and T10.ID_TO_ID_TYPE = 201071 > and 990765472621084681 = T10.ID_1 > and T11.ID_2 = T10.ID_2 > and T12.ID_TYPE = 301286 I looked into this and found that the problem is bad rowcount estimates, which lead the planner to make plans that are good only by chance. Since you have enough tables in the query to exceed geqo_threshold, the random aspect of the behavior isn't surprising --- but increasing geqo_threshold isn't likely to solve your problem, it'll probably just produce repeatable not-very-good behavior :-( The reason why the rowcount estimates are so far off is that there's a lot of correlation that the planner doesn't recognize. For instance you've got and T4.ID_TO_ID_TYPE = 201143 and T3.ID_TO_ID_TYPE = 201143 and T4.ID_2 = T3.ID_2 where T3 and T4 are actually the same table and so the number of rows matching the above conditions is much higher than a naive guess would suggest. It's possible that the planner could do better if it had cross-column correlation statistics, but it doesn't. It's hard for me to tell whether any of these conditions are redundant given your schema, but if they are I'd recommend removing the redundant conditions --- they aren't doing much for you except fooling the planner. regards, tom lane