Thread: BUG #1878: Different execution plans for the same query.

BUG #1878: Different execution plans for the same query.

From
"Alexei"
Date:
The following bug has been logged online:

Bug reference:      1878
Logged by:          Alexei
Email address:      alexei_novakov@yahoo.com
PostgreSQL version: 8.0
Operating system:   Windows 2000
Description:        Different execution plans for the same query.
Details:

I was doing performance evaluation of various DB servers including
PostgreSQL. For one of test queries I got strange result - timing was
different (from 0.5 to 240 sec) even if I was running it one time after
another without going anything in between (like there were no inserts or
whatever similar). I did vacuum and vacuum analyse, but it didn't change
anything. I checked query plan and found that this plan is different every
time. Sometimes it generates plan which uses indices all the way, but
sometimes it goes for table (or even two tables) scan. Although the result
of the query always was correct whichever plan it's using. I repete - I did
not do anything between queries, but just ran the same 'explain select ...'
sequentially. The query itself is heavy, but pretty strightforward - 2
tables joined 12 times. Here it is, just in case:

select T1.MAIN_ID, T2.ORDER_F, T3.ORDER_F,
       T4.ORDER_F, T5.ORDER_F, T6.LEVEL_IND,
       T6.PRE_IND, T7.ORDER_F, T8.ORDER_F, T9.ORDER_F,
       T10.ORDER_F, T11.ORDER_F, T12.MAIN_ID
from TMP_MAIN T1, TMP_RS T2, TMP_RS T3, TMP_RS T4,
     TMP_RS T5, RSCACHE_R T6, TMP_RS T7, TMP_RS T8,
     TMP_RS T9, TMP_RS T10, TMP_RS T11, TMP_MAIN T12
where T2.ID_2 = T1.MAIN_ID
and T2.ID_TYPE = 269
and 990765472621084681 = T2.ID_1
and T4.ID_1 = T12.MAIN_ID
and T4.ID_TYPE = 143
and T3.ID_TYPE = 143
and 990765472621084681 = T3.ID_1
and T4.ID_2 = T3.ID_2
and T7.ID_1 = T12.MAIN_ID
and T7.ID_TYPE = 224
and T6.ID_TYPE = 209
and T6.LEVEL_IND != 0
and T5.ID_TYPE = 224
and 990765472621084681 = T5.ID_1
and T6.ID_1 = T5.ID_2
and T7.ID_2 = T6.ID_2
and T9.ID_1 = T12.MAIN_ID
and T9.ID_TYPE = 125
and T8.ID_TYPE = 125
and 990765472621084681 = T8.ID_1
and T9.ID_2 = T8.ID_2
and T11.ID_1 = T12.MAIN_ID
and T11.ID_TYPE = 71
and T10.ID_TYPE = 71
and 990765472621084681 = T10.ID_1
and T11.ID_2 = T10.ID_2
and T12.ID_TYPE = 286

Both tables have 500000-1000000 records. If you need more information
contact me via e-mail.

Regards.
Alexei.

Re: BUG #1878: Different execution plans for the same query.

From
Tom Lane
Date:
"Alexei" <alexei_novakov@yahoo.com> writes:
> .... Although the result
> of the query always was correct whichever plan it's using. I repete - I did
> not do anything between queries, but just ran the same 'explain select ...'
> sequentially. The query itself is heavy, but pretty strightforward - 2
> tables joined 12 times.

With 12 tables you'll be using the geqo optimizer by default, and it has
a random component to the plans by design.  It's a bit disappointing
that it doesn't consistently find plans of about the same quality
though.  Can you provide a complete test case?  (No, the bare text of
a query isn't a test case...)

            regards, tom lane

Re: BUG #1878: Different execution plans for the same query.

From
Tom Lane
Date:
Alexei Novakov <alexei-novakov@rogers.com> writes:
> Attaching the dumps of the tables. And here is the
> query:

> select T1.MAIN_ID, T6.LEVEL_IND, T12.MAIN_ID
> from TMP_1 T1, TMP_2 T2, TMP_2 T3, TMP_2 T4, TMP_2 T5,
> TMP_3 T6,
>      TMP_2 T7, TMP_2 T8, TMP_2 T9, TMP_2 T10, TMP_2
> T11, TMP_1 T12
> where T2.ID_2 = T1.MAIN_ID
> and T2.ID_TO_ID_TYPE = 201269
> and 990765472621084681 = T2.ID_1
> and T4.ID_1 = T12.MAIN_ID
> and T4.ID_TO_ID_TYPE = 201143
> and T3.ID_TO_ID_TYPE = 201143
> and 990765472621084681 = T3.ID_1
> and T4.ID_2 = T3.ID_2
> and T7.ID_1 = T12.MAIN_ID
> and T7.ID_TO_ID_TYPE = 201224
> and T6.ID_TO_ID_TYPE = 201209
> and T6.LEVEL_IND != 0
> and T5.ID_TO_ID_TYPE = 201224
> and 990765472621084681 = T5.ID_1
> and T6.ID_1 = T5.ID_2
> and T7.ID_2 = T6.ID_2
> and T9.ID_1 = T12.MAIN_ID
> and T9.ID_TO_ID_TYPE = 201125
> and T8.ID_TO_ID_TYPE = 201125
> and 990765472621084681 = T8.ID_1
> and T9.ID_2 = T8.ID_2
> and T11.ID_1 = T12.MAIN_ID
> and T11.ID_TO_ID_TYPE = 201071
> and T10.ID_TO_ID_TYPE = 201071
> and 990765472621084681 = T10.ID_1
> and T11.ID_2 = T10.ID_2
> and T12.ID_TYPE = 301286

I looked into this and found that the problem is bad rowcount estimates,
which lead the planner to make plans that are good only by chance.
Since you have enough tables in the query to exceed geqo_threshold, the
random aspect of the behavior isn't surprising --- but increasing
geqo_threshold isn't likely to solve your problem, it'll probably just
produce repeatable not-very-good behavior :-(

The reason why the rowcount estimates are so far off is that there's a
lot of correlation that the planner doesn't recognize.  For instance
you've got

    and T4.ID_TO_ID_TYPE = 201143
    and T3.ID_TO_ID_TYPE = 201143
    and T4.ID_2 = T3.ID_2

where T3 and T4 are actually the same table and so the number of rows
matching the above conditions is much higher than a naive guess would
suggest.  It's possible that the planner could do better if it had
cross-column correlation statistics, but it doesn't.

It's hard for me to tell whether any of these conditions are redundant
given your schema, but if they are I'd recommend removing the redundant
conditions --- they aren't doing much for you except fooling the
planner.

            regards, tom lane