RE: Re: Generic Plans for Prepared Statement are 158155 times slower thanCustom Plans - Mailing list pgsql-performance

From Naik, Sameer
Subject RE: Re: Generic Plans for Prepared Statement are 158155 times slower thanCustom Plans
Date
Msg-id 16b3416fb88b48a382df0145bdd76f19@hou-exmbprd-03.adprod.bmc.com
Whole thread Raw
In response to Re: Generic Plans for Prepared Statement are 158155 times slower than Custom Plans  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Re: Generic Plans for Prepared Statement are 158155 timesslower than Custom Plans
List pgsql-performance
>The problem seems to be that the actual values being used for
>c400129200 and c400127400 are quite common in the dataset, so that when considering

>Filter: ... (c400129200 = '0'::citext) AND (c400127400 = 'DATASET1M'::citext)

>the planner makes a roughly correct assessment that there are a lot of such rows, so it prefers to index on the basis
ofthe giant OR clause instead, even though that's fairly expensive.  But, when considering the generic case 

>       ->  Index Scan using i776_0_400129200_t776 on t776  (cost=0.42..12.66 rows=1 width=52) (actual
time=1190.399..5544.385rows=48 loops=1) 
>             Index Cond: ((c400129200 = $1) AND (c400127400 = $2))

> it's evidently guessing that just a few rows will match the index condition (no more than about 3 given the cost
number),making this plan look much cheaper, so it goes with this plan.  I wonder what the actual distribution of those
keysis. 

Distribution of the keys c400129200 and c400127400 .

The distribution of c400129200 is as follows-
In entire table having 110743 records, there are 55370 records for which the value of c400129200 is 0. For each of the
remaining55,373 records the value of c400129200 is distinct. 


The distribution of c400127400 is as follows-
In entire table having 110743 records, there are 55370 records for which the value of c400127400 is   'DATASET1M' . For
remaining55,373 records the value of c400127400 the value is same and is ' 'DATASET2M'  . 


-Thanks and Regards,
Sameer Naik

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, April 29, 2019 8:06 PM
To: Bruce Momjian <bruce@momjian.us>
Cc: Naik, Sameer <Sameer_Naik@bmc.com>; pgsql-performance@lists.postgresql.org
Subject: [EXTERNAL] Re: Generic> Plans for Prepared Statement are 158155 times slower than Custom Plans

Bruce Momjian <bruce@momjian.us> writes:
> On Mon, Apr 29, 2019 at 10:36:20AM +0000, Naik, Sameer wrote:
>> Since Postgres 9.2, for prepared statements, the CBO automatically
>> switches from Custom Plan to Generic plan on the sixth iteration
>> (reference backend/ utils/cache/plancache.c).

> This is not totally true.

Yeah, that's a pretty inaccurate statement of the behavior.

The problem seems to be that the actual values being used for
c400129200 and c400127400 are quite common in the dataset, so that when considering

Filter: ... (c400129200 = '0'::citext) AND (c400127400 = 'DATASET1M'::citext)

the planner makes a roughly correct assessment that there are a lot of such rows, so it prefers to index on the basis
ofthe giant OR clause instead, even though that's fairly expensive.  But, when considering the generic case 

       ->  Index Scan using i776_0_400129200_t776 on t776  (cost=0.42..12.66 rows=1 width=52) (actual
time=1190.399..5544.385rows=48 loops=1) 
             Index Cond: ((c400129200 = $1) AND (c400127400 = $2))

it's evidently guessing that just a few rows will match the index condition (no more than about 3 given the cost
number),making this plan look much cheaper, so it goes with this plan.  I wonder what the actual distribution of those
keysis. 


In v10 and later, it's quite possible that creating extended stats on the combination of those two columns would
producea better estimate.  Won't help OP on 9.6, though. 

This isn't the first time we've seen a plan-choice failure of this sort.
I've wondered if we should make the plancache simply disbelieve generic cost estimates that are actually cheaper than
thecustom plans, on the grounds that they must be estimation errors.  In principle a generic plan could never really be
betterthan a custom plan; so if it looks that way on a cost basis, what that probably means is that the actual
parametervalues are outliers of some sort (e.g. extremely common), and the custom plan "knows" that it's going to be
takinga hit from that, but the generic plan doesn't.  In this sort of situation, going with the generic plan could be
reallydisastrous, which is exactly what the OP is seeing (and what we've seen reported before). 

However, I'm not sure how to tune this idea so that it doesn't end up rejecting perfectly good generic plans.  It's
likelythat there will be some variation in the cost estimates between the generic and specific cases, even if the plan
structureis exactly the same; and that variation could go in either direction. 

            regards, tom lane



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Generic Plans for Prepared Statement are 158155 times slower than Custom Plans
Next
From: Peter Billen
Date:
Subject: Failure to reordering in case of a lateral join in combination with aleft join (not inner join) resulting in suboptimal nested loop plan