"Naik, Sameer" <Sameer_Naik@bmc.com> writes:
> On Mon, May 20, 2019 at 09:37:34PM +0000, Deepak Somaiya wrote:
>> wow this is interesting!
>> @Tom, Bruce, David - Experts
>> Any idea why would changing the datatype would cause so much degradation - this is even when plan remains the same
,datais same.
I see nothing very exciting here. text equality comparison reduces to
a memcmp, while citext equality comparison is quite expensive, since
it has to case-fold both inputs before it can memcmp them.
For the given test case:
> ' -> Index Scan using i776_0_400129200_t776 on t776 (cost=0.42..12.66 rows=1 width=52) (actual
time=1187.686..5531.421rows=48 loops=1)'
> ' Index Cond: ((c400129200 = $1) AND (c400127400 = $2))'
> ' Filter: (((c400129100 <> $3) OR (c400129100 IS NULL)) AND ((c179 = $4) OR (c179 = $5) OR (c179 = $6)
OR(c179 = $7) OR (c179 = $8) OR (c179 = $9) OR (c179 = $10) OR (c179 = $11) OR (c179 = $12) OR (c179 = $13) OR (c179 =
$14)OR (c179 = $15) OR (c179 = $16) OR (c179 = $17) OR (c179 = $18) OR (c179 = $19) OR (c179 = $20) OR (c179 = $21) OR
(c179= $22) OR (c179 = $23) OR (c179 = $24) OR (c179 = $25) OR (c179 = $26) OR (c179 = $27) OR (c179 = $28) OR (c179 =
$29)OR (c179 = $30) OR (c179 = $31) OR (c179 = $32) OR (c179 = $33) OR (c179 = $34) OR (c179 = $35) OR (c179 = $36) OR
(c179= $37) OR (c179 = $38) OR (c179 = $39) OR (c179 = $40) OR (c179 = $41) OR (c179 = $42) OR (c179 = $43) OR (c179 =
$44)OR (c179 = $45) OR (c179 = $46) OR (c179 = $47) OR (c179 = $48) OR (c179 = $49) OR (c179 = $50) OR (c179 = $51)))'
> ' Rows Removed by Filter: 55322'
it's reasonable to suppose that not many of the rows are failing the
c400129100 conditions, so that in order to decide that a row doesn't
pass the filter, we are forced to perform each of the OR'd c179
comparisons. So this query did something like 48 * 55322 equality
comparisons for c179. If the cost of a citexteq evaluation is
around 2 microseconds, that'd fully explain the runtime differential.
The OP didn't say what locale or encoding he's using. Maybe switching
to some other settings would improve matters ... though if non-ASCII
case folding is a business requirement, that likely won't go far.
Or you could get rid of the need for the repetitive case-folding,
say by storing lower(c179) in a separate column and doing plain
text comparisons to pre-lowercased input values.
regards, tom lane