Re: BUG #4899: Open parenthesis breaks query plan - Mailing list pgsql-bugs

From Peter Headland
Subject Re: BUG #4899: Open parenthesis breaks query plan
Date
Msg-id 71F491F5DA99604A80DE49424BF3D02B0C088D4D@exchange8.actuate.com
Whole thread Raw
In response to Re: BUG #4899: Open parenthesis breaks query plan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #4899: Open parenthesis breaks query plan  (Greg Stark <gsstark@mit.edu>)
List pgsql-bugs
> your theory is pretty much nonsense
...
> What I think is happening is that '(' is a sufficiently common value
that
> the planner thinks a seqscan is superior to an indexscan for it.

Your theory is also "pretty much nonsense" if you read the detailed
description I gave in my initial post, in which I explain that presence
of a '(' character anywhere at all in the string literal triggers the
problem. For example 'abc(def'. It also totally fails to explain the way
that matched parentheses prevent the problem (which makes it obvious
that _something_ somewhere is doing enough parsing to count
parentheses).

Now that I know about EXPLAIN ANALYZE, I got these (I apologise for the
redactions and obfuscation, which represent the downside of
self-documenting column names):


"Aggregate  (cost=3D534.40..534.41 rows=3D1 width=3D0) (actual
time=3D0.442..0.444 rows=3D1 loops=3D1)"
"  Output: count(*)"
"  ->  Bitmap Heap Scan on a_table  (cost=3D9.49..534.39 rows=3D1 width=3D0)
(actual time=3D0.412..0.412 rows=3D0 loops=3D1)"
"        Output: ... 21 columns ..."
"        Recheck Cond: (an_integer_column =3D 65)"
"        Filter: ((a_varchar_column)::text =3D 'abc(def'::text)"
"        ->  Bitmap Index Scan on an_index  (cost=3D0.00..9.49 rows=3D146
width=3D0) (actual time=3D0.118..0.118 rows=3D197 loops=3D1)"
"              Index Cond: (an_integer_column =3D 65)"
"Total runtime: 0.611 ms"



"Aggregate  (cost=3D534.40..534.41 rows=3D1 width=3D0) (actual
time=3D0.418..0.421 rows=3D1 loops=3D1)"
"  Output: count(*)"
"  ->  Bitmap Heap Scan on a_table  (cost=3D9.49..534.39 rows=3D1 width=3D0)
(actual time=3D0.395..0.395 rows=3D0 loops=3D1)"
"        Output: ... 21 columns ..."
"        Recheck Cond: (an_integer_column =3D 65)"
"        Filter: ((a_varchar_column)::text =3D 'abc()def'::text)"
"        ->  Bitmap Index Scan on an_index  (cost=3D0.00..9.49 rows=3D146
width=3D0) (actual time=3D0.108..0.108 rows=3D197 loops=3D1)"
"              Index Cond: (an_integer_column =3D 65)"
"Total runtime: 0.563 ms"


This puzzles me, because it seems to say that the plan is the same in
both cases, but the graphical display of the plan in pgAdmin III looks
different for the two queries (is there some way/somewhere I can post
screen grabs?). I think the issue is a bug in the way pgAdmin III parses
the output from EXPLAIN. My inability to reproduce the issue with dummy
data would be down to the fact I'd have to get the optimizer to choose
the same plan, which is beyond my ability at this stage.

I also just realized that the graphical display of the plan in pgAdmin
III does not show a full table scan for the mismatched parentheses case;
it shows something that looks like a variant of the index scan symbol,
but with the name of the table underneath. The difference between
display of the two plans is that the initial symbol with the name of the
index underneath vanishes when there is an unmatched open parenthesis. I
have been unable to find an explanation of the symbols used in pgAdmin
III - is there such a thing anywhere?

If we are agreed that the issue is a bug in pgAdmin III, please advise
where I should report such things.

--=20
Peter Headland
Architect - e.Reports
Actuate Corporation


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Sunday, July 05, 2009 08:39
To: Peter Headland
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #4899: Open parenthesis breaks query plan=20

"Peter Headland" <pheadland@actuate.com> writes:
> While noodling around some more, I found that a comparison to '()'
> allows use of the index, as does '(abc)' and even '(a(b(c)d)e)'. It
> appears that mismatched open/close paren pairs trigger the
> bug. Obviously something is parsing the string literal and mishandling
> parentheses.

This isn't "obvious" at all, and in fact your theory is pretty much
nonsense.  What I think is happening is that '(' is a sufficiently
common value that the planner thinks a seqscan is superior to an
indexscan for it.  However, since you have not shown us EXPLAIN output
(much less EXPLAIN ANALYZE output), that's just a guess.

            regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: TH to_char modifier doesn't work with HH12
Next
From: Greg Stark
Date:
Subject: Re: BUG #4899: Open parenthesis breaks query plan