Thread: BUG #4899: Open parenthesis breaks query plan
The following bug has been logged online: Bug reference: 4899 Logged by: Peter Headland Email address: pheadland@actuate.com PostgreSQL version: 8.4.0 Operating system: Windows Description: Open parenthesis breaks query plan Details: In a moderate-size table (~400,000 rows), an equality match on an unindexed varchar column to a string that contains an open parenthesis '(' prevents the optimizer from using an obvious index. Changing the open parenthesis to another character, such as ')' allows the obvious index to be used. I have been unable to reproduce this on simple test data so far, so it is obviously fairly subtle. Abstract example of the issue: o table t has a composite index i comprising columns c1, c2, c3 o column t.c4 is not indexed Illustration of the queries: -- Full table scan SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = '('; -- Uses index i SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = ')'; I am really hoping that this defect can be found by inspection of the source, because trying to reproduce it is fast getting me nowhere. Unfortunately, the data involved are customer confidential, so I cannot provide the original table.
V2hpbGUgbm9vZGxpbmcgYXJvdW5kIHNvbWUgbW9yZSwgSSBmb3VuZCB0aGF0 IGEgY29tcGFyaXNvbiB0byAnKCknIGFsbG93cyB1c2Ugb2YgdGhlIGluZGV4 LCBhcyBkb2VzICcoYWJjKScgYW5kIGV2ZW4gJyhhKGIoYylkKWUpJy4gSXQg YXBwZWFycyB0aGF0IG1pc21hdGNoZWQgb3Blbi9jbG9zZSBwYXJlbiBwYWly cyB0cmlnZ2VyIHRoZSBidWcuIE9idmlvdXNseSBzb21ldGhpbmcgaXMgcGFy c2luZyB0aGUgc3RyaW5nIGxpdGVyYWwgYW5kIG1pc2hhbmRsaW5nIHBhcmVu dGhlc2VzLiBJIGRvbid0IHVuZGVyc3RhbmQgd2h5IHBhcmVudGhlc2VzIHNo b3VsZCBiZSBzaWduaWZpY2FudCBpbnNpZGUgYSBzdHJpbmcgbGl0ZXJhbCBp biB0aGUgZmlyc3QgcGxhY2UuDQoNCkFsc28sIGp1c3QgdG8gYmUgMTAwJSBj bGVhciwgdGhlIG9wZW4gcGFyZW4gY2FuIGJlIGFueXdoZXJlIGluIHRoZSBz dHJpbmcsIHNvIGEgY29tcGFyaXNvbiB0byAnYWJjZGVmZ2goaWprbG1ub3An IHN0aWxsIHRyaWdnZXJzIHRoZSBidWcuDQoNCi0tIA0KUGV0ZXIgSGVhZGxh bmQNCkFyY2hpdGVjdCAtIGUuUmVwb3J0cw0KQWN0dWF0ZSBDb3Jwb3JhdGlv bg0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogUGV0ZXIg SGVhZGxhbmQgDQpTZW50OiBTYXR1cmRheSwgSnVseSAwNCwgMjAwOSAxODow Mw0KVG86IHBnc3FsLWJ1Z3NAcG9zdGdyZXNxbC5vcmcNClN1YmplY3Q6IEJV RyAjNDg5OTogT3BlbiBwYXJlbnRoZXNpcyBicmVha3MgcXVlcnkgcGxhbg0K DQoNClRoZSBmb2xsb3dpbmcgYnVnIGhhcyBiZWVuIGxvZ2dlZCBvbmxpbmU6 DQoNCkJ1ZyByZWZlcmVuY2U6ICAgICAgNDg5OQ0KTG9nZ2VkIGJ5OiAgICAg ICAgICBQZXRlciBIZWFkbGFuZA0KRW1haWwgYWRkcmVzczogICAgICBwaGVh ZGxhbmRAYWN0dWF0ZS5jb20NClBvc3RncmVTUUwgdmVyc2lvbjogOC40LjAN Ck9wZXJhdGluZyBzeXN0ZW06ICAgV2luZG93cw0KRGVzY3JpcHRpb246ICAg ICAgICBPcGVuIHBhcmVudGhlc2lzIGJyZWFrcyBxdWVyeSBwbGFuDQpEZXRh aWxzOiANCg0KSW4gYSBtb2RlcmF0ZS1zaXplIHRhYmxlICh+NDAwLDAwMCBy b3dzKSwgYW4gZXF1YWxpdHkgbWF0Y2ggb24gYW4gdW5pbmRleGVkDQp2YXJj aGFyIGNvbHVtbiB0byBhIHN0cmluZyB0aGF0IGNvbnRhaW5zIGFuIG9wZW4g cGFyZW50aGVzaXMgJygnIHByZXZlbnRzDQp0aGUgb3B0aW1pemVyIGZyb20g dXNpbmcgYW4gb2J2aW91cyBpbmRleC4gQ2hhbmdpbmcgdGhlIG9wZW4gcGFy ZW50aGVzaXMgdG8NCmFub3RoZXIgY2hhcmFjdGVyLCBzdWNoIGFzICcpJyBh bGxvd3MgdGhlIG9idmlvdXMgaW5kZXggdG8gYmUgdXNlZC4gSSBoYXZlDQpi ZWVuIHVuYWJsZSB0byByZXByb2R1Y2UgdGhpcyBvbiBzaW1wbGUgdGVzdCBk YXRhIHNvIGZhciwgc28gaXQgaXMgb2J2aW91c2x5DQpmYWlybHkgc3VidGxl Lg0KDQpBYnN0cmFjdCBleGFtcGxlIG9mIHRoZSBpc3N1ZToNCg0KbyB0YWJs ZSB0IGhhcyBhIGNvbXBvc2l0ZSBpbmRleCBpIGNvbXByaXNpbmcgY29sdW1u cyBjMSwgYzIsIGMzDQoNCm8gY29sdW1uIHQuYzQgaXMgbm90IGluZGV4ZWQN Cg0KSWxsdXN0cmF0aW9uIG9mIHRoZSBxdWVyaWVzOg0KDQotLSBGdWxsIHRh YmxlIHNjYW4NClNFTEVDVCBDT1VOVCgqKSBGUk9NIHQgV0hFUkUgdC5jMSA9 IDEyMyBBTkQgdC5jNCA9ICcoJzsNCg0KLS0gVXNlcyBpbmRleCBpDQpTRUxF Q1QgQ09VTlQoKikgRlJPTSB0IFdIRVJFIHQuYzEgPSAxMjMgQU5EIHQuYzQg PSAnKSc7DQoNCkkgYW0gcmVhbGx5IGhvcGluZyB0aGF0IHRoaXMgZGVmZWN0 IGNhbiBiZSBmb3VuZCBieSBpbnNwZWN0aW9uIG9mIHRoZQ0Kc291cmNlLCBi ZWNhdXNlIHRyeWluZyB0byByZXByb2R1Y2UgaXQgaXMgZmFzdCBnZXR0aW5n IG1lIG5vd2hlcmUuDQpVbmZvcnR1bmF0ZWx5LCB0aGUgZGF0YSBpbnZvbHZl ZCBhcmUgY3VzdG9tZXIgY29uZmlkZW50aWFsLCBzbyBJIGNhbm5vdA0KcHJv dmlkZSB0aGUgb3JpZ2luYWwgdGFibGUuDQo=
On Sunday 05 July 2009 03:03:00 Peter Headland wrote: > The following bug has been logged online: > > Bug reference: 4899 > Logged by: Peter Headland > Email address: pheadland@actuate.com > PostgreSQL version: 8.4.0 > Operating system: Windows > Description: Open parenthesis breaks query plan > Details: > > In a moderate-size table (~400,000 rows), an equality match on an unindexed > varchar column to a string that contains an open parenthesis '(' prevents > the optimizer from using an obvious index. Changing the open parenthesis to > another character, such as ')' allows the obvious index to be used. I have > been unable to reproduce this on simple test data so far, so it is > obviously fairly subtle. > > Abstract example of the issue: > > o table t has a composite index i comprising columns c1, c2, c3 > > o column t.c4 is not indexed > > Illustration of the queries: > > -- Full table scan > SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = '('; > > -- Uses index i > SELECT COUNT(*) FROM t WHERE t.c1 = 123 AND t.c4 = ')'; > > I am really hoping that this defect can be found by inspection of the > source, because trying to reproduce it is fast getting me nowhere. > Unfortunately, the data involved are customer confidential, so I cannot > provide the original table. I think this is not caused by a bug but, maybe wrong, selectivity estimates. I.e. in one case the planner thinks your query will match a small enough portion of the query, so that an index will be usefull , in the other case not. Could you provide 'EXPLAIN ANALYZE' output for both queries? To make sure its not a bug directly caused by the parentheses you can do SET enable_seqscan=off; EXPLAIN ANALYZE yourquery_with_paren; in the same connection and check whether this uses an index. Andres Andres
"Peter Headland" <pheadland@actuate.com> writes: > While noodling around some more, I found that a comparison to '()' > allows use of the index, as does '(abc)' and even '(a(b(c)d)e)'. It > appears that mismatched open/close paren pairs trigger the > bug. Obviously something is parsing the string literal and mishandling > parentheses. This isn't "obvious" at all, and in fact your theory is pretty much nonsense. What I think is happening is that '(' is a sufficiently common value that the planner thinks a seqscan is superior to an indexscan for it. However, since you have not shown us EXPLAIN output (much less EXPLAIN ANALYZE output), that's just a guess. regards, tom lane
> your theory is pretty much nonsense ... > What I think is happening is that '(' is a sufficiently common value that > the planner thinks a seqscan is superior to an indexscan for it. Your theory is also "pretty much nonsense" if you read the detailed description I gave in my initial post, in which I explain that presence of a '(' character anywhere at all in the string literal triggers the problem. For example 'abc(def'. It also totally fails to explain the way that matched parentheses prevent the problem (which makes it obvious that _something_ somewhere is doing enough parsing to count parentheses). Now that I know about EXPLAIN ANALYZE, I got these (I apologise for the redactions and obfuscation, which represent the downside of self-documenting column names): "Aggregate (cost=3D534.40..534.41 rows=3D1 width=3D0) (actual time=3D0.442..0.444 rows=3D1 loops=3D1)" " Output: count(*)" " -> Bitmap Heap Scan on a_table (cost=3D9.49..534.39 rows=3D1 width=3D0) (actual time=3D0.412..0.412 rows=3D0 loops=3D1)" " Output: ... 21 columns ..." " Recheck Cond: (an_integer_column =3D 65)" " Filter: ((a_varchar_column)::text =3D 'abc(def'::text)" " -> Bitmap Index Scan on an_index (cost=3D0.00..9.49 rows=3D146 width=3D0) (actual time=3D0.118..0.118 rows=3D197 loops=3D1)" " Index Cond: (an_integer_column =3D 65)" "Total runtime: 0.611 ms" "Aggregate (cost=3D534.40..534.41 rows=3D1 width=3D0) (actual time=3D0.418..0.421 rows=3D1 loops=3D1)" " Output: count(*)" " -> Bitmap Heap Scan on a_table (cost=3D9.49..534.39 rows=3D1 width=3D0) (actual time=3D0.395..0.395 rows=3D0 loops=3D1)" " Output: ... 21 columns ..." " Recheck Cond: (an_integer_column =3D 65)" " Filter: ((a_varchar_column)::text =3D 'abc()def'::text)" " -> Bitmap Index Scan on an_index (cost=3D0.00..9.49 rows=3D146 width=3D0) (actual time=3D0.108..0.108 rows=3D197 loops=3D1)" " Index Cond: (an_integer_column =3D 65)" "Total runtime: 0.563 ms" This puzzles me, because it seems to say that the plan is the same in both cases, but the graphical display of the plan in pgAdmin III looks different for the two queries (is there some way/somewhere I can post screen grabs?). I think the issue is a bug in the way pgAdmin III parses the output from EXPLAIN. My inability to reproduce the issue with dummy data would be down to the fact I'd have to get the optimizer to choose the same plan, which is beyond my ability at this stage. I also just realized that the graphical display of the plan in pgAdmin III does not show a full table scan for the mismatched parentheses case; it shows something that looks like a variant of the index scan symbol, but with the name of the table underneath. The difference between display of the two plans is that the initial symbol with the name of the index underneath vanishes when there is an unmatched open parenthesis. I have been unable to find an explanation of the symbols used in pgAdmin III - is there such a thing anywhere? If we are agreed that the issue is a bug in pgAdmin III, please advise where I should report such things. --=20 Peter Headland Architect - e.Reports Actuate Corporation -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20 Sent: Sunday, July 05, 2009 08:39 To: Peter Headland Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #4899: Open parenthesis breaks query plan=20 "Peter Headland" <pheadland@actuate.com> writes: > While noodling around some more, I found that a comparison to '()' > allows use of the index, as does '(abc)' and even '(a(b(c)d)e)'. It > appears that mismatched open/close paren pairs trigger the > bug. Obviously something is parsing the string literal and mishandling > parentheses. This isn't "obvious" at all, and in fact your theory is pretty much nonsense. What I think is happening is that '(' is a sufficiently common value that the planner thinks a seqscan is superior to an indexscan for it. However, since you have not shown us EXPLAIN output (much less EXPLAIN ANALYZE output), that's just a guess. regards, tom lane
On Mon, Jul 6, 2009 at 5:40 PM, Peter Headland<pheadland@actuate.com> wrote: > presence > of a '(' character anywhere at all in the string literal triggers the > problem. For example 'abc(def'. Except according to that explain analyze 'abc(def' ran exactly the speed as 'abc()def'. This all seems much more likely to depend on the c1/an_integer_column value you're querying for than on the string. Keep trying different values for both columns until you find one that triggers the problem and send the explain analyze result for that. It could be that '(' was such a value yesterday but not today if autovacuum has run analyze since. -- greg http://mit.edu/~gsstark/resume.pdf
As I said further down my previous e-mail, it looks as if the optimizer is just fine, and the problem is simply a bug in the way pgAdmin III parses and displays EXPLAIN ANALYZE output in its graphical view. --=20 Peter Headland Architect - e.Reports Actuate Corporation -----Original Message----- From: gsstark@gmail.com [mailto:gsstark@gmail.com] On Behalf Of Greg Stark Sent: Monday, July 06, 2009 10:35 To: Peter Headland Cc: Tom Lane; pgsql-bugs@postgresql.org Subject: Re: BUG #4899: Open parenthesis breaks query plan On Mon, Jul 6, 2009 at 5:40 PM, Peter Headland<pheadland@actuate.com> wrote: > presence > of a '(' character anywhere at all in the string literal triggers the > problem. For example 'abc(def'. Except according to that explain analyze 'abc(def' ran exactly the speed as 'abc()def'. This all seems much more likely to depend on the c1/an_integer_column value you're querying for than on the string. Keep trying different values for both columns until you find one that triggers the problem and send the explain analyze result for that. It could be that '(' was such a value yesterday but not today if autovacuum has run analyze since. --=20 greg http://mit.edu/~gsstark/resume.pdf