Re: PATCH: CITEXT 2.0 v3 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: PATCH: CITEXT 2.0 v3 |
Date | |
Msg-id | 26072.1215899847@sss.pgh.pa.us Whole thread Raw |
In response to | Re: PATCH: CITEXT 2.0 v3 ("David E. Wheeler" <david@kineticode.com>) |
Responses |
Re: PATCH: CITEXT 2.0 v3
Re: PATCH: CITEXT 2.0 v3 |
List | pgsql-hackers |
There was some discussion earlier about whether the proposed regression tests for citext are suitable for use in contrib or not. After playing with them for awhile, I have to come down very firmly on the side of "not". I have these gripes: 1. The style is gratuitously different from every other regression test in the system. This is not a good thing. If it were an amazingly better way to do things, then maybe, but as far as I can tell the style pgTAP forces on you is really pretty darn poorly suited for SQL tests. You have to contort what could naturally be expressed in SQL as a table result into a scalar. Plus it's redundant with the expected-output file. 2. It's ridiculously slow; at least a factor of ten slower than doing equivalent tests directly in SQL. This is a very bad thing. Speed of regression tests matters a lot to those of us who run them a dozen times per day --- and I do not wish to discourage any developers who don't work that way from learning better habits ;-) Because of #1 and #2 I find the use of pgTAP to be a nonstarter. I have a couple of gripes about the substance of the tests as well: 3. What you are mostly testing is not the behavior of citext itself, but the behavior of the underlying strcoll function. This is pretty much doomed to failure, first because the regression tests are intended to work in multiple locales (and we are *not* giving that up for citext), and second because the behavior of strcoll isn't all that portable across platforms even given allegedly similar locale settings (as we already found out yesterday). Sadly, I think you have to give up attempts to check the interesting multibyte cases and confine yourself to tests using ASCII strings. 4. A lot of the later test cases are equally uselessly testing whether piggybacking over text functions works. The odds of ever finding anything with those tests are not distinguishable from zero (unless the underlying text function is busted, which is not your responsibility to test). So I don't see any point in putting them into the standard regression package. (What maybe *would* be useful to test, but you didn't, is whether the result of a function is considered citext rather than text.) I suggest something more like the attached as a suitable regression test. I got bored about halfway through and started to skim, so there might be a few tests toward the end of the original set that would be worth transposing into this one, but nothing jumped out at me. regards, tom lane -- -- Test citext datatype -- -- -- first, define the datatype. Turn off echoing so that expected file -- does not depend on contents of citext.sql. -- SET client_min_messages = warning; \set ECHO none \i citext.sql \set ECHO all RESET client_min_messages; -- Test the operators and indexing functions -- Test = and <>. SELECT 'a'::citext = 'a'::citext as t; SELECT 'a'::citext = 'A'::citext as t; SELECT 'a'::citext = 'A'::text as f; -- text wins the discussion SELECT 'a'::citext = 'b'::citext as f; SELECT 'a'::citext = 'ab'::citext as f; SELECT 'a'::citext <> 'ab'::citext as t; -- Test > and >= SELECT 'B'::citext > 'a'::citext as t; SELECT 'b'::citext > 'A'::citext as t; SELECT 'B'::citext > 'b'::citext as f; SELECT 'B'::citext >= 'b'::citext as t; -- Test < and <= SELECT 'a'::citext < 'B'::citext as t; SELECT 'a'::citext <= 'B'::citext as t; -- Test implicit casting. citext casts to text, but not vice-versa. SELECT 'a'::citext = 'a'::text as t; SELECT 'A'::text <> 'a'::citext as t; SELECT 'aardvark'::citext = 'aardvark'::citext as t; SELECT 'aardvark'::citext = 'aardVark'::citext as t; -- Check the citext_cmp() function explicitly. SELECT citext_cmp('aardvark'::citext, 'aardvark'::citext) as zero; SELECT citext_cmp('aardvark'::citext, 'aardVark'::citext) as zero; SELECT citext_cmp('B'::citext, 'a'::citext) as one; -- Do some tests using a table and index. CREATE TEMP TABLE try ( name citext PRIMARY KEY ); INSERT INTO try (name) VALUES ('a'), ('ab'), ('aba'), ('b'), ('ba'), ('bab'), ('AZ'); SELECT name, 'a' = name from try; SELECT name, 'a' = name from try where name = 'a'; SELECT name, 'A' = name from try; SELECT name, 'A' = name from try where name = 'A'; SELECT name, 'A' = name from try where name = 'A'; -- expected failures on duplicate key INSERT INTO try (name) VALUES ('a'); INSERT INTO try (name) VALUES ('A'); INSERT INTO try (name) VALUES ('aB'); -- Test aggregate functions and sort ordering CREATE TEMP TABLE srt ( name CITEXT ); INSERT INTO srt (name) VALUES ('aardvark'), ('AAA'), ('aba'), ('ABC'), ('abd'); -- Check the min() and max() aggregates, with and without index. set enable_seqscan = off; SELECT MIN(name) from srt; SELECT MAX(name) from srt; reset enable_seqscan; set enable_indexscan = off; SELECT MIN(name) from srt; SELECT MAX(name) from srt; reset enable_indexscan; -- Check sorting likewise set enable_seqscan = off; SELECT name FROM srt ORDER BY name; reset enable_seqscan; set enable_indexscan = off; SELECT name FROM srt ORDER BY name; reset enable_indexscan; -- Check LIKE SELECT name FROM srt WHERE name LIKE '%a%' ORDER BY name; SELECT name FROM srt WHERE name NOT LIKE '%a%' ORDER BY name; SELECT name FROM srt WHERE name LIKE '%A%' ORDER BY name; SELECT name FROM srt WHERE name NOT LIKE '%A%' ORDER BY name; -- ~ should be case-insensitive SELECT name FROM srt WHERE name ~ '%a%' ORDER BY name; SELECT name FROM srt WHERE name !~ '%a%' ORDER BY name; SELECT name FROM srt WHERE name ~ '%A%' ORDER BY name; SELECT name FROM srt WHERE name !~ '%A%' ORDER BY name;
pgsql-hackers by date: