Re: PATCH: CITEXT 2.0 v3 - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: PATCH: CITEXT 2.0 v3
Date
Msg-id A2AA4B32-A916-497C-9C2B-DC69B8BF952A@kineticode.com
Whole thread Raw
In response to Re: PATCH: CITEXT 2.0 v3  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PATCH: CITEXT 2.0 v3  ("David E. Wheeler" <david@kineticode.com>)
Re: PATCH: CITEXT 2.0 v3  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: PATCH: CITEXT 2.0 v3  ("David E. Wheeler" <david@kineticode.com>)
List pgsql-hackers
On Jul 12, 2008, at 14:57, Tom Lane wrote:

> There was some discussion earlier about whether the proposed  
> regression
> tests for citext are suitable for use in contrib or not.  After  
> playing
> with them for awhile, I have to come down very firmly on the side of
> "not".  I have these gripes:

You're nothing if not thorough, Tom.

> 1. The style is gratuitously different from every other regression  
> test
> in the system.  This is not a good thing.  If it were an amazingly
> better way to do things, then maybe, but as far as I can tell the  
> style
> pgTAP forces on you is really pretty darn poorly suited for SQL tests.
> You have to contort what could naturally be expressed in SQL as a  
> table
> result into a scalar. Plus it's redundant with the expected-output  
> file.

Sure. I wasn't aware of the existing regression test methodology when  
I wrote pgTAP and those tests. I'm fine to change them and use pgTAP  
for testing my app code, rather than PostgreSQL itself.

> 2. It's ridiculously slow; at least a factor of ten slower than doing
> equivalent tests directly in SQL.  This is a very bad thing.  Speed of
> regression tests matters a lot to those of us who run them a dozen  
> times
> per day --- and I do not wish to discourage any developers who don't
> work that way from learning better habits ;-)

Hrm. I'm wonder why it's so slow? The test functions don't really do a  
lot. Anyway, I agree that they should perform well.

> Because of #1 and #2 I find the use of pgTAP to be a nonstarter.
> I have a couple of gripes about the substance of the tests as well:
>
> 3. What you are mostly testing is not the behavior of citext itself,
> but the behavior of the underlying strcoll function.  This is pretty
> much doomed to failure, first because the regression tests are  
> intended
> to work in multiple locales (and we are *not* giving that up for
> citext), and second because the behavior of strcoll isn't all that
> portable across platforms even given allegedly similar locale settings
> (as we already found out yesterday).

Yes, I *just* ran the tests on Ubuntu and opened my mail to ask about  
the bizarre differences when I saw this message. Thanks for answering  
my question before I asked it. :-)

> Sadly, I think you have to give up
> attempts to check the interesting multibyte cases and confine yourself
> to tests using ASCII strings.

Grr. Kind of defeats the purpose. Is there no infrastructure for  
testing multibyte functionality? Are test database clusters all built  
using SQL_ASCII and the C locale?

> 4. A lot of the later test cases are equally uselessly testing whether
> piggybacking over text functions works.  The odds of ever finding
> anything with those tests are not distinguishable from zero (unless  
> the
> underlying text function is busted, which is not your responsibility  
> to
> test).  So I don't see any point in putting them into the standard
> regression package.  (What maybe *would* be useful to test, but you
> didn't, is whether the result of a function is considered citext  
> rather
> than text.)

Well, I was doing test-driven development: I wrote the tests to ensure  
that I had added the functions for CITEXT properly, and when they  
passed, I could move on. As a unit tester, it'd feel weird for me to  
drop these. I'm not testing the underlying functions; I'm making sure  
that they work properly with CITEXT.

> I suggest something more like the attached as a suitable regression
> test.  I got bored about halfway through and started to skim, so there
> might be a few tests toward the end of the original set that would be
> worth transposing into this one, but nothing jumped out at me.

Thanks! I'll work this in.

Best,

David

PS: I confirmed late yesterday that the memory leak I saw was with my  
version for 8.3, where I had my own str_tolower(). It clearly has a  
leak that I'll have to fix, but it has no bearing on the contribution  
to 8.4, which has no such leak. Thanks for running the test yourself  
to confirm.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PATCH: CITEXT 2.0 v3
Next
From: "David E. Wheeler"
Date:
Subject: Re: PATCH: CITEXT 2.0 v3