Re: PATCH: CITEXT 2.0 v3 - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: PATCH: CITEXT 2.0 v3
Date
Msg-id 7D3F0762-F324-4F9D-9ECF-02AACC894AD7@kineticode.com
Whole thread Raw
In response to Re: PATCH: CITEXT 2.0 v3  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Jul 14, 2008, at 11:49, Tom Lane wrote:

>> You don't seem to understand what I'm suggesting: I'm not talking
>> about testing strcoll. I'm talking about making sure that citext
>> *uses* strcoll.
>
> This seems pointless, as well as essentially impossible to enforce by
> black-box testing.  Nobody is likely to consider a patch that removes
> such obviously basic functionality of the module.

Never underestimate the human capacity for incompetence. One of my  
mottos.

> I think you're
> worrying about testing the wrong things --- and as evidence I'd note  
> the
> serious errors that slipped by your testing (including the fact that  
> the
> last submitted version would still claim that 'a' = 'ab').

Um, say what? I had that problem at one time, but it was when I was  
writing my own lower() function, which was shitty (I wasn't creating a  
nul-terminated string). I don't recall that being in any patch I sent  
to the list, and indeed you wrote that very test in the revised test  
script you sent in.

At any rate, I make no claims that my tests properly covered  
everything. There is a lot I didn't know to test, but I'm learning  
more all the time. That's why this code review has been so immensely  
valuable, both for me and for CITEXT.

> What we
> generally ask a regression test to do is catch portability problems
> (which is certainly a real-enough issue here, but we don't know how
> to address it well) or catch foreseeable breakage (such as someone
> introducing a cast that breaks resolution of citext function calls).
> The methodology can't catch everything, and trying to push it beyond
> what it can do is just a time sink for effort that can more usefully
> be spent other ways, such as code-reading.

I guess what I'm thinking of is not regression tests but unit tests.  
And with unit testing, one of the goals is coverage. Good coverage has  
saved my ass many times in the past, even catching changes that, in  
hindsight, should obviously never have been attempted. Perhaps it'd be  
worth thinking about creating a project for unit testing PostgreSQL in  
controlled environments? Maybe having tests that can contain proper  
conditionals?

Anyway, it seems that, given the limitations of the current testing  
system with regard to testing multibyte characters, the issue is moot.  
I'll throw in a few tests that test multibyte characters, but I'll  
comment them out and just uncomment them for individual test runs on  
my box for the purposes of my own sanity going forward.

Thanks,

David



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PATCH: CITEXT 2.0 v3
Next
From: Gregory Stark
Date:
Subject: Re: Fwd: Proposal - UUID data type