Thread: Collation mega-cleanups

Collation mega-cleanups

From

Bruce Momjian

Date:

09 May 2011, 16:10:35

Tom this collation stuff has seen more post-feature-commit cleanups than
I think any patch I remember.  Is there anything we can learn from this?

Yes, this is coming from me, who some consider to be the king of
post-commit cleanups, namely, cleaning up my own commits.

---------------------------------------------------------------------------

Tom Lane wrote:
> I just noticed that the collation patch has modified char2wchar and
> wchar2char to accept a collation OID as argument ... but it hasn't done
> anything to make those arguments actually work.  Since those functions
> depend on wcstombs and mbstowcs, which respond to LC_CTYPE and nothing
> else, this flat out does not work in non-default collations.  What's
> more, there doesn't seem to be any such thing as wcstombs_l or
> mbstowcs_l (at least my Fedora box hasn't got them), so this can't be
> fixed within the available glibc API.
> 
> Right at the moment this only affects str_tolower, str_toupper, and
> str_initcap; there are other uses of these functions in the text search
> code, but those always pass DEFAULT_COLLATION_OID.
> 
> It's possible that things are not too broken in practice, because it's
> likely that the transformations done by these functions only depend on
> the encoding indicated by LC_CTYPE, and we (try to) enforce that all
> locales used in a given database match the database encoding.  Still,
> that's a rather shaky chain of reasoning.
> 
> The complete lack of code comments on this doesn't make me any happier
> --- in fact, the comments for char2wchar and wchar2char still claim that
> they have the same API as wcstombs and mbstowcs, which can hardly be
> considered true when they don't even have the same argument lists.
> 
> Any thoughts what to do about this?
> 
>             regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Collation mega-cleanups

From

Tom Lane

Date:

09 May 2011, 17:16:24

Bruce Momjian <bruce@momjian.us> writes:
> Tom this collation stuff has seen more post-feature-commit cleanups than
> I think any patch I remember.  Is there anything we can learn from this?

The pre-commit review was obviously woefully inadequate.
        regards, tom lane

Re: Collation mega-cleanups

From

Robert Haas

Date:

09 May 2011, 17:19:07

On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Tom this collation stuff has seen more post-feature-commit cleanups than
> I think any patch I remember.  Is there anything we can learn from this?

How about "don't commit all the large patches at the end of the cycle"?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Collation mega-cleanups

From

"Ross J. Reedstrom"

Date:

10 May 2011, 14:08:47

On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote:
> On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > Tom this collation stuff has seen more post-feature-commit cleanups than
> > I think any patch I remember.  Is there anything we can learn from this?
> 
> How about "don't commit all the large patches at the end of the cycle"?

My take home from following this is: 'Even Tom can get caught in the
"just one more little change" trap' 

:-)

Ross
-- 
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
Connexions                  http://cnx.org            fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

Re: Collation mega-cleanups

From

Andres Freund

Date:

10 May 2011, 14:21:35

On Tuesday, May 10, 2011 07:08:23 PM Ross J. Reedstrom wrote:
> On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote:
> > On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > Tom this collation stuff has seen more post-feature-commit cleanups
> > > than I think any patch I remember.  Is there anything we can learn
> > > from this?
> >
> > 
> >
> > How about "don't commit all the large patches at the end of the cycle"?
> 
> My take home from following this is: 'Even Tom can get caught in the
> "just one more little change" trap' 
I don't think any of the changes from Tom deserves that categorization. 

Andres

Re: Collation mega-cleanups

From

"Ross J. Reedstrom"

Date:

10 May 2011, 14:42:56

On Tue, May 10, 2011 at 07:21:16PM +0200, Andres Freund wrote:
> On Tuesday, May 10, 2011 07:08:23 PM Ross J. Reedstrom wrote:
> > On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote:
> > > On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > > > Tom this collation stuff has seen more post-feature-commit cleanups
> > > > than I think any patch I remember.  Is there anything we can learn
> > > > from this?
> > >
> > > 
> > >
> > > How about "don't commit all the large patches at the end of the cycle"?
> > 
> > My take home from following this is: 'Even Tom can get caught in the
> > "just one more little change" trap' 
> I don't think any of the changes from Tom deserves that categorization. 

No disrespect intended, far from it. The trap is that something at seems
at a distance as relatively small can grow on closer inspection. Which I
think is exactly what Tom said (paraphrased) "The pre-commit review was
insufficent" i.e.  the remaining problems seemed little, but were not.

In addition, "little" is relative to who's doing the changes, over what
domain. Things that are little for Tom on PostgreSQL would not be so
for me. Presumably the inverse is true over other domains.

So perhaps it was more of the "This code is less ready than I thought
it was, but now that I've spent the time understanding it and the
problem, the shortest way out is forward". I think we've all been in
that swamp, at one time or another.

Ross
-- 
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
Connexions                  http://cnx.org            fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

Re: Collation mega-cleanups

From

Tom Lane

Date:

10 May 2011, 14:59:25

"Ross J. Reedstrom" <reedstrm@rice.edu> writes:
> So perhaps it was more of the "This code is less ready than I thought
> it was, but now that I've spent the time understanding it and the
> problem, the shortest way out is forward".

Yeah, exactly.  By the time I really understood how incomplete the
collation patch was, I'd done most of the work to fix it; and insisting
on backing it out of 9.1 didn't seem productive (even assuming that I
could have won that argument, which was by no means a given).

I'm still fairly troubled by the potential overhead in the form of extra
lookups during parse time, but have not had the time to try to measure
that.  Too bad we haven't got a performance-test farm.
        regards, tom lane

Re: Collation mega-cleanups

From

Peter Eisentraut

Date:

11 May 2011, 17:09:53

On mån, 2011-05-09 at 14:58 -0400, Bruce Momjian wrote:
> Tom this collation stuff has seen more post-feature-commit cleanups
> than I think any patch I remember.  Is there anything we can learn
> from this?

Don't do big patches?

Seriously, it looks pretty bad, but this is one of the biggest feature
patches in the last 5 years, it touches many places all over the system,
and there is a reason why this topic has been on the TODO list for 10
years: it's overwhelming.  I had aimed for a 75% solution: have
something that supports useful cases, that doesn't break anything if you
don't use it, and that can be expanded later.  Now maybe I only reached
70%, and maybe the baseline should have been 80%, but what we now have
is more like 107% and includes a handful of features I had explicitly
excluded from the first round.

The patch has been around for 10 months, it has been in every commit
fest, it has tests and documentation, it has been reviewed a bunch of
times, people evidently read (some of) the code, they gave feedback,
adjustments have been made (some reverted during later cleanup, go
figure), performance was questioned, performance tests were done,
adjustments were made, people told me to commit it, so I did, if people
had told me to revert it, I would have reverted it.  What can we learn
from that?  The bigger your patch, the lonelier you are.

Re: Collation mega-cleanups

From

Bruce Momjian

Date:

11 May 2011, 17:20:22

Peter Eisentraut wrote:
> from that?  The bigger your patch, the lonelier you are.

I can attest to that.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: Collation mega-cleanups

From

Tom Lane

Date:

11 May 2011, 17:43:50

Peter Eisentraut <peter_e@gmx.net> writes:
> Seriously, it looks pretty bad, but this is one of the biggest feature
> patches in the last 5 years, it touches many places all over the system,
> and there is a reason why this topic has been on the TODO list for 10
> years: it's overwhelming.

Yeah.  I did not want to press for reverting, because it seemed clear
to me that there was no way that this feature would ever get in if we
insisted that it be 100% right when committed.  My idea of "good enough"
kept moving the more I looked at the patch, though, and it's still
moving --- now I think that we really need to fix the lack of preloaded
pg_collation entries for Windows, and then get in a regression test that
runs everywhere.  So if you want to call that feature creep, go ahead.
        regards, tom lane