Making the C collation less inclined to abort abbreviation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Making the C collation less inclined to abort abbreviation
Date
Msg-id CAM3SWZTXyTCChND8B3AsXCKPsn4vYhapKf3fqHFyy+5eRPK6zA@mail.gmail.com
Whole thread Raw
Responses Re: Making the C collation less inclined to abort abbreviation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
The C collation is treated exactly the same as other collations when
considering whether the generation of abbreviated keys for text should
continue. This doesn't make much sense. With text, the big cost that
we are concerned about going to waste should abbreviated keys not
capture sufficient entropy is the cost of n strxfrm() calls. However,
the C collation doesn't use strxfrm() -- it uses memcmp(), which is
far cheaper.

With other types, like numeric and now UUID, the cost of generating an
abbreviated key is significantly lower than text when using collations
other than the C collation. Their cost models reflect this, and abort
abbreviation far less aggressively than text's, even though the
trade-off is very similar when text uses the C collation.

Attached patch fixes this inconsistency by making it significantly
less likely that abbreviation will be aborted when the C collation is
in use. The behavior with other collations is unchanged. This should
be backpatched to 9.5 as a bugfix, IMV.

--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Segfault while using an array domain
Next
From: Noah Misch
Date:
Subject: Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.