Re: ICU locale validation / canonicalization - Mailing list pgsql-hackers

From Noah Misch
Subject Re: ICU locale validation / canonicalization
Date
Msg-id 20230701173132.GC2764235@rfd.leadboat.com
Whole thread Raw
In response to Re: ICU locale validation / canonicalization  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: ICU locale validation / canonicalization
List pgsql-hackers
On Sat, May 20, 2023 at 10:19:30AM -0700, Jeff Davis wrote:
> On Tue, 2023-05-02 at 07:29 -0700, Noah Misch wrote:
> > On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote:
> > > On 30.03.23 04:33, Jeff Davis wrote:
> > > > Attached is a new version of the final patch, which performs
> > > > canonicalization. I'm not 100% sure that it's wanted, but it
> > > > still
> > > > seems like a good idea to get the locales into a standard format
> > > > in the
> > > > catalogs, and if a lot more people start using ICU in v16
> > > > (because it's
> > > > the default), then it would be a good time to do it. But perhaps
> > > > there
> > > > are risks?
> > > 
> > > I say, let's do it.
> > 
> > The following is not cause for postgresql.git changes at this time,
> > but I'm
> > sharing it in case it saves someone else the study effort.  Commit
> > ea1db8a
> > ("Canonicalize ICU locale names to language tags.") slowed buildfarm
> > member
> > hoverfly, but that disappears if I drop debug_parallel_query from its
> > config.
> > Typical end-to-end duration rose from 2h5m to 2h55m.  Most-affected
> > were
> > installcheck runs, which rose from 11m to 19m.  (The "check" stage
> > uses
> > NO_LOCALE=1, so it changed less.)  From profiles, my theory is that
> > each of
> > the many parallel workers burns notable CPU and I/O opening its ICU
> > collator
> > for the first time.
> 
> I didn't repro the overall test timings (mine is ~1m40s compared to
> ~11-19m on hoverfly) but I think a microbenchmark on the ICU calls
> showed a possible cause.
> 
> I ran open in a loop 10M times on the requested locale. The root locale
> ("und"[1], "root" and "") take about 1.3s to open 10M times; simple
> locales like 'en' and 'fr-CA' and 'de-DE' are all a little shower at
> 3.3s.
> 
> Unrecognized locales like "xyz" take about 10 times as long: 13s to
> open 10M times, presumably to perform the fallback logic that
> ultimately opens the root locale. Not sure if 10X slower in the open
> path is enough to explain the overall test slowdown.
> 
> My guess is that the ICU locale for these tests is not recognized, or
> is some other locale that opens slowly. Can you tell me the actual
> daticulocale?

As of commit b8c3f6d, InstallCheck-C got daticulocale=en-US-u-va-posix.  Check
got daticulocale=NULL.

(The machine in question was unusable for PostgreSQL from 2023-05-12 to
2023-06-30, due to https://stackoverflow.com/q/76369660/16371536.  That
delayed my response.)



pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?
Next
From: Joe Conway
Date:
Subject: Re: RFC: pg_stat_logmsg