Thread: fixes for the Danish locale
In Danish, the sequence 'aa' is sometimes treated as a single letter which collates after 'z'. Some regression tests got into 9.5, and are still in 9.6beta3, which fail due to assuming they know how things will sort or compare. I thought the easiest way to deal with it was just to change the test data to use 'ab...' rather than 'aa...' to represent an early-collating string. With these applied, this now passes: LANG=danish make check Cheers, Jeff
Attachment
Jeff Janes <jeff.janes@gmail.com> writes: > In Danish, the sequence 'aa' is sometimes treated as a single letter > which collates after 'z'. > Some regression tests got into 9.5, and are still in 9.6beta3, which > fail due to assuming they know how things will sort or compare. Confirmed here. Will deal with it, but I wonder why we have no buildfarm members covering this ... regards, tom lane
On Thu, Jul 21, 2016 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Confirmed here. Will deal with it, but I wonder why we have no buildfarm > members covering this ... We're not going to have a build farm member for every locale the local systems support. Perhaps the build farm script should pick a random locale for each run. Either a random locale from the set on the OS or a random language from a list of locale that the regression tests are intended to be safe for. -- greg
On Thu, Jul 21, 2016 at 11:26 AM, Greg Stark <stark@mit.edu> wrote: > Perhaps the build farm script should pick a random locale for each > run. Either a random locale from the set on the OS or a random > language from a list of locale that the regression tests are intended > to be safe for. That's more or less what I did with the amcheck regression tests. -- Peter Geoghegan
Greg Stark <stark@mit.edu> writes: > On Thu, Jul 21, 2016 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Confirmed here. Will deal with it, but I wonder why we have no buildfarm >> members covering this ... > We're not going to have a build farm member for every locale the local > systems support. Probably not, but Danish seems odd enough to be worth testing. Aside from this issue, I found one in the pltcl tests. > Perhaps the build farm script should pick a random locale for each > run. Either a random locale from the set on the OS or a random > language from a list of locale that the regression tests are intended > to be safe for. Nah, we have a hard enough time with reproducibility of buildfarm results without deliberately injecting transient failures. regards, tom lane
On Thu, Jul 21, 2016 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Perhaps the build farm script should pick a random locale for each >> run. Either a random locale from the set on the OS or a random >> language from a list of locale that the regression tests are intended >> to be safe for. > > Nah, we have a hard enough time with reproducibility of buildfarm results > without deliberately injecting transient failures. It could be pseudo-random, and so deterministic per buildfarm animal. That's what I did. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > On Thu, Jul 21, 2016 at 11:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Nah, we have a hard enough time with reproducibility of buildfarm results >> without deliberately injecting transient failures. > It could be pseudo-random, and so deterministic per buildfarm animal. > That's what I did. I'm not impressed with that proposal either --- then we don't even have any control over what set of locales are getting tested. Note that there are certain locales we've deliberately chosen not to support in some regression tests (see e.g. plpython_unicode.sql), so I'm not really willing to buy into the idea that "any random locale found on a buildfarm animal should work" anyway. I'm much more interested in supporting locales that someone cares enough about to configure a buildfarm animal for. regards, tom lane
On Thu, Jul 21, 2016 at 9:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> In Danish, the sequence 'aa' is sometimes treated as a single letter >> which collates after 'z'. >> Some regression tests got into 9.5, and are still in 9.6beta3, which >> fail due to assuming they know how things will sort or compare. > > Confirmed here. Will deal with it, but I wonder why we have no buildfarm > members covering this ... > My CentOS box came with 735 locales installed, so testing all of them on a regular basis would be quite a task. And it doesn't help that many of them seem to be very slow compared to C locale. I guess the good news is that nothing I tested which was working in 9.5 is broken in 9.6, but several things which were working in 9.4 did get broken in 9.5 and still are in 9.6. The Danish fix will probably also fix the (very large) Norwegian family. The Welsh (cy_GB) apparently put 'dd' after 'f', which breaks row level security in much the same way as 'aa' does. I think that that will cover all of the ones that were working in 9.4. Does testing in other locales ever uncover bugs other than those in the tests themselves? Is it worth trying to maintain broad coverage? Cheers, Jeff
On Thu, Jul 21, 2016 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Note that there are certain locales we've deliberately chosen not to > support in some regression tests (see e.g. plpython_unicode.sql), so > I'm not really willing to buy into the idea that "any random locale found > on a buildfarm animal should work" anyway. I'm much more interested in > supporting locales that someone cares enough about to configure a > buildfarm animal for. That seems like a high standard to me. Locale rules are known to change, and are explicitly versioned by glibc, for example. -- Peter Geoghegan
On Thu, Jul 21, 2016 at 11:49 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > Does testing in other locales ever uncover bugs other than those in > the tests themselves? Is it worth trying to maintain broad coverage? Potentially, yes. The strxfrm() inconsistency issue disproportionately affected de_DE.utf8, for example. There were other locales that were affected less severely, and I think the majority were not shown to be affected at all. That being said, it probably wouldn't have caught that particular issue if we had broad coverage. It probably would catch a broken test, though. -- Peter Geoghegan
On 07/21/2016 02:26 PM, Greg Stark wrote: > On Thu, Jul 21, 2016 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Confirmed here. Will deal with it, but I wonder why we have no buildfarm >> members covering this ... > We're not going to have a build farm member for every locale the local > systems support. > > Perhaps the build farm script should pick a random locale for each > run. Either a random locale from the set on the OS or a random > language from a list of locale that the regression tests are intended > to be safe for. > I don't see why we shouldn't have a buildfarm machine that tests a very large number of locales. It takes a very lightly resourced machine like nightjar just over two minutes per locale. The list of locales to test is a setting in the config file. cheers andrew
Jeff Janes <jeff.janes@gmail.com> writes: > In Danish, the sequence 'aa' is sometimes treated as a single letter > which collates after 'z'. > Some regression tests got into 9.5, and are still in 9.6beta3, which > fail due to assuming they know how things will sort or compare. As of HEAD, "LANG=danish make check-world" passes for me, which it did not before the round of fixes I just pushed. I see that the core tests fall over in Turkish still :-( regards, tom lane
On Thu, Jul 21, 2016 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> In Danish, the sequence 'aa' is sometimes treated as a single letter >> which collates after 'z'. >> Some regression tests got into 9.5, and are still in 9.6beta3, which >> fail due to assuming they know how things will sort or compare. > > As of HEAD, "LANG=danish make check-world" passes for me, which it > did not before the round of fixes I just pushed. > > I see that the core tests fall over in Turkish still :-( Turkish has never passed (at least back to 9.0). It looks like it is in the stemming functions. I don't understand why, I would think everything other than English would be failing those if the regression tests hard-code English stemming expectations but fail to arrange for English stemming rules. Cheers, Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > On Thu, Jul 21, 2016 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I see that the core tests fall over in Turkish still :-( > Turkish has never passed (at least back to 9.0). It looks like it is > in the stemming functions. I don't understand why, I would think > everything other than English would be failing those if the regression > tests hard-code English stemming expectations but fail to arrange for > English stemming rules. It looks to me like the 'simple' dictionary assumes it can apply the lowercasing rules implied by LC_CTYPE regardless of which language it's supposedly working on. This is probably something we should improve sometime, but I doubt it's an easy change. regards, tom lane
On Thu, Jul 21, 2016 at 11:49 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Jul 21, 2016 at 9:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Jeff Janes <jeff.janes@gmail.com> writes: >>> In Danish, the sequence 'aa' is sometimes treated as a single letter >>> which collates after 'z'. >>> Some regression tests got into 9.5, and are still in 9.6beta3, which >>> fail due to assuming they know how things will sort or compare. >> >> Confirmed here. Will deal with it, but I wonder why we have no buildfarm >> members covering this ... >> > > My CentOS box came with 735 locales installed, so testing all of them > on a regular basis would be quite a task. And it doesn't help that > many of them seem to be very slow compared to C locale. > > I guess the good news is that nothing I tested which was working in > 9.5 is broken in 9.6, but several things which were working in 9.4 did > get broken in 9.5 and still are in 9.6. > > The Danish fix will probably also fix the (very large) Norwegian family. > > The Welsh (cy_GB) apparently put 'dd' after 'f', which breaks row > level security in much the same way as 'aa' does. > > I think that that will cover all of the ones that were working in 9.4. The attached patch fixes regression tests for Welsh (cy_GB), needed in 9.5 and 9.6. Cheers, Jeff
Attachment
Jeff Janes <jeff.janes@gmail.com> writes: > The attached patch fixes regression tests for Welsh (cy_GB), needed in > 9.5 and 9.6. Pushed, thanks. regards, tom lane
On 07/22/2016 03:59 AM, Jeff Janes wrote: > On Thu, Jul 21, 2016 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I see that the core tests fall over in Turkish still :-( > > Turkish has never passed (at least back to 9.0). It looks like it is > in the stemming functions. I don't understand why, I would think > everything other than English would be failing those if the regression > tests hard-code English stemming expectations but fail to arrange for > English stemming rules. If something fails for Turkish but not other languages it is usually due to the upper/lower casing rules of the dotted and the dotless I (I -> ı and İ -> i rather than most languages which have I -> i). Andreas
On Thu, Jul 21, 2016 at 03:53:45PM -0400, Andrew Dunstan wrote: > > > On 07/21/2016 02:26 PM, Greg Stark wrote: > >On Thu, Jul 21, 2016 at 5:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>Confirmed here. Will deal with it, but I wonder why we have no buildfarm > >>members covering this ... > >We're not going to have a build farm member for every locale the local > >systems support. > > > >Perhaps the build farm script should pick a random locale for each > >run. Either a random locale from the set on the OS or a random > >language from a list of locale that the regression tests are intended > >to be safe for. > > > > > I don't see why we shouldn't have a buildfarm machine that tests a very > large number of locales. It takes a very lightly resourced machine like > nightjar just over two minutes per locale. The list of locales to test is a > setting in the config file. +1. Ten animals of ~75 locales apiece would give fair per-animal runtime.
On 21/07 08.42, Jeff Janes wrote: > In Danish, the sequence 'aa' is sometimes treated as a single letter > which collates after 'z'. For the record: this is also true for Norwegian, in both locales it collates equal to 'Ã¥' which is the 29th letter of the alphabet. But 'aa' is no longer used in ordinary words, only names (in Norwegian only personal names, in Danish also place names). - Bjorn Munch