Thread: move collation import to backend

move collation import to backend

From
Peter Eisentraut
Date:
Currently, initdb parses locale -a output to populate pg_collation.  If
additional collations are installed in the operating system, it is not
possible to repeat this process, only by doing each step manually.  So I
propose to move this to a backend function that can be called
separately, and have initdb call that.  Running this logic in the
backend instead of initdb also makes the code simpler.  If we add other
collation providers such as ICU, initdb doesn't need to know about that
at all, because all the logic would be contained in the backend.

Here is an example:

    select pg_import_system_collations(if_not_exists => false, schema =>
'test');

(Specifying the schema also allows testing this without overwriting
pg_catalog.)

I thought about making this a top-level command (IMPORT COLLATIONS ...
?) but decided against it for now, to keep it simple.  Right now, this
is more of a refactoring.  Documentation could be added if we decide so.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: move collation import to backend

From
Andres Freund
Date:
Hi,

On 2016-10-27 21:56:53 -0400, Peter Eisentraut wrote:
> Currently, initdb parses locale -a output to populate pg_collation.  If
> additional collations are installed in the operating system, it is not
> possible to repeat this process, only by doing each step manually.  So I
> propose to move this to a backend function that can be called
> separately, and have initdb call that.  Running this logic in the
> backend instead of initdb also makes the code simpler.  If we add other
> collation providers such as ICU, initdb doesn't need to know about that
> at all, because all the logic would be contained in the backend.

That generally sounds like a good idea.  There's some questions imo:
E.g. what if previously present collations are now unavailable?

> I thought about making this a top-level command (IMPORT COLLATIONS ...
> ?) but decided against it for now, to keep it simple.

Seems ok to me.

>  
>      /*
>     * Also forbid matching an any-encoding entry.  This test of course is not
>     * backed up by the unique index, but it's not a problem since we don't
>     * support adding any-encoding entries after initdb.
>     */

Note that this isn't true anymore...

> +
> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
> +
> +Datum
> +pg_import_system_collations(PG_FUNCTION_ARGS)
> +{

Uh?

> +    bool        if_not_exists = PG_GETARG_BOOL(0);
> +    Oid         nspid = PG_GETARG_OID(1);
> +
> +    FILE       *locale_a_handle;
> +    char        localebuf[NAMEDATALEN]; /* we assume ASCII so this is fine */
> +    int            count = 0;
> +
> +    locale_a_handle = OpenPipeStream("locale -a", "r");
> +    if (locale_a_handle == NULL)
> +        ereport(ERROR,
> +                (errcode_for_file_access(),
> +                 errmsg("could not execute command \"%s\": %m",
> +                        "locale -a")));

This function needs to have !superuser permissions revoked, which it
afaics currently hasn't.


Greetings,

Andres Freund



Re: move collation import to backend

From
Peter Eisentraut
Date:
On 11/12/16 10:38 AM, Andres Freund wrote:
> E.g. what if previously present collations are now unavailable?

You get an error message when you try to use the collation.  I think
that is a different class of problems.

>>
>>      /*
>>      * Also forbid matching an any-encoding entry.  This test of course is not
>>      * backed up by the unique index, but it's not a problem since we don't
>>      * support adding any-encoding entries after initdb.
>>      */
>
> Note that this isn't true anymore...

I think this is still correct, because the collation import does not
produce any any-encoding entries (collencoding = -1).

>> +
>> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
>> +
>> +Datum
>> +pg_import_system_collations(PG_FUNCTION_ARGS)
>> +{
>
> Uh?

Required to avoid compiler warning about missing prototype.

> This function needs to have !superuser permissions revoked, which it
> afaics currently hasn't.

Done.

New patch attached (includes OID change because of conflict).

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: move collation import to backend

From
Andres Freund
Date:
On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:
> On 11/12/16 10:38 AM, Andres Freund wrote:
> >>      /*
> >>      * Also forbid matching an any-encoding entry.  This test of course is not
> >>      * backed up by the unique index, but it's not a problem since we don't
> >>      * support adding any-encoding entries after initdb.
> >>      */
> > 
> > Note that this isn't true anymore...
> 
> I think this is still correct, because the collation import does not
> produce any any-encoding entries (collencoding = -1).

Well, the comment "don't support adding any-encoding entries after
initdb." is now wrong.

> >> +
> >> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
> >> +
> >> +Datum
> >> +pg_import_system_collations(PG_FUNCTION_ARGS)
> >> +{
> > 
> > Uh?
> 
> Required to avoid compiler warning about missing prototype.

It seems not to be project style to have prototypes in the middle of the
file...

- Andres



Re: move collation import to backend

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:
>> Required to avoid compiler warning about missing prototype.

> It seems not to be project style to have prototypes in the middle of the
> file...

I agree.  Please put that in builtins.h, if you can't find any better
header for it.
        regards, tom lane



Re: move collation import to backend

From
Peter Eisentraut
Date:
On 11/29/16 2:53 PM, Andres Freund wrote:
> On 2016-11-29 12:16:37 -0500, Peter Eisentraut wrote:
>> On 11/12/16 10:38 AM, Andres Freund wrote:
>>>>      /*
>>>>      * Also forbid matching an any-encoding entry.  This test of course is not
>>>>      * backed up by the unique index, but it's not a problem since we don't
>>>>      * support adding any-encoding entries after initdb.
>>>>      */
>>>
>>> Note that this isn't true anymore...
>>
>> I think this is still correct, because the collation import does not
>> produce any any-encoding entries (collencoding = -1).
> 
> Well, the comment "don't support adding any-encoding entries after
> initdb." is now wrong.

I think there is a misunderstanding.  The comment says that we don't
support adding encodings that have collencoding = -1 after initdb.  That
is still true.  Note that the original comment as two "any"'s.  With
this patch, we would now support adding collations with collencoding <>
-1 after initdb.

> 
>>>> +
>>>> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
>>>> +
>>>> +Datum
>>>> +pg_import_system_collations(PG_FUNCTION_ARGS)
>>>> +{
>>>
>>> Uh?
>>
>> Required to avoid compiler warning about missing prototype.
> 
> It seems not to be project style to have prototypes in the middle of the
> file...

OK, will fix.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: move collation import to backend

From
Haribabu Kommi
Date:


On Thu, Dec 1, 2016 at 12:18 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

>
>>>> +
>>>> +Datum pg_import_system_collations(PG_FUNCTION_ARGS);
>>>> +
>>>> +Datum
>>>> +pg_import_system_collations(PG_FUNCTION_ARGS)
>>>> +{
>>>
>>> Uh?
>>
>> Required to avoid compiler warning about missing prototype.
>
> It seems not to be project style to have prototypes in the middle of the
> file...

OK, will fix.

Moved to next CF with "waiting on author" status.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] move collation import to backend

From
Peter Eisentraut
Date:
On 11/30/16 8:18 AM, Peter Eisentraut wrote:
>> It seems not to be project style to have prototypes in the middle of the
>> file...
> 
> OK, will fix.

Updated patch with that fix.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] move collation import to backend

From
Euler Taveira
Date:
On 18-12-2016 18:30, Peter Eisentraut wrote:
> Updated patch with that fix.
> 
Peter, I reviewed and improved your patch.

* I document the new function. Since collation is a database object, I
chose "Database Object Management Functions" section.
* I've added a check to any-encoding database because I got 'FATAL:
collation "C" already exists' on Debian 8, although, I didn't get on
CentOS 7. The problem is that Debian has two locales for C (C and
C.UTF-8) and CentOS has just one (C).
* I've added OidIsValid to test the new collation row.
* I've changed the parameter order. Schema seems more important than
if_not_exists. Also, we generally leave those boolean parameters for the
end of list. I don't turn if_not_exists optional but IMO it would be a
good idea (default = true).
* You removed some #if and #ifdef while moving things around. I put it back.
* You didn't pgident some lines of code but I'm sure you didn't for a
small patch footprint.
* I didn't test on Windows.
* As a last comment, you set cost = 100 and it seems reasonable because
it lasts 411 ms to scan/load 923 collations in my slow VM. However,
successive executions takes ~1200 ms.

I'm attaching the complete and also a patch at the top of your last patch.


-- 
   Euler Taveira                   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] move collation import to backend

From
Peter Eisentraut
Date:
On 1/9/17 10:04 PM, Euler Taveira wrote:
> On 18-12-2016 18:30, Peter Eisentraut wrote:
>> Updated patch with that fix.
>>
> Peter, I reviewed and improved your patch.
> 
> * I document the new function. Since collation is a database object, I
> chose "Database Object Management Functions" section.

OK

> * I've added a check to any-encoding database because I got 'FATAL:
> collation "C" already exists' on Debian 8, although, I didn't get on
> CentOS 7. The problem is that Debian has two locales for C (C and
> C.UTF-8) and CentOS has just one (C).

OK

> * I've added OidIsValid to test the new collation row.

OK

> * I've changed the parameter order. Schema seems more important than
> if_not_exists. Also, we generally leave those boolean parameters for the
> end of list. I don't turn if_not_exists optional but IMO it would be a
> good idea (default = true).

I put them that way because in an SQL command the "IF NOT EXISTS" comes
before the schema, but I can see how it is weird that way in a function.

> * You removed some #if and #ifdef while moving things around. I put it back.
> * You didn't pgident some lines of code but I'm sure you didn't for a
> small patch footprint.

I had left the #if in initdb, but I think your changes are better.

> I'm attaching the complete and also a patch at the top of your last patch.

Thanks.  If there are no more comments, I will proceed with that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] move collation import to backend

From
Jeff Janes
Date:
On Tue, Jan 17, 2017 at 9:05 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 1/9/17 10:04 PM, Euler Taveira wrote:
> On 18-12-2016 18:30, Peter Eisentraut wrote:
>> Updated patch with that fix.
>>
> Peter, I reviewed and improved your patch.
>
> * I document the new function. Since collation is a database object, I
> chose "Database Object Management Functions" section.

OK

> * I've added a check to any-encoding database because I got 'FATAL:
> collation "C" already exists' on Debian 8, although, I didn't get on
> CentOS 7. The problem is that Debian has two locales for C (C and
> C.UTF-8) and CentOS has just one (C).

OK

> * I've added OidIsValid to test the new collation row.

OK

> * I've changed the parameter order. Schema seems more important than
> if_not_exists. Also, we generally leave those boolean parameters for the
> end of list. I don't turn if_not_exists optional but IMO it would be a
> good idea (default = true).

I put them that way because in an SQL command the "IF NOT EXISTS" comes
before the schema, but I can see how it is weird that way in a function.

> * You removed some #if and #ifdef while moving things around. I put it back.
> * You didn't pgident some lines of code but I'm sure you didn't for a
> small patch footprint.

I had left the #if in initdb, but I think your changes are better.

> I'm attaching the complete and also a patch at the top of your last patch.

Thanks.  If there are no more comments, I will proceed with that.


With this commit, I'm getting 'make check' fail at initdb with the error:

2017-01-18 07:47:50.565 PST [43691] FATAL:  collation "aa_ER@saaho" for encoding "UTF8" already exists
2017-01-18 07:47:50.565 PST [43691] STATEMENT:  SELECT pg_import_system_collations(if_not_exists => false, schema => 'pg_catalog');

My system:

CentOS release 6.8 (Final)
gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC)

./configure > /dev/null # no options

$ locale -a|fgrep aa_ER
aa_ER
aa_ER.utf8
aa_ER.utf8@saaho
aa_ER@saaho

I have no idea what @ means in a locale, I'm just relaying the information.

Cheers,

Jeff

Re: [HACKERS] move collation import to backend

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> With this commit, I'm getting 'make check' fail at initdb with the error:

> 2017-01-18 07:47:50.565 PST [43691] FATAL:  collation "aa_ER@saaho" for
> encoding "UTF8" already exists

Yeah, so are large chunks of the buildfarm.  Having now read the patch,
I see that the problem is that it simply ignored the de-duplication
logic that existed in initdb's implementation.  That was put there
on the basis of bitter experience, as I recall.

The new code seems to think it's sufficient to do an "if not exists"
insertion when generating abbreviated names, but that's wrong, and
even if it avoided outright failures, it would be nondeterministic
(I doubt "locale -a" is guaranteed to emit locale names in any
particular order).

I think this needs to be reverted pending redesign of the de-duplication
coding.
        regards, tom lane