Thread: collations in shared catalogs?

collations in shared catalogs?

From

Andrew Gierth

Date:

25 February 2015, 16:41:21

So while helping someone with an unrelated issue, I did a quick query to
look for collation-dependent indexes, and was rather shocked to find
that not only are there two such in the system catalogs, both set to
"default" collation, but that one of them is in a _shared_ catalog
(pg_shseclabel).

How did that happen? And how could it possibly work?

-- 
Andrew (irc:RhodiumToad)

Re: collations in shared catalogs?

From

Tom Lane

Date:

25 February 2015, 17:08:38

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> So while helping someone with an unrelated issue, I did a quick query to
> look for collation-dependent indexes, and was rather shocked to find
> that not only are there two such in the system catalogs, both set to
> "default" collation, but that one of them is in a _shared_ catalog
> (pg_shseclabel).

> How did that happen? And how could it possibly work?

It probably doesn't, and the reason nobody has noticed is that the
security label stuff has fewer users than I have fingers (and those
people aren't using provider names that would cause anything interesting
to happen).

The most obvious fix is to change "provider" to a NAME column.

What was the other case?  We might want to add a regression test to
check for collation-dependent system indexes ...
        regards, tom lane

Re: collations in shared catalogs?

From

Andrew Gierth

Date:

25 February 2015, 17:59:54

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> So while helping someone with an unrelated issue, I did a quick>> query to look for collation-dependent indexes, and
wasrather>> shocked to find that not only are there two such in the system>> catalogs, both set to "default" collation,
butthat one of them is>> in a _shared_ catalog (pg_shseclabel).
 
>> How did that happen? And how could it possibly work?
Tom> It probably doesn't, and the reason nobody has noticed is that theTom> security label stuff has fewer users than I
havefingers (andTom> those people aren't using provider names that would cause anythingTom> interesting to happen).
 

Or possibly not mixing locales between databases.
Tom> The most obvious fix is to change "provider" to a NAME column.
Tom> What was the other case?  We might want to add a regression testTom> to check for collation-dependent system
indexes...
 

pg_seclabel (also "provider").

-- 
Andrew (irc:RhodiumToad)

Re: collations in shared catalogs?

From

Alvaro Herrera

Date:

25 February 2015, 18:14:30

Tom Lane wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> > So while helping someone with an unrelated issue, I did a quick query to
> > look for collation-dependent indexes, and was rather shocked to find
> > that not only are there two such in the system catalogs, both set to
> > "default" collation, but that one of them is in a _shared_ catalog
> > (pg_shseclabel).
> 
> > How did that happen? And how could it possibly work?
> 
> It probably doesn't, and the reason nobody has noticed is that the
> security label stuff has fewer users than I have fingers (and those
> people aren't using provider names that would cause anything interesting
> to happen).

The BDR code has recently started using security labels as a place to
store table-specific data.  That widens its use a fair bit ... and most
likely, other extensions will also start using them as soon as they
realize that it can be used for stuff other than actual security labels.

(FWIW we shouldn't have called these "security labels" but just
generically "labels" or something like that.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: collations in shared catalogs?

From

Tom Lane

Date:

25 February 2015, 18:21:26

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>>> How did that happen? And how could it possibly work?

>> It probably doesn't, and the reason nobody has noticed is that the
>> security label stuff has fewer users than I have fingers (and those
>> people aren't using provider names that would cause anything interesting
>> to happen).

> The BDR code has recently started using security labels as a place to
> store table-specific data.  That widens its use a fair bit ... and most
> likely, other extensions will also start using them as soon as they
> realize that it can be used for stuff other than actual security labels.

Yeah?  Would they be OK with redefining the provider field as "name",
or would the length limit be an issue?
        regards, tom lane

Re: collations in shared catalogs?

From

Alvaro Herrera

Date:

25 February 2015, 18:40:05

Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> >>> How did that happen? And how could it possibly work?
> 
> >> It probably doesn't, and the reason nobody has noticed is that the
> >> security label stuff has fewer users than I have fingers (and those
> >> people aren't using provider names that would cause anything interesting
> >> to happen).
> 
> > The BDR code has recently started using security labels as a place to
> > store table-specific data.  That widens its use a fair bit ... and most
> > likely, other extensions will also start using them as soon as they
> > realize that it can be used for stuff other than actual security labels.
> 
> Yeah?  Would they be OK with redefining the provider field as "name",
> or would the length limit be an issue?

Nah, it's fine.  The provider name used there is "bdr".

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: collations in shared catalogs?

From

Andres Freund

Date:

25 February 2015, 20:54:21

On 2015-02-25 12:08:32 -0500, Tom Lane wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> > So while helping someone with an unrelated issue, I did a quick query to
> > look for collation-dependent indexes, and was rather shocked to find
> > that not only are there two such in the system catalogs, both set to
> > "default" collation, but that one of them is in a _shared_ catalog
> > (pg_shseclabel).
> 
> > How did that happen? And how could it possibly work?
> 
> It probably doesn't, and the reason nobody has noticed is that the
> security label stuff has fewer users than I have fingers (and those
> people aren't using provider names that would cause anything interesting
> to happen).
> 
> The most obvious fix is to change "provider" to a NAME column.

Yea. I'm not sure why that wasn't done initially. I can't really see the
length be an issue. How about we add an error check enforcing ascii,
that'll work in the back branches?

Generally it's not the greatest idea to have non-ascii stuff in shared
catalogs...

> What was the other case?  We might want to add a regression test to
> check for collation-dependent system indexes ...

+1

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: collations in shared catalogs?

From

Tom Lane

Date:

25 February 2015, 21:00:04

Andres Freund <andres@2ndquadrant.com> writes:
> On 2015-02-25 12:08:32 -0500, Tom Lane wrote:
>> The most obvious fix is to change "provider" to a NAME column.

> Yea. I'm not sure why that wasn't done initially. I can't really see the
> length be an issue. How about we add an error check enforcing ascii,
> that'll work in the back branches?

Nope, that won't help much at all.  C vs en_US for instance is different
sort orders even with all-ASCII data.

Basically you're screwed if you've got different collations in different
databases and you put anything into pg_shseclabel ...
        regards, tom lane

Re: collations in shared catalogs?

From

Andres Freund

Date:

25 February 2015, 21:20:20

On 2015-02-25 15:59:55 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2015-02-25 12:08:32 -0500, Tom Lane wrote:
> >> The most obvious fix is to change "provider" to a NAME column.
> 
> > Yea. I'm not sure why that wasn't done initially. I can't really see the
> > length be an issue. How about we add an error check enforcing ascii,
> > that'll work in the back branches?
> 
> Nope, that won't help much at all.  C vs en_US for instance is different
> sort orders even with all-ASCII data.

Ick, yes. The restriction to a charset that's encodable in all server
encodings should be there additionally, but it's not sufficient :(

> Basically you're screwed if you've got different collations in different
> databases and you put anything into pg_shseclabel ...

Hrmpf.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: collations in shared catalogs?

From

Stephen Frost

Date:

25 February 2015, 22:47:11

* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > > Tom Lane wrote:
> > >> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> > >>> How did that happen? And how could it possibly work?
> >
> > >> It probably doesn't, and the reason nobody has noticed is that the
> > >> security label stuff has fewer users than I have fingers (and those
> > >> people aren't using provider names that would cause anything interesting
> > >> to happen).
> >
> > > The BDR code has recently started using security labels as a place to
> > > store table-specific data.  That widens its use a fair bit ... and most
> > > likely, other extensions will also start using them as soon as they
> > > realize that it can be used for stuff other than actual security labels.
> >
> > Yeah?  Would they be OK with redefining the provider field as "name",
> > or would the length limit be an issue?
>
> Nah, it's fine.  The provider name used there is "bdr".

Agreed, the provider field should be fine as a name field.  Not that I
expect it to be an issue, but I'd definitely like to keep the label
field as text as those can definitely be longer (the very simply example
included in the security label docs is over half the length of a name
field already..).  Now if we increased name to 128 characters...

/me runs and hides.
Thanks!
    Stephen

Re: collations in shared catalogs?

From

David Steele

Date:

26 February 2015, 00:54:36

On 2/25/15 5:47 PM, Stephen Frost wrote:
> * Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
>> Tom Lane wrote:
>>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>>> Tom Lane wrote:
>>>>> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>>>>>> How did that happen? And how could it possibly work?
>>>
>>>>> It probably doesn't, and the reason nobody has noticed is that the
>>>>> security label stuff has fewer users than I have fingers (and those
>>>>> people aren't using provider names that would cause anything interesting
>>>>> to happen).
>>>
>>>> The BDR code has recently started using security labels as a place to
>>>> store table-specific data.  That widens its use a fair bit ... and most
>>>> likely, other extensions will also start using them as soon as they
>>>> realize that it can be used for stuff other than actual security labels.
>>>
>>> Yeah?  Would they be OK with redefining the provider field as "name",
>>> or would the length limit be an issue?
>>
>> Nah, it's fine.  The provider name used there is "bdr".
>
> Agreed, the provider field should be fine as a name field.  Not that I
> expect it to be an issue, but I'd definitely like to keep the label
> field as text as those can definitely be longer (the very simply example
> included in the security label docs is over half the length of a name
> field already..).  Now if we increased name to 128 characters...

+1 on 128/256 character names.

> /me runs and hides.

/stands brazenly in the open and volunteers to try it if I don't get
clobbered within seconds.

--
- David Steele
david@pgmasters.net

Re: collations in shared catalogs?

From

Robert Haas

Date:

04 March 2015, 15:14:49

On Wed, Feb 25, 2015 at 7:54 PM, David Steele <david@pgmasters.net> wrote:
> +1 on 128/256 character names.
>
>> /me runs and hides.
>
> /stands brazenly in the open and volunteers to try it if I don't get
> clobbered within seconds.

I think the question is whether making lots of rows in system catalogs
better is going to have undesirable effects on (a) the size of our
initial on-disk format (i.e. how big an empty database is), (b) the
amount of memory consumed by the syscache and relcaches on workloads
that touch lots of tables/functions/whatever, or (c) CPU consumption
mostly as a result of more cache line accesses for the same operation.
If you can prove those effects are minimal, that'd be a good place to
start.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: collations in shared catalogs?

From

David Steele

Date:

04 March 2015, 15:44:49

Hi Robert,

On 3/4/15 10:14 AM, Robert Haas wrote:
> On Wed, Feb 25, 2015 at 7:54 PM, David Steele <david@pgmasters.net> wrote:
>> +1 on 128/256 character names.
>>
>>> /me runs and hides.
>>
>> /stands brazenly in the open and volunteers to try it if I don't get
>> clobbered within seconds.
>
> I think the question is whether making lots of rows in system catalogs
> better is going to have undesirable effects on (a) the size of our
> initial on-disk format (i.e. how big an empty database is), (b) the
> amount of memory consumed by the syscache and relcaches on workloads
> that touch lots of tables/functions/whatever, or (c) CPU consumption
> mostly as a result of more cache line accesses for the same operation.
> If you can prove those effects are minimal, that'd be a good place to
> start.

Thanks, that's encouraging.  I've already compiled with NAMEDATALEN=256
and verified that the only failing tests are the ones making sure that
identifier lengths are truncated or fail appropriately when they are >
63.  I'm sure lots of people have done that before and gotten the same
result.

I'm currently investigating the issues that you've identified above
since they constitute the real problem with increasing NAMEDATALEN.
Once I have some answers I'll send a proposal to hackers.

--
- David Steele
david@pgmasters.net

Re: collations in shared catalogs?

From

Bruce Momjian

Date:

30 April 2015, 15:01:12

On Wed, Feb 25, 2015 at 10:19:45PM +0100, Andres Freund wrote:
> On 2015-02-25 15:59:55 -0500, Tom Lane wrote:
> > Andres Freund <andres@2ndquadrant.com> writes:
> > > On 2015-02-25 12:08:32 -0500, Tom Lane wrote:
> > >> The most obvious fix is to change "provider" to a NAME column.
> > 
> > > Yea. I'm not sure why that wasn't done initially. I can't really see the
> > > length be an issue. How about we add an error check enforcing ascii,
> > > that'll work in the back branches?
> > 
> > Nope, that won't help much at all.  C vs en_US for instance is different
> > sort orders even with all-ASCII data.
> 
> Ick, yes. The restriction to a charset that's encodable in all server
> encodings should be there additionally, but it's not sufficient :(
> 
> > Basically you're screwed if you've got different collations in different
> > databases and you put anything into pg_shseclabel ...
> 
> Hrmpf.

Where are we on this?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: collations in shared catalogs?

From

Tom Lane

Date:

30 April 2015, 15:16:27

Bruce Momjian <bruce@momjian.us> writes:
>>>> On 2015-02-25 12:08:32 -0500, Tom Lane wrote:
>>>>> The most obvious fix is to change "provider" to a NAME column.

> Where are we on this?

Not done yet, but we should make a point of making that fix before 9.5.
Please add it to the open items page for 9.5.

I am not sure there's anything useful to be done about this in the back
branches.
        regards, tom lane

Re: collations in shared catalogs?

From

Bruce Momjian

Date:

30 April 2015, 23:16:21

On Thu, Apr 30, 2015 at 08:16:09AM -0700, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> >>>> On 2015-02-25 12:08:32 -0500, Tom Lane wrote:
> >>>>> The most obvious fix is to change "provider" to a NAME column.
> 
> > Where are we on this?
> 
> Not done yet, but we should make a point of making that fix before 9.5.
> Please add it to the open items page for 9.5.
> 
> I am not sure there's anything useful to be done about this in the back
> branches.

Done.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: collations in shared catalogs?

From

Tom Lane

Date:

18 May 2015, 23:24:08

Andres Freund <andres@2ndquadrant.com> writes:
> On 2015-02-25 12:08:32 -0500, Tom Lane wrote:
>> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>>> So while helping someone with an unrelated issue, I did a quick query to
>>> look for collation-dependent indexes, and was rather shocked to find
>>> that not only are there two such in the system catalogs, both set to
>>> "default" collation, but that one of them is in a _shared_ catalog
>>> (pg_shseclabel).
>>> How did that happen? And how could it possibly work?

>> It probably doesn't, and the reason nobody has noticed is that the
>> security label stuff has fewer users than I have fingers (and those
>> people aren't using provider names that would cause anything interesting
>> to happen).
>> 
>> The most obvious fix is to change "provider" to a NAME column.

> Yea. I'm not sure why that wasn't done initially.

OK, now I'm on the warpath, because I went to fix this and discovered
that since that discussion, somebody named Freund committed yet another
shared catalog with a collation-dependent index.  This time, at least,
we can fix it *before* it gets into the wild.

Is it okay to change pg_replication_origin.roname to type "name",
and if not what do you want to do instead?

While I'm looking at it, why in the world have roident and not just a
standard system OID column?  This catalog seems willfully ignorant of
Postgres conventions.
        regards, tom lane

Re: collations in shared catalogs?

From

Andres Freund

Date:

18 May 2015, 23:37:08

On 2015-05-18 19:23:59 -0400, Tom Lane wrote:
> OK, now I'm on the warpath, because I went to fix this and discovered
> that since that discussion, somebody named Freund committed yet another
> shared catalog with a collation-dependent index.  This time, at least,
> we can fix it *before* it gets into the wild.

Hrmpf, good point.

> Is it okay to change pg_replication_origin.roname to type "name",
> and if not what do you want to do instead?

It was turned into text after it initially was name, because of length
concerns.

Hm, just forcing a collation and restricting the input to ascii should
work, right? What I'm wondering is how we easily can do the collation
forcing part. The best seems to be to force the collation on the column
itself. We could add BKI_COLLATION(). Or we could invent a alias
'systext' or something that's intended to be used in catalogs?

> While I'm looking at it, why in the world have roident and not just a
> standard system OID column?  This catalog seems willfully ignorant of
> Postgres conventions.

There's a comment: * Needs to fit into an uint16, so we don't waste too much space in WAL * records. For this reason we
don'tuse a normal Oid column here, since * we need to handle allocation of new values manually.

I mean it could use the standard oid, but given it's allocated
differently...

Re: collations in shared catalogs?

From

Tom Lane

Date:

18 May 2015, 23:59:39

Andres Freund <andres@anarazel.de> writes:
> On 2015-05-18 19:23:59 -0400, Tom Lane wrote:
>> Is it okay to change pg_replication_origin.roname to type "name",
>> and if not what do you want to do instead?

> It was turned into text after it initially was name, because of length
> concerns.

> Hm, just forcing a collation and restricting the input to ascii should
> work, right?

I think that's fragile as can be.  Is there a *really really* good
argument why these things shouldn't be subject to identifier length
restrictions?

>> While I'm looking at it, why in the world have roident and not just a
>> standard system OID column?  This catalog seems willfully ignorant of
>> Postgres conventions.

> There's a comment:
>      * Needs to fit into an uint16, so we don't waste too much space in WAL
>      * records. For this reason we don't use a normal Oid column here, since
>      * we need to handle allocation of new values manually.

If it needs to fit into uint16, why not make it smallint?  The declaration
seems 100% misleading if it's not an OID.  Moreover, the catalog
infrastructure is failing to help you make sure the values are unique.
        regards, tom lane

Re: collations in shared catalogs?

From

Andres Freund

Date:

19 May 2015, 00:10:04

On 2015-05-18 19:59:29 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2015-05-18 19:23:59 -0400, Tom Lane wrote:
> > Hm, just forcing a collation and restricting the input to ascii should
> > work, right?
> 
> I think that's fragile as can be.  Is there a *really really* good
> argument why these things shouldn't be subject to identifier length
> restrictions?

It's maybe not absolutely strictly necessary. In fact in earlier
versions of the patch it was name. But replication solutions like bdr,
slony, whatever will have to store a bunch of values identifying a node
in there. And that's much easier if you're not constrained by 63 chars.

> >> While I'm looking at it, why in the world have roident and not just a
> >> standard system OID column?  This catalog seems willfully ignorant of
> >> Postgres conventions.
> 
> > There's a comment:
> >      * Needs to fit into an uint16, so we don't waste too much space in WAL
> >      * records. For this reason we don't use a normal Oid column here, since
> >      * we need to handle allocation of new values manually.
> 
> If it needs to fit into uint16, why not make it smallint?  The declaration
> seems 100% misleading if it's not an OID.

smallint has a smaller range. I mean, we could use a smallint and just
store unsigned values nonetheless. But that'd be somewhat ugly, although
not without precedent (pg_class.relpages).  There'll only ever be very
few rows in pg_replication_origin, so the wideness itself doesn't
matter.

One reason for leaving it a oid instead of a more fitting type is that
it'll make it much smother to increase the limit to full 32bit -
there'll be no user level change.

> Moreover, the catalog infrastructure is failing to help you make sure
> the values are unique.

Not sure what you mean? There's both a unique key and locking in place
to make sure that's not violated.

Greetings,

Andres Freund

Re: collations in shared catalogs?

From

Tom Lane

Date:

19 May 2015, 00:19:38

Andres Freund <andres@anarazel.de> writes:
> On 2015-05-18 19:59:29 -0400, Tom Lane wrote:
>> I think that's fragile as can be.  Is there a *really really* good
>> argument why these things shouldn't be subject to identifier length
>> restrictions?

> It's maybe not absolutely strictly necessary. In fact in earlier
> versions of the patch it was name. But replication solutions like bdr,
> slony, whatever will have to store a bunch of values identifying a node
> in there. And that's much easier if you're not constrained by 63 chars.

Many people rely on UUIDs being impervious to chance collisions, so
it's not clear to me why uniqueness within 63 characters is unachievable.
Even more, if you can't do it in 63, what makes you think that 100 is
better?

Also, is a length limit really more onerous than the ASCII-only
restriction you proposed?  (As an ASCII-only kind of guy, it wouldn't
bother me any; but I suspect much of the world would beg to differ.)

>> Moreover, the catalog infrastructure is failing to help you make sure
>> the values are unique.

> Not sure what you mean? There's both a unique key and locking in place
> to make sure that's not violated.

If you had both 1 and 1 + 2^20 in there, the existing unique index would
not complain, but in practice those are duplicate entries, no?  If you
make the column smallint such a case would be physically impossible.
        regards, tom lane

Re: collations in shared catalogs?

From

Andres Freund

Date:

19 May 2015, 00:30:54

On 2015-05-18 20:19:29 -0400, Tom Lane wrote:
> Many people rely on UUIDs being impervious to chance collisions, so
> it's not clear to me why uniqueness within 63 characters is unachievable.
> Even more, if you can't do it in 63, what makes you think that 100 is
> better?

Well UUIDs are also hard to manage because they're pretty much bare of
any meaning. It's much easier to understand
'slony:{node=nodename,role=master,id=someid}' or similar than
'slony:cc70ac60-fdbd-11e4-b939-0800200c9a66'.

> Also, is a length limit really more onerous than the ASCII-only
> restriction you proposed?  (As an ASCII-only kind of guy, it wouldn't
> bother me any; but I suspect much of the world would beg to differ.)

I don't think anybody is going to be too concerned about node names or
something similar being ascii only. There's already a more restrictive
naming policy in place for replication slots...

> If you had both 1 and 1 + 2^20 in there, the existing unique index
> would not complain, but in practice those are duplicate entries, no?
> If you make the column smallint such a case would be physically
> impossible.

There's an error check in place (ERROR, PROGRAM_LIMIT_EXCEEDED)
preventing it when creating a replication origin.

Greetings,

Andres Freund

Re: collations in shared catalogs?

From

Andres Freund

Date:

19 May 2015, 01:14:23

On 2015-05-18 19:59:29 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Hm, just forcing a collation and restricting the input to ascii should
> > work, right?
> 
> I think that's fragile as can be.

Hm. I think actually just forcing a collation would bring this on-par
with name, right? We don't have any guarantees about the contents of
e.g. pg_database.datname being meaningful in another database with a
different encoding. In fact even the current database may have a name
that's in a wrong encoding.

I'm right now toying with the idea of defining 'varname' as a text
equivalent that always has a C type collation, and no length
limitation. That'd generally imo be a good thing to have. A bunch of
places really don't need the fixed width type and using a variable
length type will save space. It'll also be a miniscule start twoards
allowing longer identifiers...

Greetings,

Andres Freund

Re: collations in shared catalogs?

From

Stephen Frost

Date:

19 May 2015, 01:20:33

* Andres Freund (andres@anarazel.de) wrote:
> On 2015-05-18 19:59:29 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > Hm, just forcing a collation and restricting the input to ascii should
> > > work, right?
> >
> > I think that's fragile as can be.
>
> Hm. I think actually just forcing a collation would bring this on-par
> with name, right? We don't have any guarantees about the contents of
> e.g. pg_database.datname being meaningful in another database with a
> different encoding. In fact even the current database may have a name
> that's in a wrong encoding.
>
> I'm right now toying with the idea of defining 'varname' as a text
> equivalent that always has a C type collation, and no length
> limitation. That'd generally imo be a good thing to have. A bunch of
> places really don't need the fixed width type and using a variable
> length type will save space. It'll also be a miniscule start twoards
> allowing longer identifiers...

Yes, please?

Would be great to finally make that happen...
Thanks!
    Stephen

Re: collations in shared catalogs?

From

andres@anarazel.de (Andres Freund)

Date:

19 May 2015, 01:22:41

On 2015-05-18 21:20:29 -0400, Stephen Frost wrote:
> * Andres Freund (andres@anarazel.de) wrote:
> > I'm right now toying with the idea of defining 'varname' as a text
> > equivalent that always has a C type collation, and no length
> > limitation. That'd generally imo be a good thing to have. A bunch of
> > places really don't need the fixed width type and using a variable
> > length type will save space. It'll also be a miniscule start twoards
> > allowing longer identifiers...
> 
> Yes, please?
> 
> Would be great to finally make that happen...

Uh, this would bring us there maybe 15% of the way?

Re: collations in shared catalogs?

From

Stephen Frost

Date:

19 May 2015, 01:23:11

* Andres Freund (andres@anarazel.de) wrote:
> On 2015-05-18 21:20:29 -0400, Stephen Frost wrote:
> > * Andres Freund (andres@anarazel.de) wrote:
> > > I'm right now toying with the idea of defining 'varname' as a text
> > > equivalent that always has a C type collation, and no length
> > > limitation. That'd generally imo be a good thing to have. A bunch of
> > > places really don't need the fixed width type and using a variable
> > > length type will save space. It'll also be a miniscule start twoards
> > > allowing longer identifiers...
> >
> > Yes, please?
> >
> > Would be great to finally make that happen...
>
> Uh, this would bring us there maybe 15% of the way?

I can hope?
Thanks!
    Stephen

Re: collations in shared catalogs?

From

"Greg Sabino Mullane"

Date:

19 May 2015, 01:39:38

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> It's maybe not absolutely strictly necessary. In fact in earlier
> versions of the patch it was name. But replication solutions like bdr,
> slony, whatever will have to store a bunch of values identifying a node
> in there. And that's much easier if you're not constrained by 63 chars.

That's silly. We (third-party tools) already have to work around lots 
of things constrained by namedatalen.

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201505182138
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlValBYACgkQvJuQZxSWSsiODwCfRDTNsEHKsp7rbK24lT4lApwa
X1sAn0QL33wJyn/AWT2aLL9u+Ybt+aNb
=VjvO
-----END PGP SIGNATURE-----

Re: collations in shared catalogs?

From

Tom Lane

Date:

19 May 2015, 03:22:44

Andres Freund <andres@anarazel.de> writes:
> On 2015-05-18 19:59:29 -0400, Tom Lane wrote:
>> I think that's fragile as can be.

> Hm. I think actually just forcing a collation would bring this on-par
> with name, right? We don't have any guarantees about the contents of
> e.g. pg_database.datname being meaningful in another database with a
> different encoding. In fact even the current database may have a name
> that's in a wrong encoding.

Oh, wait a minute.  I just noticed that you have
pg_replication_origin_roname_index defined to use varchar_pattern_ops.
Now, this is mildly broken: it should be text_pattern_ops.  But as far as
I can see offhand, that eliminates the collation dependency for the index.
The comparison rule is memcmp() which is not collation sensitive.

I'm inclined to think I should revert b82a7be603f1811a and instead make
the seclabel provider columns use text_pattern_ops.  That would fix
their collation problem with less of a backwards compatibility hazard.

> I'm right now toying with the idea of defining 'varname' as a text
> equivalent that always has a C type collation, and no length
> limitation.

That doesn't really address the encoding problem, so I'm not sure it
advances the state of the art particularly.
        regards, tom lane

Re: collations in shared catalogs?

From

Andres Freund

Date:

19 May 2015, 03:32:52

On 2015-05-18 23:22:33 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2015-05-18 19:59:29 -0400, Tom Lane wrote:
> >> I think that's fragile as can be.
>
> > Hm. I think actually just forcing a collation would bring this on-par
> > with name, right? We don't have any guarantees about the contents of
> > e.g. pg_database.datname being meaningful in another database with a
> > different encoding. In fact even the current database may have a name
> > that's in a wrong encoding.
>
> Oh, wait a minute.  I just noticed that you have
> pg_replication_origin_roname_index defined to use varchar_pattern_ops.
> Now, this is mildly broken: it should be text_pattern_ops.  But as far as
> I can see offhand, that eliminates the collation dependency for the index.
> The comparison rule is memcmp() which is not collation sensitive.

Hah. Right. Forgot about that. Oh Brain, where art thou.

> I'm inclined to think I should revert b82a7be603f1811a and instead make
> the seclabel provider columns use text_pattern_ops.  That would fix
> their collation problem with less of a backwards compatibility hazard.

Sounds good to me. Are you doing that or should I?

Re: collations in shared catalogs?

From

Tom Lane

Date:

19 May 2015, 03:36:23

Andres Freund <andres@anarazel.de> writes:
> On 2015-05-18 23:22:33 -0400, Tom Lane wrote:
>> I'm inclined to think I should revert b82a7be603f1811a and instead make
>> the seclabel provider columns use text_pattern_ops.  That would fix
>> their collation problem with less of a backwards compatibility hazard.

> Sounds good to me. Are you doing that or should I?

I can deal with it, but not till mañana.
        regards, tom lane