Thread: collations in shared catalogs?
So while helping someone with an unrelated issue, I did a quick query to look for collation-dependent indexes, and was rather shocked to find that not only are there two such in the system catalogs, both set to "default" collation, but that one of them is in a _shared_ catalog (pg_shseclabel). How did that happen? And how could it possibly work? -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > So while helping someone with an unrelated issue, I did a quick query to > look for collation-dependent indexes, and was rather shocked to find > that not only are there two such in the system catalogs, both set to > "default" collation, but that one of them is in a _shared_ catalog > (pg_shseclabel). > How did that happen? And how could it possibly work? It probably doesn't, and the reason nobody has noticed is that the security label stuff has fewer users than I have fingers (and those people aren't using provider names that would cause anything interesting to happen). The most obvious fix is to change "provider" to a NAME column. What was the other case? We might want to add a regression test to check for collation-dependent system indexes ... regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> So while helping someone with an unrelated issue, I did a quick>> query to look for collation-dependent indexes, and wasrather>> shocked to find that not only are there two such in the system>> catalogs, both set to "default" collation, butthat one of them is>> in a _shared_ catalog (pg_shseclabel). >> How did that happen? And how could it possibly work? Tom> It probably doesn't, and the reason nobody has noticed is that theTom> security label stuff has fewer users than I havefingers (andTom> those people aren't using provider names that would cause anythingTom> interesting to happen). Or possibly not mixing locales between databases. Tom> The most obvious fix is to change "provider" to a NAME column. Tom> What was the other case? We might want to add a regression testTom> to check for collation-dependent system indexes... pg_seclabel (also "provider"). -- Andrew (irc:RhodiumToad)
Tom Lane wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > > So while helping someone with an unrelated issue, I did a quick query to > > look for collation-dependent indexes, and was rather shocked to find > > that not only are there two such in the system catalogs, both set to > > "default" collation, but that one of them is in a _shared_ catalog > > (pg_shseclabel). > > > How did that happen? And how could it possibly work? > > It probably doesn't, and the reason nobody has noticed is that the > security label stuff has fewer users than I have fingers (and those > people aren't using provider names that would cause anything interesting > to happen). The BDR code has recently started using security labels as a place to store table-specific data. That widens its use a fair bit ... and most likely, other extensions will also start using them as soon as they realize that it can be used for stuff other than actual security labels. (FWIW we shouldn't have called these "security labels" but just generically "labels" or something like that.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >>> How did that happen? And how could it possibly work? >> It probably doesn't, and the reason nobody has noticed is that the >> security label stuff has fewer users than I have fingers (and those >> people aren't using provider names that would cause anything interesting >> to happen). > The BDR code has recently started using security labels as a place to > store table-specific data. That widens its use a fair bit ... and most > likely, other extensions will also start using them as soon as they > realize that it can be used for stuff other than actual security labels. Yeah? Would they be OK with redefining the provider field as "name", or would the length limit be an issue? regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > Tom Lane wrote: > >> Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > >>> How did that happen? And how could it possibly work? > > >> It probably doesn't, and the reason nobody has noticed is that the > >> security label stuff has fewer users than I have fingers (and those > >> people aren't using provider names that would cause anything interesting > >> to happen). > > > The BDR code has recently started using security labels as a place to > > store table-specific data. That widens its use a fair bit ... and most > > likely, other extensions will also start using them as soon as they > > realize that it can be used for stuff other than actual security labels. > > Yeah? Would they be OK with redefining the provider field as "name", > or would the length limit be an issue? Nah, it's fine. The provider name used there is "bdr". -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2015-02-25 12:08:32 -0500, Tom Lane wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > > So while helping someone with an unrelated issue, I did a quick query to > > look for collation-dependent indexes, and was rather shocked to find > > that not only are there two such in the system catalogs, both set to > > "default" collation, but that one of them is in a _shared_ catalog > > (pg_shseclabel). > > > How did that happen? And how could it possibly work? > > It probably doesn't, and the reason nobody has noticed is that the > security label stuff has fewer users than I have fingers (and those > people aren't using provider names that would cause anything interesting > to happen). > > The most obvious fix is to change "provider" to a NAME column. Yea. I'm not sure why that wasn't done initially. I can't really see the length be an issue. How about we add an error check enforcing ascii, that'll work in the back branches? Generally it's not the greatest idea to have non-ascii stuff in shared catalogs... > What was the other case? We might want to add a regression test to > check for collation-dependent system indexes ... +1 Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2015-02-25 12:08:32 -0500, Tom Lane wrote: >> The most obvious fix is to change "provider" to a NAME column. > Yea. I'm not sure why that wasn't done initially. I can't really see the > length be an issue. How about we add an error check enforcing ascii, > that'll work in the back branches? Nope, that won't help much at all. C vs en_US for instance is different sort orders even with all-ASCII data. Basically you're screwed if you've got different collations in different databases and you put anything into pg_shseclabel ... regards, tom lane
On 2015-02-25 15:59:55 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2015-02-25 12:08:32 -0500, Tom Lane wrote: > >> The most obvious fix is to change "provider" to a NAME column. > > > Yea. I'm not sure why that wasn't done initially. I can't really see the > > length be an issue. How about we add an error check enforcing ascii, > > that'll work in the back branches? > > Nope, that won't help much at all. C vs en_US for instance is different > sort orders even with all-ASCII data. Ick, yes. The restriction to a charset that's encodable in all server encodings should be there additionally, but it's not sufficient :( > Basically you're screwed if you've got different collations in different > databases and you put anything into pg_shseclabel ... Hrmpf. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote: > Tom Lane wrote: > > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > > Tom Lane wrote: > > >> Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > > >>> How did that happen? And how could it possibly work? > > > > >> It probably doesn't, and the reason nobody has noticed is that the > > >> security label stuff has fewer users than I have fingers (and those > > >> people aren't using provider names that would cause anything interesting > > >> to happen). > > > > > The BDR code has recently started using security labels as a place to > > > store table-specific data. That widens its use a fair bit ... and most > > > likely, other extensions will also start using them as soon as they > > > realize that it can be used for stuff other than actual security labels. > > > > Yeah? Would they be OK with redefining the provider field as "name", > > or would the length limit be an issue? > > Nah, it's fine. The provider name used there is "bdr". Agreed, the provider field should be fine as a name field. Not that I expect it to be an issue, but I'd definitely like to keep the label field as text as those can definitely be longer (the very simply example included in the security label docs is over half the length of a name field already..). Now if we increased name to 128 characters... /me runs and hides. Thanks! Stephen
On 2/25/15 5:47 PM, Stephen Frost wrote: > * Alvaro Herrera (alvherre@2ndquadrant.com) wrote: >> Tom Lane wrote: >>> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>>> Tom Lane wrote: >>>>> Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >>>>>> How did that happen? And how could it possibly work? >>> >>>>> It probably doesn't, and the reason nobody has noticed is that the >>>>> security label stuff has fewer users than I have fingers (and those >>>>> people aren't using provider names that would cause anything interesting >>>>> to happen). >>> >>>> The BDR code has recently started using security labels as a place to >>>> store table-specific data. That widens its use a fair bit ... and most >>>> likely, other extensions will also start using them as soon as they >>>> realize that it can be used for stuff other than actual security labels. >>> >>> Yeah? Would they be OK with redefining the provider field as "name", >>> or would the length limit be an issue? >> >> Nah, it's fine. The provider name used there is "bdr". > > Agreed, the provider field should be fine as a name field. Not that I > expect it to be an issue, but I'd definitely like to keep the label > field as text as those can definitely be longer (the very simply example > included in the security label docs is over half the length of a name > field already..). Now if we increased name to 128 characters... +1 on 128/256 character names. > /me runs and hides. /stands brazenly in the open and volunteers to try it if I don't get clobbered within seconds. -- - David Steele david@pgmasters.net
On Wed, Feb 25, 2015 at 7:54 PM, David Steele <david@pgmasters.net> wrote: > +1 on 128/256 character names. > >> /me runs and hides. > > /stands brazenly in the open and volunteers to try it if I don't get > clobbered within seconds. I think the question is whether making lots of rows in system catalogs better is going to have undesirable effects on (a) the size of our initial on-disk format (i.e. how big an empty database is), (b) the amount of memory consumed by the syscache and relcaches on workloads that touch lots of tables/functions/whatever, or (c) CPU consumption mostly as a result of more cache line accesses for the same operation. If you can prove those effects are minimal, that'd be a good place to start. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Robert, On 3/4/15 10:14 AM, Robert Haas wrote: > On Wed, Feb 25, 2015 at 7:54 PM, David Steele <david@pgmasters.net> wrote: >> +1 on 128/256 character names. >> >>> /me runs and hides. >> >> /stands brazenly in the open and volunteers to try it if I don't get >> clobbered within seconds. > > I think the question is whether making lots of rows in system catalogs > better is going to have undesirable effects on (a) the size of our > initial on-disk format (i.e. how big an empty database is), (b) the > amount of memory consumed by the syscache and relcaches on workloads > that touch lots of tables/functions/whatever, or (c) CPU consumption > mostly as a result of more cache line accesses for the same operation. > If you can prove those effects are minimal, that'd be a good place to > start. Thanks, that's encouraging. I've already compiled with NAMEDATALEN=256 and verified that the only failing tests are the ones making sure that identifier lengths are truncated or fail appropriately when they are > 63. I'm sure lots of people have done that before and gotten the same result. I'm currently investigating the issues that you've identified above since they constitute the real problem with increasing NAMEDATALEN. Once I have some answers I'll send a proposal to hackers. -- - David Steele david@pgmasters.net
On Wed, Feb 25, 2015 at 10:19:45PM +0100, Andres Freund wrote: > On 2015-02-25 15:59:55 -0500, Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > On 2015-02-25 12:08:32 -0500, Tom Lane wrote: > > >> The most obvious fix is to change "provider" to a NAME column. > > > > > Yea. I'm not sure why that wasn't done initially. I can't really see the > > > length be an issue. How about we add an error check enforcing ascii, > > > that'll work in the back branches? > > > > Nope, that won't help much at all. C vs en_US for instance is different > > sort orders even with all-ASCII data. > > Ick, yes. The restriction to a charset that's encodable in all server > encodings should be there additionally, but it's not sufficient :( > > > Basically you're screwed if you've got different collations in different > > databases and you put anything into pg_shseclabel ... > > Hrmpf. Where are we on this? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: >>>> On 2015-02-25 12:08:32 -0500, Tom Lane wrote: >>>>> The most obvious fix is to change "provider" to a NAME column. > Where are we on this? Not done yet, but we should make a point of making that fix before 9.5. Please add it to the open items page for 9.5. I am not sure there's anything useful to be done about this in the back branches. regards, tom lane
On Thu, Apr 30, 2015 at 08:16:09AM -0700, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > >>>> On 2015-02-25 12:08:32 -0500, Tom Lane wrote: > >>>>> The most obvious fix is to change "provider" to a NAME column. > > > Where are we on this? > > Not done yet, but we should make a point of making that fix before 9.5. > Please add it to the open items page for 9.5. > > I am not sure there's anything useful to be done about this in the back > branches. Done. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Andres Freund <andres@2ndquadrant.com> writes: > On 2015-02-25 12:08:32 -0500, Tom Lane wrote: >> Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >>> So while helping someone with an unrelated issue, I did a quick query to >>> look for collation-dependent indexes, and was rather shocked to find >>> that not only are there two such in the system catalogs, both set to >>> "default" collation, but that one of them is in a _shared_ catalog >>> (pg_shseclabel). >>> How did that happen? And how could it possibly work? >> It probably doesn't, and the reason nobody has noticed is that the >> security label stuff has fewer users than I have fingers (and those >> people aren't using provider names that would cause anything interesting >> to happen). >> >> The most obvious fix is to change "provider" to a NAME column. > Yea. I'm not sure why that wasn't done initially. OK, now I'm on the warpath, because I went to fix this and discovered that since that discussion, somebody named Freund committed yet another shared catalog with a collation-dependent index. This time, at least, we can fix it *before* it gets into the wild. Is it okay to change pg_replication_origin.roname to type "name", and if not what do you want to do instead? While I'm looking at it, why in the world have roident and not just a standard system OID column? This catalog seems willfully ignorant of Postgres conventions. regards, tom lane
On 2015-05-18 19:23:59 -0400, Tom Lane wrote: > OK, now I'm on the warpath, because I went to fix this and discovered > that since that discussion, somebody named Freund committed yet another > shared catalog with a collation-dependent index. This time, at least, > we can fix it *before* it gets into the wild. Hrmpf, good point. > Is it okay to change pg_replication_origin.roname to type "name", > and if not what do you want to do instead? It was turned into text after it initially was name, because of length concerns. Hm, just forcing a collation and restricting the input to ascii should work, right? What I'm wondering is how we easily can do the collation forcing part. The best seems to be to force the collation on the column itself. We could add BKI_COLLATION(). Or we could invent a alias 'systext' or something that's intended to be used in catalogs? > While I'm looking at it, why in the world have roident and not just a > standard system OID column? This catalog seems willfully ignorant of > Postgres conventions. There's a comment: * Needs to fit into an uint16, so we don't waste too much space in WAL * records. For this reason we don'tuse a normal Oid column here, since * we need to handle allocation of new values manually. I mean it could use the standard oid, but given it's allocated differently...
Andres Freund <andres@anarazel.de> writes: > On 2015-05-18 19:23:59 -0400, Tom Lane wrote: >> Is it okay to change pg_replication_origin.roname to type "name", >> and if not what do you want to do instead? > It was turned into text after it initially was name, because of length > concerns. > Hm, just forcing a collation and restricting the input to ascii should > work, right? I think that's fragile as can be. Is there a *really really* good argument why these things shouldn't be subject to identifier length restrictions? >> While I'm looking at it, why in the world have roident and not just a >> standard system OID column? This catalog seems willfully ignorant of >> Postgres conventions. > There's a comment: > * Needs to fit into an uint16, so we don't waste too much space in WAL > * records. For this reason we don't use a normal Oid column here, since > * we need to handle allocation of new values manually. If it needs to fit into uint16, why not make it smallint? The declaration seems 100% misleading if it's not an OID. Moreover, the catalog infrastructure is failing to help you make sure the values are unique. regards, tom lane
On 2015-05-18 19:59:29 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2015-05-18 19:23:59 -0400, Tom Lane wrote: > > Hm, just forcing a collation and restricting the input to ascii should > > work, right? > > I think that's fragile as can be. Is there a *really really* good > argument why these things shouldn't be subject to identifier length > restrictions? It's maybe not absolutely strictly necessary. In fact in earlier versions of the patch it was name. But replication solutions like bdr, slony, whatever will have to store a bunch of values identifying a node in there. And that's much easier if you're not constrained by 63 chars. > >> While I'm looking at it, why in the world have roident and not just a > >> standard system OID column? This catalog seems willfully ignorant of > >> Postgres conventions. > > > There's a comment: > > * Needs to fit into an uint16, so we don't waste too much space in WAL > > * records. For this reason we don't use a normal Oid column here, since > > * we need to handle allocation of new values manually. > > If it needs to fit into uint16, why not make it smallint? The declaration > seems 100% misleading if it's not an OID. smallint has a smaller range. I mean, we could use a smallint and just store unsigned values nonetheless. But that'd be somewhat ugly, although not without precedent (pg_class.relpages). There'll only ever be very few rows in pg_replication_origin, so the wideness itself doesn't matter. One reason for leaving it a oid instead of a more fitting type is that it'll make it much smother to increase the limit to full 32bit - there'll be no user level change. > Moreover, the catalog infrastructure is failing to help you make sure > the values are unique. Not sure what you mean? There's both a unique key and locking in place to make sure that's not violated. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2015-05-18 19:59:29 -0400, Tom Lane wrote: >> I think that's fragile as can be. Is there a *really really* good >> argument why these things shouldn't be subject to identifier length >> restrictions? > It's maybe not absolutely strictly necessary. In fact in earlier > versions of the patch it was name. But replication solutions like bdr, > slony, whatever will have to store a bunch of values identifying a node > in there. And that's much easier if you're not constrained by 63 chars. Many people rely on UUIDs being impervious to chance collisions, so it's not clear to me why uniqueness within 63 characters is unachievable. Even more, if you can't do it in 63, what makes you think that 100 is better? Also, is a length limit really more onerous than the ASCII-only restriction you proposed? (As an ASCII-only kind of guy, it wouldn't bother me any; but I suspect much of the world would beg to differ.) >> Moreover, the catalog infrastructure is failing to help you make sure >> the values are unique. > Not sure what you mean? There's both a unique key and locking in place > to make sure that's not violated. If you had both 1 and 1 + 2^20 in there, the existing unique index would not complain, but in practice those are duplicate entries, no? If you make the column smallint such a case would be physically impossible. regards, tom lane
On 2015-05-18 20:19:29 -0400, Tom Lane wrote: > Many people rely on UUIDs being impervious to chance collisions, so > it's not clear to me why uniqueness within 63 characters is unachievable. > Even more, if you can't do it in 63, what makes you think that 100 is > better? Well UUIDs are also hard to manage because they're pretty much bare of any meaning. It's much easier to understand 'slony:{node=nodename,role=master,id=someid}' or similar than 'slony:cc70ac60-fdbd-11e4-b939-0800200c9a66'. > Also, is a length limit really more onerous than the ASCII-only > restriction you proposed? (As an ASCII-only kind of guy, it wouldn't > bother me any; but I suspect much of the world would beg to differ.) I don't think anybody is going to be too concerned about node names or something similar being ascii only. There's already a more restrictive naming policy in place for replication slots... > If you had both 1 and 1 + 2^20 in there, the existing unique index > would not complain, but in practice those are duplicate entries, no? > If you make the column smallint such a case would be physically > impossible. There's an error check in place (ERROR, PROGRAM_LIMIT_EXCEEDED) preventing it when creating a replication origin. Greetings, Andres Freund
On 2015-05-18 19:59:29 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Hm, just forcing a collation and restricting the input to ascii should > > work, right? > > I think that's fragile as can be. Hm. I think actually just forcing a collation would bring this on-par with name, right? We don't have any guarantees about the contents of e.g. pg_database.datname being meaningful in another database with a different encoding. In fact even the current database may have a name that's in a wrong encoding. I'm right now toying with the idea of defining 'varname' as a text equivalent that always has a C type collation, and no length limitation. That'd generally imo be a good thing to have. A bunch of places really don't need the fixed width type and using a variable length type will save space. It'll also be a miniscule start twoards allowing longer identifiers... Greetings, Andres Freund
* Andres Freund (andres@anarazel.de) wrote: > On 2015-05-18 19:59:29 -0400, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > Hm, just forcing a collation and restricting the input to ascii should > > > work, right? > > > > I think that's fragile as can be. > > Hm. I think actually just forcing a collation would bring this on-par > with name, right? We don't have any guarantees about the contents of > e.g. pg_database.datname being meaningful in another database with a > different encoding. In fact even the current database may have a name > that's in a wrong encoding. > > I'm right now toying with the idea of defining 'varname' as a text > equivalent that always has a C type collation, and no length > limitation. That'd generally imo be a good thing to have. A bunch of > places really don't need the fixed width type and using a variable > length type will save space. It'll also be a miniscule start twoards > allowing longer identifiers... Yes, please? Would be great to finally make that happen... Thanks! Stephen
On 2015-05-18 21:20:29 -0400, Stephen Frost wrote: > * Andres Freund (andres@anarazel.de) wrote: > > I'm right now toying with the idea of defining 'varname' as a text > > equivalent that always has a C type collation, and no length > > limitation. That'd generally imo be a good thing to have. A bunch of > > places really don't need the fixed width type and using a variable > > length type will save space. It'll also be a miniscule start twoards > > allowing longer identifiers... > > Yes, please? > > Would be great to finally make that happen... Uh, this would bring us there maybe 15% of the way?
* Andres Freund (andres@anarazel.de) wrote: > On 2015-05-18 21:20:29 -0400, Stephen Frost wrote: > > * Andres Freund (andres@anarazel.de) wrote: > > > I'm right now toying with the idea of defining 'varname' as a text > > > equivalent that always has a C type collation, and no length > > > limitation. That'd generally imo be a good thing to have. A bunch of > > > places really don't need the fixed width type and using a variable > > > length type will save space. It'll also be a miniscule start twoards > > > allowing longer identifiers... > > > > Yes, please? > > > > Would be great to finally make that happen... > > Uh, this would bring us there maybe 15% of the way? I can hope? Thanks! Stephen
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > It's maybe not absolutely strictly necessary. In fact in earlier > versions of the patch it was name. But replication solutions like bdr, > slony, whatever will have to store a bunch of values identifying a node > in there. And that's much easier if you're not constrained by 63 chars. That's silly. We (third-party tools) already have to work around lots of things constrained by namedatalen. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201505182138 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAlValBYACgkQvJuQZxSWSsiODwCfRDTNsEHKsp7rbK24lT4lApwa X1sAn0QL33wJyn/AWT2aLL9u+Ybt+aNb =VjvO -----END PGP SIGNATURE-----
Andres Freund <andres@anarazel.de> writes: > On 2015-05-18 19:59:29 -0400, Tom Lane wrote: >> I think that's fragile as can be. > Hm. I think actually just forcing a collation would bring this on-par > with name, right? We don't have any guarantees about the contents of > e.g. pg_database.datname being meaningful in another database with a > different encoding. In fact even the current database may have a name > that's in a wrong encoding. Oh, wait a minute. I just noticed that you have pg_replication_origin_roname_index defined to use varchar_pattern_ops. Now, this is mildly broken: it should be text_pattern_ops. But as far as I can see offhand, that eliminates the collation dependency for the index. The comparison rule is memcmp() which is not collation sensitive. I'm inclined to think I should revert b82a7be603f1811a and instead make the seclabel provider columns use text_pattern_ops. That would fix their collation problem with less of a backwards compatibility hazard. > I'm right now toying with the idea of defining 'varname' as a text > equivalent that always has a C type collation, and no length > limitation. That doesn't really address the encoding problem, so I'm not sure it advances the state of the art particularly. regards, tom lane
On 2015-05-18 23:22:33 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2015-05-18 19:59:29 -0400, Tom Lane wrote: > >> I think that's fragile as can be. > > > Hm. I think actually just forcing a collation would bring this on-par > > with name, right? We don't have any guarantees about the contents of > > e.g. pg_database.datname being meaningful in another database with a > > different encoding. In fact even the current database may have a name > > that's in a wrong encoding. > > Oh, wait a minute. I just noticed that you have > pg_replication_origin_roname_index defined to use varchar_pattern_ops. > Now, this is mildly broken: it should be text_pattern_ops. But as far as > I can see offhand, that eliminates the collation dependency for the index. > The comparison rule is memcmp() which is not collation sensitive. Hah. Right. Forgot about that. Oh Brain, where art thou. > I'm inclined to think I should revert b82a7be603f1811a and instead make > the seclabel provider columns use text_pattern_ops. That would fix > their collation problem with less of a backwards compatibility hazard. Sounds good to me. Are you doing that or should I?
Andres Freund <andres@anarazel.de> writes: > On 2015-05-18 23:22:33 -0400, Tom Lane wrote: >> I'm inclined to think I should revert b82a7be603f1811a and instead make >> the seclabel provider columns use text_pattern_ops. That would fix >> their collation problem with less of a backwards compatibility hazard. > Sounds good to me. Are you doing that or should I? I can deal with it, but not till mañana. regards, tom lane