Thread: Re: [SQL] Column name's length
Zalman Stern <zalman@netcom.com> writes: > Here are the two diffs that up the "name size" from 32 characters to 256 > characters. (Once I get bit, I try to fix things real good so I don't get > bit again :-)) > ----- > diff postgresql-6.4.2/src/include/postgres_ext.h postgres-build/src/include/postgres_ext.h > 34c34 > < #define NAMEDATALEN 32 > --- >> #define NAMEDATALEN 256 > 37c37 > < #define OIDNAMELEN 36 > --- >> #define OIDNAMELEN 260 > ----- > diff postgresql-6.4.2/src/include/storage/buf_internals.h postgres-build/src/include/storage/buf_internals.h > 87c87 > < #define PADDED_SBUFDESC_SIZE 128 > --- >> #define PADDED_SBUFDESC_SIZE 1024 > ----- It'd probably be worthwhile to move NAMEDATALEN to config.h and make the other two symbols be computed off NAMEDATALEN. Any objections if I sneak that change into 6.5, or is it too close to being a "new feature"? regards, tom lane
> Zalman Stern <zalman@netcom.com> writes: > > Here are the two diffs that up the "name size" from 32 characters to 256 > > characters. (Once I get bit, I try to fix things real good so I don't get > > bit again :-)) > > ----- > > diff postgresql-6.4.2/src/include/postgres_ext.h postgres-build/src/include/postgres_ext.h > > 34c34 > > < #define NAMEDATALEN 32 > > --- > >> #define NAMEDATALEN 256 > > 37c37 > > < #define OIDNAMELEN 36 > > --- > >> #define OIDNAMELEN 260 > > ----- > > diff postgresql-6.4.2/src/include/storage/buf_internals.h postgres-build/src/include/storage/buf_internals.h > > 87c87 > > < #define PADDED_SBUFDESC_SIZE 128 > > --- > >> #define PADDED_SBUFDESC_SIZE 1024 > > ----- > > It'd probably be worthwhile to move NAMEDATALEN to config.h and make the > other two symbols be computed off NAMEDATALEN. Any objections if I > sneak that change into 6.5, or is it too close to being a "new feature"? Don't they have to be visible to outside apps, so it is in postgres_ext.h? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> It'd probably be worthwhile to move NAMEDATALEN to config.h and make the >> other two symbols be computed off NAMEDATALEN. Any objections if I >> sneak that change into 6.5, or is it too close to being a "new feature"? > Don't they have to be visible to outside apps, so it is in postgres_ext.h? Good point --- I was thinking that postgres_ext.h includes config.h, but I see it ain't so. You're right, those definitions must stay where they are. Still, I wonder why OIDNAMELEN isn't just defined as (NAMEDATALEN+sizeof(Oid)) rather than putting a comment to that effect. I will check the uses and see if that is a safe change or not. regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> It'd probably be worthwhile to move NAMEDATALEN to config.h and make the > >> other two symbols be computed off NAMEDATALEN. Any objections if I > >> sneak that change into 6.5, or is it too close to being a "new feature"? > > > Don't they have to be visible to outside apps, so it is in postgres_ext.h? > > Good point --- I was thinking that postgres_ext.h includes config.h, > but I see it ain't so. You're right, those definitions must stay where > they are. > > Still, I wonder why OIDNAMELEN isn't just defined as > (NAMEDATALEN+sizeof(Oid)) rather than putting a comment to that effect. > I will check the uses and see if that is a safe change or not. Yes, probably should be changed. The old code did some fancy sed with it, so maybe it had to be a real number back then, or perhaps initdb pulls it from the file. Not sure. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Tue, 1 Jun 1999, Bruce Momjian wrote: > > Zalman Stern <zalman@netcom.com> writes: > > > Here are the two diffs that up the "name size" from 32 characters to 256 > > > characters. (Once I get bit, I try to fix things real good so I don't get > > > bit again :-)) > > > ----- > > > diff postgresql-6.4.2/src/include/postgres_ext.h postgres-build/src/include/postgres_ext.h > > > 34c34 > > > < #define NAMEDATALEN 32 > > > --- > > >> #define NAMEDATALEN 256 > > > 37c37 > > > < #define OIDNAMELEN 36 > > > --- > > >> #define OIDNAMELEN 260 > > > ----- > > > diff postgresql-6.4.2/src/include/storage/buf_internals.h postgres-build/src/include/storage/buf_internals.h > > > 87c87 > > > < #define PADDED_SBUFDESC_SIZE 128 > > > --- > > >> #define PADDED_SBUFDESC_SIZE 1024 > > > ----- > > > > It'd probably be worthwhile to move NAMEDATALEN to config.h and make the > > other two symbols be computed off NAMEDATALEN. Any objections if I > > sneak that change into 6.5, or is it too close to being a "new feature"? > > Don't they have to be visible to outside apps, so it is in postgres_ext.h? We would need to have some way of getting at it from the client - the DatabaseMetaData method getColumnNameLength() would need to know about this, and we can't refer to the C header files. Peter -- Peter T Mount peter@retep.org.uk Main Homepage: http://www.retep.org.uk PostgreSQL JDBC Faq: http://www.retep.org.uk/postgresJava PDF Generator: http://www.retep.org.uk/pdf
We've seen this table-name-plus-column-name-too-long problem before, and I'm sure we're going to keep hearing about it until we fix it somehow. Messing with NAMEDATALEN is probably not a very useful answer for the average user, given the compatibility problems it creates. How about something like this: if the code finds that the names are too long when forming an implicit index name, it truncates the names to fit, and you are OK as long as the truncated name is unique. For example create table averylongtablename (averylongfieldname serial); would truncate the input names to produce something like averylongtable_averylongfie_keyaverylongtable_averylongfie_seq and you'd only get a failure if those indexes/sequences already existed. (Truncating both names as shown above, not just the field name, should reduce the probability of collisions.) You could even imagine trying a few different possibilities in order to find an unused name, but that worries me. I'd rather that it were completely predictable what name would be used for a given key, and if it depends on what already exists then it wouldn't be so predictable. But there's nothing unpredictable about truncation to fit a known length. This is obviously not a 100% solution, since there's a risk of name collisions (averylongfieldname1 and averylongfieldname2) but it's probably a 95% solution, and it wouldn't take much work or risk. Comments? Objections? I think I could argue that this is a bug fix and deserves to be slipped into 6.5 ;-) regards, tom lane
> How about something like this: if the code finds that the names are > too long when forming an implicit index name, it truncates the names > to fit, and you are OK as long as the truncated name is unique. > For example > > create table averylongtablename (averylongfieldname serial); > > would truncate the input names to produce something like > > averylongtable_averylongfie_key > averylongtable_averylongfie_seq > > and you'd only get a failure if those indexes/sequences already existed. > (Truncating both names as shown above, not just the field name, > should reduce the probability of collisions.) This only partially solves the problem and can introduce bugs into code which is only reading from a database. When someone is setting up the database to work on the system, they'll in theory get a failure so they know it won't work. This really isn't true for our software though because we have functions which dynamically query a table to see what columns it has. In theory two queries for different longnames can resolve to the same column name. It is also a backwards compatibility hassle if you ever want to increase the number of significant characters in the name. This is because the existing database only knows the first 32 characters and *must* ignore anything after that in lookups. You would have to keep track of which names are "old style" and which are new. Why set yourself up like that? -Z-
> How about something like this: if the code finds that the names are > too long when forming an implicit index name, it truncates the names > to fit, and you are OK as long as the truncated name is unique. > Comments? Objections? I think I could argue that this is a bug fix > and deserves to be slipped into 6.5 ;-) I understand some folks think this is a problem, but have been reluctant to include a "randomizer" in the created index name since it would make the index name less clearly predictable. May as well use something like "idx_<procid>_<timestamp>" or somesuch... No real objection though, other than aesthetics. And those only count for so much... - Tom -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
> This is obviously not a 100% solution, since there's a risk of name > collisions (averylongfieldname1 and averylongfieldname2) but it's > probably a 95% solution, and it wouldn't take much work or risk. > > Comments? Objections? I think I could argue that this is a bug fix > and deserves to be slipped into 6.5 ;-) Trying to slip it in as a bug fix. Sounds like me, Tom. :-) -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Zalman Stern <zalman@netcom.com> writes: >> How about something like this: if the code finds that the names are >> too long when forming an implicit index name, it truncates the names >> to fit, and you are OK as long as the truncated name is unique. > This only partially solves the problem and can introduce bugs into code > which is only reading from a database. When someone is setting up the > database to work on the system, they'll in theory get a failure so they > know it won't work. This really isn't true for our software though because > we have functions which dynamically query a table to see what columns it > has. In theory two queries for different longnames can resolve to the same > column name. Um, no, I don't think this has anything to do with whether you can distinguish the names of different columns in a table. What we are talking about is the names generated for indexes and sequences that are needed to implement PRIMARY KEY and SERIAL column attributes. Ideally these names are completely invisible to an SQL application --- there's certainly no direct need for the app to know about them. We could eliminate the whole issue if we generated names along the lines of "pg_pkey_idx_48812091". But when you are looking at the system catalogs it is useful to be able to tell what's what by eye. So we compromise by generating names that include the table and column name for which we're creating an implicit index or sequence. The problem is that this implementation-detail-that-should-be-invisible *is* visible to an SQL application, because it restricts the SQL app's choice of table and column names. We need to avoid that restriction, or at least reduce it as much as we can. I'm willing to sacrifice a little bit of SQL naming freedom to preserve readability of the names generated behind the scenes, but putting a hard limit on name length is too much sacrifice. (This is more a question of designer's taste than anything else --- you're certainly free to argue for a different tradeoff point. But it is a tradeoff; there's no perfect solution.) > It is also a backwards compatibility hassle if you ever want to increase > the number of significant characters in the name. This is because the > existing database only knows the first 32 characters and *must* ignore > anything after that in lookups. You would have to keep track of which names > are "old style" and which are new. Why set yourself up like that? No, because the SQL app should never need to know these names at all. If they change, it won't affect app code. Increasing NAMEDATALEN could not cause two table+field names to conflict where they did not conflict before, so I see no risk there. regards, tom lane
I misunderstood the context quite a bit. I would consider gluing the entire full length name together and using 8 bytes or so for a strong hash of the fullname. If there is another lookup path to get to the correct index name, then one can just increment the hash until the name is unique. Or whatever. -Z-
> > > How about something like this: if the code finds that the names are > > too long when forming an implicit index name, it truncates the names > > to fit, and you are OK as long as the truncated name is unique. > > Comments? Objections? I think I could argue that this is a bug fix > > and deserves to be slipped into 6.5 ;-) > > I understand some folks think this is a problem, but have been > reluctant to include a "randomizer" in the created index name since it > would make the index name less clearly predictable. May as well use > something like "idx_<procid>_<timestamp>" or somesuch... > > No real objection though, other than aesthetics. And those only count > for so much... I've been wondering for some time why at all to build the index and sequence names from those table/fieldnames. Only to make them guessable? What about building them from the tables OID plus the column numbers. That way, auto created sequences could also be automatically removed on a DROP TABLE because the system can "guess" them. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > > > > I understand some folks think this is a problem, but have been > > reluctant to include a "randomizer" in the created index name since it > > would make the index name less clearly predictable. May as well use > > something like "idx_<procid>_<timestamp>" or somesuch... > > > > No real objection though, other than aesthetics. And those only count > > for so much... > > I've been wondering for some time why at all to build the And me -:) > index and sequence names from those table/fieldnames. Only to > make them guessable? > > What about building them from the tables OID plus the column > numbers. That way, auto created sequences could also be > automatically removed on a DROP TABLE because the system can > "guess" them. Actually, we should use names not allowed in CREATE statements! So I would use "pg_" prefix... Vadim
> > Jan Wieck wrote: > > > > > > > > I understand some folks think this is a problem, but have been > > > reluctant to include a "randomizer" in the created index name since it > > > would make the index name less clearly predictable. May as well use > > > something like "idx_<procid>_<timestamp>" or somesuch... > > > > > > No real objection though, other than aesthetics. And those only count > > > for so much... > > > > I've been wondering for some time why at all to build the > > And me -:) > > > index and sequence names from those table/fieldnames. Only to > > make them guessable? > > > > What about building them from the tables OID plus the column > > numbers. That way, auto created sequences could also be > > automatically removed on a DROP TABLE because the system can > > "guess" them. > > Actually, we should use names not allowed in CREATE statements! > So I would use "pg_" prefix... This would implicitly deny the user from dropping the created index for a unique constraint :-) Same for the sequences - what's good because they are used in the default clauses for the serial field and dropping the sequence would corrupt the table though. I like it. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: >> Actually, we should use names not allowed in CREATE statements! >> So I would use "pg_" prefix... > This would implicitly deny the user from dropping the created > index for a unique constraint :-) Same for the sequences - > what's good because they are used in the default clauses for > the serial field and dropping the sequence would corrupt the > table though. Well, it's only good if the system will get rid of the objects when the user drops the owning table. This is true for indexes but AFAIK it is not yet true for sequences. So if we go with pg_ prefix now, there will be *no* way short of superuser privilege to get rid of the sequence object for a deleted table that had a serial field. Also, this will break pg_dump, which will have no good way to restore the state of a serial sequence object. (CREATE SEQUENCE pg_xxx will fail, no?) > I like it. Perhaps eventually we should wind up using names like "pg_pkey_8381292" but I think this ought to wait until the system retains an explicit representation of the relationship between these indexes/sequences and the owning table, and until we think through the consequences for pg_dump. For now we had better stick to unprivileged names. regards, tom lane
wieck@debis.com (Jan Wieck) writes: > What about building them from the tables OID plus the column > numbers. The parser doesn't know what OID will be assigned to the table at the time it builds the names for the derived objects. I suppose we could postpone the creation of these names until after the table OID is known, but that looks like a rather large and risky change to be making at this stage of the release cycle... At this point I like Zalman's idea, which if I understood it properly went like this: 1. If table and column name are short enough, use "table_column_key" etc (so, no change in the cases that the system acceptsnow). 2. Otherwise, truncate table and/or column name to fit, leaving room for a few extra characters that are made from a hashof the removed characters. The result would look something like "tab_col_5927_key". This still isn't a 100% solution, but it's probably a 99.5% solution where the simple truncation idea would be maybe 95%. Not sure that the additional coverage is worth making the names harder to predict for a person, though. regards, tom lane
Tom Lane wrote: > > I like it. > > Perhaps eventually we should wind up using names like "pg_pkey_8381292" > but I think this ought to wait until the system retains an explicit > representation of the relationship between these indexes/sequences and > the owning table, and until we think through the consequences for > pg_dump. For now we had better stick to unprivileged names. Of course! I didn't meant to do anything on it for v6.5. Implementing automatic sequence deletion if they got created due to serial fields is definitely feature. And I agree that all the odds and ends have to get discussed down first. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
> > > > > How about something like this: if the code finds that the names are > > > too long when forming an implicit index name, it truncates the names > > > to fit, and you are OK as long as the truncated name is unique. > > > Comments? Objections? I think I could argue that this is a bug fix > > > and deserves to be slipped into 6.5 ;-) > > > > I understand some folks think this is a problem, but have been > > reluctant to include a "randomizer" in the created index name since it > > would make the index name less clearly predictable. May as well use > > something like "idx_<procid>_<timestamp>" or somesuch... > > > > No real objection though, other than aesthetics. And those only count > > for so much... > > I've been wondering for some time why at all to build the > index and sequence names from those table/fieldnames. Only to > make them guessable? > > What about building them from the tables OID plus the column > numbers. That way, auto created sequences could also be > automatically removed on a DROP TABLE because the system can > "guess" them. Another idea would be to truncate table and column names equally to fit in NAMEDATALEN, then if that is not unique, start replacing the last letters of the string with number until it is unique: tabnamecolnametabnamecolnam1tabnamecolnam2tabnamecolna32 -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Just a suggestion: use an printably-encoded version of md5 or sha, which are cryptographic hash algorithms. It will make the name completely predictable: if(too_long(name)) {name = md5(name); } It will be *very* unlikely that there are any collisions. Of course, a person won't say "gee, party_address_relation_code_types_seq is too long, I guess that will turn out to be d4420a3105e98e3e2e12c5c73019db59". > -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Thomas Lockhart > Sent: Wednesday, June 02, 1999 12:26 AM > To: Tom Lane > Cc: pgsql-hackers@postgreSQL.org; Zalman Stern > Subject: Re: [HACKERS] Re: [SQL] Column name's length > > > > How about something like this: if the code finds that the names are > > too long when forming an implicit index name, it truncates the names > > to fit, and you are OK as long as the truncated name is unique. > > Comments? Objections? I think I could argue that this is a bug fix > > and deserves to be slipped into 6.5 ;-) > > I understand some folks think this is a problem, but have been > reluctant to include a "randomizer" in the created index name since it > would make the index name less clearly predictable. May as well use > something like "idx_<procid>_<timestamp>" or somesuch... > > No real objection though, other than aesthetics. And those only count > for so much... > > - Tom > > -- > Thomas Lockhart lockhart@alumni.caltech.edu > South Pasadena, California > >
At 09:16 2/06/99 -0400, you wrote: > >Well, it's only good if the system will get rid of the objects when >the user drops the owning table. This is true for indexes but AFAIK >it is not yet true for sequences. So if we go with pg_ prefix now, >there will be *no* way short of superuser privilege to get rid of the >sequence object for a deleted table that had a serial field. > >Also, this will break pg_dump, which will have no good way to restore >the state of a serial sequence object. (CREATE SEQUENCE pg_xxx will >fail, no?) I know I'm probably out of my depth here, but couldn't pg_dump ignore everything with a pg_* prefix? It can (safely?) assumeany 'system' structures will be created as a result of some other user-based definition it is dumping? [If you beat me about the head, I'll shut up] Philip Warner. P.S. I also like the idea of creating the 'system' structures with readily and reliably identifiable names, since it potentiallygives the option of the user choosing to 'hide' them. As a user with about 20000 blobs to load, the output ofa \d is pretty cumbersome. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: >> Also, this will break pg_dump, which will have no good way to restore >> the state of a serial sequence object. (CREATE SEQUENCE pg_xxx will >> fail, no?) > I know I'm probably out of my depth here, but couldn't pg_dump ignore > everything with a pg_* prefix? It does, for the most part. The trouble is that if we rename SERIAL sequences to pg_xxx, and pg_dump then ignores them, then dump and reload will fail to restore the next-serial-number state of a SERIAL column. (Actually, given no other code changes, the serial column would fail entirely because its underlying sequence wouldn't be recreated at all. I was pointing out that it's not even *possible* for pg_dump to restore the sequence's state if the sequence is given a protected name.) > As a user with about 20000 blobs to load, the output of a \d is pretty > cumbersome. Hmm, I suppose \d ought to ignore xinv relations ... regards, tom lane
>> Still, I wonder why OIDNAMELEN isn't just defined as >> (NAMEDATALEN+sizeof(Oid)) rather than putting a comment to that effect. >> I will check the uses and see if that is a safe change or not. > Yes, probably should be changed. The old code did some fancy sed with > it, so maybe it had to be a real number back then, or perhaps initdb > pulls it from the file. Not sure. There was indeed a script pulling it from the file ... but it turns out the value wasn't actually being *used* anywhere! So I just removed OIDNAMELEN entirely. Peter Mount pointed out that the Java interface code has 32 hardwired as a constant for name length, and there may be similar problems in other non-C interfaces that can't conveniently use the NAMEDATALEN constant from postgres_ext.h. Another problem is that some of psql's formats for system table display have hardwired column widths. So there is still work to do if you want to alter NAMEDATALEN. regards, tom lane