Thread: char() or varchar() for frequently used column
Hello everyone, A db I'm designing will have a lot of tables with codes in them, like create table country ( id serial primary key, code char(2) not null unique, name varchar(100) not null unique); insert into country (code, name) values ('NL', 'Nederland'); insert into country (code, name) values ('BE', 'Belgie'); -- etc create table gender ( id serial primary key, code char(1) not null unique, name varchar(100) not null unique); insert into gender (code, name) values ('M', 'male'); insert into gender (code, name) values ('F', 'female'); The 'code' columns will be used as foreign keys in other tables. My question is about the datatype and length of the 'code' columns. Allthough the codes length will probably not change over the years, it might happen anyway. I've seen this before and had some headaches over it. So in the new db I want to be prepared and make the referenced columns longer than would appear necessary at first sight. So instead of "code char(2)" I plan to do "code varchar(25)". The idea is that I: - give myself space for future and unforeseeable change of the length - don't waste space by using varchar() instead of char() Are there any flaws in this approach? Will I get in trouble when using indexes. Will performance be hampered severely? (we're not talking about huge amounts of transactions) Thanks for any insight!
From: "Jules Alberts" <jules.alberts@arbodienst-limburg.nl> Organization: ARBOdienst Limburg BV To: pgsql-novice@postgresql.org Date sent: Thu, 17 Oct 2002 09:23:14 +0200 Subject: [NOVICE] char() or varchar() for frequently used column Send reply to: jules.alberts@arbodienst-limburg.nl From the online docs: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. (character) So I would suggest create table country ( id serial primary key, code varchar not null unique, name varchar not null unique); insert into country (code, name) values ('NL', 'Nederland'); insert into country (code, name) values ('BE', 'Belgie'); create table gender ( id serial primary key, code varchar not null unique, name varchar not null unique); insert into gender (code, name) values ('M', 'male'); insert into gender (code, name) values ('F', 'female'); That way your future proofed, varchar without brackets is unlimited and while I know you didn't ask create table country ( code varchar primary key, name varchar not null unique; insert into country (code, name) values ('NL', 'Nederland'); insert into country (code, name) values ('BE', 'Belgie'); create table gender ( code varchar primary key, name varchar not null unique); insert into gender (code, name) values ('M', 'male'); insert into gender (code, name) values ('F', 'female'); might serve just as well Hope this helps Paul Butler > Hello everyone, > > A db I'm designing will have a lot of tables with codes in them, like > > create table country ( > id serial primary key, > code char(2) not null unique, > name varchar(100) not null unique); > insert into country (code, name) values ('NL', 'Nederland'); > insert into country (code, name) values ('BE', 'Belgie'); > -- etc > > create table gender ( > id serial primary key, > code char(1) not null unique, > name varchar(100) not null unique); > insert into gender (code, name) values ('M', 'male'); > insert into gender (code, name) values ('F', 'female'); > > The 'code' columns will be used as foreign keys in other tables. My > question is about the datatype and length of the 'code' columns. > Allthough the codes length will probably not change over the years, it > might happen anyway. I've seen this before and had some headaches over > it. So in the new db I want to be prepared and make the referenced > columns longer than would appear necessary at first sight. So instead > of "code char(2)" I plan to do "code varchar(25)". The idea is that I: > > - give myself space for future and unforeseeable change of the length > - don't waste space by using varchar() instead of char() > > Are there any flaws in this approach? Will I get in trouble when using > indexes. Will performance be hampered severely? (we're not talking > about huge amounts of transactions) > > Thanks for any insight! > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On 17 Oct 2002 at 9:24, paul butler wrote: <snip> > That way your future proofed, varchar without brackets is unlimited Thanks a lot, I didn't know that. > and while I know you didn't ask > > create table country ( > code varchar primary key, > name varchar not null unique; > insert into country (code, name) values ('NL', 'Nederland'); > insert into country (code, name) values ('BE', 'Belgie'); > > create table gender ( > code varchar primary key, > name varchar not null unique); > insert into gender (code, name) values ('M', 'male'); > insert into gender (code, name) values ('F', 'female'); > > might serve just as well I have considered this. As a matter of fact, that is the way it is in our current db but I'm not really happy with it. Theoretically CODE should never change and is therefore safe to use as primary key. But having an "extra" serial primary key will make the db more flexible regarding to unforeseen complications. > Hope this helps It did, thanks again.
On Thu, 2002-10-17 at 22:23, Jules Alberts wrote: > > I have considered this. As a matter of fact, that is the way it is in > our current db but I'm not really happy with it. Theoretically CODE > should never change and is therefore safe to use as primary key. But > having an "extra" serial primary key will make the db more flexible > regarding to unforeseen complications. Yeah, this happens. Later people want to expire particular codes, or change their meaning, but not for the existing records that refer to them... From my own experience, I would also say that there is value in being able to sequence the codes in a non-alphabetic order. I add another "seq" column to such tables, to allow their ordering to be arbitrarily adjusted as well. Cheers, Andrew. -- --------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Survey for free with http://survey.net.nz/ ---------------------------------------------------------------------
Purely for discussion: On Thu, 2002-10-17 at 22:23, Jules Alberts wrote: > > I have considered this. As a matter of fact, that is the way it is in > our current db but I'm not really happy with it. Theoretically CODE > should never change and is therefore safe to use as primary key. But > having an "extra" serial primary key will make the db more flexible > regarding to unforeseen complications. Could you not make NAME not unique? Then you could have a new code for the same name, not affecting previous records. If a code changes, then its a new code, or the old code with a new name >Yeah, this happens. Later people want to expire particular codes, >or >change their meaning, but not for the existing records that refer to >them... If all attributes are 'unique' I don't see how you could change a codes 'meaning' without (effectively not mechanically) cascading these changes to existing records From my own experience, I would also say that there is value in >being >able to sequence the codes in a non-alphabetic order. I add >another "seq" column to such tables, to allow their ordering to be arbitrarily adjusted as well. Just wondering aloud Cheers Paul Butler
On 17 Oct 2002 at 13:28, paul butler wrote: > Purely for discussion: > > > On Thu, 2002-10-17 at 22:23, Jules Alberts wrote: > > > > I have considered this. As a matter of fact, that is the way it is in > > our current db but I'm not really happy with it. Theoretically > CODE > > should never change and is therefore safe to use as primary key. > But > > having an "extra" serial primary key will make the db more > flexible > > regarding to unforeseen complications. > > Could you not make NAME not unique? Then you could have a > new code for the same name, not affecting previous records. If a > code changes, then its a new code, or the old code with a new > name Both CODE and NAME are unique. And they will _never_ change. Until they do :-). These are mostly tables which are not our own, stuff like country codes, medical diagnosises etc. They comply to ISO standards. Someone may decide to change them. Of course we could tackle this by using an UPDATE CASCADE, but beside that I want a truly unique and unchangeable column in the table. Hence the ID field, which is redundant as long as nothing unchangeable changes, merely a safeguard. > >Yeah, this happens. Later people want to expire particular codes, > >or > >change their meaning, but not for the existing records that refer to > >them... > > If all attributes are 'unique' I don't see how you could change a > codes 'meaning' without (effectively not mechanically) cascading > these changes to existing records > > > >From my own experience, I would also say that there is value in > >being > >able to sequence the codes in a non-alphabetic order. I add > >another > "seq" column to such tables, to allow their ordering to be arbitrarily > adjusted as well. > > Just wondering aloud > > Cheers > > Paul Butler > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Jules, > Are there any flaws in this approach? Will I get in trouble when > using > indexes. Will performance be hampered severely? (we're not talking > about huge amounts of transactions) On postgresql, there is no performance difference between CHAR and VARCHAR. And a VARCHAR(25) which only has 1 character in each row is no slower than a VARCHAR(2) that has only one character. -Josh Berkus
If your really that worried about this you will just have to have an effective date and an expiration date on the codes. Leave the expiration date either null or a VERY VERY large value. On Thursday 17 October 2002 07:21 am, you wrote: > On 17 Oct 2002 at 13:28, paul butler wrote: > > Purely for discussion: > > > > On Thu, 2002-10-17 at 22:23, Jules Alberts wrote: > > > I have considered this. As a matter of fact, that is the way it is in > > > our current db but I'm not really happy with it. Theoretically > > > > CODE > > > > > should never change and is therefore safe to use as primary key. > > > > But > > > > > having an "extra" serial primary key will make the db more > > > > flexible > > > > > regarding to unforeseen complications. > > > > Could you not make NAME not unique? Then you could have a > > new code for the same name, not affecting previous records. If a > > code changes, then its a new code, or the old code with a new > > name > > Both CODE and NAME are unique. And they will _never_ change. Until they > do :-). These are mostly tables which are not our own, stuff like > country codes, medical diagnosises etc. They comply to ISO standards. > Someone may decide to change them. Of course we could tackle this by > using an UPDATE CASCADE, but beside that I want a truly unique and > unchangeable column in the table. Hence the ID field, which is > redundant as long as nothing unchangeable changes, merely a safeguard. > > > >Yeah, this happens. Later people want to expire particular codes, > > >or > > >change their meaning, but not for the existing records that refer to > > >them... > > > > If all attributes are 'unique' I don't see how you could change a > > codes 'meaning' without (effectively not mechanically) cascading > > these changes to existing records > > > > >From my own experience, I would also say that there is value in > > >being > > >able to sequence the codes in a non-alphabetic order. I add > > >another > > > > "seq" column to such tables, to allow their ordering to be arbitrarily > > adjusted as well. > > > > Just wondering aloud > > > > Cheers > > > > Paul Butler > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On 17 Oct 2002 at 14:15, Randy Neumann wrote: > If your really that worried about this you will just have to have an > effective date and an expiration date on the codes. Leave the expiration > date either null or a VERY VERY large value. Yes, I have considered this. A simpler (less complete) way would be to add a boolean is_active to each table. Defaults to true, and if false, you can't insert the CODE. While especially the date approach would be correct, i haven't decided if I will use it, or the boolean approach. It has it's advantages, but a big disadvantage is that it will make thing a lot more complicated, errorprone etc. Will it be worth it? I don't know yet.
On Thu, Oct 17, 2002 at 14:15:11 -0600, Randy Neumann <Randy_Neumann@centralref.com> wrote: > If your really that worried about this you will just have to have an > effective date and an expiration date on the codes. Leave the expiration > date either null or a VERY VERY large value. There is an 'infinity' date that is larger than any real date.