Thread: char() or varchar() for frequently used column

char() or varchar() for frequently used column

From
"Jules Alberts"
Date:
Hello everyone,

A db I'm designing will have a lot of tables with codes in them, like

create table country (
    id serial primary key,
    code char(2) not null unique,
    name varchar(100) not null unique);
insert into country (code, name) values ('NL', 'Nederland');
insert into country (code, name) values ('BE', 'Belgie');
-- etc

create table gender (
    id serial primary key,
    code char(1) not null unique,
    name varchar(100) not null unique);
insert into gender (code, name) values ('M', 'male');
insert into gender (code, name) values ('F', 'female');

The 'code' columns will be used as foreign keys in other tables. My
question is about the datatype and length of the 'code' columns.
Allthough the codes length will probably not change over the years, it
might happen anyway. I've seen this before and had some headaches over
it. So in the new db I want to be prepared and make the referenced
columns longer than would appear necessary at first sight. So instead
of "code char(2)" I plan to do "code varchar(25)". The idea is that I:

- give myself space for future and unforeseeable change of the length
- don't waste space by using varchar() instead of char()

Are there any flaws in this approach? Will I get in trouble when using
indexes. Will performance be hampered severely? (we're not talking
about huge amounts of transactions)

Thanks for any insight!

Re: char() or varchar() for frequently used column

From
"paul butler"
Date:
From:               "Jules Alberts" <jules.alberts@arbodienst-limburg.nl>
Organization:       ARBOdienst Limburg BV
To:                 pgsql-novice@postgresql.org
Date sent:          Thu, 17 Oct 2002 09:23:14 +0200
Subject:            [NOVICE] char() or varchar() for frequently used column
Send reply to:      jules.alberts@arbodienst-limburg.nl

From the online docs:

There are no performance differences between these three types, apart from
the increased storage size when using the blank-padded type. (character)

So I would suggest
 create table country (
     id serial primary key,
    code varchar not null unique,
    name varchar not null unique);
insert into country (code, name) values ('NL', 'Nederland');
 insert into country (code, name) values ('BE', 'Belgie');

 create table gender (
     id serial primary key,
     code varchar not null unique,
     name varchar not null unique);
 insert into gender (code, name) values ('M', 'male');
 insert into gender (code, name) values ('F', 'female');

That way your future proofed, varchar without brackets is unlimited
and while I know you didn't ask

 create table country (
    code varchar primary key,
    name varchar not null unique;
insert into country (code, name) values ('NL', 'Nederland');
 insert into country (code, name) values ('BE', 'Belgie');

 create table gender (
     code varchar primary key,
     name varchar not null unique);
 insert into gender (code, name) values ('M', 'male');
 insert into gender (code, name) values ('F', 'female');

might serve just as well
Hope this helps

Paul Butler


> Hello everyone,
>
> A db I'm designing will have a lot of tables with codes in them, like
>
> create table country (
>     id serial primary key,
>     code char(2) not null unique,
>     name varchar(100) not null unique);
> insert into country (code, name) values ('NL', 'Nederland');
> insert into country (code, name) values ('BE', 'Belgie');
> -- etc
>
> create table gender (
>     id serial primary key,
>     code char(1) not null unique,
>     name varchar(100) not null unique);
> insert into gender (code, name) values ('M', 'male');
> insert into gender (code, name) values ('F', 'female');
>
> The 'code' columns will be used as foreign keys in other tables. My
> question is about the datatype and length of the 'code' columns.
> Allthough the codes length will probably not change over the years, it
> might happen anyway. I've seen this before and had some headaches over
> it. So in the new db I want to be prepared and make the referenced
> columns longer than would appear necessary at first sight. So instead
> of "code char(2)" I plan to do "code varchar(25)". The idea is that I:
>
> - give myself space for future and unforeseeable change of the length
> - don't waste space by using varchar() instead of char()
>
> Are there any flaws in this approach? Will I get in trouble when using
> indexes. Will performance be hampered severely? (we're not talking
> about huge amounts of transactions)
>
> Thanks for any insight!
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)



Re: char() or varchar() for frequently used column

From
"Jules Alberts"
Date:
On 17 Oct 2002 at 9:24, paul butler wrote:
<snip>
> That way your future proofed, varchar without brackets is unlimited

Thanks a lot, I didn't know that.

> and while I know you didn't ask
>
>  create table country (
>     code varchar primary key,
>     name varchar not null unique;
> insert into country (code, name) values ('NL', 'Nederland');
>  insert into country (code, name) values ('BE', 'Belgie');
>
>  create table gender (
>      code varchar primary key,
>      name varchar not null unique);
>  insert into gender (code, name) values ('M', 'male');
>  insert into gender (code, name) values ('F', 'female');
>
> might serve just as well

I have considered this. As a matter of fact, that is the way it is in
our current db but I'm not really happy with it. Theoretically CODE
should never change and is therefore safe to use as primary key. But
having an "extra" serial primary key will make the db more flexible
regarding to unforeseen complications.

> Hope this helps

It did, thanks again.

Re: char() or varchar() for frequently used column

From
Andrew McMillan
Date:
On Thu, 2002-10-17 at 22:23, Jules Alberts wrote:
>
> I have considered this. As a matter of fact, that is the way it is in
> our current db but I'm not really happy with it. Theoretically CODE
> should never change and is therefore safe to use as primary key. But
> having an "extra" serial primary key will make the db more flexible
> regarding to unforeseen complications.

Yeah, this happens.  Later people want to expire particular codes, or
change their meaning, but not for the existing records that refer to
them...

From my own experience, I would also say that there is value in being
able to sequence the codes in a non-alphabetic order.  I add another
"seq" column to such tables, to allow their ordering to be arbitrarily
adjusted as well.

Cheers,
                    Andrew.
--
---------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/         PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201     MOB: +64(21)635-694    OFFICE: +64(4)499-2267
           Survey for free with http://survey.net.nz/
---------------------------------------------------------------------


Re: char() or varchar() for frequently used column

From
"paul butler"
Date:
Purely for discussion:


On Thu, 2002-10-17 at 22:23, Jules Alberts wrote:
>
> I have considered this. As a matter of fact, that is the way it is in
> our current db but I'm not really happy with it. Theoretically
CODE
> should never change and is therefore safe to use as primary key.
But
> having an "extra" serial primary key will make the db more
flexible
> regarding to unforeseen complications.

Could you not make NAME not unique? Then you could have a
new code for the same name, not affecting previous records. If a
code changes, then its a new code, or the old code with a new
name

>Yeah, this happens.  Later people want to expire particular codes,
>or
>change their meaning, but not for the existing records that refer to
>them...

If all attributes are 'unique' I don't see how you could change a
codes 'meaning' without (effectively not mechanically) cascading
these changes to existing records


From my own experience, I would also say that there is value in
>being
>able to sequence the codes in a non-alphabetic order.  I add
>another
"seq" column to such tables, to allow their ordering to be arbitrarily
adjusted as well.

Just wondering aloud

Cheers

Paul Butler


Re: char() or varchar() for frequently used column

From
"Jules Alberts"
Date:
On 17 Oct 2002 at 13:28, paul butler wrote:
> Purely for discussion:
>
>
> On Thu, 2002-10-17 at 22:23, Jules Alberts wrote:
> >
> > I have considered this. As a matter of fact, that is the way it is in
> > our current db but I'm not really happy with it. Theoretically
> CODE
> > should never change and is therefore safe to use as primary key.
> But
> > having an "extra" serial primary key will make the db more
> flexible
> > regarding to unforeseen complications.
>
> Could you not make NAME not unique? Then you could have a
> new code for the same name, not affecting previous records. If a
> code changes, then its a new code, or the old code with a new
> name

Both CODE and NAME are unique. And they will _never_ change. Until they
do :-). These are mostly tables which are not our own, stuff like
country codes, medical diagnosises etc. They comply to ISO standards.
Someone may decide to change them. Of course we could tackle this by
using an UPDATE CASCADE, but beside that I want a truly unique and
unchangeable column in the table. Hence the ID field, which is
redundant as long as nothing unchangeable changes, merely a safeguard.

> >Yeah, this happens.  Later people want to expire particular codes,
> >or
> >change their meaning, but not for the existing records that refer to
> >them...
>
> If all attributes are 'unique' I don't see how you could change a
> codes 'meaning' without (effectively not mechanically) cascading
> these changes to existing records
>
>
> >From my own experience, I would also say that there is value in
> >being
> >able to sequence the codes in a non-alphabetic order.  I add
> >another
> "seq" column to such tables, to allow their ordering to be arbitrarily
> adjusted as well.
>
> Just wondering aloud
>
> Cheers
>
> Paul Butler
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>



Re: char() or varchar() for frequently used column

From
"Josh Berkus"
Date:
Jules,

> Are there any flaws in this approach? Will I get in trouble when
> using
> indexes. Will performance be hampered severely? (we're not talking
> about huge amounts of transactions)

On postgresql, there is no performance difference between CHAR and
VARCHAR.   And a VARCHAR(25) which only has 1 character in each row is
no slower than a VARCHAR(2) that has only one character.

-Josh Berkus

Re: char() or varchar() for frequently used column

From
Randy Neumann
Date:
If your really that worried about this you will just have to have an
effective date and an expiration date on the codes.  Leave the expiration
date either null or a VERY VERY large value.



On Thursday 17 October 2002 07:21 am, you wrote:
> On 17 Oct 2002 at 13:28, paul butler wrote:
> > Purely for discussion:
> >
> > On Thu, 2002-10-17 at 22:23, Jules Alberts wrote:
> > > I have considered this. As a matter of fact, that is the way it is in
> > > our current db but I'm not really happy with it. Theoretically
> >
> > CODE
> >
> > > should never change and is therefore safe to use as primary key.
> >
> > But
> >
> > > having an "extra" serial primary key will make the db more
> >
> > flexible
> >
> > > regarding to unforeseen complications.
> >
> > Could you not make NAME not unique? Then you could have a
> > new code for the same name, not affecting previous records. If a
> > code changes, then its a new code, or the old code with a new
> > name
>
> Both CODE and NAME are unique. And they will _never_ change. Until they
> do :-). These are mostly tables which are not our own, stuff like
> country codes, medical diagnosises etc. They comply to ISO standards.
> Someone may decide to change them. Of course we could tackle this by
> using an UPDATE CASCADE, but beside that I want a truly unique and
> unchangeable column in the table. Hence the ID field, which is
> redundant as long as nothing unchangeable changes, merely a safeguard.
>
> > >Yeah, this happens.  Later people want to expire particular codes,
> > >or
> > >change their meaning, but not for the existing records that refer to
> > >them...
> >
> > If all attributes are 'unique' I don't see how you could change a
> > codes 'meaning' without (effectively not mechanically) cascading
> > these changes to existing records
> >
> > >From my own experience, I would also say that there is value in
> > >being
> > >able to sequence the codes in a non-alphabetic order.  I add
> > >another
> >
> > "seq" column to such tables, to allow their ordering to be arbitrarily
> > adjusted as well.
> >
> > Just wondering aloud
> >
> > Cheers
> >
> > Paul Butler
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo@postgresql.org so that your
> > message can get through to the mailing list cleanly
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: char() or varchar() for frequently used column

From
"Jules Alberts"
Date:
On 17 Oct 2002 at 14:15, Randy Neumann wrote:
> If your really that worried about this you will just have to have an
> effective date and an expiration date on the codes.  Leave the expiration
> date either null or a VERY VERY large value.

Yes, I have considered this. A simpler (less complete) way would be to
add a boolean is_active to each table. Defaults to true, and if false,
you can't insert the CODE.

While especially the date approach would be correct, i haven't decided
if I will use it, or the boolean approach. It has it's advantages, but
a big disadvantage is that it will make thing a lot more complicated,
errorprone etc. Will it be worth it? I don't know yet.

Re: char() or varchar() for frequently used column

From
Bruno Wolff III
Date:
On Thu, Oct 17, 2002 at 14:15:11 -0600,
  Randy Neumann <Randy_Neumann@centralref.com> wrote:
> If your really that worried about this you will just have to have an
> effective date and an expiration date on the codes.  Leave the expiration
> date either null or a VERY VERY large value.

There is an 'infinity' date that is larger than any real date.