Thread: char or int vs varchar

char or int vs varchar

From
"Keith Worthington"
Date:
Hi All,

I am building a new table and I would appreciate your thoughts on a data
storage and performance question

I need to store a (at the moment) tristate variable.  Now this could be done
as a char "L", "S", "T" as an int2 1, 2, 3 or as a varchar(8) "loose",
"standard", "tight" among possibly other ways.

For absolute storage minimum char would be the best at 1 byte but the
documentation indicates that it is an internal type so I do not know whether
or not I can really use it.  If not then character(1) would have to be used
and it looks like that would use up 5 bytes.

For readability varchar(8) obviously wins.  The documentation assures me that
other than using 12 bytes there is "no performance differences between these
three types".

Finally there is the smallint at 2 bytes.  Almost the smallest in terms of
disk usage I wonder how it will affect performance since in most cases I would
probably have to use a CASE statement to convert to something a human would
intuitively understand.

Kind Regards,
Keith

Re: char or int vs varchar

From
"Mike Oxford"
Date:
> -----Original Message-----
> From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-
> owner@postgresql.org] On Behalf Of Keith Worthington
> Sent: Friday, April 22, 2005 1:21 PM
> To: PostgreSQL Novice
> Subject: [NOVICE] char or int vs varchar


> I need to store a (at the moment) tristate variable.  Now this could be
> done
> as a char "L", "S", "T" as an int2 1, 2, 3 or as a varchar(8) "loose",
> "standard", "tight" among possibly other ways.

Indexing is fastest on integer types.  Char types are probably stored
underneath as integers (guessing.)  VARCHARs are variable allocation
and when you change the string length it may have to reallocate.

Generally integers are fastest, then single characters then CHAR[x]
types and then VARCHARS.

String compares in your queries are an issue, as well.
"... where x = 'loose'" could be an issue if you query it a lot.

CHAR[1] would work, or you can use a bit field (true,false,not-set) or
if you're creative, 2 bits (01,11,10,11) and you don't use one set.  Check
out the "bytea" types, but it's got a 4 byte header so you'll end up using
33 bits. :P

Personally, I'd use an "char" (note quotes - not a char(1) ) or an unsigned
smallint (in that order) so I don't have to worry about a sign-flag, I have
extended room for more flags and it indexes quickly.

> Finally there is the smallint at 2 bytes.  Almost the smallest in terms of
> disk usage I wonder how it will affect performance since in most cases I
> would
> probably have to use a CASE statement to convert to something a human
> would
> intuitively understand.

Using "full strings of varchar(8)" is probably your slowest and least
efficient route...but the most readable.

I don't think Postgres does enum() fields.  That would probably be ideal
in this case, since you could use human-readable strings and it'd
store/index based on the numeric offset.  Oh well.

G'luck.

-Mike / Szii