Thread: Single Byte values

Single Byte values

From
Jason Hihn
Date:
As someone pointed out, CHAR(1) costs 5 bytes. Is there a better way to
store it? I can see one or three bytes as being bad (byte-alignment issues
and speed) Two could be ok, but at the very least 4 should be the max for
char(1).

Two is ok, once you consider multi-bytes strings (unicode)
Three is right out -
Four bytes could store two multibyte, two single byte (cast as 2 byte for
unicode) or four single byte characters.

But in returning to my question, what should I be using for char(1) to
char(4) storage?

Thanks again!


Re: Single Byte values

From
"Nigel J. Andrews"
Date:
On Thu, 3 Apr 2003, Jason Hihn wrote:

> As someone pointed out, CHAR(1) costs 5 bytes. Is there a better way to
> store it? I can see one or three bytes as being bad (byte-alignment issues
> and speed) Two could be ok, but at the very least 4 should be the max for
> char(1).
>
> Two is ok, once you consider multi-bytes strings (unicode)
> Three is right out -
> Four bytes could store two multibyte, two single byte (cast as 2 byte for
> unicode) or four single byte characters.
>
> But in returning to my question, what should I be using for char(1) to
> char(4) storage?
>
> Thanks again!

I say you should use char(1) or whatever if that's what your requirement
is.  I'm not sure on the storage details but I doubt using text type will save
anything.

I also suspect the 5 byte cost isn't just the data but column overhead as
well. I think the person saying not to use it is really saying why limit
yourself to 1 character when for similar cost you can get 1 character _and_ the
ability to get more than 1 character. However, this comes back to my point, if
1 character is what you need then use a char(1). Unfortunately, the example
that jumped to my mind of vehicle registration year tags doesn't work,
somewhere along the line that part of registration numbers became 2 letters
instead of 1, which would really come back and haunt you if you'd done your
database to be char(1) and had been amassing data for years before the change.


--
Nigel J. Andrews


Re: Single Byte values

From
Jason Hihn
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Nigel J. Andrews
> Sent: Thursday, April 03, 2003 4:02 PM
> To: Jason Hihn
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Single Byte values
...
> I say you should use char(1) or whatever if that's what your requirement
> is.  I'm not sure on the storage details but I doubt using text
> type will save
> anything.
>
> I also suspect the 5 byte cost isn't just the data but column overhead as
> well. I think the person saying not to use it is really saying why limit
> yourself to 1 character when for similar cost you can get 1
...

4 bytes(stored string length) + 1 data in the case of char(1).

Well this is a key for an enumeration, there are only a handful of values,
but thousands of records. I could do it via CHECK(a='a' || a='b' || a='c')
BUT I much rather dump that enumeration off to a table so to add a letter
later only requires INSERT INTO _table VALUES('d');

Furthermore, storing to off to a table allows a description of user-friendly
views of the data. Join on the column and you get an English (or whatever
your language) description. (Incidentally, it's not bad to change the schema
to:
create table _table
a char(1),
lang char(2),
desc text
primary key(a));

Where you can support multiple languages.

Incententally the typical representation is a list or drop-down box. Now you
can populate it with what's in _table

add a REFERENCES _table(a), and you have an easily extensible system that
the can add values too as well.

> instead of 1, which would really come back and haunt you if you'd
> done your
> database to be char(1) and had been amassing data for years
> before the change.

It could, but as it stands, I only have 10 or so for an application that's
been around for 15 years, and additions are rare. Even so this makes them
trivially easy... The decision to use a packed type over a vector type
should lie with the DB designer.


Re: Single Byte values

From
Manfred Koizar
Date:
On Thu, 03 Apr 2003 13:35:24 -0500, Jason Hihn
<jhihn@paytimepayroll.com> wrote:
>As someone pointed out, CHAR(1) costs 5 bytes.

Jason, you already found out, that char(1) uses

>4 bytes(stored string length) + 1 data in the case of char(1).

All char(n) types are stored on 4 byte boundaries, so char(1) costs up
to 8 bytes.

>But in returning to my question, what should I be using for char(1) to
>char(4) storage?

Instead of char(1) you can use the Postgres specific type "char"
(*with* the quotes) which only needs one byte.

For char(n) a little toy has been posted to -hackers half a year ago
(cf. discussion around
http://archives.postgresql.org/pgsql-hackers/2002-10/msg00501.php).
Feel free to play around with it.  Though I wouldn't take the trouble
if only a few thousand rows are involved...

Servus
 Manfred


Re: Single Byte values

From
Jason Hihn
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Manfred Koizar
> Sent: Thursday, April 03, 2003 5:53 PM
> To: Jason Hihn
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Single Byte values
>
>
> On Thu, 03 Apr 2003 13:35:24 -0500, Jason Hihn
> <jhihn@paytimepayroll.com> wrote:
> >As someone pointed out, CHAR(1) costs 5 bytes.
>
> Jason, you already found out, that char(1) uses
>
> >4 bytes(stored string length) + 1 data in the case of char(1).
>
> All char(n) types are stored on 4 byte boundaries, so char(1) costs up
> to 8 bytes.

Is this true for memory AND disk, or just memory?

> >But in returning to my question, what should I be using for char(1) to
> >char(4) storage?
>
> Instead of char(1) you can use the Postgres specific type "char"
> (*with* the quotes) which only needs one byte.

So I assume use of this is 'safe'? Any degridation on performance?

It appears that this datatype is also array compatible, CREATE TABLE test3(a
"char"[4]); works!

I guess if space is so important, I could also just store
(char[0]<<8|char[1]) into a smallint...

> For char(n) a little toy has been posted to -hackers half a year ago
> (cf. discussion around
> http://archives.postgresql.org/pgsql-hackers/2002-10/msg00501.php).
> Feel free to play around with it.  Though I wouldn't take the trouble
> if only a few thousand rows are involved...
>
> Servus
>  Manfred
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


Re: Single Byte values

From
Manfred Koizar
Date:
On Fri, 04 Apr 2003 09:13:54 -0500, Jason Hihn
<jhihn@paytimepayroll.com> wrote:
>> All char(n) types are stored on 4 byte boundaries, so char(1) costs up
>> to 8 bytes.
>
>Is this true for memory AND disk, or just memory?

Both.

>> Instead of char(1) you can use the Postgres specific type "char"
>> (*with* the quotes) which only needs one byte.
>
>So I assume use of this is 'safe'? Any degridation on performance?

I don't expect performance degradation;  this should rather improve
performance because with smaller tuples you get more tuples per page
and so pages have to be read/written less frequently.  AFAICT from my
tests, if your system is IO bound, then a reduction of tuple size
translates directly to performance improvement, e.g. 5% tuple size
reduction results more or less in 5% better throughput;  might be
less, if your usage pattern is CPU bound;  might be more, if you're
lucky and your working set jumps from just a bit more than available
cache to just a bit less.

Servus
 Manfred


Re: Single Byte values

From
Antti Haapala
Date:
> So I assume use of this is 'safe'? Any degridation on performance?

Hmm.. I believe it isn't standard...

> It appears that this datatype is also array compatible, CREATE TABLE
test3(a
> "char"[4]); works!
>
> I guess if space is so important, I could also just store
> (char[0]<<8|char[1]) into a smallint...

It's often better use int4(8) than char(1) as (primary) keys.

pros: int4 uses _less_ space, not affected by locales, operations are
surely -faster-. Larger range of values. I believe even int8 is faster
than char(1) and not many applications are ever going to exhaust 2^63
keyvalues.

cons: integer values often aren't quite mnemonic.

--
Antti Haapala


Re: Single Byte values

From
Joel Rees
Date:
> It's often better use int4(8) than char(1) as (primary) keys.

A thought from a rank newb, how about using the a misnamed ascii(), and
storing the integer code for the characters you want to use as short
keys in integer fields?

--
Joel Rees <joel@alpsgiken.gr.jp>