Thread: Single Byte values
As someone pointed out, CHAR(1) costs 5 bytes. Is there a better way to store it? I can see one or three bytes as being bad (byte-alignment issues and speed) Two could be ok, but at the very least 4 should be the max for char(1). Two is ok, once you consider multi-bytes strings (unicode) Three is right out - Four bytes could store two multibyte, two single byte (cast as 2 byte for unicode) or four single byte characters. But in returning to my question, what should I be using for char(1) to char(4) storage? Thanks again!
On Thu, 3 Apr 2003, Jason Hihn wrote: > As someone pointed out, CHAR(1) costs 5 bytes. Is there a better way to > store it? I can see one or three bytes as being bad (byte-alignment issues > and speed) Two could be ok, but at the very least 4 should be the max for > char(1). > > Two is ok, once you consider multi-bytes strings (unicode) > Three is right out - > Four bytes could store two multibyte, two single byte (cast as 2 byte for > unicode) or four single byte characters. > > But in returning to my question, what should I be using for char(1) to > char(4) storage? > > Thanks again! I say you should use char(1) or whatever if that's what your requirement is. I'm not sure on the storage details but I doubt using text type will save anything. I also suspect the 5 byte cost isn't just the data but column overhead as well. I think the person saying not to use it is really saying why limit yourself to 1 character when for similar cost you can get 1 character _and_ the ability to get more than 1 character. However, this comes back to my point, if 1 character is what you need then use a char(1). Unfortunately, the example that jumped to my mind of vehicle registration year tags doesn't work, somewhere along the line that part of registration numbers became 2 letters instead of 1, which would really come back and haunt you if you'd done your database to be char(1) and had been amassing data for years before the change. -- Nigel J. Andrews
> -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Nigel J. Andrews > Sent: Thursday, April 03, 2003 4:02 PM > To: Jason Hihn > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Single Byte values ... > I say you should use char(1) or whatever if that's what your requirement > is. I'm not sure on the storage details but I doubt using text > type will save > anything. > > I also suspect the 5 byte cost isn't just the data but column overhead as > well. I think the person saying not to use it is really saying why limit > yourself to 1 character when for similar cost you can get 1 ... 4 bytes(stored string length) + 1 data in the case of char(1). Well this is a key for an enumeration, there are only a handful of values, but thousands of records. I could do it via CHECK(a='a' || a='b' || a='c') BUT I much rather dump that enumeration off to a table so to add a letter later only requires INSERT INTO _table VALUES('d'); Furthermore, storing to off to a table allows a description of user-friendly views of the data. Join on the column and you get an English (or whatever your language) description. (Incidentally, it's not bad to change the schema to: create table _table a char(1), lang char(2), desc text primary key(a)); Where you can support multiple languages. Incententally the typical representation is a list or drop-down box. Now you can populate it with what's in _table add a REFERENCES _table(a), and you have an easily extensible system that the can add values too as well. > instead of 1, which would really come back and haunt you if you'd > done your > database to be char(1) and had been amassing data for years > before the change. It could, but as it stands, I only have 10 or so for an application that's been around for 15 years, and additions are rare. Even so this makes them trivially easy... The decision to use a packed type over a vector type should lie with the DB designer.
On Thu, 03 Apr 2003 13:35:24 -0500, Jason Hihn <jhihn@paytimepayroll.com> wrote: >As someone pointed out, CHAR(1) costs 5 bytes. Jason, you already found out, that char(1) uses >4 bytes(stored string length) + 1 data in the case of char(1). All char(n) types are stored on 4 byte boundaries, so char(1) costs up to 8 bytes. >But in returning to my question, what should I be using for char(1) to >char(4) storage? Instead of char(1) you can use the Postgres specific type "char" (*with* the quotes) which only needs one byte. For char(n) a little toy has been posted to -hackers half a year ago (cf. discussion around http://archives.postgresql.org/pgsql-hackers/2002-10/msg00501.php). Feel free to play around with it. Though I wouldn't take the trouble if only a few thousand rows are involved... Servus Manfred
> -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Manfred Koizar > Sent: Thursday, April 03, 2003 5:53 PM > To: Jason Hihn > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Single Byte values > > > On Thu, 03 Apr 2003 13:35:24 -0500, Jason Hihn > <jhihn@paytimepayroll.com> wrote: > >As someone pointed out, CHAR(1) costs 5 bytes. > > Jason, you already found out, that char(1) uses > > >4 bytes(stored string length) + 1 data in the case of char(1). > > All char(n) types are stored on 4 byte boundaries, so char(1) costs up > to 8 bytes. Is this true for memory AND disk, or just memory? > >But in returning to my question, what should I be using for char(1) to > >char(4) storage? > > Instead of char(1) you can use the Postgres specific type "char" > (*with* the quotes) which only needs one byte. So I assume use of this is 'safe'? Any degridation on performance? It appears that this datatype is also array compatible, CREATE TABLE test3(a "char"[4]); works! I guess if space is so important, I could also just store (char[0]<<8|char[1]) into a smallint... > For char(n) a little toy has been posted to -hackers half a year ago > (cf. discussion around > http://archives.postgresql.org/pgsql-hackers/2002-10/msg00501.php). > Feel free to play around with it. Though I wouldn't take the trouble > if only a few thousand rows are involved... > > Servus > Manfred > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
On Fri, 04 Apr 2003 09:13:54 -0500, Jason Hihn <jhihn@paytimepayroll.com> wrote: >> All char(n) types are stored on 4 byte boundaries, so char(1) costs up >> to 8 bytes. > >Is this true for memory AND disk, or just memory? Both. >> Instead of char(1) you can use the Postgres specific type "char" >> (*with* the quotes) which only needs one byte. > >So I assume use of this is 'safe'? Any degridation on performance? I don't expect performance degradation; this should rather improve performance because with smaller tuples you get more tuples per page and so pages have to be read/written less frequently. AFAICT from my tests, if your system is IO bound, then a reduction of tuple size translates directly to performance improvement, e.g. 5% tuple size reduction results more or less in 5% better throughput; might be less, if your usage pattern is CPU bound; might be more, if you're lucky and your working set jumps from just a bit more than available cache to just a bit less. Servus Manfred
> So I assume use of this is 'safe'? Any degridation on performance? Hmm.. I believe it isn't standard... > It appears that this datatype is also array compatible, CREATE TABLE test3(a > "char"[4]); works! > > I guess if space is so important, I could also just store > (char[0]<<8|char[1]) into a smallint... It's often better use int4(8) than char(1) as (primary) keys. pros: int4 uses _less_ space, not affected by locales, operations are surely -faster-. Larger range of values. I believe even int8 is faster than char(1) and not many applications are ever going to exhaust 2^63 keyvalues. cons: integer values often aren't quite mnemonic. -- Antti Haapala
> It's often better use int4(8) than char(1) as (primary) keys. A thought from a rank newb, how about using the a misnamed ascii(), and storing the integer code for the characters you want to use as short keys in integer fields? -- Joel Rees <joel@alpsgiken.gr.jp>