Thread: Unclear CHARACTER specification

Unclear CHARACTER specification

From
Lyle
Date:
Hi,
   Here:
http://www.postgresql.org/docs/9.2/interactive/datatype-character.html

There isn't really clear guidance on ranges and limits. All we have to
go on is this paragraph:

=start
The storage requirement for a short string (up to 126 bytes) is 1 byte
plus the actual string, which includes the space padding in the case of
character. Longer strings have 4 bytes of overhead instead of 1. Long
strings are compressed by the system automatically, so the physical
requirement on disk might be less. Very long values are also stored in
background tables so that they do not interfere with rapid access to
shorter column values. In any case, the longest possible character
string that can be stored is about 1 GB. (The maximum value that will be
allowed for n in the data type declaration is less than that. It
wouldn't be useful to change this because with multibyte character
encodings the number of characters and bytes can be quite different. If
you desire to store long strings with no specific upper limit, use text
or character varying without a length specifier, rather than making up
an arbitrary length limit.)
=end

 From this I can deduce that if I want to keep the character string as a
short string CHAR(126) should achieve that, for single byte character
encodings at least.

Then it says about long and very long strings, but doesn't give any
figures. Nor does it say whether it decides a string is long or very
long before or after the compression. There must be byte length figures
associated with this, please could we have them?

Why is the longest possible string about 1GB? Is that compressed or
uncompressed? Why can't we have an exact figure? What compression is
used? If the amount allowed for n in the data type definition is less,
what is it?

I can't imagine the PostgreSQL code has something like
if ( length( string ) + RAND() > 1GB ) {
error();
}
Seeing how short, long, and very long strings have been mentioned, along
with compression, why not complete this description with specific figures?

Is there other documentation available that covers this?

Sorry for all the questions, but I really thing all of this should be
covered already. Even if it's platform specific, a table with platforms
and figures would do the job nicely.


Lyle



Re: Unclear CHARACTER specification

From
"Kevin Grittner"
Date:
Lyle wrote:

> Is there other documentation available that covers this?

You might want to read this page:

http://www.postgresql.org/docs/9.2/static/storage-toast.html

Note that there is seldom a good reason to use char(n) in
PostgreSQL for any value of n. As the page you cited mentions,
"While character(n) has performance advantages in some other
database systems, there is no such advantage in PostgreSQL; in fact
character(n) is usually the slowest of the three because of its
additional storage costs. In most situations text or character
varying should be used instead."

Don't try to micro-optimize by breaking what is logically a larger
string into 126 character pieces; that will defeat copression, take
more space to store, and add processing overhead. In other words,
such an attempt will almost certainly backfire.

-Kevin