Thread: About varlena2

About varlena2

From
Qingqing Zhou
Date:
To reduce size of varlen2.vl_len to int16. This has been mentioned before,
but is there any show-stopper reasoning preventing us from doing that or
somebody has been working on it?

Sorry, just to repeat myself. Char types will benefit from that. Many
applications are from DB2, Oracle or SQL Server:
       Max Char Length
DB2         32672
SQL         8000
Oracle      4000

All of above just need varlena2. To support bigger char types, we could
follow the tradition "long varchar", etc. Or, we can introduce several new
data types like "short varchar" to keep compatible with previous
PostgreSQL applications.

Regards,
Qingqing



Re: About varlena2

From
Tom Lane
Date:
Qingqing Zhou <zhouqq@cs.toronto.edu> writes:
> To reduce size of varlen2.vl_len to int16. This has been mentioned before,
> but is there any show-stopper reasoning preventing us from doing that or
> somebody has been working on it?

> Sorry, just to repeat myself. Char types will benefit from that.

I have considerably less than zero interest in creating variant char
types with an int16 header.  The proposal that was on the table was
to use this for numeric and inet types, where it could be done without
introducing any user-visible semantics changes.
        regards, tom lane


Re: About varlena2

From
ITAGAKI Takahiro
Date:
Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:

> To reduce size of varlen2.vl_len to int16. This has been mentioned before,
> but is there any show-stopper reasoning preventing us from doing that or
> somebody has been working on it?

Hi, I'm rewriting the patch that I proposed before.
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg00421.php)
This is another way to reduce the size of variable length types,
using variable length headers.

I'm sure that there are pros and cons of this approach.
Pros. - Optimized for short variables (length <= 127),   where the header takes only one byte. - It can represent long
data.
Cons. - More complexity and operations to extract lengths and buffers. - Needs more works to support TOAST.

To support TOAST, I think the following representations.
It might be good to use only A and B, if TOAST is not needed.
 | Representation             | Size  | Mode                |
--+----------------------------+-------+---------------------+
A | 0*******            + data | 1 + n | length <= 127       |
B | 10****** +  1 byte  + data | 2 + n | length <= 16K -1    |
C | 110----- +  4 bytes + data | 5 + n | length <= 4G  -1    |
D | 1110---- +  6 bytes + data | 7 + n | Compressed          |
E | 11110--- + 12 bytes        | 13    | External            |
F | 11111--- + 16 bytes        | 17    | External+Compressed |
('*' bits are used for length, '-' are unused.)

Comments welcome,
---
ITAGAKI Takahiro
NTT Cyber Space Laboratories