Variable length array element encoding… - Mailing list pgsql-hackers
From | Sean Chittenden |
---|---|
Subject | Variable length array element encoding… |
Date | |
Msg-id | 0938071E-E4B8-4466-BDD5-AEB2FAB40397@chittenden.org Whole thread Raw |
List | pgsql-hackers |
[ Not subscribed, please keep me in the CC list ] Is there a standard idiom for encoding small variable length data in an array? I wrote the varint extension[1] that encodesdata using a variable width encoding scheme[2] for signed and unsigned integers[3]. Right now the extension is mostlyof use in skinny tables that have at least 4-5 columns, all of which are of INT or INT8. If you have only 5 columnsof INT8, you can save ~50% of your table space. But, to get larger savings, it's required to bypass the tuple overhead and aggregating data in to an array (i.e. aggregateall time series data for a 5min window of time in to a single varuint[]). The problem with that being, each varint takes 8 bytes in an array because of padding and alignment. Is there a way to preventthat, or, more realistically, are there standard ways of encoding this data in to a BYTEA and then manually scanningand unpacking the data? Random access in to the array isn't a concern. I was thinking about adding a BYTEA to varint[]cast, but am fishing for a better idea. Any hints or thoughts? Thanks in advance. -sc [1] https://github.com/sean-/postgresql-varint [2] SELECT varint64, pg_column_size(varint64) FROM varint64_table ORDER BY varint64 ASC; varint64 | pg_column_size ----------------------+----------------- -4611686018427387905 | 11 -4611686018427387904 | 10 -36028797018963969 | 10 -36028797018963968 | 9 -281474976710657 | 9 -281474976710656 | 8 -2199023255553 | 8 -2199023255552 | 7 -17179869185 | 7 -17179869184 | 6 -134217729 | 6 -134217728 | 5 -1048577 | 5 -1048576 | 4 -8193 | 4 -8192 | 3 -65 | 3 -64 | 2 -1 | 2 0 | 2 1 | 2 63 | 2 64 | 3 8191 | 3 8192 | 4 1048575 | 4 1048576 | 5 134217727 | 5 134217728 | 6 17179869183 | 6 17179869184 | 7 2199023255551 | 7 2199023255552 | 8 281474976710655 | 8 281474976710656 | 9 36028797018963967 | 9 36028797018963968 | 10 4611686018427387903 | 10 4611686018427387904 | 11 (39 rows) SELECT varuint64, pg_column_size(varint64) FROM varuint64_table ORDER BY varint64 ASC; varuint64 | pg_column_size ---------------------+----------------- 0 | 2 127 | 2 128 | 3 16383 | 3 16384 | 4 2097151 | 4 2097152 | 5 268435455 | 5 268435456 | 6 34359738367 | 6 34359738368 | 7 4398046511103 | 7 4398046511104 | 8 562949953421311 | 8 562949953421312 | 9 72057594037927935 | 9 72057594037927936 | 10 9223372036854775807 | 10 [3] I know the unsigned int only goes up to 2^^63 atm, it will go to 2^^64 once I get around to setting up a test methodology.Using INT8 internally was too convenient at the time. -- Sean Chittenden sean@chittenden.org
pgsql-hackers by date: