Variable length array element encoding… - Mailing list pgsql-hackers

From Sean Chittenden
Subject Variable length array element encoding…
Date
Msg-id 0938071E-E4B8-4466-BDD5-AEB2FAB40397@chittenden.org
Whole thread Raw
List pgsql-hackers
[ Not subscribed, please keep me in the CC list ]

Is there a standard idiom for encoding small variable length data in an array? I wrote the varint extension[1] that
encodesdata using a variable width encoding scheme[2] for signed and unsigned integers[3]. Right now the extension is
mostlyof use in skinny tables that have at least 4-5 columns, all of which are of INT or INT8. If you have only 5
columnsof INT8, you can save ~50% of your table space. 

But, to get larger savings, it's required to bypass the tuple overhead and aggregating data in to an array (i.e.
aggregateall time series data for a 5min window of time in to a single varuint[]). 

The problem with that being, each varint takes 8 bytes in an array because of padding and alignment. Is there a way to
preventthat, or, more realistically, are there standard ways of encoding this data in to a BYTEA and then manually
scanningand unpacking the data? Random access in to the array isn't a concern. I was thinking about adding a BYTEA to
varint[]cast, but am fishing for a better idea. 

Any hints or thoughts? Thanks in advance. -sc


[1] https://github.com/sean-/postgresql-varint

[2] SELECT varint64,  pg_column_size(varint64) FROM varint64_table ORDER BY varint64 ASC;
       varint64       |  pg_column_size
----------------------+-----------------
 -4611686018427387905 |              11
 -4611686018427387904 |              10
 -36028797018963969   |              10
 -36028797018963968   |               9
 -281474976710657     |               9
 -281474976710656     |               8
 -2199023255553       |               8
 -2199023255552       |               7
 -17179869185         |               7
 -17179869184         |               6
 -134217729           |               6
 -134217728           |               5
 -1048577             |               5
 -1048576             |               4
 -8193                |               4
 -8192                |               3
 -65                  |               3
 -64                  |               2
 -1                   |               2
 0                    |               2
 1                    |               2
 63                   |               2
 64                   |               3
 8191                 |               3
 8192                 |               4
 1048575              |               4
 1048576              |               5
 134217727            |               5
 134217728            |               6
 17179869183          |               6
 17179869184          |               7
 2199023255551        |               7
 2199023255552        |               8
 281474976710655      |               8
 281474976710656      |               9
 36028797018963967    |               9
 36028797018963968    |              10
 4611686018427387903  |              10
 4611686018427387904  |              11
(39 rows)

SELECT varuint64,  pg_column_size(varint64) FROM varuint64_table ORDER BY varint64 ASC;
      varuint64      |  pg_column_size
---------------------+-----------------
 0                   |               2
 127                 |               2
 128                 |               3
 16383               |               3
 16384               |               4
 2097151             |               4
 2097152             |               5
 268435455           |               5
 268435456           |               6
 34359738367         |               6
 34359738368         |               7
 4398046511103       |               7
 4398046511104       |               8
 562949953421311     |               8
 562949953421312     |               9
 72057594037927935   |               9
 72057594037927936   |              10
 9223372036854775807 |              10

[3] I know the unsigned int only goes up to 2^^63 atm, it will go to 2^^64 once I get around to setting up a test
methodology.Using INT8 internally was too convenient at the time. 

--
Sean Chittenden
sean@chittenden.org




pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Proof of concept: standalone backend with full FE/BE protocol
Next
From: Robert Haas
Date:
Subject: Re: Memory leaks in record_out and record_send