Good day, everyone.
It is just proposal.
Main concern: allow compressed toast to be seekable.
Since every chunk compressed separately, toast_fetch_datum_slice can
fetch each slice separately as with EXTERNAL storage.
Attached patch is couple of new column storage types:
- EXTSEEKABLE - like external, but every chunk is separately compressed,
- SEEKABLE - mix of MAIN and EXTSEEKABLE, ie values less than 2k acts as
MAIN
storage, and greater as EXTSEEKABLE.
I tested it with source code of postgresql (tables with filename and
content)
EXTENDED storage: 1296k + 15032k = 16328k
EXTERNAL storage: 728k + 44552k = 45280k
EXTSEEKABLE: 728k + 23096k = 23824k
SEEKABLE: 768k + 23072k = 23640k
Patch is not complete: toast_pointer looks like uncompressed, so
toast_datum_size (and so that pg_column_size) reports uncompressed size
of datum.
And certainly it is just POC, cause better scheme could exist.
For example, improved approach could be:
- modify compression function, so it could stop when it produce desired
amount
of compressed data,
- instead of (oid, counter, chunk) use (oid, offset_in_uncompressed,
chunk)
for toast tuple, so that it could be located fast.
- using modified compression function, make chunks close to current 2k
limit
after compression, but compressed separately, and insert them with
offset
in uncompressed varlena.
Other improvement could be building dictionary common for all chunks,
and
storing it in chunk numbered -1.
PS. Interesting result with tsvector of source code:
EXTENDED:
896k + 16144k = 17040k
EXTERNAL:
896k + 16248k = 17144k
EXTSEEKABLE:
896k + 15792k = 16688k
SEEKABLE:
952k + 15752k = 16704k
So, a) looks like tsvector is almost uncompressible (so probably default
storage should be EXTERNAL), b) it is compressed better by chunks.
--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers