[HACKERS] Seekable compressed TOAST [POC] - Mailing list pgsql-hackers

From Sokolov Yura
Subject [HACKERS] Seekable compressed TOAST [POC]
Date
Msg-id dea9d6cc4cdabc609bede5ef6677adef@postgrespro.ru
Whole thread Raw
List pgsql-hackers
Good day, everyone.

It is just proposal.
Main concern: allow compressed toast to be seekable.
Since every chunk compressed separately, toast_fetch_datum_slice can
fetch each slice separately as with EXTERNAL storage.

Attached patch is couple of new column storage types:
- EXTSEEKABLE - like external, but every chunk is separately compressed,
- SEEKABLE - mix of MAIN and EXTSEEKABLE, ie values less than 2k acts as 
MAIN
   storage, and greater as EXTSEEKABLE.

I tested it with source code of postgresql (tables with filename and 
content)
EXTENDED storage: 1296k + 15032k = 16328k
EXTERNAL storage:  728k + 44552k = 45280k
EXTSEEKABLE:       728k + 23096k = 23824k
SEEKABLE:          768k + 23072k = 23640k


Patch is not complete: toast_pointer looks like uncompressed, so
toast_datum_size (and so that pg_column_size) reports uncompressed size
of datum.

And certainly it is just POC, cause better scheme could exist.

For example, improved approach could be:
- modify compression function, so it could stop when it produce desired 
amount
   of compressed data,
- instead of (oid, counter, chunk) use (oid, offset_in_uncompressed, 
chunk)
   for toast tuple, so that it could be located fast.
- using modified compression function, make chunks close to current 2k 
limit
   after compression, but compressed separately, and insert them with 
offset
   in uncompressed varlena.

Other improvement could be building dictionary common for all chunks, 
and
storing it in chunk numbered -1.

PS. Interesting result with tsvector of source code:
EXTENDED:
896k + 16144k = 17040k
EXTERNAL:
896k + 16248k = 17144k
EXTSEEKABLE:
896k + 15792k = 16688k
SEEKABLE:
952k + 15752k = 16704k

So, a) looks like tsvector is almost uncompressible (so probably default
storage should be EXTERNAL), b) it is compressed better by chunks.

-- 
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Pierre-Emmanuel André
Date:
Subject: [HACKERS] PostgreSQL 10beta1 / OpenBSD : compilation failed with libxml
Next
From: tushar
Date:
Subject: [HACKERS] pg_dump ignoring information_schema tables which used in CreatePublication.