Home > mailing lists

Re: Add ZSON extension to /contrib/ - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Add ZSON extension to /contrib/
Date	May 28, 2021 10:57:07
Msg-id	b53392fa-47e3-a2e6-a8e1-6329d1d74da6@enterprisedb.com Whole thread Raw
In response to	Re: Add ZSON extension to /contrib/ (Andrew Dunstan <andrew@dunslane.net>)
List	pgsql-hackers

Tree view


On 5/27/21 4:15 AM, Andrew Dunstan wrote:
> 
> On 5/26/21 5:29 PM, Bruce Momjian wrote:
>> On Tue, May 25, 2021 at 01:55:13PM +0300, Aleksander Alekseev wrote:
>>> Hi hackers,
>>>
>>> Back in 2016 while being at PostgresPro I developed the ZSON extension [1]. The
>>> extension introduces the new ZSON type, which is 100% compatible with JSONB but
>>> uses a shared dictionary of strings most frequently used in given JSONB
>>> documents for compression. These strings are replaced with integer IDs.
>>> Afterward, PGLZ (and now LZ4) applies if the document is large enough by common
>>> PostgreSQL logic. Under certain conditions (many large documents), this saves
>>> disk space, memory and increases the overall performance. More details can be
>>> found in README on GitHub.
>> I think this is interesting because it is one of the few cases that
>> allow compression outside of a single column.  Here is a list of
>> compression options:
>>
>>     https://momjian.us/main/blogs/pgblog/2020.html#April_27_2020
>>     
>>     1. single field
>>     2. across rows in a single page
>>     3. across rows in a single column
>>     4. across all columns and rows in a table
>>     5. across tables in a database
>>     6. across databases
>>
>> While standard Postgres does #1, ZSON allows 2-5, assuming the data is
>> in the ZSON data type.  I think this cross-field compression has great
>> potential for cases where the data is not relational, or hasn't had time
>> to be structured relationally.  It also opens questions of how to do
>> this cleanly in a relational system.
>>
> 
> I think we're going to get the best bang for the buck on doing 2, 3, and
> 4. If it's confined to a single table then we can put a dictionary in
> something like a fork.

Agreed.

> Maybe given partitioning we want to be able to do multi-table
> dictionaries, but that's less certain.
> 

Yeah. I think it'll have many of the same issues/complexity as global
indexes, and the gains are likely limited. At least assuming the
partitions are sufficiently large, but tiny partitions are inefficient
in general, I think.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Dilip Kumar
Date: 28 May 2021, 10:51:17
Subject: Re: [BUG]Update Toast data failure in logical replication

From: Tomas Vondra
Date: 28 May 2021, 11:46:47
Subject: Re: Decoding speculative insert with toast leaks memory

Re: Add ZSON extension to /contrib/ - Mailing list pgsql-hackers

Previous

Next