Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id CAM3SWZSDMkntNCG8dm-grcke_BjZ6U3sSDdMVWhpC_VXJwQ_Jw@mail.gmail.com
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: jsonb format is pessimal for toast compression
List pgsql-hackers
On Mon, Aug 11, 2014 at 12:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think that's a good point.

I think that there may be something to be said for the current layout.
Having adjacent keys and values could take better advantage of CPU
cache characteristics. I've heard of approaches to improving B-Tree
locality that forced keys and values to be adjacent on individual
B-Tree pages [1], for example. I've heard of this more than once. And
FWIW, I believe based on earlier research of user requirements in this
area that very large jsonb datums are not considered all that
compelling. Document database systems have considerable limitations
here.

> On the general topic, I don't think it's reasonable to imagine that
> we're going to come up with a single heuristic that works well for
> every kind of input data.  What pglz is doing - assuming that if the
> beginning of the data is incompressible then the rest probably is too
> - is fundamentally reasonable, nonwithstanding the fact that it
> doesn't happen to work out well for JSONB.  We might be able to tinker
> with that general strategy in some way that seems to fix this case and
> doesn't appear to break others, but there's some risk in that, and
> there's no obvious reason in my mind why PGLZ should be require to fly
> blind.  So I think it would be a better idea to arrange some method by
> which JSONB (and perhaps other data types) can provide compression
> hints to pglz.

If there is to be any effort to make jsonb a more effective target for
compression, I imagine that that would have to target redundancy
between JSON documents. With idiomatic usage, we can expect plenty of
it.

[1] http://www.vldb.org/conf/1999/P7.pdf , "We also forced each key
and child pointer to be adjacent to each other physically"
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: jsonb format is pessimal for toast compression
Next
From: Pavel Stehule
Date:
Subject: Re: psql: show only failed queries