Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id 20140811195340.GR16422@tamriel.snowman.net
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: jsonb format is pessimal for toast compression
List pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > ... I think it would be a better idea to arrange some method by
> > which JSONB (and perhaps other data types) can provide compression
> > hints to pglz.
>
> I agree with that as a long-term goal, but not sure if it's sane to
> push into 9.4.

Agreed.

> What we could conceivably do now is (a) add a datatype OID argument to
> toast_compress_datum, and (b) hard-wire the selection of a different
> compression-parameters struct if it's JSONBOID.  The actual fix would
> then be to increase the first_success_by field of this alternate struct.

Isn't the offset-to-compressable-data variable though, depending on the
number of keys, etc?  Would we be increasing first_success_by based off
of some function which inspects the object?

> I had been worrying about API-instability risks associated with (a),
> but on reflection it seems unlikely that any third-party code calls
> toast_compress_datum directly, and anyway it's not something we'd
> be back-patching to before 9.4.

Agreed.

> The main objection to (b) is that it wouldn't help for domains over jsonb
> (and no, I don't want to insert a getBaseType call there to fix that).

While not ideal, that seems like an acceptable compromise for 9.4 to me.

> A longer-term solution would be to make this some sort of type property
> that domains could inherit, like typstorage is already.  (Somebody
> suggested dealing with this by adding more typstorage values, but
> I don't find that attractive; typstorage is known in too many places.)

Think that was me and having it be something which domains can inherit
makes sense.  Would be able to use this approach to introduce type
(and domains inheirited from that type) specific compression algorithms,
perhaps?  Or even get to a point where we could have a chunk-based
compression scheme for certain types of objects (such as JSONB) where we
keep track of which keys exist at which points in the compressed object,
allowing us to skip to the specific chunk which contains the requested
key, similar to what we do with uncompressed data?

> We'd need some thought about exactly what we want to expose, since
> the specific knobs that pglz_compress has today aren't necessarily
> good for the long term.

Agreed.

> This is all kinda ugly really, but since I'm not hearing brilliant
> ideas for redesigning jsonb's storage format, maybe this is the
> best we can do for now.

This would certainly be an improvement over what's going on now, and I
love the idea of possibly being able to expand this in the future to do
more.  What I'd hate to see is having all of this and only ever using it
to say "skip ahead another 1k for JSONB".
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: 9.4 pg_restore --help changes
Next
From: Tom Lane
Date:
Subject: Re: jsonb format is pessimal for toast compression