Re: [Bad Attachment] Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: [Bad Attachment] Re: jsonb format is pessimal for toast compression
Date
Msg-id 53EE8736.5090505@agliodbs.com
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 08/15/2014 01:38 PM, Tom Lane wrote:
> I've been poking at this, and I think the main explanation for your result
> is that with more JSONB documents being subject to compression, we're
> spending more time in pglz_decompress.  There's no free lunch in that
> department: if you want compressed storage it's gonna cost ya to
> decompress.  The only way I can get decompression and TOAST access to not
> dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
> PLAIN.  However, when I do that, I do see my test patch running about 25%
> slower overall than HEAD on an "explain analyze select jfield -> 'key'
> from table" type of query with 200-key documents with narrow fields (see
> attached perl script that generates the test data).

Ok, that probably falls under the heading of "acceptable tradeoffs" then.

> Having said all that, I think this test is something of a contrived worst
> case.  More realistic cases are likely to have many fewer keys (so that
> speed of the binary search loop is less of an issue) or else to have total
> document sizes large enough that inline PLAIN storage isn't an option,
> meaning that detoast+decompression costs will dominate.

This was intended to be a worst case.  However, I don't think that it's
the last time we'll see the case of having 100 to 200 keys each with
short values.  That case was actually from some XML data which I'd
already converted into a regular table (hence every row having 183
keys), but if JSONB had been available when I started the project, I
might have chosen to store it as JSONB instead.  It occurs to me that
the matching data from a personals website would very much fit the
pattern of having between 50 and 200 keys, each of which has a short value.

So we don't need to *optimize* for that case, but it also shouldn't be
disastrously slow or 300% of the size of comparable TEXT.  Mind you, I
don't find +80% to be disastrously slow (especially not with a space
savings of 60%), so maybe that's good enough.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



pgsql-hackers by date:

Previous
From: Stefan Keller
Date:
Subject: New Minmax index for geometry data type?
Next
From: David Rowley
Date:
Subject: Re: strncpy is not a safe version of strcpy