On 09/25/2014 08:10 PM, Tom Lane wrote:
> I wrote:
>> The "offsets-and-lengths" patch seems like the approach we ought to
>> compare to my patch, but it looks pretty unfinished to me: AFAICS it
>> includes logic to understand offsets sprinkled into a mostly-lengths
>> array, but no logic that would actually *store* any such offsets,
>> which means it's going to act just like my patch for performance
>> purposes.
>
>> In the interests of pushing this forward, I will work today on
>> trying to finish and review Heikki's offsets-and-lengths patch
>> so that we have something we can do performance testing on.
>> I doubt that the performance testing will tell us anything we
>> don't expect, but we should do it anyway.
>
> I've now done that, and attached is what I think would be a committable
> version. Having done this work, I no longer think that this approach
> is significantly messier code-wise than the all-lengths version, and
> it does have the merit of not degrading on very large objects/arrays.
> So at the moment I'm leaning to this solution not the all-lengths one.
>
> To get a sense of the compression effects of varying the stride distance,
> I repeated the compression measurements I'd done on 14 August with Pavel's
> geometry data (<24077.1408052877@sss.pgh.pa.us>). The upshot of that was
>
> min max avg
>
> external text representation 220 172685 880.3
> JSON representation (compressed text) 224 78565 541.3
> pg_column_size, JSONB HEAD repr. 225 82540 639.0
> pg_column_size, all-lengths repr. 225 66794 531.1
>
> Here's what I get with this patch and different stride distances:
>
> JB_OFFSET_STRIDE = 8 225 68551 559.7
> JB_OFFSET_STRIDE = 16 225 67601 552.3
> JB_OFFSET_STRIDE = 32 225 67120 547.4
> JB_OFFSET_STRIDE = 64 225 66886 546.9
> JB_OFFSET_STRIDE = 128 225 66879 546.9
> JB_OFFSET_STRIDE = 256 225 66846 546.8
>
> So at least for that test data, 32 seems like the sweet spot.
> We are giving up a couple percent of space in comparison to the
> all-lengths version, but this is probably an acceptable tradeoff
> for not degrading on very large arrays.
>
> I've not done any speed testing.
I'll do some tommorrow. I should have some different DBs to test on, too.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com