Thread: [HACKERS] MAIN, Uncompressed?

[HACKERS] MAIN, Uncompressed?

From
Simon Riggs
Date:
It looks like we need a new Column Storage option for MAIN, Uncompressed.

We have these Column Storage options
Plain - inline only, uncompressed
Main - inline until external as last resort, compressible
External - external, uncompressed
Extended - external, compressible

So there is no option for Main, but not compressible...

With reference to code... there seems to be no way to skip step 3

/* ----------* Compress and/or save external until data fits into target length**  1: Inline compress attributes with
attstorage'x', and store very*     large attributes with attstorage 'x' or 'e' external immediately*  2: Store
attributeswith attstorage 'x' or 'e' external*  3: Inline compress attributes with attstorage 'm'*  4: Store attributes
withattstorage 'm' external* ----------*/
 

Not sure what to call this new option? MAINU?

Objections?

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] MAIN, Uncompressed?

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> It looks like we need a new Column Storage option for MAIN, Uncompressed.
> We have these Column Storage options
> Plain - inline only, uncompressed
> Main - inline until external as last resort, compressible
> External - external, uncompressed
> Extended - external, compressible

> So there is no option for Main, but not compressible...

Doesn't Plain serve the purpose?

In point of fact, though, "never inline and never compress" is not really
a useful option, as it can be more easily read as "fail on wide values".
        regards, tom lane



Re: [HACKERS] MAIN, Uncompressed?

From
Simon Riggs
Date:
On 25 August 2017 at 12:24, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> It looks like we need a new Column Storage option for MAIN, Uncompressed.
>> We have these Column Storage options
>> Plain - inline only, uncompressed
>> Main - inline until external as last resort, compressible
>> External - external, uncompressed
>> Extended - external, compressible
>
>> So there is no option for Main, but not compressible...
>
> Doesn't Plain serve the purpose?

No, because that just gives an error if you try to insert a large column value.

> In point of fact, though, "never inline and never compress" is not really
> a useful option, as it can be more easily read as "fail on wide values".

Agreed, but that is not what I am proposing.

Main is roughly what is wanted, yet it always tries to compress. If
you already know that won't be useful it should be possible to turn
compression off.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] MAIN, Uncompressed?

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Main is roughly what is wanted, yet it always tries to compress. If
> you already know that won't be useful it should be possible to turn
> compression off.

If you know compression isn't useful, but you don't want to fail on
wide values, then "external" should serve the purpose.
        regards, tom lane



Re: [HACKERS] MAIN, Uncompressed?

From
Simon Riggs
Date:
On 25 August 2017 at 13:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> Main is roughly what is wanted, yet it always tries to compress. If
>> you already know that won't be useful it should be possible to turn
>> compression off.
>
> If you know compression isn't useful, but you don't want to fail on
> wide values, then "external" should serve the purpose.

Well, almost. External toasts at 2048-ish bytes whereas Main toasts at
8160 bytes.

The rows are typically near 4kB long, so if marked External they would
always be toasted.

It's desirable to have the full row in the heap block, rather than
have to access heap-toastindex-toastblocks in all cases.

The data is also incompressible, so Main just wastes time on insert.

Hence, we have a missing option.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] MAIN, Uncompressed?

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> On 25 August 2017 at 13:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If you know compression isn't useful, but you don't want to fail on
>> wide values, then "external" should serve the purpose.

> Well, almost. External toasts at 2048-ish bytes whereas Main toasts at
> 8160 bytes.
> The rows are typically near 4kB long, so if marked External they would
> always be toasted.
> It's desirable to have the full row in the heap block, rather than
> have to access heap-toastindex-toastblocks in all cases.
> The data is also incompressible, so Main just wastes time on insert.
> Hence, we have a missing option.

Maybe, but the use case seems mighty narrow.
        regards, tom lane



Re: [HACKERS] MAIN, Uncompressed?

From
Simon Riggs
Date:
On 25 August 2017 at 14:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 25 August 2017 at 13:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> If you know compression isn't useful, but you don't want to fail on
>>> wide values, then "external" should serve the purpose.
>
>> Well, almost. External toasts at 2048-ish bytes whereas Main toasts at
>> 8160 bytes.
>> The rows are typically near 4kB long, so if marked External they would
>> always be toasted.
>> It's desirable to have the full row in the heap block, rather than
>> have to access heap-toastindex-toastblocks in all cases.
>> The data is also incompressible, so Main just wastes time on insert.
>> Hence, we have a missing option.
>
> Maybe, but the use case seems mighty narrow.

JSON blobs between 2kB and 8160 bytes are very common.

String length is maybe a poisson distribution, definitely not uniform.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] MAIN, Uncompressed?

From
Greg Stark
Date:
On 25 August 2017 at 19:59, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On 25 August 2017 at 14:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Maybe, but the use case seems mighty narrow.
>
> JSON blobs between 2kB and 8160 bytes are very common.
>
> String length is maybe a poisson distribution, definitely not uniform.


But JSON blobs should be highly compressible. Even jsonb will be quite
compressible.

That said I always found remembering the mapping from these names to
various behaviours to be quite hard to use. I would have found it far
more useful to have two separate properties I could set "compress" and
"external" or perhaps even more useful would be to set some kind of
guideline size threshold for each (and perhaps a second size compress
threshold and external threshold for the whole tuple).

I think this is a particularly old piece of code and we're lucky the
default heuristics have served well for all this time because I doubt
many people fiddle with these storage attributes. The time may have
come to come up with a better UI for the storage attributes because
people are doing new things (like json) and wanting more control over
this heuristic.

For what it's worth I think a good start would be to give people more
visibility into what the tuptoaster heuristic is actually doing to
their data and that will encourage people to give feedback about when
they're surprised and are frustrated by the existing UI.


-- 
greg



Re: [HACKERS] MAIN, Uncompressed?

From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes:
> I think this is a particularly old piece of code and we're lucky the
> default heuristics have served well for all this time because I doubt
> many people fiddle with these storage attributes. The time may have
> come to come up with a better UI for the storage attributes because
> people are doing new things (like json) and wanting more control over
> this heuristic.

Yeah, I could get behind a basic rethinking of the tuptoaster control
knobs.  I'm just not in love with Simon's specific proposal, especially
not if we can't think of a better name for it than "MAINU".

> For what it's worth I think a good start would be to give people more
> visibility into what the tuptoaster heuristic is actually doing to
> their data and that will encourage people to give feedback about when
> they're surprised and are frustrated by the existing UI.

Hm, what might that look like exactly?  More pgstattuple functionality
perhaps?
        regards, tom lane



Re: [HACKERS] MAIN, Uncompressed?

From
Simon Riggs
Date:
On 25 August 2017 at 20:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> I think this is a particularly old piece of code and we're lucky the
>> default heuristics have served well for all this time because I doubt
>> many people fiddle with these storage attributes. The time may have
>> come to come up with a better UI for the storage attributes because
>> people are doing new things (like json) and wanting more control over
>> this heuristic.
>
> Yeah, I could get behind a basic rethinking of the tuptoaster control
> knobs.  I'm just not in love with Simon's specific proposal, especially
> not if we can't think of a better name for it than "MAINU".

Extended/External would be just fine if you could set the toast
target, so I think a better suggestion would be to make "toast_target"
a per-attribute option .

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] MAIN, Uncompressed?

From
Mark Kirkwood
Date:
On 26/08/17 12:18, Simon Riggs wrote:

> On 25 August 2017 at 20:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Greg Stark <stark@mit.edu> writes:
>>> I think this is a particularly old piece of code and we're lucky the
>>> default heuristics have served well for all this time because I doubt
>>> many people fiddle with these storage attributes. The time may have
>>> come to come up with a better UI for the storage attributes because
>>> people are doing new things (like json) and wanting more control over
>>> this heuristic.
>> Yeah, I could get behind a basic rethinking of the tuptoaster control
>> knobs.  I'm just not in love with Simon's specific proposal, especially
>> not if we can't think of a better name for it than "MAINU".
> Extended/External would be just fine if you could set the toast
> target, so I think a better suggestion would be to make "toast_target"
> a per-attribute option .
>

+1, have thought about this myself previously....thank you for bringing 
it up!

regards

Mark



Re: [HACKERS] MAIN, Uncompressed?

From
Simon Riggs
Date:
On 26 August 2017 at 05:40, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
> On 26/08/17 12:18, Simon Riggs wrote:
>
>> On 25 August 2017 at 20:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> Greg Stark <stark@mit.edu> writes:
>>>>
>>>> I think this is a particularly old piece of code and we're lucky the
>>>> default heuristics have served well for all this time because I doubt
>>>> many people fiddle with these storage attributes. The time may have
>>>> come to come up with a better UI for the storage attributes because
>>>> people are doing new things (like json) and wanting more control over
>>>> this heuristic.
>>>
>>> Yeah, I could get behind a basic rethinking of the tuptoaster control
>>> knobs.  I'm just not in love with Simon's specific proposal, especially
>>> not if we can't think of a better name for it than "MAINU".
>>
>> Extended/External would be just fine if you could set the toast
>> target, so I think a better suggestion would be to make "toast_target"
>> a per-attribute option .
>>
>
> +1, have thought about this myself previously....thank you for bringing it
> up!

OK, so table-level option for "toast_tuple_target", not attribute-level option

The attached patch and test shows this concept is useful and doesn't
affect existing data.

For 4x 4000 byte rows:
* by default we use 1 heap block and 3 toast blocks
* toast_tuple_target=4080 uses 2 heap blocks and 0 toast blocks

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] MAIN, Uncompressed?

From
Simon Riggs
Date:
On 29 August 2017 at 07:58, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 26 August 2017 at 05:40, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
>> On 26/08/17 12:18, Simon Riggs wrote:
>>
>>> On 25 August 2017 at 20:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>
>>>> Greg Stark <stark@mit.edu> writes:
>>>>>
>>>>> I think this is a particularly old piece of code and we're lucky the
>>>>> default heuristics have served well for all this time because I doubt
>>>>> many people fiddle with these storage attributes. The time may have
>>>>> come to come up with a better UI for the storage attributes because
>>>>> people are doing new things (like json) and wanting more control over
>>>>> this heuristic.
>>>>
>>>> Yeah, I could get behind a basic rethinking of the tuptoaster control
>>>> knobs.  I'm just not in love with Simon's specific proposal, especially
>>>> not if we can't think of a better name for it than "MAINU".
>>>
>>> Extended/External would be just fine if you could set the toast
>>> target, so I think a better suggestion would be to make "toast_target"
>>> a per-attribute option .
>>>
>>
>> +1, have thought about this myself previously....thank you for bringing it
>> up!
>
> OK, so table-level option for "toast_tuple_target", not attribute-level option
>
> The attached patch and test shows this concept is useful and doesn't
> affect existing data.
>
> For 4x 4000 byte rows:
> * by default we use 1 heap block and 3 toast blocks
> * toast_tuple_target=4080 uses 2 heap blocks and 0 toast blocks


New patch, v2, since one line in the docs failed to apply because of
recent changes.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] MAIN, Uncompressed?

From
Andrew Dunstan
Date:

On 09/12/2017 12:11 PM, Simon Riggs wrote:
>
>> OK, so table-level option for "toast_tuple_target", not attribute-level option
>>
>> The attached patch and test shows this concept is useful and doesn't
>> affect existing data.
>>
>> For 4x 4000 byte rows:
>> * by default we use 1 heap block and 3 toast blocks
>> * toast_tuple_target=4080 uses 2 heap blocks and 0 toast blocks
>
> New patch, v2, since one line in the docs failed to apply because of
> recent changes.
>



This has bitrotted ever so slightly. I fixed that and took the
opportunity to edit the docs text slightly to improve the clarity a bit.
Revised patch attached - adjust to your taste.

The code is extremely simple and small and straightforward, and the
tests are good.

Marking as Ready for Committer,

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Attachment