Re: RFC/WIP: adding new configuration options to TOAST - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: RFC/WIP: adding new configuration options to TOAST
Date
Msg-id CAMsr+YH1r3Up2x_-R_c-hKowCatEknoV0JRJDQBwKFXFiaT=dQ@mail.gmail.com
Whole thread Raw
In response to Re: RFC/WIP: adding new configuration options to TOAST  (Bill Moran <wmoran@potentialtech.com>)
Responses Re: RFC/WIP: adding new configuration options to TOAST  (Bill Moran <wmoran@potentialtech.com>)
List pgsql-hackers
On 4 November 2015 at 10:58, Bill Moran <wmoran@potentialtech.com> wrote:
> On Tue, 3 Nov 2015 18:34:39 -0800
> Jeff Janes <jeff.janes@gmail.com> wrote:
>
>> On Tue, Nov 3, 2015 at 5:21 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> > On 3 November 2015 at 23:04, Bill Moran <wmoran@potentialtech.com> wrote:
>> >>
>> >> Looking for feedback to see if anyone sees any issues or has any
>> >> suggestions on what I'm doing. The attached patch alters 3 things
>> >> with regard to TOAST behavior:
>> >
>> > COMPRESSION_TEST_SIZE (2) seems useful.
>> >
>> > The other two mostly seem like options nobody's going to know are
>> > there, or know how to sensibly set if they do notice them. What's the
>> > driving reason behind those, the problem you're trying to solve? Why
>> > make them configurable per-table (or at all)?
>>
>> I currently have a table with one column which has a median width of
>> 500 bytes, a 90th percentile of 650 bytes, and makes up 75% of the
>> table's size, and the column is rarely used, while the table itself is
>> frequently seq scanned.  I'd very much like to drive that column out
>> of main and into toast. I think target_tuple_size would let me do
>> that.
>
> That's exactly the use case. As it currently stands, any tuple smaller
> than about 2K will never be toasted. So if you have 1900 bytes of
> highly compressible text that is infrequently queried from the table
> whilst other columns are frequently accessed, there's no way to force
> it to be out of line from the main table, or be compressed.

Ok, so that's the underlying issue to solve. Make smaller tuples
TOASTable, especially when quite compressible.

Shouldn't that be column-level, really?

We have SET STORAGE at the moment. Would some sort of "FORCE" option
to SET STORAGE EXTERNAL meet your needs? It'd potentially force small
data out of line too, but that'd make sense for your rarely accessed
use case.

> The two new configurables allow the DBA to make tradeoff decisions on
> CPU usage vs. storage efficiency. Since the TOAST code attempts to
> process the column that will provide the largest gain first, in your
> described use case you could calculate the size of the other columns,
> and set the target_tuple_size to just a bit larger than that, and
> that large column should get moved into the toast table in most or
> all cases (depending on how predictable the other sizes are)

I'm just concerned that this is another knob that 0.001% of the user
base will know about and use correctly, 1% will use incorrectly based
on some cargo-culted nonsense they pick up somewhere, and the rest
will have no clue exists. We have more than a few of those already.

The way you describe using a GUC here makes it sound like what you
really want is just a column storage option, and that twiddling a GUC
like this is a workaround to try to make one column more aggressively
compressed and moved out of line without affecting the others.

> Compression is a similarly hard-coded value in current versions.
> I feel that allowing the DBA to control how much savings is required
> before incurring the overhead of compression is worthwhile, especially
> when considered on a per-table basis. For example, the compression
> on an archive table could be very aggressive, whereas compression on
> a frequently accessed table might only be justified if it saves a lot
> of space. How much space compression saves is highly dependent on the
> data being stored.

Yes, I can see value in making attempts at compression more (or less)
aggressive. In that regard it's a pity the pluggable compression
support didn't go anywhere really, because the current algorithm is
cpu-cheap at the cost of pretty poor compression ratios.

>> (Per-column control would be even nicer, but I'd take what I can get)
>
> Oddly, I hadn't considered getting as granualar as per-column, but
> now that you've got me thinking about it, it seems like a logical
> step to take.

I think per-column is really where it makes sense, if it's to be done
at all. At least based on the use cases given.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Request: pg_cancel_backend variant that handles 'idle in transaction' sessions
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] postgres_fdw extension support