Re: RFC/WIP: adding new configuration options to TOAST - Mailing list pgsql-hackers

From Bill Moran
Subject Re: RFC/WIP: adding new configuration options to TOAST
Date
Msg-id 20151103215835.6d0161c3916bb70d5b21c5db@potentialtech.com
Whole thread Raw
In response to Re: RFC/WIP: adding new configuration options to TOAST  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: RFC/WIP: adding new configuration options to TOAST
List pgsql-hackers
On Tue, 3 Nov 2015 18:34:39 -0800
Jeff Janes <jeff.janes@gmail.com> wrote:

> On Tue, Nov 3, 2015 at 5:21 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> > On 3 November 2015 at 23:04, Bill Moran <wmoran@potentialtech.com> wrote:
> >>
> >> Looking for feedback to see if anyone sees any issues or has any
> >> suggestions on what I'm doing. The attached patch alters 3 things
> >> with regard to TOAST behavior:
> >
> > COMPRESSION_TEST_SIZE (2) seems useful.
> >
> > The other two mostly seem like options nobody's going to know are
> > there, or know how to sensibly set if they do notice them. What's the
> > driving reason behind those, the problem you're trying to solve? Why
> > make them configurable per-table (or at all)?
> 
> I currently have a table with one column which has a median width of
> 500 bytes, a 90th percentile of 650 bytes, and makes up 75% of the
> table's size, and the column is rarely used, while the table itself is
> frequently seq scanned.  I'd very much like to drive that column out
> of main and into toast. I think target_tuple_size would let me do
> that.

That's exactly the use case. As it currently stands, any tuple smaller
than about 2K will never be toasted. So if you have 1900 bytes of
highly compressible text that is infrequently queried from the table
whilst other columns are frequently accessed, there's no way to force
it to be out of line from the main table, or be compressed.

The two new configurables allow the DBA to make tradeoff decisions on
CPU usage vs. storage efficiency. Since the TOAST code attempts to
process the column that will provide the largest gain first, in your
described use case you could calculate the size of the other columns,
and set the target_tuple_size to just a bit larger than that, and
that large column should get moved into the toast table in most or
all cases (depending on how predictable the other sizes are)

Compression is a similarly hard-coded value in current versions.
I feel that allowing the DBA to control how much savings is required
before incurring the overhead of compression is worthwhile, especially
when considered on a per-table basis. For example, the compression
on an archive table could be very aggressive, whereas compression on
a frequently accessed table might only be justified if it saves a lot
of space. How much space compression saves is highly dependent on the
data being stored.

I don't have anything remotely like statistical advice on how much
improvement can actually be gained yet. Once I have per-table values
implemented, it will be much, much easier to test the impact.

It does sound like I need to spend a little more time improving the
documentation to ensure that it's clear what these values achieve.

> (Per-column control would be even nicer, but I'd take what I can get)

Oddly, I hadn't considered getting as granualar as per-column, but
now that you've got me thinking about it, it seems like a logical
step to take.

-- 
Bill Moran



pgsql-hackers by date:

Previous
From: Jim Nasby
Date:
Subject: Re: RFC/WIP: adding new configuration options to TOAST
Next
From: Peter Geoghegan
Date:
Subject: Valgrind and shared_buffers (Was: Restore-reliability mode)