TOAST usage setting - Mailing list pgsql-hackers

From Bruce Momjian
Subject TOAST usage setting
Date
Msg-id 200705290018.l4T0ItH18228@momjian.us
Whole thread Raw
Responses Re: TOAST usage setting
List pgsql-hackers
bruce wrote:
> > > * Re: [HACKERS] Modifying TOAST thresholds  /Tom Lane/
> > >
> > > At this point it seems nothing will be done about this issue for 8.3.
> > 
> > I'm not sure anyone has an idea how to test it. TPCC isn't really useful
> > because it has a fixed size (500 byte) string buffer. Perhaps if we modified
> > it to have a random string length uniformly distributed between 0-2k ? But
> > even then it never does any scans based on that buffer. But the problem with
> > going with something more natural is that it'll be harder to tell exactly what
> > it's testing.
> 
> My idea on this was to create two backends, one with the default TOAST
> value, and a second with a value of 50 bytes.  Create a table with one
> TEXT field, and several other columns, each column < 50 bytes.
> 
> Then, fill the table with random data (script attached that might help),
> and the try 2000, 1500, 1000, etc, bytes in the TEXT column for each row
> (use random data so the compression code doesn't shrink it).  Then run a
> test with both backends acessing the TEXT column and non-TEXT column and
> measure the difference between the two backends, i.e. the backend with a
> TOAST value of 50 should show faster access on the non-TEXT field, but
> slower access on the TEXT field.
> 
> Then, figure out where the gains on the non-TEXT field seem to diminish
> in usefulness.  Basically, with a lower TOAST value, we are going to
> spend more time accessing the TEXT field, but the speedup for the
> non-TEXT field should be large enough win that we don't care. As the
> TEXT column becomes shorter, it has less affect on the non-TEXT access.

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s.  I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:
http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values.  I am somewhat confused why TOAST has faster access
than inline heap data.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Do we need a TODO? (was Re: Concurrently updating an updatable view)
Next
From: Tatsuo Ishii
Date:
Subject: Re: What is the maximum encoding-conversion growth rate, anyway?