Thread: TOAST usage setting

TOAST usage setting

From
Bruce Momjian
Date:
bruce wrote:
> > > * Re: [HACKERS] Modifying TOAST thresholds  /Tom Lane/
> > >
> > > At this point it seems nothing will be done about this issue for 8.3.
> > 
> > I'm not sure anyone has an idea how to test it. TPCC isn't really useful
> > because it has a fixed size (500 byte) string buffer. Perhaps if we modified
> > it to have a random string length uniformly distributed between 0-2k ? But
> > even then it never does any scans based on that buffer. But the problem with
> > going with something more natural is that it'll be harder to tell exactly what
> > it's testing.
> 
> My idea on this was to create two backends, one with the default TOAST
> value, and a second with a value of 50 bytes.  Create a table with one
> TEXT field, and several other columns, each column < 50 bytes.
> 
> Then, fill the table with random data (script attached that might help),
> and the try 2000, 1500, 1000, etc, bytes in the TEXT column for each row
> (use random data so the compression code doesn't shrink it).  Then run a
> test with both backends acessing the TEXT column and non-TEXT column and
> measure the difference between the two backends, i.e. the backend with a
> TOAST value of 50 should show faster access on the non-TEXT field, but
> slower access on the TEXT field.
> 
> Then, figure out where the gains on the non-TEXT field seem to diminish
> in usefulness.  Basically, with a lower TOAST value, we are going to
> spend more time accessing the TEXT field, but the speedup for the
> non-TEXT field should be large enough win that we don't care. As the
> TEXT column becomes shorter, it has less affect on the non-TEXT access.

I tested TOAST using a method similar to the above method against CVS
HEAD, with default shared_buffers = 32MB and no assert()s.  I created
backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
1k, 512, 256, and 128, roughly.

The results are here:
http://momjian.us/expire/TOAST/

Strangely, 128 bytes seems to be the break-even point for TOAST and
non-TOAST, even for sequential scans of the entire heap touching all
long row values.  I am somewhat confused why TOAST has faster access
than inline heap data.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Chris Browne
Date:
bruce@momjian.us (Bruce Momjian) writes:
> The results are here:
>
>     http://momjian.us/expire/TOAST/

I'll take a look and see if there's anything further it makes sense
for me to try testing.  Thanks for following up so quickly; what with
the cold I have had, I haven't yet gotten back to the office.
-- 
"cbbrowne","@","acm.org"
http://cbbrowne.com/info/languages.html
I've had a perfectly wonderful evening. But this wasn't it.
-- Groucho Marx


Re: TOAST usage setting

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

> I tested TOAST using a method similar to the above method against CVS
> HEAD, with default shared_buffers = 32MB and no assert()s.  I created
> backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
> 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
> 1k, 512, 256, and 128, roughly.
>
> The results are here:
>
>     http://momjian.us/expire/TOAST/
>
> Strangely, 128 bytes seems to be the break-even point for TOAST and
> non-TOAST, even for sequential scans of the entire heap touching all
> long row values.  I am somewhat confused why TOAST has faster access
> than inline heap data.

Did your test also imply setting the MAX_TOAST_CHUNK_SIZE (or however that's
spelled)? And what size long values were you actually storing? How did you
generate them?

I wonder if what's happening is that you have large chunks which when stored
inline are leaving lots of dead space in the table. Ie, if you're generating
values with size near 2k and the default chunk size you would expect to find
an average of 1k dead space per page, or a 12.5% drain on performance. As you
lower the chunk size you decrease that margin.

However I agree that it's hard to believe that the costs of random access
wouldn't swamp that 12.5% overhead pretty quickly.

One query I used when measuring the impact of the variable varlena stuff was
this which gives the distribution of tuples/page over a table:

SELECT count(*),n  FROM (SELECT count(*) AS n         FROM foo         GROUP BY (point_in(tidout(ctid)))[0]      ) as
xGROUPBY n;
 

Which might help you peek at what's going on. You could also combine
pg_column_size(foo.*) to measure the size of the tuple. I think that will
measure the size of the tuple as is before the columns are detoasted.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
Bruce Momjian
Date:
Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
> 
> > I tested TOAST using a method similar to the above method against CVS
> > HEAD, with default shared_buffers = 32MB and no assert()s.  I created
> > backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
> > 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
> > 1k, 512, 256, and 128, roughly.
> >
> > The results are here:
> >
> >     http://momjian.us/expire/TOAST/
> >
> > Strangely, 128 bytes seems to be the break-even point for TOAST and
> > non-TOAST, even for sequential scans of the entire heap touching all
> > long row values.  I am somewhat confused why TOAST has faster access
> > than inline heap data.
> 
> Did your test also imply setting the MAX_TOAST_CHUNK_SIZE (or however that's
> spelled)? And what size long values were you actually storing? How did you
> generate them?

Please look at the script sqltest.sh at that URL.  I did not modify
TOAST_MAX_CHUNK_SIZE, but it changes based on TOAST_TUPLES_PER_PAGE,
which I did change.

> I wonder if what's happening is that you have large chunks which when stored
> inline are leaving lots of dead space in the table. Ie, if you're generating
> values with size near 2k and the default chunk size you would expect to find
> an average of 1k dead space per page, or a 12.5% drain on performance. As you
> lower the chunk size you decrease that margin.

Well, that could be it, but effectively that is what would happen in the
real world too.

> However I agree that it's hard to believe that the costs of random access
> wouldn't swamp that 12.5% overhead pretty quickly.
> 
> One query I used when measuring the impact of the variable varlena stuff was
> this which gives the distribution of tuples/page over a table:
> 
> SELECT count(*),n 
>   FROM (SELECT count(*) AS n
>           FROM foo 
>          GROUP BY (point_in(tidout(ctid)))[0]
>        ) as x
>  GROUP BY n;
> 
> Which might help you peek at what's going on. You could also combine
> pg_column_size(foo.*) to measure the size of the tuple. I think that will
> measure the size of the tuple as is before the columns are detoasted.

Please use my test script and see what you find.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

> Gregory Stark wrote:
>> "Bruce Momjian" <bruce@momjian.us> writes:
>> 
>> > I tested TOAST using a method similar to the above method against CVS
>> > HEAD, with default shared_buffers = 32MB and no assert()s.  I created
>> > backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
>> > 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
>> > 1k, 512, 256, and 128, roughly.
>> >
>> > The results are here:
>> >
>> >     http://momjian.us/expire/TOAST/
>> >
>> > Strangely, 128 bytes seems to be the break-even point for TOAST and
>> > non-TOAST, even for sequential scans of the entire heap touching all
>> > long row values.  I am somewhat confused why TOAST has faster access
>> > than inline heap data.

Is your database initialized with C locale? If so then length(text) is
optimized to not have to detoast:
if (pg_database_encoding_max_length() == 1)    PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Also, I think you have to run this for small datasets like you have well as
large data sets where the random access seek time of TOAST will really hurt.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
Gregory Stark
Date:
"Gregory Stark" <stark@enterprisedb.com> writes:

> Is your database initialized with C locale? If so then length(text) is
> optimized to not have to detoast:
>
>     if (pg_database_encoding_max_length() == 1)
>         PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Of course I got that wrong. The question is "is your database in a single-byte
encoding" which isn't the same as being in C locale.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
Bruce Momjian
Date:
Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
> 
> > Gregory Stark wrote:
> >> "Bruce Momjian" <bruce@momjian.us> writes:
> >> 
> >> > I tested TOAST using a method similar to the above method against CVS
> >> > HEAD, with default shared_buffers = 32MB and no assert()s.  I created
> >> > backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
> >> > 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
> >> > 1k, 512, 256, and 128, roughly.
> >> >
> >> > The results are here:
> >> >
> >> >     http://momjian.us/expire/TOAST/
> >> >
> >> > Strangely, 128 bytes seems to be the break-even point for TOAST and
> >> > non-TOAST, even for sequential scans of the entire heap touching all
> >> > long row values.  I am somewhat confused why TOAST has faster access
> >> > than inline heap data.
> 
> Is your database initialized with C locale? If so then length(text) is
> optimized to not have to detoast:
> 
>     if (pg_database_encoding_max_length() == 1)
>         PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);

Wow, we optimized length().  OK, will run the test with
substring(t,1,1).

> Also, I think you have to run this for small datasets like you have well as
> large data sets where the random access seek time of TOAST will really hurt.

Well, if everything doesn't fit in the cache, then the smaller heap will
be a bigger win for non-TOAST access, so some of that overhead balances
out.  Let me get in-cache numbers and then I can look at larger data sets.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Alvaro Herrera
Date:
Bruce Momjian wrote:
> Gregory Stark wrote:
> > "Bruce Momjian" <bruce@momjian.us> writes:
> > 
> > > Gregory Stark wrote:
> > >> "Bruce Momjian" <bruce@momjian.us> writes:
> > >> 
> > >> > I tested TOAST using a method similar to the above method against CVS
> > >> > HEAD, with default shared_buffers = 32MB and no assert()s.  I created
> > >> > backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default),
> > >> > 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default),
> > >> > 1k, 512, 256, and 128, roughly.
> > >> >
> > >> > The results are here:
> > >> >
> > >> >     http://momjian.us/expire/TOAST/
> > >> >
> > >> > Strangely, 128 bytes seems to be the break-even point for TOAST and
> > >> > non-TOAST, even for sequential scans of the entire heap touching all
> > >> > long row values.  I am somewhat confused why TOAST has faster access
> > >> > than inline heap data.
> > 
> > Is your database initialized with C locale? If so then length(text) is
> > optimized to not have to detoast:
> > 
> >     if (pg_database_encoding_max_length() == 1)
> >         PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);
> 
> Wow, we optimized length().  OK, will run the test with
> substring(t,1,1).

Be careful about the compression!  It might be a good idea to run the
test once with the column set to uncompressible (SET STORAGE EXTERNAL?),
and again with it as compressible.

-- 
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.


Re: TOAST usage setting

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> > > >> > Strangely, 128 bytes seems to be the break-even point for TOAST and
> > > >> > non-TOAST, even for sequential scans of the entire heap touching all
> > > >> > long row values.  I am somewhat confused why TOAST has faster access
> > > >> > than inline heap data.
> > > 
> > > Is your database initialized with C locale? If so then length(text) is
> > > optimized to not have to detoast:
> > > 
> > >     if (pg_database_encoding_max_length() == 1)
> > >         PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);
> > 
> > Wow, we optimized length().  OK, will run the test with
> > substring(t,1,1).
> 
> Be careful about the compression!  It might be a good idea to run the
> test once with the column set to uncompressible (SET STORAGE EXTERNAL?),
> and again with it as compressible.

My test uses random data, which I figured was a close to real-world as I
could get, and I have a test that makes sure the data was pushed to the
TOAST table.  Should I still try EXTERNAL?

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Alvaro Herrera
Date:
Bruce Momjian wrote:

> My test uses random data, which I figured was a close to real-world as I
> could get, and I have a test that makes sure the data was pushed to the
> TOAST table.  Should I still try EXTERNAL?

My point is that you probably want to measure separately the effect of
compression from that of the external storage.

-- 
Alvaro Herrera                          Developer, http://www.PostgreSQL.org/
"La tristeza es un muro entre dos jardines" (Khalil Gibran)


Re: TOAST usage setting

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

> Gregory Stark wrote:
>
>> Is your database initialized with C locale? If so then length(text) is
>> optimized to not have to detoast:
>> 
>>     if (pg_database_encoding_max_length() == 1)
>>         PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);
>
> Wow, we optimized length().  OK, will run the test with
> substring(t,1,1).

No, we did substring() too :)

You could try hashtext()

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > My test uses random data, which I figured was a close to real-world as I
> > could get, and I have a test that makes sure the data was pushed to the
> > TOAST table.  Should I still try EXTERNAL?
> 
> My point is that you probably want to measure separately the effect of
> compression from that of the external storage.

OK, I will get to that if we need it later.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Bruce Momjian
Date:
Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
> 
> > Gregory Stark wrote:
> >
> >> Is your database initialized with C locale? If so then length(text) is
> >> optimized to not have to detoast:
> >> 
> >>     if (pg_database_encoding_max_length() == 1)
> >>         PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ);
> >
> > Wow, we optimized length().  OK, will run the test with
> > substring(t,1,1).
> 
> No, we did substring() too :)

Uh, I looked at text_substring(), and while there is an optimization to
do character counting for encoding length == 1, it is still accessing
the data.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

>> No, we did substring() too :)
>
> Uh, I looked at text_substring(), and while there is an optimization to
> do character counting for encoding length == 1, it is still accessing
> the data.

Sure but it'll only access the first chunk. There are two chunks in your test.
It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
2 chunks (hashtext()).

Or if you're concerned with the cpu cost of hashtext you could calculate the
precise two bytes you need to access with substr to force it to load both
chunks. But I think the real cost of unnecessary toasting is the random disk
i/o so the cpu cost is of secondary interest.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
Bruce Momjian
Date:
Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
> 
> >> No, we did substring() too :)
> >
> > Uh, I looked at text_substring(), and while there is an optimization to
> > do character counting for encoding length == 1, it is still accessing
> > the data.
> 
> Sure but it'll only access the first chunk. There are two chunks in your test.
> It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
> 2 chunks (hashtext()).
> 
> Or if you're concerned with the cpu cost of hashtext you could calculate the
> precise two bytes you need to access with substr to force it to load both
> chunks. But I think the real cost of unnecessary toasting is the random disk
> i/o so the cpu cost is of secondary interest.

OK, will run a test with hashtext().  What I am seeing now is a 10-20x
slowdown to access the TOAST data, and a 0-1x speedup for accessing the
non-TOAST data when the rows are long:
http://momjian.us/expire/TOAST/

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Bruce Momjian
Date:
Bruce Momjian wrote:
> Gregory Stark wrote:
> > "Bruce Momjian" <bruce@momjian.us> writes:
> > 
> > >> No, we did substring() too :)
> > >
> > > Uh, I looked at text_substring(), and while there is an optimization to
> > > do character counting for encoding length == 1, it is still accessing
> > > the data.
> > 
> > Sure but it'll only access the first chunk. There are two chunks in your test.
> > It might be interesting to run tests accessing 0 (length()), 1 (substr()), and
> > 2 chunks (hashtext()).
> > 
> > Or if you're concerned with the cpu cost of hashtext you could calculate the
> > precise two bytes you need to access with substr to force it to load both
> > chunks. But I think the real cost of unnecessary toasting is the random disk
> > i/o so the cpu cost is of secondary interest.
> 
> OK, will run a test with hashtext().  What I am seeing now is a 10-20x
> slowdown to access the TOAST data, and a 0-1x speedup for accessing the
> non-TOAST data when the rows are long:

I reran the tests with hashtext(), and created a SUMMARY.HTML chart:
http://momjian.us/expire/TOAST/

What you will see is that pushing TEXT to a TOAST column allows quick
access to non-TOAST values and single-row TOAST values, but accessing
all TOAST columns is slower than accessing them in the heap, by a factor
of 3-18x.

Looking at the chart, it seems 512 is the proper breakpoint for TOAST
because 512 gives us a 2x change in accessing non-TOAST values and
single-row TOAST values, and it is only 2x slower to access all TOAST
values than we have now.

Of course, this has all the data in the cache, but if the cache is
limited, pushing more to TOAST is going to be a bigger win.  In general,
I would guess that the number of times all >512 byte rows are accessed
is much less than the number of times that pushing those >512 byte
values to TOAST will give a speedup.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> I reran the tests with hashtext(), and created a SUMMARY.HTML chart:
>     http://momjian.us/expire/TOAST/

I don't understand what the numbers in this chart are?
        regards, tom lane


Re: TOAST usage setting

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I reran the tests with hashtext(), and created a SUMMARY.HTML chart:
> >     http://momjian.us/expire/TOAST/
> 
> I don't understand what the numbers in this chart are?

They are taken from the test script and output files that are also in
that directory.  The are in milliseconds, so higher is slower.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
"Zeugswetter Andreas ADI SD"
Date:
> I reran the tests with hashtext(), and created a SUMMARY.HTML chart:
>
>     http://momjian.us/expire/TOAST/
>
> What you will see is that pushing TEXT to a TOAST column
> allows quick access to non-TOAST values and single-row TOAST
> values, but accessing all TOAST columns is slower than
> accessing them in the heap, by a factor of 3-18x.

Um, my understanding for this measurement would be to tweak
TOAST_TUPLE_THRESHOLD, and use a fixed TOAST_MAX_CHUNK_SIZE ?
Why would you want to increase the number of toast tuples required for
one column value ?

My expectation would be, that we want to allow a toast tuple to fill a
whole page (TOAST_TUPLES_PER_PAGE = 1),
but keep or reduce the threshold. Maybe we need other logic to find and
use free space for that, though (8.4 material).

Andreas


Re: TOAST usage setting

From
Tom Lane
Date:
"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:
> My expectation would be, that we want to allow a toast tuple to fill a
> whole page (TOAST_TUPLES_PER_PAGE = 1),

I've been wondering about that too.  It certainly needs to be
experimented with, independently of TOAST_TUPLE_THRESHOLD, now that
we can vary the two numbers independently.
        regards, tom lane


Re: TOAST usage setting

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes:
>> My expectation would be, that we want to allow a toast tuple to fill a
>> whole page (TOAST_TUPLES_PER_PAGE = 1),

I thought that previously but then I started thinking maybe that's not true.

Picture toasting a whole bunch of 5-7k data. If your toast chunk size is 2k
then each page will be more or less packed full. Either with four 2k chunks or
fewer full size chunks and one or more final fragments.

Whereas if you set toast_tuples_per_page to 8k then the only option for
Postgres will be to put each datum in its own page and waste 1-3k on every
page.

Perhaps that's too artificial a case. If you have a full range of datum sizes
from 2-8k it ought to be able to pack more densely and especially if you have
data over 8k and so you have final chunks under 2k to fill out the free space.

Moreover, the data density might matter less for the random access type access
pattern of toast tables than the reduction in the number of seeks. We know
from tests that data density does matter for random access OLTP performance
due to cache effectiveness. But still it seems it would only take one extra
seek for a 2k chunk placed on a different page to swamp any effect the wasted
space would cause.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Whereas if you set toast_tuples_per_page to 8k then the only option for
> Postgres will be to put each datum in its own page and waste 1-3k on every
> page.

No, because actually the code is designed to make the toast chunk size
just enough less than 8K that the tuples fit.

The shorter-than-normal tuples carrying the last chunk of any particular
datum are going to result in wasted space to the extent that we can't
pack them together on a page, but that's true now.  Right now, if you
have a large toasted datum, it mostly will consist of just-under-2K
tuples that are sized so that there's no noticeable wasted space on a
page with 4 of them.  There isn't any advantage to that compared to one
just-under-8K tuple AFAICS, and it takes 4 times as much work to insert
or retrieve 'em.
        regards, tom lane


Re: TOAST usage setting

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> The shorter-than-normal tuples carrying the last chunk of any particular
> datum are going to result in wasted space to the extent that we can't
> pack them together on a page, but that's true now.  Right now, if you
> have a large toasted datum, it mostly will consist of just-under-2K
> tuples that are sized so that there's no noticeable wasted space on a
> page with 4 of them.  There isn't any advantage to that compared to one
> just-under-8K tuple AFAICS, and it takes 4 times as much work to insert
> or retrieve 'em.

That will be true if your data are usually larger than 8k. But if your data
are all between TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE you won't have
any fragments to put in the remaining space.

If there's no daylight between those two parameters then you'll get an average
amount of wasted space of about half a chunk per page (or perhaps even less).

But if there's a gap and your use case happens to have mostly or entirely data
sized in that gap then you won't have many or even any fragments available to
fill in that space.

As I described, picture a case where you have TOAST_MAX_CHUNK_SIZE set to 8k
and TOAST_TUPLE_THRESHOLD set under 4k and all your datums are 4k+1. Each one
will have to go on its own page, wasting 50% of the space.

If in that worst case it's still faster than having to pay the cost of two
seeks to use a smaller chunk size and get better data density then we know
there's no argument for a smaller chunk size.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
"Zeugswetter Andreas ADI SD"
Date:
> > Whereas if you set toast_tuples_per_page to 8k then the only option
> > for Postgres will be to put each datum in its own page and
> waste 1-3k
> > on every page.
>
> No, because actually the code is designed to make the toast
> chunk size just enough less than 8K that the tuples fit.

He mentioned toasted values of 6-7k each. If all values are of that size
(like payment slip tiffs) there is nothing we would do with the
remaining 1-2k on each page. But that disadvantage disappears as soon as
you have just a little more variability in the length. Still, it might
be enough to opt for some freespace reuse smarts if we can think of a
cheap heuristic. But the cost to fetch such a "distributed" tuple would
be so huge I doubt there is anything to win but disk space.

Andreas


Re: TOAST usage setting

From
Bruce Momjian
Date:
Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
> > Whereas if you set toast_tuples_per_page to 8k then the only option for
> > Postgres will be to put each datum in its own page and waste 1-3k on every
> > page.
> 
> No, because actually the code is designed to make the toast chunk size
> just enough less than 8K that the tuples fit.
> 
> The shorter-than-normal tuples carrying the last chunk of any particular
> datum are going to result in wasted space to the extent that we can't
> pack them together on a page, but that's true now.  Right now, if you
> have a large toasted datum, it mostly will consist of just-under-2K
> tuples that are sized so that there's no noticeable wasted space on a
> page with 4 of them.  There isn't any advantage to that compared to one
> just-under-8K tuple AFAICS, and it takes 4 times as much work to insert
> or retrieve 'em.

Uh, am I supposed to be running more TOAST tests?  Would someone explain
what they want tested?

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

> Uh, am I supposed to be running more TOAST tests?  Would someone explain
> what they want tested?

If you want my opinion I would say we need two tests:

1) For TOAST_TUPLE_TARGET:

We need to run the test scripts you have already for sizes that cause actual
disk i/o. The real cost of TOAST lies in the random access seeks and your
tests all fit in memory so they're missing that.

2) And for TOAST_MAX_CHUNK_SIZE:

Set TOAST_MAX_CHUNK_SIZE to 8k and TOAST_TOAST_TUPLE_TARGET to 4097 and store
a large table (larger than RAM) of 4069 bytes (and verify that that's creating
two chunks for each tuple). Test how long it takes to do a sequential scan
with hashtext(). Compare that to the above with TOAST_MAX_CHUNK_SIZE set to 4k
(and verify that the toast table is much smaller in this configuration).

Actually I think we need to do the latter of these first. Because if it shows
that bloating the toast table is faster than chopping up data into finer
chunks then we'll want to set TOAST_MAX_CHUNK_SIZE to 8k and then your tests
above will have to be rerun.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
Heikki Linnakangas
Date:
Bruce Momjian wrote:
> What you will see is that pushing TEXT to a TOAST column allows quick
> access to non-TOAST values and single-row TOAST values, but accessing
> all TOAST columns is slower than accessing them in the heap, by a factor
> of 3-18x.
> 
> Looking at the chart, it seems 512 is the proper breakpoint for TOAST
> because 512 gives us a 2x change in accessing non-TOAST values and
> single-row TOAST values, and it is only 2x slower to access all TOAST
> values than we have now.

I don't understand why that point is any better than any other point.

> Of course, this has all the data in the cache, but if the cache is
> limited, pushing more to TOAST is going to be a bigger win.

... assuming that you don't access the TOASTed data.

>  In general,
> I would guess that the number of times all >512 byte rows are accessed
> is much less than the number of times that pushing those >512 byte
> values to TOAST will give a speedup.

Maybe. Maybe not. I wonder how the access times for a single row change 
when you have a table that doesn't fit in cache. Especially after some 
random updates, so that the toast is not clustered in the same order as 
the heap.

The bottom line is that the optimal threshold depends on your application.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: TOAST usage setting

From
Bruce Momjian
Date:
I tested EXTERN_TUPLES_PER_PAGE for values 4(default), 2, and 1:
4    15.5962    15.1971    14.6

which is basically a 3% decrease from 4->2 and 2->1.  The test script
and result are here:
http://momjian.us/expire/TOAST2/

shared_buffers again was 32MB so all the data was in memory.

---------------------------------------------------------------------------

Gregory Stark wrote:
> 
> "Bruce Momjian" <bruce@momjian.us> writes:
> 
> > Uh, am I supposed to be running more TOAST tests?  Would someone explain
> > what they want tested?
> 
> If you want my opinion I would say we need two tests:
> 
> 1) For TOAST_TUPLE_TARGET:
> 
> We need to run the test scripts you have already for sizes that cause actual
> disk i/o. The real cost of TOAST lies in the random access seeks and your
> tests all fit in memory so they're missing that.
> 
> 2) And for TOAST_MAX_CHUNK_SIZE:
> 
> Set TOAST_MAX_CHUNK_SIZE to 8k and TOAST_TOAST_TUPLE_TARGET to 4097 and store
> a large table (larger than RAM) of 4069 bytes (and verify that that's creating
> two chunks for each tuple). Test how long it takes to do a sequential scan
> with hashtext(). Compare that to the above with TOAST_MAX_CHUNK_SIZE set to 4k
> (and verify that the toast table is much smaller in this configuration).
> 
> Actually I think we need to do the latter of these first. Because if it shows
> that bloating the toast table is faster than chopping up data into finer
> chunks then we'll want to set TOAST_MAX_CHUNK_SIZE to 8k and then your tests
> above will have to be rerun.
> 
> -- 
>   Gregory Stark
>   EnterpriseDB          http://www.enterprisedb.com
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

> shared_buffers again was 32MB so all the data was in memory.

The case where all the data is in memory is simply not interesting. The cost
of TOAST is the random access seeks it causes. You seem to be intentionally
avoiding testing the precise thing we're interested in.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
Gregory Stark
Date:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Bruce Momjian" <bruce@momjian.us> writes:
>
>> shared_buffers again was 32MB so all the data was in memory.
>
> The case where all the data is in memory is simply not interesting. The cost
> of TOAST is the random access seeks it causes. You seem to be intentionally
> avoiding testing the precise thing we're interested in.

Also, something's not right with these results. 100,000 tuples --even if all
they contain is a toast pointer-- won't fit on a single page. And the toast
tables should vary in size depending on how many toast chunks are created.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
"Zeugswetter Andreas ADI SD"
Date:
> I tested EXTERN_TUPLES_PER_PAGE for values 4(default), 2, and 1:
>
>     4    15.596
>     2    15.197
>     1    14.6
>
> which is basically a 3% decrease from 4->2 and 2->1.  The
> test script and result are here:
>
>     http://momjian.us/expire/TOAST2/
>
> shared_buffers again was 32MB so all the data was in memory.

Thanks for the test. (The test is for 1 row that is 100k wide.)

It is good. It shows, that we even see a small advantage in the
everything cached case.

What we don't have yet is numbers for whether EXTERN_TUPLES_PER_PAGE=1
substantially increases the toast table size for real life scenarios,
what happens in the worst case (~48% wastage compared to previous 12%),
and whether 1 row per page works well with autovacuum ?

The bad case (with EXTERN_TUPLES_PER_PAGE=1) is when most toast tuples
have a size over TOAST_MAX_CHUNK_SIZE_for_2+1 but enough smaller than a
page that we care about the wasteage. Maybe we can special case that
range.
Maybe determine (and lock) the freespace of any cheap-to-get-at non
empty page (e.g. the current insert target page) and splitting the toast
data there.

Andreas


Re: TOAST usage setting

From
Bruce Momjian
Date:
Gregory Stark wrote:
> "Gregory Stark" <stark@enterprisedb.com> writes:
> 
> > "Bruce Momjian" <bruce@momjian.us> writes:
> >
> >> shared_buffers again was 32MB so all the data was in memory.
> >
> > The case where all the data is in memory is simply not interesting. The cost
> > of TOAST is the random access seeks it causes. You seem to be intentionally
> > avoiding testing the precise thing we're interested in.
> 
> Also, something's not right with these results. 100,000 tuples --even if all
> they contain is a toast pointer-- won't fit on a single page. And the toast
> tables should vary in size depending on how many toast chunks are created.

The test creates _one_ row of length 100,000 and then finds out how long
it takes to access it twenty times.

I don't see how having the data outside cache helps us.  For a large row
with 2k chunks, I assume all the 2k chunks are going to be in the same
8k page.  What I want to measure is the cost of accessing four 2k chunks
vs. one 8k chunk, and I think we can conclude that is 6% of the access
time.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Bruce Momjian
Date:
Zeugswetter Andreas ADI SD wrote:
> It is good. It shows, that we even see a small advantage in the
> everything cached case.
> 
> What we don't have yet is numbers for whether EXTERN_TUPLES_PER_PAGE=1
> substantially increases the toast table size for real life scenarios,
> what happens in the worst case (~48% wastage compared to previous 12%),
> and whether 1 row per page works well with autovacuum ?
> 
> The bad case (with EXTERN_TUPLES_PER_PAGE=1) is when most toast tuples
> have a size over TOAST_MAX_CHUNK_SIZE_for_2+1 but enough smaller than a
> page that we care about the wasteage. Maybe we can special case that
> range.
> Maybe determine (and lock) the freespace of any cheap-to-get-at non
> empty page (e.g. the current insert target page) and splitting the toast
> data there. 

I think we can look at the BSD file system as a parallel.  It has 8k
blocks, with 2k fragments.  Files > 6k get a full block, and smaller
files are made up of full fragments.  

The big difference is that the BSD file system only allocates in 2k or
8k chunks, while use a single chunk size.  However, we allow the filling
of arbitrary free space in an 8k block, which is particularly important
for saving the trailing data that doesn't fill a full chunk.

I think the long-term solution is to go to a 2k/8k fragment/block model,
but that isn't going to happen for 8.3.

The big question is do we want to drop the target tuple size down to
512, and increase the chunk size to 8k for 8.3?  Dropping the tuple size
down to 512 is going to give us some smaller TOAST values to fill in
free space created by the 8k chuck size, assuming you have both types of
values in the table.  Do we want to increase the access time of long
TOAST by 6% if it means having more wasted space for lots of 4.1k
values?

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
"Jim C. Nasby"
Date:
On Fri, Jun 01, 2007 at 01:50:12PM -0400, Bruce Momjian wrote:
> I think the long-term solution is to go to a 2k/8k fragment/block model,
> but that isn't going to happen for 8.3.
There might well have been lessons learned since UFS (anyone know what
ZFS does in this regard?), but I agree that we want to be able to do a
mix of full chunks and fragments.

> The big question is do we want to drop the target tuple size down to
> 512, and increase the chunk size to 8k for 8.3?  Dropping the tuple size
> down to 512 is going to give us some smaller TOAST values to fill in
> free space created by the 8k chuck size, assuming you have both types of
> values in the table.  Do we want to increase the access time of long
> TOAST by 6% if it means having more wasted space for lots of 4.1k
> values?

If we do that people could see their disk space usage increase by up to
16x: currently 513 bytes fits in heap and takes (roughly) 513 bytes; if
we make that change it would then get toasted and take 8K. I don't think
we want to do that. Disk space aside, it's almost certain to seriously
hurt performance as soon as you don't fit entirely in memory.

How big is the hit for setting both to 512? Also, is this something that
could be set at initdb instead of compile time? That would make it
easier for folks to go back to old behavior if the needed to...
--
Jim Nasby                                      decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: TOAST usage setting

From
Tom Lane
Date:
"Jim C. Nasby" <decibel@decibel.org> writes:
> On Fri, Jun 01, 2007 at 01:50:12PM -0400, Bruce Momjian wrote:
>> The big question is do we want to drop the target tuple size down to
>> 512, and increase the chunk size to 8k for 8.3?

> If we do that people could see their disk space usage increase by up to
> 16x: currently 513 bytes fits in heap and takes (roughly) 513 bytes; if
> we make that change it would then get toasted and take 8K.

That argument is completely bogus --- having a toast chunk size of 8K
does not mean that smaller rows occupy 8K.
        regards, tom lane


Re: TOAST usage setting

From
"Zeugswetter Andreas ADI SD"
Date:
> > The big question is do we want to drop the target tuple  size down
to
> > 512, and increase the chunk size to 8k for 8.3?  Dropping the tuple
> > size down to 512 is going to give us some smaller TOAST values to
fill
> > in free space created by the 8k chuck size, assuming you have both
> > types of values in the table.  Do we want to increase the access
time
> > of long TOAST by 6% if it means having more wasted space for lots of

> > 4.1k values?
>
> If we do that people could see their disk space usage increase by up
to
> 16x: currently 513 bytes fits in heap and takes (roughly) 513
> bytes;

No, you misunderstood. Bruce was suggesting changing the target to 512.
That means if a row is wider than ~2k, toaster will try to toast until
the base row is
~512 bytes. I would not do that part for 8.3.

> if we make that change it would then get toasted and
> take 8K. I don't think we want to do that. Disk space aside,
> it's almost certain to seriously hurt performance as soon as
> you don't fit entirely in memory.

No, allowing one toast chunk to fill a page does not mean that every
chunk uses a whole page.

Andreas


Re: TOAST usage setting

From
Bruce Momjian
Date:
Zeugswetter Andreas ADI SD wrote:
>
> > > The big question is do we want to drop the target tuple  size down
> to
> > > 512, and increase the chunk size to 8k for 8.3?  Dropping the tuple
> > > size down to 512 is going to give us some smaller TOAST values to
> fill
> > > in free space created by the 8k chuck size, assuming you have both
> > > types of values in the table.  Do we want to increase the access
> time
> > > of long TOAST by 6% if it means having more wasted space for lots of
>
> > > 4.1k values?
> >
> > If we do that people could see their disk space usage increase by up
> to
> > 16x: currently 513 bytes fits in heap and takes (roughly) 513
> > bytes;
>
> No, you misunderstood. Bruce was suggesting changing the target to 512.
> That means if a row is wider than ~2k, toaster will try to toast until
> the base row is
> ~512 bytes. I would not do that part for 8.3.

OK, what do you suggest for 8.3?  Attached are my suggestion to use 512
and a 4k chunk size, which I think means that 2.7k is the worst values
that has a loss of around 25%.

--
  Bruce Momjian  <bruce@momjian.us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/include/access/tuptoaster.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/access/tuptoaster.h,v
retrieving revision 1.35
diff -c -c -r1.35 tuptoaster.h
*** src/include/access/tuptoaster.h    6 Apr 2007 04:21:43 -0000    1.35
--- src/include/access/tuptoaster.h    2 Jun 2007 02:52:22 -0000
***************
*** 42,48 ****
   * given to needs_toast_table() in toasting.c before unleashing random
   * changes.
   */
! #define TOAST_TUPLES_PER_PAGE    4

  /* Note: sizeof(PageHeaderData) includes the first ItemId on the page */
  #define TOAST_TUPLE_THRESHOLD    \
--- 42,48 ----
   * given to needs_toast_table() in toasting.c before unleashing random
   * changes.
   */
! #define TOAST_TUPLES_PER_PAGE    16

  /* Note: sizeof(PageHeaderData) includes the first ItemId on the page */
  #define TOAST_TUPLE_THRESHOLD    \
***************
*** 69,75 ****
   *
   * NB: Changing TOAST_MAX_CHUNK_SIZE requires an initdb.
   */
! #define EXTERN_TUPLES_PER_PAGE    4                /* tweak only this */

  /* Note: sizeof(PageHeaderData) includes the first ItemId on the page */
  #define EXTERN_TUPLE_MAX_SIZE    \
--- 69,75 ----
   *
   * NB: Changing TOAST_MAX_CHUNK_SIZE requires an initdb.
   */
! #define EXTERN_TUPLES_PER_PAGE    2                /* tweak only this */

  /* Note: sizeof(PageHeaderData) includes the first ItemId on the page */
  #define EXTERN_TUPLE_MAX_SIZE    \

Re: TOAST usage setting

From
"Zeugswetter Andreas ADI SD"
Date:
> > No, you misunderstood. Bruce was suggesting changing the target to
512.
> > That means if a row is wider than ~2k, toaster will try to toast
until
> > the base row is
> > ~512 bytes. I would not do that part for 8.3.
>
> OK, what do you suggest for 8.3?  Attached are my suggestion
> to use 512 and a 4k chunk size, which I think means that 2.7k
> is the worst values that has a loss of around 25%.

Oh, so I misunderstood you also. You are suggesting two changes:
TOAST_TUPLES_PER_PAGE    16
EXTERN_TUPLES_PER_PAGE    2

While I agree, that 2 might be a good compromise with low risc for now,
I think
that toasting all rows down to ~512 bytes is too narrowly targeted at
not reading wider columns.

When suggesting a new target, I interpreted you like so:
#define TOAST_TUPLES_PER_PAGE    4
#define TOAST_TUPLE_TARGET    \MAXALIGN_DOWN((BLCKSZ - \               MAXALIGN(sizeof(PageHeaderData) +
(TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \              / 16)
So we would only toast rows wider than 2k, but once toasting, toast the
base row down to 512.

My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all
else would need extensive performance testing.
#define TOAST_TUPLES_PER_PAGE    4

My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but:
Split data wider than a page into page sized chunks as long as they fill
whole pages.
Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now.
This would not waste more space than currently, but improve performance
for very wide columns.

I can try to do a patch if you think that is a good idea, can't do a lot
of testing though.

Andreas


Re: TOAST usage setting

From
Bruce Momjian
Date:
Zeugswetter Andreas ADI SD wrote:
> 
> > > No, you misunderstood. Bruce was suggesting changing the target to
> 512.
> > > That means if a row is wider than ~2k, toaster will try to toast
> until 
> > > the base row is
> > > ~512 bytes. I would not do that part for 8.3. 
> > 
> > OK, what do you suggest for 8.3?  Attached are my suggestion 
> > to use 512 and a 4k chunk size, which I think means that 2.7k 
> > is the worst values that has a loss of around 25%.
> 
> Oh, so I misunderstood you also. You are suggesting two changes:
> TOAST_TUPLES_PER_PAGE    16
> EXTERN_TUPLES_PER_PAGE    2

Right.

> While I agree, that 2 might be a good compromise with low risc for now,
> I think
> that toasting all rows down to ~512 bytes is too narrowly targeted at
> not reading wider columns.

Well, it is summarized here:
http://momjian.us/expire/TOAST/SUMMARY.html

It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed
like a good compromise.

> When suggesting a new target, I interpreted you like so:
> #define TOAST_TUPLES_PER_PAGE    4
> #define TOAST_TUPLE_TARGET    \
>     MAXALIGN_DOWN((BLCKSZ - \
>                    MAXALIGN(sizeof(PageHeaderData) +
> (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \
>                   / 16)
> So we would only toast rows wider than 2k, but once toasting, toast the
> base row down to 512.  

That is certainly not my intent, and I don't see how you would get the
2k number from that macro.  I think you are looking at 8.2 and not CVS
HEAD.  CVS HEAD has:
#define TOAST_TUPLE_TARGET      TOAST_TUPLE_THRESHOLD

> My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all
> else would need extensive performance testing.
> #define TOAST_TUPLES_PER_PAGE    4
> 
> My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but:
> Split data wider than a page into page sized chunks as long as they fill
> whole pages.
> Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now.
> This would not waste more space than currently, but improve performance
> for very wide columns.
> 
> I can try to do a patch if you think that is a good idea, can't do a lot
> of testing though.

None of this spliting is going to happen for 8.3.  The question what
changes we can make for 8.3, if any.


--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

> Well, it is summarized here:
>
>     http://momjian.us/expire/TOAST/SUMMARY.html
>
> It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed
> like a good compromise.

Is this still testing with all data fitting in RAM?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
"Zeugswetter Andreas ADI SD"
Date:
> > While I agree, that 2 might be a good compromise with low risc for
> > now, I think that toasting all rows down to ~512 bytes is too
narrowly
> > targeted at not reading wider columns.
>
> Well, it is summarized here:
>
>     http://momjian.us/expire/TOAST/SUMMARY.html
>
> It made non-TOAST access 2x faster, but TOAST 7x slower, and
> that seemed like a good compromise.

Yes, my argument was that I expect that in the up to 1-2k range more use
cases will suffer the 7x slowdown, than see the speedup. But the ratio
certainly is hard to judge, and you may well be right.

e.g. for me TOAST_TUPLES_PER_PAGE 8 would be ok, I have base row widths
of ~700 in 2 tables that would suffer if further toasted, but none in
the 1k - 2k range.

I wonder whether this threshold isn't rather absolute, thus the 32k
pagesize users should probably use 32.

> > When suggesting a new target, I interpreted you like so:
> > #define TOAST_TUPLES_PER_PAGE    4
> > #define TOAST_TUPLE_TARGET    \
> >     MAXALIGN_DOWN((BLCKSZ - \
> >                    MAXALIGN(sizeof(PageHeaderData) +
> > (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \
> >                   / 16)
> > So we would only toast rows wider than 2k, but once toasting, toast
> > the base row down to 512.
>
> That is certainly not my intent, and I don't see how you
> would get the 2k number from that macro.  I think you are

the ~2k come from TOAST_TUPLE_THRESHOLD

> looking at 8.2 and not CVS HEAD.  CVS HEAD has:
>
>     #define TOAST_TUPLE_TARGET      TOAST_TUPLE_THRESHOLD

Nope, I meant what I said. To only change the target you would replace
above macro for TOAST_TUPLE_TARGET.
But I also don't see how this would be good.

Andreas


Re: TOAST usage setting

From
Bruce Momjian
Date:
Gregory Stark wrote:
> "Bruce Momjian" <bruce@momjian.us> writes:
> 
> > Well, it is summarized here:
> >
> >     http://momjian.us/expire/TOAST/SUMMARY.html
> >
> > It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed
> > like a good compromise.
> 
> Is this still testing with all data fitting in RAM?

Yes.  Having things out of RAM is going to make access even slower, but
it is going to allow the heap to be in RAM more often.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: TOAST usage setting

From
Gregory Stark
Date:
"Bruce Momjian" <bruce@momjian.us> writes:

> Gregory Stark wrote:
>> 
>> Is this still testing with all data fitting in RAM?
>
> Yes.  Having things out of RAM is going to make access even slower, but
> it is going to allow the heap to be in RAM more often.

It would let us measure the actual impact of TOAST. The largest negative
effect of which is to destroy the sequential access pattern and the positive
effect is as you say to increase the cache effectiveness on non-toasted data.
The cpu costs are insignificant so testing in-memory cases misses the point.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: TOAST usage setting

From
"Zeugswetter Andreas ADI SD"
Date:
> My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE
> as is, but:
> Split data wider than a page into page sized chunks as long
> as they fill whole pages.
> Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now.
> This would not waste more space than currently, but improve
> performance for very wide columns.
>
> I can try to do a patch if you think that is a good idea,
> can't do a lot of testing though.

I have a PoC patch running, but it is larger than expected because of
the size checks during read
(toast_fetch_datum_slice not done, but would be straight forward).
Also the pg_control variable toast_max_chunk_size would need to be
renamed and reflect the
EXTERN_TUPLES_PER_PAGE (4) number and the fact that fullpage chunks are
used
(else the chunk size checks and slice could not work like now).

Should I pursue, keep for 8.4, dump it ?

The downside of this concept is, that chunks smaller than fullpage still
get split into the smaller pieces.
And the < ~8k chunks may well outnumber the > ~8k on real data.
The up side is, that I do not see a better solution that would keep
slice cheap and still lower the overhead even for pathological cases.

Andreas