Thread: TOAST usage setting
bruce wrote: > > > * Re: [HACKERS] Modifying TOAST thresholds /Tom Lane/ > > > > > > At this point it seems nothing will be done about this issue for 8.3. > > > > I'm not sure anyone has an idea how to test it. TPCC isn't really useful > > because it has a fixed size (500 byte) string buffer. Perhaps if we modified > > it to have a random string length uniformly distributed between 0-2k ? But > > even then it never does any scans based on that buffer. But the problem with > > going with something more natural is that it'll be harder to tell exactly what > > it's testing. > > My idea on this was to create two backends, one with the default TOAST > value, and a second with a value of 50 bytes. Create a table with one > TEXT field, and several other columns, each column < 50 bytes. > > Then, fill the table with random data (script attached that might help), > and the try 2000, 1500, 1000, etc, bytes in the TEXT column for each row > (use random data so the compression code doesn't shrink it). Then run a > test with both backends acessing the TEXT column and non-TEXT column and > measure the difference between the two backends, i.e. the backend with a > TOAST value of 50 should show faster access on the non-TEXT field, but > slower access on the TEXT field. > > Then, figure out where the gains on the non-TEXT field seem to diminish > in usefulness. Basically, with a lower TOAST value, we are going to > spend more time accessing the TEXT field, but the speedup for the > non-TEXT field should be large enough win that we don't care. As the > TEXT column becomes shorter, it has less affect on the non-TEXT access. I tested TOAST using a method similar to the above method against CVS HEAD, with default shared_buffers = 32MB and no assert()s. I created backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default), 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default), 1k, 512, 256, and 128, roughly. The results are here: http://momjian.us/expire/TOAST/ Strangely, 128 bytes seems to be the break-even point for TOAST and non-TOAST, even for sequential scans of the entire heap touching all long row values. I am somewhat confused why TOAST has faster access than inline heap data. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
bruce@momjian.us (Bruce Momjian) writes: > The results are here: > > http://momjian.us/expire/TOAST/ I'll take a look and see if there's anything further it makes sense for me to try testing. Thanks for following up so quickly; what with the cold I have had, I haven't yet gotten back to the office. -- "cbbrowne","@","acm.org" http://cbbrowne.com/info/languages.html I've had a perfectly wonderful evening. But this wasn't it. -- Groucho Marx
"Bruce Momjian" <bruce@momjian.us> writes: > I tested TOAST using a method similar to the above method against CVS > HEAD, with default shared_buffers = 32MB and no assert()s. I created > backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default), > 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default), > 1k, 512, 256, and 128, roughly. > > The results are here: > > http://momjian.us/expire/TOAST/ > > Strangely, 128 bytes seems to be the break-even point for TOAST and > non-TOAST, even for sequential scans of the entire heap touching all > long row values. I am somewhat confused why TOAST has faster access > than inline heap data. Did your test also imply setting the MAX_TOAST_CHUNK_SIZE (or however that's spelled)? And what size long values were you actually storing? How did you generate them? I wonder if what's happening is that you have large chunks which when stored inline are leaving lots of dead space in the table. Ie, if you're generating values with size near 2k and the default chunk size you would expect to find an average of 1k dead space per page, or a 12.5% drain on performance. As you lower the chunk size you decrease that margin. However I agree that it's hard to believe that the costs of random access wouldn't swamp that 12.5% overhead pretty quickly. One query I used when measuring the impact of the variable varlena stuff was this which gives the distribution of tuples/page over a table: SELECT count(*),n FROM (SELECT count(*) AS n FROM foo GROUP BY (point_in(tidout(ctid)))[0] ) as xGROUPBY n; Which might help you peek at what's going on. You could also combine pg_column_size(foo.*) to measure the size of the tuple. I think that will measure the size of the tuple as is before the columns are detoasted. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Bruce Momjian" <bruce@momjian.us> writes: > > > I tested TOAST using a method similar to the above method against CVS > > HEAD, with default shared_buffers = 32MB and no assert()s. I created > > backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default), > > 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default), > > 1k, 512, 256, and 128, roughly. > > > > The results are here: > > > > http://momjian.us/expire/TOAST/ > > > > Strangely, 128 bytes seems to be the break-even point for TOAST and > > non-TOAST, even for sequential scans of the entire heap touching all > > long row values. I am somewhat confused why TOAST has faster access > > than inline heap data. > > Did your test also imply setting the MAX_TOAST_CHUNK_SIZE (or however that's > spelled)? And what size long values were you actually storing? How did you > generate them? Please look at the script sqltest.sh at that URL. I did not modify TOAST_MAX_CHUNK_SIZE, but it changes based on TOAST_TUPLES_PER_PAGE, which I did change. > I wonder if what's happening is that you have large chunks which when stored > inline are leaving lots of dead space in the table. Ie, if you're generating > values with size near 2k and the default chunk size you would expect to find > an average of 1k dead space per page, or a 12.5% drain on performance. As you > lower the chunk size you decrease that margin. Well, that could be it, but effectively that is what would happen in the real world too. > However I agree that it's hard to believe that the costs of random access > wouldn't swamp that 12.5% overhead pretty quickly. > > One query I used when measuring the impact of the variable varlena stuff was > this which gives the distribution of tuples/page over a table: > > SELECT count(*),n > FROM (SELECT count(*) AS n > FROM foo > GROUP BY (point_in(tidout(ctid)))[0] > ) as x > GROUP BY n; > > Which might help you peek at what's going on. You could also combine > pg_column_size(foo.*) to measure the size of the tuple. I think that will > measure the size of the tuple as is before the columns are detoasted. Please use my test script and see what you find. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: > Gregory Stark wrote: >> "Bruce Momjian" <bruce@momjian.us> writes: >> >> > I tested TOAST using a method similar to the above method against CVS >> > HEAD, with default shared_buffers = 32MB and no assert()s. I created >> > backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default), >> > 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default), >> > 1k, 512, 256, and 128, roughly. >> > >> > The results are here: >> > >> > http://momjian.us/expire/TOAST/ >> > >> > Strangely, 128 bytes seems to be the break-even point for TOAST and >> > non-TOAST, even for sequential scans of the entire heap touching all >> > long row values. I am somewhat confused why TOAST has faster access >> > than inline heap data. Is your database initialized with C locale? If so then length(text) is optimized to not have to detoast: if (pg_database_encoding_max_length() == 1) PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); Also, I think you have to run this for small datasets like you have well as large data sets where the random access seek time of TOAST will really hurt. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Gregory Stark" <stark@enterprisedb.com> writes: > Is your database initialized with C locale? If so then length(text) is > optimized to not have to detoast: > > if (pg_database_encoding_max_length() == 1) > PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); Of course I got that wrong. The question is "is your database in a single-byte encoding" which isn't the same as being in C locale. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Bruce Momjian" <bruce@momjian.us> writes: > > > Gregory Stark wrote: > >> "Bruce Momjian" <bruce@momjian.us> writes: > >> > >> > I tested TOAST using a method similar to the above method against CVS > >> > HEAD, with default shared_buffers = 32MB and no assert()s. I created > >> > backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default), > >> > 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default), > >> > 1k, 512, 256, and 128, roughly. > >> > > >> > The results are here: > >> > > >> > http://momjian.us/expire/TOAST/ > >> > > >> > Strangely, 128 bytes seems to be the break-even point for TOAST and > >> > non-TOAST, even for sequential scans of the entire heap touching all > >> > long row values. I am somewhat confused why TOAST has faster access > >> > than inline heap data. > > Is your database initialized with C locale? If so then length(text) is > optimized to not have to detoast: > > if (pg_database_encoding_max_length() == 1) > PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); Wow, we optimized length(). OK, will run the test with substring(t,1,1). > Also, I think you have to run this for small datasets like you have well as > large data sets where the random access seek time of TOAST will really hurt. Well, if everything doesn't fit in the cache, then the smaller heap will be a bigger win for non-TOAST access, so some of that overhead balances out. Let me get in-cache numbers and then I can look at larger data sets. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Gregory Stark wrote: > > "Bruce Momjian" <bruce@momjian.us> writes: > > > > > Gregory Stark wrote: > > >> "Bruce Momjian" <bruce@momjian.us> writes: > > >> > > >> > I tested TOAST using a method similar to the above method against CVS > > >> > HEAD, with default shared_buffers = 32MB and no assert()s. I created > > >> > backends with power-of-2 seetings for TOAST_TUPLES_PER_PAGE (4(default), > > >> > 8, 16, 32, 64) which gives TOAST/non-TOAST breakpoints of 2k(default), > > >> > 1k, 512, 256, and 128, roughly. > > >> > > > >> > The results are here: > > >> > > > >> > http://momjian.us/expire/TOAST/ > > >> > > > >> > Strangely, 128 bytes seems to be the break-even point for TOAST and > > >> > non-TOAST, even for sequential scans of the entire heap touching all > > >> > long row values. I am somewhat confused why TOAST has faster access > > >> > than inline heap data. > > > > Is your database initialized with C locale? If so then length(text) is > > optimized to not have to detoast: > > > > if (pg_database_encoding_max_length() == 1) > > PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); > > Wow, we optimized length(). OK, will run the test with > substring(t,1,1). Be careful about the compression! It might be a good idea to run the test once with the column set to uncompressible (SET STORAGE EXTERNAL?), and again with it as compressible. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4" Syntax error: function hell() needs an argument. Please choose what hell you want to involve.
Alvaro Herrera wrote: > > > >> > Strangely, 128 bytes seems to be the break-even point for TOAST and > > > >> > non-TOAST, even for sequential scans of the entire heap touching all > > > >> > long row values. I am somewhat confused why TOAST has faster access > > > >> > than inline heap data. > > > > > > Is your database initialized with C locale? If so then length(text) is > > > optimized to not have to detoast: > > > > > > if (pg_database_encoding_max_length() == 1) > > > PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); > > > > Wow, we optimized length(). OK, will run the test with > > substring(t,1,1). > > Be careful about the compression! It might be a good idea to run the > test once with the column set to uncompressible (SET STORAGE EXTERNAL?), > and again with it as compressible. My test uses random data, which I figured was a close to real-world as I could get, and I have a test that makes sure the data was pushed to the TOAST table. Should I still try EXTERNAL? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > My test uses random data, which I figured was a close to real-world as I > could get, and I have a test that makes sure the data was pushed to the > TOAST table. Should I still try EXTERNAL? My point is that you probably want to measure separately the effect of compression from that of the external storage. -- Alvaro Herrera Developer, http://www.PostgreSQL.org/ "La tristeza es un muro entre dos jardines" (Khalil Gibran)
"Bruce Momjian" <bruce@momjian.us> writes: > Gregory Stark wrote: > >> Is your database initialized with C locale? If so then length(text) is >> optimized to not have to detoast: >> >> if (pg_database_encoding_max_length() == 1) >> PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); > > Wow, we optimized length(). OK, will run the test with > substring(t,1,1). No, we did substring() too :) You could try hashtext() -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera wrote: > Bruce Momjian wrote: > > > My test uses random data, which I figured was a close to real-world as I > > could get, and I have a test that makes sure the data was pushed to the > > TOAST table. Should I still try EXTERNAL? > > My point is that you probably want to measure separately the effect of > compression from that of the external storage. OK, I will get to that if we need it later. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Gregory Stark wrote: > "Bruce Momjian" <bruce@momjian.us> writes: > > > Gregory Stark wrote: > > > >> Is your database initialized with C locale? If so then length(text) is > >> optimized to not have to detoast: > >> > >> if (pg_database_encoding_max_length() == 1) > >> PG_RETURN_INT32(toast_raw_datum_size(str) - VARHDRSZ); > > > > Wow, we optimized length(). OK, will run the test with > > substring(t,1,1). > > No, we did substring() too :) Uh, I looked at text_substring(), and while there is an optimization to do character counting for encoding length == 1, it is still accessing the data. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: >> No, we did substring() too :) > > Uh, I looked at text_substring(), and while there is an optimization to > do character counting for encoding length == 1, it is still accessing > the data. Sure but it'll only access the first chunk. There are two chunks in your test. It might be interesting to run tests accessing 0 (length()), 1 (substr()), and 2 chunks (hashtext()). Or if you're concerned with the cpu cost of hashtext you could calculate the precise two bytes you need to access with substr to force it to load both chunks. But I think the real cost of unnecessary toasting is the random disk i/o so the cpu cost is of secondary interest. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Bruce Momjian" <bruce@momjian.us> writes: > > >> No, we did substring() too :) > > > > Uh, I looked at text_substring(), and while there is an optimization to > > do character counting for encoding length == 1, it is still accessing > > the data. > > Sure but it'll only access the first chunk. There are two chunks in your test. > It might be interesting to run tests accessing 0 (length()), 1 (substr()), and > 2 chunks (hashtext()). > > Or if you're concerned with the cpu cost of hashtext you could calculate the > precise two bytes you need to access with substr to force it to load both > chunks. But I think the real cost of unnecessary toasting is the random disk > i/o so the cpu cost is of secondary interest. OK, will run a test with hashtext(). What I am seeing now is a 10-20x slowdown to access the TOAST data, and a 0-1x speedup for accessing the non-TOAST data when the rows are long: http://momjian.us/expire/TOAST/ -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Gregory Stark wrote: > > "Bruce Momjian" <bruce@momjian.us> writes: > > > > >> No, we did substring() too :) > > > > > > Uh, I looked at text_substring(), and while there is an optimization to > > > do character counting for encoding length == 1, it is still accessing > > > the data. > > > > Sure but it'll only access the first chunk. There are two chunks in your test. > > It might be interesting to run tests accessing 0 (length()), 1 (substr()), and > > 2 chunks (hashtext()). > > > > Or if you're concerned with the cpu cost of hashtext you could calculate the > > precise two bytes you need to access with substr to force it to load both > > chunks. But I think the real cost of unnecessary toasting is the random disk > > i/o so the cpu cost is of secondary interest. > > OK, will run a test with hashtext(). What I am seeing now is a 10-20x > slowdown to access the TOAST data, and a 0-1x speedup for accessing the > non-TOAST data when the rows are long: I reran the tests with hashtext(), and created a SUMMARY.HTML chart: http://momjian.us/expire/TOAST/ What you will see is that pushing TEXT to a TOAST column allows quick access to non-TOAST values and single-row TOAST values, but accessing all TOAST columns is slower than accessing them in the heap, by a factor of 3-18x. Looking at the chart, it seems 512 is the proper breakpoint for TOAST because 512 gives us a 2x change in accessing non-TOAST values and single-row TOAST values, and it is only 2x slower to access all TOAST values than we have now. Of course, this has all the data in the cache, but if the cache is limited, pushing more to TOAST is going to be a bigger win. In general, I would guess that the number of times all >512 byte rows are accessed is much less than the number of times that pushing those >512 byte values to TOAST will give a speedup. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > I reran the tests with hashtext(), and created a SUMMARY.HTML chart: > http://momjian.us/expire/TOAST/ I don't understand what the numbers in this chart are? regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I reran the tests with hashtext(), and created a SUMMARY.HTML chart: > > http://momjian.us/expire/TOAST/ > > I don't understand what the numbers in this chart are? They are taken from the test script and output files that are also in that directory. The are in milliseconds, so higher is slower. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
> I reran the tests with hashtext(), and created a SUMMARY.HTML chart: > > http://momjian.us/expire/TOAST/ > > What you will see is that pushing TEXT to a TOAST column > allows quick access to non-TOAST values and single-row TOAST > values, but accessing all TOAST columns is slower than > accessing them in the heap, by a factor of 3-18x. Um, my understanding for this measurement would be to tweak TOAST_TUPLE_THRESHOLD, and use a fixed TOAST_MAX_CHUNK_SIZE ? Why would you want to increase the number of toast tuples required for one column value ? My expectation would be, that we want to allow a toast tuple to fill a whole page (TOAST_TUPLES_PER_PAGE = 1), but keep or reduce the threshold. Maybe we need other logic to find and use free space for that, though (8.4 material). Andreas
"Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes: > My expectation would be, that we want to allow a toast tuple to fill a > whole page (TOAST_TUPLES_PER_PAGE = 1), I've been wondering about that too. It certainly needs to be experimented with, independently of TOAST_TUPLE_THRESHOLD, now that we can vary the two numbers independently. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Zeugswetter Andreas ADI SD" <ZeugswetterA@spardat.at> writes: >> My expectation would be, that we want to allow a toast tuple to fill a >> whole page (TOAST_TUPLES_PER_PAGE = 1), I thought that previously but then I started thinking maybe that's not true. Picture toasting a whole bunch of 5-7k data. If your toast chunk size is 2k then each page will be more or less packed full. Either with four 2k chunks or fewer full size chunks and one or more final fragments. Whereas if you set toast_tuples_per_page to 8k then the only option for Postgres will be to put each datum in its own page and waste 1-3k on every page. Perhaps that's too artificial a case. If you have a full range of datum sizes from 2-8k it ought to be able to pack more densely and especially if you have data over 8k and so you have final chunks under 2k to fill out the free space. Moreover, the data density might matter less for the random access type access pattern of toast tables than the reduction in the number of seeks. We know from tests that data density does matter for random access OLTP performance due to cache effectiveness. But still it seems it would only take one extra seek for a 2k chunk placed on a different page to swamp any effect the wasted space would cause. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Whereas if you set toast_tuples_per_page to 8k then the only option for > Postgres will be to put each datum in its own page and waste 1-3k on every > page. No, because actually the code is designed to make the toast chunk size just enough less than 8K that the tuples fit. The shorter-than-normal tuples carrying the last chunk of any particular datum are going to result in wasted space to the extent that we can't pack them together on a page, but that's true now. Right now, if you have a large toasted datum, it mostly will consist of just-under-2K tuples that are sized so that there's no noticeable wasted space on a page with 4 of them. There isn't any advantage to that compared to one just-under-8K tuple AFAICS, and it takes 4 times as much work to insert or retrieve 'em. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > The shorter-than-normal tuples carrying the last chunk of any particular > datum are going to result in wasted space to the extent that we can't > pack them together on a page, but that's true now. Right now, if you > have a large toasted datum, it mostly will consist of just-under-2K > tuples that are sized so that there's no noticeable wasted space on a > page with 4 of them. There isn't any advantage to that compared to one > just-under-8K tuple AFAICS, and it takes 4 times as much work to insert > or retrieve 'em. That will be true if your data are usually larger than 8k. But if your data are all between TOAST_TUPLE_THRESHOLD and TOAST_MAX_CHUNK_SIZE you won't have any fragments to put in the remaining space. If there's no daylight between those two parameters then you'll get an average amount of wasted space of about half a chunk per page (or perhaps even less). But if there's a gap and your use case happens to have mostly or entirely data sized in that gap then you won't have many or even any fragments available to fill in that space. As I described, picture a case where you have TOAST_MAX_CHUNK_SIZE set to 8k and TOAST_TUPLE_THRESHOLD set under 4k and all your datums are 4k+1. Each one will have to go on its own page, wasting 50% of the space. If in that worst case it's still faster than having to pay the cost of two seeks to use a smaller chunk size and get better data density then we know there's no argument for a smaller chunk size. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> > Whereas if you set toast_tuples_per_page to 8k then the only option > > for Postgres will be to put each datum in its own page and > waste 1-3k > > on every page. > > No, because actually the code is designed to make the toast > chunk size just enough less than 8K that the tuples fit. He mentioned toasted values of 6-7k each. If all values are of that size (like payment slip tiffs) there is nothing we would do with the remaining 1-2k on each page. But that disadvantage disappears as soon as you have just a little more variability in the length. Still, it might be enough to opt for some freespace reuse smarts if we can think of a cheap heuristic. But the cost to fetch such a "distributed" tuple would be so huge I doubt there is anything to win but disk space. Andreas
Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > Whereas if you set toast_tuples_per_page to 8k then the only option for > > Postgres will be to put each datum in its own page and waste 1-3k on every > > page. > > No, because actually the code is designed to make the toast chunk size > just enough less than 8K that the tuples fit. > > The shorter-than-normal tuples carrying the last chunk of any particular > datum are going to result in wasted space to the extent that we can't > pack them together on a page, but that's true now. Right now, if you > have a large toasted datum, it mostly will consist of just-under-2K > tuples that are sized so that there's no noticeable wasted space on a > page with 4 of them. There isn't any advantage to that compared to one > just-under-8K tuple AFAICS, and it takes 4 times as much work to insert > or retrieve 'em. Uh, am I supposed to be running more TOAST tests? Would someone explain what they want tested? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: > Uh, am I supposed to be running more TOAST tests? Would someone explain > what they want tested? If you want my opinion I would say we need two tests: 1) For TOAST_TUPLE_TARGET: We need to run the test scripts you have already for sizes that cause actual disk i/o. The real cost of TOAST lies in the random access seeks and your tests all fit in memory so they're missing that. 2) And for TOAST_MAX_CHUNK_SIZE: Set TOAST_MAX_CHUNK_SIZE to 8k and TOAST_TOAST_TUPLE_TARGET to 4097 and store a large table (larger than RAM) of 4069 bytes (and verify that that's creating two chunks for each tuple). Test how long it takes to do a sequential scan with hashtext(). Compare that to the above with TOAST_MAX_CHUNK_SIZE set to 4k (and verify that the toast table is much smaller in this configuration). Actually I think we need to do the latter of these first. Because if it shows that bloating the toast table is faster than chopping up data into finer chunks then we'll want to set TOAST_MAX_CHUNK_SIZE to 8k and then your tests above will have to be rerun. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Bruce Momjian wrote: > What you will see is that pushing TEXT to a TOAST column allows quick > access to non-TOAST values and single-row TOAST values, but accessing > all TOAST columns is slower than accessing them in the heap, by a factor > of 3-18x. > > Looking at the chart, it seems 512 is the proper breakpoint for TOAST > because 512 gives us a 2x change in accessing non-TOAST values and > single-row TOAST values, and it is only 2x slower to access all TOAST > values than we have now. I don't understand why that point is any better than any other point. > Of course, this has all the data in the cache, but if the cache is > limited, pushing more to TOAST is going to be a bigger win. ... assuming that you don't access the TOASTed data. > In general, > I would guess that the number of times all >512 byte rows are accessed > is much less than the number of times that pushing those >512 byte > values to TOAST will give a speedup. Maybe. Maybe not. I wonder how the access times for a single row change when you have a table that doesn't fit in cache. Especially after some random updates, so that the toast is not clustered in the same order as the heap. The bottom line is that the optimal threshold depends on your application. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
I tested EXTERN_TUPLES_PER_PAGE for values 4(default), 2, and 1: 4 15.5962 15.1971 14.6 which is basically a 3% decrease from 4->2 and 2->1. The test script and result are here: http://momjian.us/expire/TOAST2/ shared_buffers again was 32MB so all the data was in memory. --------------------------------------------------------------------------- Gregory Stark wrote: > > "Bruce Momjian" <bruce@momjian.us> writes: > > > Uh, am I supposed to be running more TOAST tests? Would someone explain > > what they want tested? > > If you want my opinion I would say we need two tests: > > 1) For TOAST_TUPLE_TARGET: > > We need to run the test scripts you have already for sizes that cause actual > disk i/o. The real cost of TOAST lies in the random access seeks and your > tests all fit in memory so they're missing that. > > 2) And for TOAST_MAX_CHUNK_SIZE: > > Set TOAST_MAX_CHUNK_SIZE to 8k and TOAST_TOAST_TUPLE_TARGET to 4097 and store > a large table (larger than RAM) of 4069 bytes (and verify that that's creating > two chunks for each tuple). Test how long it takes to do a sequential scan > with hashtext(). Compare that to the above with TOAST_MAX_CHUNK_SIZE set to 4k > (and verify that the toast table is much smaller in this configuration). > > Actually I think we need to do the latter of these first. Because if it shows > that bloating the toast table is faster than chopping up data into finer > chunks then we'll want to set TOAST_MAX_CHUNK_SIZE to 8k and then your tests > above will have to be rerun. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: > shared_buffers again was 32MB so all the data was in memory. The case where all the data is in memory is simply not interesting. The cost of TOAST is the random access seeks it causes. You seem to be intentionally avoiding testing the precise thing we're interested in. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Gregory Stark" <stark@enterprisedb.com> writes: > "Bruce Momjian" <bruce@momjian.us> writes: > >> shared_buffers again was 32MB so all the data was in memory. > > The case where all the data is in memory is simply not interesting. The cost > of TOAST is the random access seeks it causes. You seem to be intentionally > avoiding testing the precise thing we're interested in. Also, something's not right with these results. 100,000 tuples --even if all they contain is a toast pointer-- won't fit on a single page. And the toast tables should vary in size depending on how many toast chunks are created. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> I tested EXTERN_TUPLES_PER_PAGE for values 4(default), 2, and 1: > > 4 15.596 > 2 15.197 > 1 14.6 > > which is basically a 3% decrease from 4->2 and 2->1. The > test script and result are here: > > http://momjian.us/expire/TOAST2/ > > shared_buffers again was 32MB so all the data was in memory. Thanks for the test. (The test is for 1 row that is 100k wide.) It is good. It shows, that we even see a small advantage in the everything cached case. What we don't have yet is numbers for whether EXTERN_TUPLES_PER_PAGE=1 substantially increases the toast table size for real life scenarios, what happens in the worst case (~48% wastage compared to previous 12%), and whether 1 row per page works well with autovacuum ? The bad case (with EXTERN_TUPLES_PER_PAGE=1) is when most toast tuples have a size over TOAST_MAX_CHUNK_SIZE_for_2+1 but enough smaller than a page that we care about the wasteage. Maybe we can special case that range. Maybe determine (and lock) the freespace of any cheap-to-get-at non empty page (e.g. the current insert target page) and splitting the toast data there. Andreas
Gregory Stark wrote: > "Gregory Stark" <stark@enterprisedb.com> writes: > > > "Bruce Momjian" <bruce@momjian.us> writes: > > > >> shared_buffers again was 32MB so all the data was in memory. > > > > The case where all the data is in memory is simply not interesting. The cost > > of TOAST is the random access seeks it causes. You seem to be intentionally > > avoiding testing the precise thing we're interested in. > > Also, something's not right with these results. 100,000 tuples --even if all > they contain is a toast pointer-- won't fit on a single page. And the toast > tables should vary in size depending on how many toast chunks are created. The test creates _one_ row of length 100,000 and then finds out how long it takes to access it twenty times. I don't see how having the data outside cache helps us. For a large row with 2k chunks, I assume all the 2k chunks are going to be in the same 8k page. What I want to measure is the cost of accessing four 2k chunks vs. one 8k chunk, and I think we can conclude that is 6% of the access time. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Zeugswetter Andreas ADI SD wrote: > It is good. It shows, that we even see a small advantage in the > everything cached case. > > What we don't have yet is numbers for whether EXTERN_TUPLES_PER_PAGE=1 > substantially increases the toast table size for real life scenarios, > what happens in the worst case (~48% wastage compared to previous 12%), > and whether 1 row per page works well with autovacuum ? > > The bad case (with EXTERN_TUPLES_PER_PAGE=1) is when most toast tuples > have a size over TOAST_MAX_CHUNK_SIZE_for_2+1 but enough smaller than a > page that we care about the wasteage. Maybe we can special case that > range. > Maybe determine (and lock) the freespace of any cheap-to-get-at non > empty page (e.g. the current insert target page) and splitting the toast > data there. I think we can look at the BSD file system as a parallel. It has 8k blocks, with 2k fragments. Files > 6k get a full block, and smaller files are made up of full fragments. The big difference is that the BSD file system only allocates in 2k or 8k chunks, while use a single chunk size. However, we allow the filling of arbitrary free space in an 8k block, which is particularly important for saving the trailing data that doesn't fill a full chunk. I think the long-term solution is to go to a 2k/8k fragment/block model, but that isn't going to happen for 8.3. The big question is do we want to drop the target tuple size down to 512, and increase the chunk size to 8k for 8.3? Dropping the tuple size down to 512 is going to give us some smaller TOAST values to fill in free space created by the 8k chuck size, assuming you have both types of values in the table. Do we want to increase the access time of long TOAST by 6% if it means having more wasted space for lots of 4.1k values? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, Jun 01, 2007 at 01:50:12PM -0400, Bruce Momjian wrote: > I think the long-term solution is to go to a 2k/8k fragment/block model, > but that isn't going to happen for 8.3. There might well have been lessons learned since UFS (anyone know what ZFS does in this regard?), but I agree that we want to be able to do a mix of full chunks and fragments. > The big question is do we want to drop the target tuple size down to > 512, and increase the chunk size to 8k for 8.3? Dropping the tuple size > down to 512 is going to give us some smaller TOAST values to fill in > free space created by the 8k chuck size, assuming you have both types of > values in the table. Do we want to increase the access time of long > TOAST by 6% if it means having more wasted space for lots of 4.1k > values? If we do that people could see their disk space usage increase by up to 16x: currently 513 bytes fits in heap and takes (roughly) 513 bytes; if we make that change it would then get toasted and take 8K. I don't think we want to do that. Disk space aside, it's almost certain to seriously hurt performance as soon as you don't fit entirely in memory. How big is the hit for setting both to 512? Also, is this something that could be set at initdb instead of compile time? That would make it easier for folks to go back to old behavior if the needed to... -- Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <decibel@decibel.org> writes: > On Fri, Jun 01, 2007 at 01:50:12PM -0400, Bruce Momjian wrote: >> The big question is do we want to drop the target tuple size down to >> 512, and increase the chunk size to 8k for 8.3? > If we do that people could see their disk space usage increase by up to > 16x: currently 513 bytes fits in heap and takes (roughly) 513 bytes; if > we make that change it would then get toasted and take 8K. That argument is completely bogus --- having a toast chunk size of 8K does not mean that smaller rows occupy 8K. regards, tom lane
> > The big question is do we want to drop the target tuple size down to > > 512, and increase the chunk size to 8k for 8.3? Dropping the tuple > > size down to 512 is going to give us some smaller TOAST values to fill > > in free space created by the 8k chuck size, assuming you have both > > types of values in the table. Do we want to increase the access time > > of long TOAST by 6% if it means having more wasted space for lots of > > 4.1k values? > > If we do that people could see their disk space usage increase by up to > 16x: currently 513 bytes fits in heap and takes (roughly) 513 > bytes; No, you misunderstood. Bruce was suggesting changing the target to 512. That means if a row is wider than ~2k, toaster will try to toast until the base row is ~512 bytes. I would not do that part for 8.3. > if we make that change it would then get toasted and > take 8K. I don't think we want to do that. Disk space aside, > it's almost certain to seriously hurt performance as soon as > you don't fit entirely in memory. No, allowing one toast chunk to fill a page does not mean that every chunk uses a whole page. Andreas
Zeugswetter Andreas ADI SD wrote: > > > > The big question is do we want to drop the target tuple size down > to > > > 512, and increase the chunk size to 8k for 8.3? Dropping the tuple > > > size down to 512 is going to give us some smaller TOAST values to > fill > > > in free space created by the 8k chuck size, assuming you have both > > > types of values in the table. Do we want to increase the access > time > > > of long TOAST by 6% if it means having more wasted space for lots of > > > > 4.1k values? > > > > If we do that people could see their disk space usage increase by up > to > > 16x: currently 513 bytes fits in heap and takes (roughly) 513 > > bytes; > > No, you misunderstood. Bruce was suggesting changing the target to 512. > That means if a row is wider than ~2k, toaster will try to toast until > the base row is > ~512 bytes. I would not do that part for 8.3. OK, what do you suggest for 8.3? Attached are my suggestion to use 512 and a 4k chunk size, which I think means that 2.7k is the worst values that has a loss of around 25%. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + Index: src/include/access/tuptoaster.h =================================================================== RCS file: /cvsroot/pgsql/src/include/access/tuptoaster.h,v retrieving revision 1.35 diff -c -c -r1.35 tuptoaster.h *** src/include/access/tuptoaster.h 6 Apr 2007 04:21:43 -0000 1.35 --- src/include/access/tuptoaster.h 2 Jun 2007 02:52:22 -0000 *************** *** 42,48 **** * given to needs_toast_table() in toasting.c before unleashing random * changes. */ ! #define TOAST_TUPLES_PER_PAGE 4 /* Note: sizeof(PageHeaderData) includes the first ItemId on the page */ #define TOAST_TUPLE_THRESHOLD \ --- 42,48 ---- * given to needs_toast_table() in toasting.c before unleashing random * changes. */ ! #define TOAST_TUPLES_PER_PAGE 16 /* Note: sizeof(PageHeaderData) includes the first ItemId on the page */ #define TOAST_TUPLE_THRESHOLD \ *************** *** 69,75 **** * * NB: Changing TOAST_MAX_CHUNK_SIZE requires an initdb. */ ! #define EXTERN_TUPLES_PER_PAGE 4 /* tweak only this */ /* Note: sizeof(PageHeaderData) includes the first ItemId on the page */ #define EXTERN_TUPLE_MAX_SIZE \ --- 69,75 ---- * * NB: Changing TOAST_MAX_CHUNK_SIZE requires an initdb. */ ! #define EXTERN_TUPLES_PER_PAGE 2 /* tweak only this */ /* Note: sizeof(PageHeaderData) includes the first ItemId on the page */ #define EXTERN_TUPLE_MAX_SIZE \
> > No, you misunderstood. Bruce was suggesting changing the target to 512. > > That means if a row is wider than ~2k, toaster will try to toast until > > the base row is > > ~512 bytes. I would not do that part for 8.3. > > OK, what do you suggest for 8.3? Attached are my suggestion > to use 512 and a 4k chunk size, which I think means that 2.7k > is the worst values that has a loss of around 25%. Oh, so I misunderstood you also. You are suggesting two changes: TOAST_TUPLES_PER_PAGE 16 EXTERN_TUPLES_PER_PAGE 2 While I agree, that 2 might be a good compromise with low risc for now, I think that toasting all rows down to ~512 bytes is too narrowly targeted at not reading wider columns. When suggesting a new target, I interpreted you like so: #define TOAST_TUPLES_PER_PAGE 4 #define TOAST_TUPLE_TARGET \MAXALIGN_DOWN((BLCKSZ - \ MAXALIGN(sizeof(PageHeaderData) + (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \ / 16) So we would only toast rows wider than 2k, but once toasting, toast the base row down to 512. My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all else would need extensive performance testing. #define TOAST_TUPLES_PER_PAGE 4 My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but: Split data wider than a page into page sized chunks as long as they fill whole pages. Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now. This would not waste more space than currently, but improve performance for very wide columns. I can try to do a patch if you think that is a good idea, can't do a lot of testing though. Andreas
Zeugswetter Andreas ADI SD wrote: > > > > No, you misunderstood. Bruce was suggesting changing the target to > 512. > > > That means if a row is wider than ~2k, toaster will try to toast > until > > > the base row is > > > ~512 bytes. I would not do that part for 8.3. > > > > OK, what do you suggest for 8.3? Attached are my suggestion > > to use 512 and a 4k chunk size, which I think means that 2.7k > > is the worst values that has a loss of around 25%. > > Oh, so I misunderstood you also. You are suggesting two changes: > TOAST_TUPLES_PER_PAGE 16 > EXTERN_TUPLES_PER_PAGE 2 Right. > While I agree, that 2 might be a good compromise with low risc for now, > I think > that toasting all rows down to ~512 bytes is too narrowly targeted at > not reading wider columns. Well, it is summarized here: http://momjian.us/expire/TOAST/SUMMARY.html It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed like a good compromise. > When suggesting a new target, I interpreted you like so: > #define TOAST_TUPLES_PER_PAGE 4 > #define TOAST_TUPLE_TARGET \ > MAXALIGN_DOWN((BLCKSZ - \ > MAXALIGN(sizeof(PageHeaderData) + > (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \ > / 16) > So we would only toast rows wider than 2k, but once toasting, toast the > base row down to 512. That is certainly not my intent, and I don't see how you would get the 2k number from that macro. I think you are looking at 8.2 and not CVS HEAD. CVS HEAD has: #define TOAST_TUPLE_TARGET TOAST_TUPLE_THRESHOLD > My suggestion would be to leave TOAST_TUPLES_PER_PAGE as is, because all > else would need extensive performance testing. > #define TOAST_TUPLES_PER_PAGE 4 > > My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE as is, but: > Split data wider than a page into page sized chunks as long as they fill > whole pages. > Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now. > This would not waste more space than currently, but improve performance > for very wide columns. > > I can try to do a patch if you think that is a good idea, can't do a lot > of testing though. None of this spliting is going to happen for 8.3. The question what changes we can make for 8.3, if any. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: > Well, it is summarized here: > > http://momjian.us/expire/TOAST/SUMMARY.html > > It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed > like a good compromise. Is this still testing with all data fitting in RAM? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> > While I agree, that 2 might be a good compromise with low risc for > > now, I think that toasting all rows down to ~512 bytes is too narrowly > > targeted at not reading wider columns. > > Well, it is summarized here: > > http://momjian.us/expire/TOAST/SUMMARY.html > > It made non-TOAST access 2x faster, but TOAST 7x slower, and > that seemed like a good compromise. Yes, my argument was that I expect that in the up to 1-2k range more use cases will suffer the 7x slowdown, than see the speedup. But the ratio certainly is hard to judge, and you may well be right. e.g. for me TOAST_TUPLES_PER_PAGE 8 would be ok, I have base row widths of ~700 in 2 tables that would suffer if further toasted, but none in the 1k - 2k range. I wonder whether this threshold isn't rather absolute, thus the 32k pagesize users should probably use 32. > > When suggesting a new target, I interpreted you like so: > > #define TOAST_TUPLES_PER_PAGE 4 > > #define TOAST_TUPLE_TARGET \ > > MAXALIGN_DOWN((BLCKSZ - \ > > MAXALIGN(sizeof(PageHeaderData) + > > (TOAST_TUPLES_PER_PAGE-1) * sizeof(ItemIdData))) \ > > / 16) > > So we would only toast rows wider than 2k, but once toasting, toast > > the base row down to 512. > > That is certainly not my intent, and I don't see how you > would get the 2k number from that macro. I think you are the ~2k come from TOAST_TUPLE_THRESHOLD > looking at 8.2 and not CVS HEAD. CVS HEAD has: > > #define TOAST_TUPLE_TARGET TOAST_TUPLE_THRESHOLD Nope, I meant what I said. To only change the target you would replace above macro for TOAST_TUPLE_TARGET. But I also don't see how this would be good. Andreas
Gregory Stark wrote: > "Bruce Momjian" <bruce@momjian.us> writes: > > > Well, it is summarized here: > > > > http://momjian.us/expire/TOAST/SUMMARY.html > > > > It made non-TOAST access 2x faster, but TOAST 7x slower, and that seemed > > like a good compromise. > > Is this still testing with all data fitting in RAM? Yes. Having things out of RAM is going to make access even slower, but it is going to allow the heap to be in RAM more often. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes: > Gregory Stark wrote: >> >> Is this still testing with all data fitting in RAM? > > Yes. Having things out of RAM is going to make access even slower, but > it is going to allow the heap to be in RAM more often. It would let us measure the actual impact of TOAST. The largest negative effect of which is to destroy the sequential access pattern and the positive effect is as you say to increase the cache effectiveness on non-toasted data. The cpu costs are insignificant so testing in-memory cases misses the point. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> My next suggestion would be to leave EXTERN_TUPLES_PER_PAGE > as is, but: > Split data wider than a page into page sized chunks as long > as they fill whole pages. > Split the rest with EXTERN_TUPLES_PER_PAGE (4) as now. > This would not waste more space than currently, but improve > performance for very wide columns. > > I can try to do a patch if you think that is a good idea, > can't do a lot of testing though. I have a PoC patch running, but it is larger than expected because of the size checks during read (toast_fetch_datum_slice not done, but would be straight forward). Also the pg_control variable toast_max_chunk_size would need to be renamed and reflect the EXTERN_TUPLES_PER_PAGE (4) number and the fact that fullpage chunks are used (else the chunk size checks and slice could not work like now). Should I pursue, keep for 8.4, dump it ? The downside of this concept is, that chunks smaller than fullpage still get split into the smaller pieces. And the < ~8k chunks may well outnumber the > ~8k on real data. The up side is, that I do not see a better solution that would keep slice cheap and still lower the overhead even for pathological cases. Andreas