Thread: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Tue, 26 Jun 2012 17:04:42 -0400 Robert Haas wrote:
> I see you posted up a follow-up email asking Tom what he had in mind.
> Personally, I don't think this needs incredibly complicated testing.
> I think you should just test a workload involving inserting and/or
> updating rows with lots of trailing NULL columns, and then another
> workload with a table of similar width that... doesn't. If we can't
> find a regression - or, better, we find a win in one or both cases -
> then I think we're done here.
As per the last discussion for this patch, performance data needs to be provided before this patch's Review can proceed further.
So as per your suggestion and from the discussions about this patch, I have collected the performance data as below:
Results are taken with following configuration.
1. Schema - UNLOGGED TABLE with 2,000,000 records having all columns are INT type.
2. shared_buffers = 10GB
3. All the performance result are taken with single connection.
4. Performance is collected for INSERT operation (insert into temptable select * from inittable)
Platform details:
Operating System: Suse-Linux 10.2 x86_64
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)
RAM : 24GB
Documents Attached:
init.sh : Which will create the schema
sql_used.sql : sql's used for taking results
Trim_Nulls_Perf_Report.html : Performance data
Observations from Performance Results
------------------------------------------------
1. There is no performance change for cloumns that have all valid values(non- NULLs).
2. There is a visible performance increase when number of columns containing NULLS are more than > 60~70% in table have large number of columns.
3. There are visible space savings when number of columns containing NULLS are more than > 60~70% in table have large number of columns.
Let me know if there is more performance data needs to be collected for this patch?
With Regards,
Amit Kapila.
Attachment
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Saturday, October 13, 2012 1:24 PM Amit kapila wrote: Tue, 26 Jun 2012 17:04:42 -0400 Robert Haas wrote: >> I see you posted up a follow-up email asking Tom what he had in mind. >> Personally, I don't think this needs incredibly complicated testing. >> I think you should just test a workload involving inserting and/or >> updating rows with lots of trailing NULL columns, and then another >> workload with a table of similar width that... doesn't. If we can't >> find a regression - or, better, we find a win in one or both cases - >> then I think we're done here. >As per the last discussion for this patch, performance data needs to be provided before this patch's Review can proceed>further. >So as per your suggestion and from the discussions about this patch, I have collected the performance data as below: >Results are taken with following configuration. >1. Schema - UNLOGGED TABLE with 2,000,000 records having all columns are INT type. >2. shared_buffers = 10GB >3. All the performance result are taken with single connection. >4. Performance is collected for INSERT operation (insert into temptable select * from inittable) >Platform details: > Operating System: Suse-Linux 10.2 x86_64 > Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz) > RAM : 24GB Further to Performance data, I have completed the review of the Patch. Basic stuff: ------------ - Rebase of Patch is required. As heap_fill_tuple function prototype is moved to different file [htup.h to htup_details.h] - Compiles cleanly without any errors/warnings - Regression tests pass. Code Review comments: --------------------- 1. There is possibility of memory growth in case of toast table, if trailing toasted columns are updated to NULLs; i.e. In Function toast_insert_or_update, for tuples when 'need_change' variable is true, numAttrs are modified to last nonnull column values, and in old tuple de-toasted columns are not getting freed, if this repeats for more number of tuples there is chanceof out of memory. if ( need_change) { numAttrs = lastNonNullValOffset + 1; .... } if (need_delold) for (i = 0; i < numAttrs; i++) <== Tailing toasted value wouldn't be freed as updated to NULL and numAttrsis modified to smaller value. if (toast_delold[i]) toast_delete_datum(rel, toast_oldvalues[i]); 2. Comments need to updated in following functions; how ending null columns are skipped in header part. heap_fill_tuple - function header heap_form_tuple, heap_form_minimal_tuple, heap_form_minimal_tuple. 3. Why following change is required in function toast_flatten_tuple_attribute - numAttrs = tupleDesc->natts; + numAttrs = HeapTupleHeaderGetNatts(olddata); Detailed Performance Report for Insert and Update Operations is attached with this mail. Observations from Performance Results ------------------------------------------------ 1. There is no performance change for cloumns that have all valid values(non- NULLs). 2. There is a visible performance increase when number of columns containing NULLS are more than > 60~70% in table have largenumber of columns. 3. There are visible space savings when number of columns containing NULLS are more than > 60~70% in table have large numberof columns. With Regards, Amit Kapila.
Attachment
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
From: pgsql-hackers-owner@postgresql.org [pgsql-hackers-owner@postgresql.org] on behalf of Amit kapila [amit.kapila@huawei.com] Sent: Monday, October 15, 2012 7:28 PM To: robertmhaas@gmail.com; josh@agliodbs.com Cc: pgsql-hackers@postgresql.org Subject: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review] On Monday, October 15, 2012 7:28 PM Amit kapila wrote: On Saturday, October 13, 2012 1:24 PM Amit kapila wrote: Tue, 26 Jun 2012 17:04:42 -0400 Robert Haas wrote: >> I see you posted up a follow-up email asking Tom what he had in mind. >> Personally, I don't think this needs incredibly complicated testing. >> I think you should just test a workload involving inserting and/or >> updating rows with lots of trailing NULL columns, and then another >> workload with a table of similar width that... doesn't. If we can't >> find a regression - or, better, we find a win in one or both cases - >> then I think we're done here. >As per the last discussion for this patch, performance data needs to be provided before this patch's Review can proceed>further. >So as per your suggestion and from the discussions about this patch, I have collected the performance data as below: >Results are taken with following configuration. >1. Schema - UNLOGGED TABLE with 2,000,000 records having all columns are INT type. >2. shared_buffers = 10GB >3. All the performance result are taken with single connection. >4. Performance is collected for INSERT operation (insert into temptable select * from inittable) >Platform details: > Operating System: Suse-Linux 10.2 x86_64 > Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz) > RAM : 24GB > Further to Performance data, I have completed the review of the Patch. Please find the patch to address Review Comments attached with this mail. IMO, now its ready for a committer. With Regards, Amit Kapila.
Attachment
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 13 October 2012 08:54, Amit kapila <amit.kapila@huawei.com> wrote: > As per the last discussion for this patch, performance data needs to be > provided before this patch's Review can proceed further. > > So as per your suggestion and from the discussions about this patch, I have > collected the performance data as below: > > > > Results are taken with following configuration. > 1. Schema - UNLOGGED TABLE with 2,000,000 records having all columns are INT > type. > 2. shared_buffers = 10GB > 3. All the performance result are taken with single connection. > 4. Performance is collected for INSERT operation (insert into temptable > select * from inittable) > > Platform details: > Operating System: Suse-Linux 10.2 x86_64 > Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz) > RAM : 24GB > > Documents Attached: > init.sh : Which will create the schema > sql_used.sql : sql's used for taking results > > Trim_Nulls_Perf_Report.html : Performance data > > > Observations from Performance Results > > ------------------------------------------------ > > 1. There is no performance change for cloumns that have all valid > values(non- NULLs). > > 2. There is a visible performance increase when number of columns containing > NULLS are more than > 60~70% in table have large number of columns. > > 3. There are visible space savings when number of columns containing NULLS > are more than > 60~70% in table have large number of columns. > > > Let me know if there is more performance data needs to be collected for this > patch? I can't make sense of your performance report. Because of that I can't derive the same conclusions from it you do. Can you explain the performance results in more detail, so we can see what they mean? Like which are the patched, which are the unpatched results? Which results are comparable, what the percentages mean etc.. We might then move quickly towards commit, or at least more tests. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Thursday, December 20, 2012 5:46 PM Simon Riggs wrote: > On 13 October 2012 08:54, Amit kapila <amit.kapila@huawei.com> wrote: > > > As per the last discussion for this patch, performance data needs to > be > > provided before this patch's Review can proceed further. > > > > So as per your suggestion and from the discussions about this patch, > I have > > > > ------------------------------------------------ > > > > 1. There is no performance change for cloumns that have all valid > > values(non- NULLs). > > > > 2. There is a visible performance increase when number of columns > containing > > NULLS are more than > 60~70% in table have large number of columns. > > > > 3. There are visible space savings when number of columns containing > NULLS > > are more than > 60~70% in table have large number of columns. > > > > > > Let me know if there is more performance data needs to be collected > for this > > patch? > > > I can't make sense of your performance report. Because of that I can't > derive the same conclusions from it you do. > > Can you explain the performance results in more detail, so we can see > what they mean? Like which are the patched, which are the unpatched > results? On the extreme let it is mentioned Original Code/ Trim Triling Nulls Patch. In any case I have framed the results again as below: 1. Table with 800 columns A. INSERT tuples with 600 trailing nulls B. UPDATE last column value to "non-null" C. UPDATE last column value to "null" ---------------------+---------------------+--------------------- Original Code | Trim Tailing NULLs | Improvement(%) TPS space used| TPS space used | Results (pages) | (pages) | ---------------------+---------------------+---------------------- 1A: 0.2068 250000 | 0.2302 222223 | 10.1% tps, 11.1% space 1B: 0.0448 500000 | 0.0481 472223 | 6.8% tps, 5.6% space 1C: 0.0433 750000 | 0.0493 694445 | 12.2% tps, 7.4% space 2. Table with 800 columns A. INSERT tuples with 300 trailing nulls B. UPDATE last column value to "non-null" C. UPDATE last column value to "null" ---------------------+---------------------+--------------------- Original Code | Trim Tailing NULLs | Improvement(%) TPS space used| TPS space used | Results (pages) | (pages) | ---------------------+---------------------+---------------------- 2A: 0.0280 666667 | 0.0287 666667 | 2.3% tps, 0% space 2B: 0.0143 1333334 | 0.0152 1333334 | 5.3% tps, 0% space 2C: 0.0145 2000000 | 0.0149 2000000 | 2.9% tps, 0% space 3. Table with 300 columns A. INSERT tuples with 150 trailing nulls B. UPDATE last column value to "non-null" C. UPDATE last column value to "null" ---------------------+---------------------+-------------------- Original Code | Trim Tailing NULLs | Improvement(%) TPS space used| TPS space used | Results (pages) | (pages) | ---------------------+---------------------+-------------------- 3A: 0.2815 166667 | 0.2899 166667 | 2.9% tps, 0% space 3B: 0.0851 333334 | 0.0870 333334 | 2.2% tps, 0% space 3C: 0.0846 500000 | 0.0852 500000 | 0.7% tps, 0% space 4. Table with 300 columns A. INSERT tuples with 250 trailing nulls B. UPDATE last column value to "non-null" C. UPDATE last column value to "null" ---------------------+---------------------+------------------------- Original Code | Trim Tailing NULLs | Improvement(%) TPS space used| TPS space used | Results (pages) | (pages) | ---------------------+---------------------+------------------------- 4A: 0.5447 66667 | 0.5996 58824 | 09.2% tps, 11.8% space 4B: 0.1251 135633 | 0.1232 127790 | -01.5% tps, 5.8% space 4C: 0.1223 202299 | 0.1361 186613 | 10.1% tps, 7.5% space Please let me know, if still it is not clear. With Regards, Amit Kapila.
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 20 December 2012 14:56, Amit Kapila <amit.kapila@huawei.com> wrote: >> > 1. There is no performance change for cloumns that have all valid >> > values(non- NULLs). I don't see any tests (at all) that measure this. I'm particularly interested in lower numbers of columns, so we can show no regression for the common case. >> > 2. There is a visible performance increase when number of columns >> containing >> > NULLS are more than > 60~70% in table have large number of columns. >> > >> > 3. There are visible space savings when number of columns containing >> NULLS >> > are more than > 60~70% in table have large number of columns. Agreed. I would call that quite disappointing though and was expecting better. Are we sure the patch works and the tests are correct? The lack of any space saving for lower % values is strange and somewhat worrying. There should be a 36? byte saving for 300 null columns in an 800 column table - how does that not show up at all? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Simon Riggs <simon@2ndQuadrant.com> writes: > The lack of any space saving for lower % values is strange and > somewhat worrying. There should be a 36? byte saving for 300 null > columns in an 800 column table - how does that not show up at all? You could only fit about 4 such rows in an 8K page (assuming the columns are all int4s). Unless the savings is enough to allow 5 rows to fit in a page, the effective savings will be zilch. This may well mean that the whole thing is a waste of time in most scenarios --- the more likely it is to save anything, the more likely that the savings will be lost anyway due to page alignment considerations, because wider rows inherently pack less efficiently. regards, tom lane
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 23 December 2012 17:38, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> The lack of any space saving for lower % values is strange and >> somewhat worrying. There should be a 36? byte saving for 300 null >> columns in an 800 column table - how does that not show up at all? > > You could only fit about 4 such rows in an 8K page (assuming the columns > are all int4s). Unless the savings is enough to allow 5 rows to fit in > a page, the effective savings will be zilch. If that's the case, the use case is tiny, especially considering how sensitive the saving is to the exact location of the NULLs. > This may well mean that the whole thing is a waste of time in most > scenarios --- the more likely it is to save anything, the more likely > that the savings will be lost anyway due to page alignment > considerations, because wider rows inherently pack less efficiently. ISTM that we'd get a better gain and a wider use case by compressing the whole block, with some bits masked out to allow updates/deletes. The string of zeroes in the null bitmap would compress easily, but so would other aspects also. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Sunday, December 23, 2012 8:11 PM Simon Riggs wrote: > On 20 December 2012 14:56, Amit Kapila <amit.kapila@huawei.com> wrote: > > >> > 1. There is no performance change for cloumns that have all valid > >> > values(non- NULLs). > > I don't see any tests (at all) that measure this. > > I'm particularly interested in lower numbers of columns, so we can > show no regression for the common case. For now I have taken for 300 columns, I can take for 10~30 columns reading as well if required 1. Table with 300 columns (all integer columns) A. INSERT tuples without trailing nulls B. UPDATE last column value to "null" ----------------+---------------------+------------------ Original Code | Trim Tailing NULLs | Improvement (%) TPS | TPS | Results ----------------+---------------------+------------------ 1A: 0.1348 | 0.1352 | 0.3% 1B: 0.0495 | 0.0495 | 0.0% > >> > 2. There is a visible performance increase when number of columns > >> containing > >> > NULLS are more than > 60~70% in table have large number of > columns. > >> > > >> > 3. There are visible space savings when number of columns > containing > >> NULLS > >> > are more than > 60~70% in table have large number of columns. > > Agreed. > > I would call that quite disappointing though and was expecting better. > Are we sure the patch works and the tests are correct? > > The lack of any space saving for lower % values is strange and > somewhat worrying. There should be a 36? byte saving for 300 null > columns in an 800 column table - how does that not show up at all? 300 NULL's case will save approximately 108 bytes, as 3 tuples will be accommodated in such case. So now the total space left in page will be approximately 1900 bytes (including 108 bytes saved by optimization). Now the point is that in existing test case all rows are same (approx 2100 bytes), so no space saving is shown, but incase the last row is such that it can get accommodated in space saved (remaining space of page + space saved due to NULLS optimization), then it can show space savings as well. In anycase there is a performance gain for 300 NULLS case as well. Apart from above, the performance data for less number of columns (where the trailing nulls are such that they cross word boundary) also show similar gains: The below cases (2 & 3) can give benefit as it will save 4 bytes per tuple 2. Table with 12 columns (first 3 integer followed by 9 Boolean columns) A. INSERT tuples with 9 trailing nulls B. UPDATE last column value to "non-null" C. UPDATE last column value to "null" ---------------------+---------------------+--------------------- Original Code | Trim Tailing NULLs | Improvement(%) TPS space used| TPS space used | Results (pages) | (pages) | ---------------------+---------------------+---------------------- 2A: 0.8485 12739 | 0.8524 10811 | 0.4% 15.1% 2B: 0.5847 25478 | 0.5749 23550 | -1.5% 7.5% 2C: 0.5591 38217 | 0.5545 34361 | 0.8% 10.0% 3. Table with 12 columns (first 3 integer followed by 9 Boolean columns) A. INSERT tuples with 4 trailing nulls B. UPDATE last column value to "non-null" C. UPDATE last column value to "null" ---------------------+---------------------+--------------------- Original Code | Trim Tailing NULLs | Improvement(%) TPS space used| TPS space used | Results (pages) | (pages) | ---------------------+---------------------+---------------------- 3A: 0.8443 14706 | 0.8626 12739 | 2.3% 13.3% 3B: 0.5307 29412 | 0.5272 27445 | -0.6% 6.7% 3C: 0.5102 44118 | 0.5218 40184 | 2.2% 8.9% As a conclusion point, I would like to say that this patch doesn't have performance regression for most used scenario's and it gives benefit in some of the trailing null's cases. With Regards, Amit Kapila.
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 24 December 2012 13:13, Amit Kapila <amit.kapila@huawei.com> wrote: > Apart from above, the performance data for less number of columns (where the > trailing nulls are such that they cross word boundary) also show similar > gains: > > The below cases (2 & 3) can give benefit as it will save 4 bytes per tuple > > 2. Table with 12 columns (first 3 integer followed by 9 Boolean columns) > A. INSERT tuples with 9 trailing nulls > B. UPDATE last column value to "non-null" > C. UPDATE last column value to "null" > ---------------------+---------------------+--------------------- > Original Code | Trim Tailing NULLs | Improvement (%) > TPS space used| TPS space used | Results > (pages) | (pages) | > ---------------------+---------------------+---------------------- > 2A: 0.8485 12739 | 0.8524 10811 | 0.4% 15.1% > 2B: 0.5847 25478 | 0.5749 23550 | -1.5% 7.5% > 2C: 0.5591 38217 | 0.5545 34361 | 0.8% 10.0% > > > 3. Table with 12 columns (first 3 integer followed by 9 Boolean columns) > A. INSERT tuples with 4 trailing nulls > B. UPDATE last column value to "non-null" > C. UPDATE last column value to "null" > ---------------------+---------------------+--------------------- > Original Code | Trim Tailing NULLs | Improvement (%) > TPS space used| TPS space used | Results > (pages) | (pages) | > ---------------------+---------------------+---------------------- > 3A: 0.8443 14706 | 0.8626 12739 | 2.3% 13.3% > 3B: 0.5307 29412 | 0.5272 27445 | -0.6% 6.7% > 3C: 0.5102 44118 | 0.5218 40184 | 2.2% 8.9% > > As a conclusion point, I would like to say that this patch doesn't have > performance regression for most used scenario's > and it gives benefit in some of the trailing null's cases. Not really sure about the 100s of columns use case. But showing gain in useful places in these more common cases wins my vote. Thanks for testing. Barring objections, will commit. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services