Thread: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Jameison Martin
Date:
The following patch truncates trailing null attributes from heap rows to reduce the size of the row bitmap.
Applications often have wide rows in which many of the trailing column values are null. On an insert/update, all of the trailing null columns are tracked in the row bitmap. This can add a substantial overhead for very wide rows. This change truncates heap rows such that the trailing nulls are elided.
The intuition for this change is that ALTER TABLE t ADD COLUMN c type NULL is a metadata only change. Postgres works fine when a row's metadata (tuple descriptor) is inconsistent with the actual row data: extra columns are assumed to be null. This change just adjusts the number of attributes for a row and the row bitmap to only track up to the last non-null attribute.
Thanks.
-Jamie Martin
Attachment
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Tom Lane
Date:
Jameison Martin <jameisonb@yahoo.com> writes: > The following patch truncates trailing null attributes from heap rows to reduce the size of the row bitmap. This has been discussed before, but it always seemed that the cost-benefit ratio was exceedingly questionable. You don't get any savings whatsoever unless you reduce the size of the null bitmap across a MAXALIGN boundary, which more and more often is 64 bits, so that the frequency with which the optimization wins anything doesn't look likely to be that high. And on the other side of the coin, you're adding cycles to every single tuple-construction operation to make this work. The introduction of bugs doesn't seem improbable either. (Just because tuples in user tables might have unexpected natts values doesn't mean that the code is, or should be, prepared to tolerate that in system tables or plan-constructed tuples.) So what I'd like to see is some concrete test results proving that this is a net win, or at least not a net loss, for typical cases. Just asserting that it might be a win for certain usage patterns doesn't do it for me. regards, tom lane
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Greg Stark
Date:
On Tue, Apr 17, 2012 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > This has been discussed before, but it always seemed that the > cost-benefit ratio was exceedingly questionable. You don't get any > savings whatsoever unless you reduce the size of the null bitmap across > a MAXALIGN boundary, which more and more often is 64 bits, so that the > frequency with which the optimization wins anything doesn't look likely > to be that high. There is the usage pattern where (brace yourself) people have thousands of columns in which they have all but a handful be null. They might be pretty happy about this. I'm not sure if that's a use case that makes sense to optimize for though -- even for them the space overhead would be noticeable but not a showstopper. -- greg
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Tom Lane
Date:
Greg Stark <stark@mit.edu> writes: > On Tue, Apr 17, 2012 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> This has been discussed before, but it always seemed that the >> cost-benefit ratio was exceedingly questionable. �You don't get any >> savings whatsoever unless you reduce the size of the null bitmap across >> a MAXALIGN boundary, which more and more often is 64 bits, so that the >> frequency with which the optimization wins anything doesn't look likely >> to be that high. > There is the usage pattern where (brace yourself) people have > thousands of columns in which they have all but a handful be null. > They might be pretty happy about this. Oh, I don't doubt that there are *some* use cases for this. I'm just dubious about how much we'd be slowing things down for everybody else. As I said, what I'd like to see are some benchmarks, and not just benchmarks that are tuned to match the sort of case where this wins. regards, tom lane
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Jameison Martin
Date:
<div style="color:#000; background-color:#fff; font-family:tahoma, new york, times, serif;font-size:10pt"><div style="font-size:10pt; "><span style="font-size: small; ">Thanks for the response.</span><br /></div><div style="font-family:tahoma, 'new york', times, serif; "><div style="font-family: 'times new roman', 'new york', times, serif;"><div id="yiv1234046135"><div style="color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); font-family: tahoma,'new york', times, serif; "><div style="font-size: 10pt; "><font size="2"><br /></font></div><div style="font-size:10pt; "><font size="2">The use-case I'm targeting is a schema that has multiple tables with ~800 columns,most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equatesto 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If onlythe first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes,which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows wouldfall into the category of needing only 8 bytes for the null bitmap.<br /></font></div><div style="font-size: 10pt; "><fontsize="2"><br /></font></div><div style="font-size: 10pt; "><font size="2">What kind of test results would prove thatthis is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, howwould you define a typical case (e.g. what kind of data shape)?</font></div><div style="font-size: 10pt; "><font size="2"><br/>Thanks.</font></div><div style="font-size: 10pt; "><span style="font-size:13px;">-jamie</span></div><div style="font-size:10pt; font-family: tahoma, times, serif; "><br /></div><div style="font-size: 10pt; font-family: tahoma,times, serif; "><div style="font-size: 12pt; font-family: times, serif; "><div dir="ltr"><font face="Arial" size="2"><hrsize="1" /><b><span style="font-weight:bold;">From:</span></b> Tom Lane <tgl@sss.pgh.pa.us><br /><b><spanstyle="font-weight:bold;">To:</span></b> Jameison Martin <jameisonb@yahoo.com> <br /><b><span style="font-weight:bold;">Cc:</span></b>"pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org> <br /><b><spanstyle=" font-weight:bold;">Sent:</span></b> Tuesday, April 17, 2012 9:38 AM<br /><b><span style="font-weight:bold;">Subject:</span></b>Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reducethe size of the null bitmap <br /></font></div><br /> Jameison Martin <<a href="mailto:jameisonb@yahoo.com" rel="nofollow"target="_blank" ymailto="mailto:jameisonb@yahoo.com">jameisonb@yahoo.com</a>> writes:<br />> The followingpatch truncates trailing null attributes from heap rows to reduce the size of the row bitmap. <br /><br />This hasbeen discussed before, but it always seemed that the<br />cost-benefit ratio was exceedingly questionable. You don'tget any<br />savings whatsoever unless you reduce the size of the null bitmap across<br />a MAXALIGN boundary, whichmore and more often is 64 bits, so that the<br />frequency with which the optimization wins anything doesn't look likely<br/>to be that high. And on the other side of the coin, you're adding<br />cycles to every single tuple-constructionoperation to make this work.<br />The introduction of bugs doesn't seem improbable either. (Just because<br/>tuples in user tables might have unexpected natts values doesn't mean<br />that the code is, or should be, preparedto tolerate that in system<br />tables or plan-constructed tuples.)<br /><br />So what I'd like to see is some concretetest results proving that this<br />is a net win, or at least not a net loss, for typical cases. Just<br />assertingthat it might be a win for certain usage patterns doesn't do<br />it for me.<br /><br /> regards, tomlane<br /><br /><br /></div></div></div></div><br /><br /></div></div></div>
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Tom Lane
Date:
Jameison Martin <jameisonb@yahoo.com> writes: > The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment therow bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null,then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes forthe null bitmap. I can't help thinking that (a) this is an incredibly narrow use-case, and (b) you'd be well advised to rethink your schema design anyway. There are a whole lot of inefficiencies associated with having that many columns; the size of the null bitmap is probably one of the smaller ones. I don't really want to suggest an EAV design, but perhaps some of the columns could be collapsed into arrays, or something like that? > What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interestedin some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)? Hmm, well, most of the tables I've seen have fewer than 64 columns, so that the probability of win is exactly zero. Which would mean that you've got to demonstrate that the added overhead is unmeasurably small. Which maybe you can do, because there's certainly plenty of cycles involved in a tuple insertion, but we need to see the numbers. I'd suggest an INSERT/SELECT into a temp table as probably stressing tuple formation speed the most. Or maybe you could write a C function that just exercises heap_form_tuple followed by heap_freetuple in a tight loop --- if there's no slowdown measurable in that context, then a fortiori we don't have to worry about it in the real world. regards, tom lane
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Jameison Martin
Date:
Regarding the schema: I'm afraid the schema cannot be changed at this point, though I appreciate
the suggestions.
Regarding an INSERT performance test, what kind of table shape would you like me to exercise?
The patch as submitted may actually shave some cycles off of the insertion of rows with trailing nulls even
when there are less than 64 columns because it avoids iterating over the null columns a 2nd time in heap_fill_tuple(),
so I want to be sure that I pick something that you feel is properly representative.
Thanks.
-Jamie
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Jameison Martin <jameisonb@yahoo.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Tuesday, April 17, 2012 9:57 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
Jameison Martin <jameisonb@yahoo.com> writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes for the null bitmap.
I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones. I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?
> What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?
Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero. Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most. Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.
regards, tom lane
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Jameison Martin
Date:
Tom, I whipped up some INSERT/SELECT tests where I selected into a temporary table as you suggested. The target temporary table and the source table were in cache and I basically disabled things that would cause noise. The source table had 5 integer columns, and was populated with 10 million rows.
I tried 3 variations:
1) target has all nullable columns, all set to non null values: the results were the same
2) target has all nullable columns, only the first column is set: the patch was slightly faster
3) target has all non-null columns: the patch maybe was slightly faster, probably not statistically relevant
By slightly faster I'm talking on order of 10 nanoseconds per row.
I think #2 is explained by the reduction in loop iterations in heap_fill_tuple().
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Jameison Martin <jameisonb@yahoo.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Tuesday, April 17, 2012 9:57 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
Jameison Martin <jameisonb@yahoo.com> writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes for the null bitmap.
I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones. I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?
> What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?
Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero. Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most. Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.
regards, tom lane
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Simon Riggs
Date:
On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin <jameisonb@yahoo.com> wrote: > Tom, I whipped up some INSERT/SELECT tests where I selected into a > temporary table as you suggested. The target temporary table and the source > table were in cache and I basically disabled things that would cause noise. > The source table had 5 integer columns, and was populated with 10 million > rows. > > I tried 3 variations: > 1) target has all nullable columns, all set to non null values: the > results were the same > 2) target has all nullable columns, only the first column is set: the > patch was slightly faster > 3) target has all non-null columns: the patch maybe was slightly faster, > probably not statistically relevant > > By slightly faster I'm talking on order of 10 nanoseconds per row. > > I think #2 is explained by the reduction in loop iterations in > heap_fill_tuple(). I see this as a useful use case that I have come across in a few cases, most typically associated with very large databases. It will be a win in those cases, but I think your maths is unrealistic for the common case. In your case, you're saying that you have 750 trailing null columns that will be all-NULL in 90% of cases. Given a randomly distributed set of col values, I'd expect the last NULL to be on average around the 400th column, perhaps more. So the savings are still high, but not as high in the general case as it is for you. The performance tests Tom asks for are essential, otherwise we cannot proceed. Thanks for starting those. Please post your test code, any environment notes and your exact test results. The important point is that we need objectively confirmable tests, not just your word it was faster. Everybody is held to the same level of proof here, so its not a personal doubt. It would be useful to post sizes of databases also, to confirm that the patch really does reduce database size. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Greg Stark
Date:
On Thu, Apr 26, 2012 at 8:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The source table had 5 integer columns, and was populated with 10 million >> rows. ... >> 2) target has all nullable columns, only the first column is set: the >> patch was slightly faster ... >> By slightly faster I'm talking on order of 10 nanoseconds per row. >> >> I think #2 is explained by the reduction in loop iterations in >> heap_fill_tuple(). > > I see this as a useful use case that I have come across in a few > cases, most typically associated with very large databases. Indeed, if this result holds up then I think that would be pretty convincing evidence. But I'm pretty skeptical. You're talking about five bitmap tests in the middle of a loop involving much more expensive steps. Can we see the raw numbers and the actual test case? What I think would be strong evidence it's a real effect is if you repeat the comparison with larger numbers and see the speedup scale up. For instance if you create a table with 100 nullable columns and one non-null column value stored in various columns. Is the difference between the runtimes for the 95th column and 100th column doubled when you compare the 90th and 100th column cases? And is it doubled again when you compare the 80th column and the 100th column cases? (Off the top of my head I think the null bitmap would take the same storage space for those four) -- greg
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Jameison Martin
Date:
Simon and Greg,
The math on space savings is assuming that columns will be used roughly from first to last as declared in the DDL, not a random distribution of column values. This is the case for the particular schema that I'm looking at. I'm not asserting that it is the common case in general, though it may be more common than not given the fact that several commercial databases optimize for trailing null column values and developers often pay attention to this.
If there is a exact standard as to how this group does performance analysis (e.g. removing outliers beyond a certain standard deviation, number of repetitions, machine isolation requirements and so forth), please let me know. I can submit my results as is but in the interest of avoiding a lot of duplicate postings perhaps someone can point me to an example of what kinds of numbers are desired so I can make sure my posting conforms to that. For what it is worth I ran the 3 tests 10 times each and removed the outliers, but I can run 100 times or do something different if need be (e.g. post a csv for easy consumption in a spreadsheet). Also, Simon, you mentioned posting "environment notes", can you let me know what kind of environment notes are desired? For example, are you thinking about changes to the vanilla postgresql.conf, hardware information, OS config, etc?
Greg, all I'm trying to establish is that this change doesn't hurt insert performance for the common case as per Tom's comments. I'll try to add some additional test cases with varying trailing null column values to see if we can establish the potential salutary effect with a bit more data, but I'm not actually asserting that this significant or is a justification for the patch. It would be interesting to see what the performance benefit is with real queries against rows that have much smaller bitmaps, but I'd prefer not to get into that.
As for proof of the size reduction, I'd actually like to codify something in a regression test to ensure there are no regressions in the behavior of the patch. I was a little leery of creating a regression test that is dependent on internals that might cause the test to break over time, so I punted on it. Does anyone have a good suggestion as to a safe way to codify that the proposed behavioral change is working as intended in the form of a test that is unlikely to break over time? The best thing I could come up with was to create a very wide table and insert some sparse rows (trailing nulls) and verify that the pages. In any event, I'll also post a comparative relation size number and test as well.
Cheers.
-Jamie
From: Simon Riggs <simon@2ndQuadrant.com>
To: Jameison Martin <jameisonb@yahoo.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Thursday, April 26, 2012 12:27 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin <jameisonb@yahoo.com> wrote:
> Tom, I whipped up some INSERT/SELECT tests where I selected into a
> temporary table as you suggested. The target temporary table and the source
> table were in cache and I basically disabled things that would cause noise.
> The source table had 5 integer columns, and was populated with 10 million
> rows.
>
> I tried 3 variations:
> 1) target has all nullable columns, all set to non null values: the
> results were the same
> 2) target has all nullable columns, only the first column is set: the
> patch was slightly faster
> 3) target has all non-null columns: the patch maybe was slightly faster,
> probably not statistically relevant
>
> By slightly faster I'm talking on order of 10 nanoseconds per row.
>
> I think #2 is explained by the reduction in loop iterations in
> heap_fill_tuple().
I see this as a useful use case that I have come across in a few
cases, most typically associated with very large databases.
It will be a win in those cases, but I think your maths is unrealistic
for the common case. In your case, you're saying that you have 750
trailing null columns that will be all-NULL in 90% of cases. Given a
randomly distributed set of col values, I'd expect the last NULL to be
on average around the 400th column, perhaps more. So the savings are
still high, but not as high in the general case as it is for you.
The performance tests Tom asks for are essential, otherwise we cannot
proceed. Thanks for starting those.
Please post your test code, any environment notes and your exact test
results. The important point is that we need objectively confirmable
tests, not just your word it was faster. Everybody is held to the same
level of proof here, so its not a personal doubt.
It would be useful to post sizes of databases also, to confirm that
the patch really does reduce database size.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Josh Berkus
Date:
Tom, > I can't help thinking that (a) this is an incredibly narrow use-case, > and (b) you'd be well advised to rethink your schema design anyway. It's more common than you'd think. Both EAV and Hstore have their own (severe) drawbacks. For example, I'm working on an application which collects telemetry data from hardware. This can involve up to 700 columns of data, most of which is booleans, and an awful lot of which is NULL. Also, adding lots of columns *is* following "proper" relational design like we urge users to do, so it would be nice to make it perfomant. Now, the other issue I'd be worried about for this optimization is what happens when the nulls become non-trailing? For example, this pattern: 1. Out of 700 columns, columns 301+ are all Null, so we map them away. 2. User updates column 688 to non-null 3. Suddenly we have a MUCH larger row which will no longer fit on the page. If your application had a lot of that kind of update pattern, I'd be concerned that this would be a deoptimzation. > If there is a exact standard as to how this group does performance > analysis (e.g. removing outliers beyond a certain standard deviation, > number of repetitions, machine isolation requirements and so forth), > please let me know. Oh, don't I wish! We're a lot more "cowboy" that that. Greg Smith and Mark Wong have been trying to build a performance testing infrastructure, but right now our test software and methodology is *very* primitive. You're welcome to help and suggest. > I can submit my results as is but in the interest > of avoiding a lot of duplicate postings perhaps someone can point me > to an example of what kinds of numbers are desired so I can make sure > my posting conforms to that. For what it is worth I ran the 3 tests > 10 times each and removed the outliers, but I can run 100 times or do > something different if need be (e.g. post a csv for easy consumption > in a spreadsheet). Actually, I think just doing a battery of pgbench tests, for both the bigger and smaller than memory cases, with the patch installed, would give us some results for the non-NULL case. Something more sophisticated like DVDstore or DBT2 would be even better, since the tables there have more columns. > I tried 3 variations: > 1) target has all nullable columns, all set to non null values: the results were the same > 2) target has all nullable columns, only the first column is set: the patch was slightly faster > 3) target has all non-null columns: the patch maybe was slightly faster, probably not statistically relevant This seems pretty on-target; can you share the numbers, the nature of the test, and the setup with us so that we can evaulate it? > Also, Simon, you mentioned posting "environment > notes", can you let me know what kind of environment notes are > desired? For example, are you thinking about changes to the vanilla > postgresql.conf, hardware information, OS config, etc? Yes, exactly. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Simon Riggs
Date:
On Fri, Apr 27, 2012 at 1:51 AM, Josh Berkus <josh@agliodbs.com> wrote: > Now, the other issue I'd be worried about for this optimization is what > happens when the nulls become non-trailing? For example, this pattern: > > 1. Out of 700 columns, columns 301+ are all Null, so we map them away. > 2. User updates column 688 to non-null > 3. Suddenly we have a MUCH larger row which will no longer fit on the page. > > If your application had a lot of that kind of update pattern, I'd be > concerned that this would be a deoptimzation. Currently, we have a long row before and a long row after. Jamie's proposals would give us a short row before and a long row after. Since we don't ever update in place, we're much more likely to fit on the same page with this optimisation than without it. I guess we can check that with a performance test. (Perhaps a more obvious optimisation would be to use a compressed NULL bitmap. That would respond better in a wider range of use cases than just truncation of trailing zeroes.) -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Greg Stark
Date:
On Fri, Apr 27, 2012 at 1:51 AM, Josh Berkus <josh@agliodbs.com> wrote: > 1. Out of 700 columns, columns 301+ are all Null, so we map them away. > 2. User updates column 688 to non-null > 3. Suddenly we have a MUCH larger row which will no longer fit on the page. Note that this is only actually 48 bytes more in the null bitmap in this scenario. That's part of Tom's point that the null bitmap is so dense that you have to be talking about some pretty huge number of columns before the size savings are noticeable. Saving 48 bytes is nothing to sneeze at but it's hardly an impractical update to handle. -- greg
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Robert Haas
Date:
On Sat, Apr 28, 2012 at 6:23 PM, Greg Stark <stark@mit.edu> wrote: > On Fri, Apr 27, 2012 at 1:51 AM, Josh Berkus <josh@agliodbs.com> wrote: >> 1. Out of 700 columns, columns 301+ are all Null, so we map them away. >> 2. User updates column 688 to non-null >> 3. Suddenly we have a MUCH larger row which will no longer fit on the page. > > Note that this is only actually 48 bytes more in the null bitmap in > this scenario. That's part of Tom's point that the null bitmap is so > dense that you have to be talking about some pretty huge number of > columns before the size savings are noticeable. Saving 48 bytes is > nothing to sneeze at but it's hardly an impractical update to handle. More to the point, if the old row were 48 bytes larger, that would not increase the chance of the new row fitting on the page. You've got to store the old and new row no matter what: if the old one can be made smaller than otherwise, that's a win regardless of whether the new one is also smaller or not. The other point I feel we're overlooking here is... according to Jamie, the patch actually made things FASTER in every case he thought to test, and those cases don't appear to be anything particularly favorable to the patch, so that's probably a good sign. I'd like to see the exact numbers from each test run, and a complete reproducible test case, but at present all signs seem to point to this change being a savings in both time and space. Let's not go looking for reasons to reject the approach just because we didn't expect it to work as well as it does. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Simon Riggs
Date:
On Sun, Apr 29, 2012 at 12:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Let's not go looking for reasons to > reject the approach just because we didn't expect it to work as well > as it does. Who here, in your opinion, is looking for reasons to reject anything? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Gokulakannan Somasundaram
Date:
There might be a patch available for this already. In the worst case articulated above (less than 64 columns), if all the nulls are trailing nulls, the bitmap need not be saved. Actually it is not 64(actually 72), as postgres heaptupleheader is only 23 bytes and one byte is left for the start of the bitmap.
The same principle can be considered for Index Tuple as an extension
Thanks,
Gokul.
Re: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Robert Haas
Date:
On Sun, Apr 29, 2012 at 7:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sun, Apr 29, 2012 at 12:24 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Let's not go looking for reasons to >> reject the approach just because we didn't expect it to work as well >> as it does. > > Who here, in your opinion, is looking for reasons to reject anything? I'm just saying that there seems to be a bit more skepticism here than can be justified considering that the test results are all on one side. It wouldn't take a lot of evidence to convince me that this is a bad idea, but it will take more than none, which is the amount we have now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Jameison Martin
Date:
Attached are the following as per various requests:
- test_results.txt: the performance benchmarking results,
- TestTrailingNull.java: the performance benchmarking code, with a few additional scenarios as per various requests
- hardinfo_report.txt: some information about the hardware and OS of the system on which the benchmarks were run, and
- postgresql.conf: the postgresql.conf used when running benchmarks. Note that the changes made to the vanilla postgresql.conf can be identified by looking for the string 'jamie' in the file I attached (there aren't that many)
I ran the tests against a recent pull from git that I made a week ago or so, both with and without my patch. The results are marked as BASELINE (without my patch) and PATCH (with my patch). As I mentioned previously, I took Tom's advice and ran INSERT SELECT into a temporary table to get some idea of the impact of the proposed patch on the INSERT codepath. The DDL that the test ran is stated in the results along with the time the test took and the size of the target table. The INSERT SELECT always inserted 10 million rows per iteration. I mostly focused on smaller schemas to address Tom's concerns. I also added some wider schemas as per Simon and Greg. Note that the smaller schema runs fit in memory whereas the wider ones did not necessarily fit in memory; the wider schemas are primarily intended to clearly demonstrate the space savings.
When inserting rows with trailing nulls the patch always improves insert performance. Row size is decreased when the row bitmap can be truncated to something smaller. I'm not seeing a performance degradation without trailing nulls. I'm not asserting that the performance improvement justifies the change, just that the patch can have a significant impact on row size in the scenarios that I have outlined in previous emails (800 nullable columns with only the first 50 set). The fact that it improves insert performance in some cases is gravy in my opinion because this is a micro benchmark and we aren't talking about significant performance differences (in general we are talking about nanoseconds per row).
Hopefully the test output and the code is pretty self-explanatory.
If anyone wants to run TestTrailingNull.java for themselves you'll need the postgres jdbc driver and junit in your classpath.
Thanks.
-Jamie
From: Jameison Martin <jameisonb@yahoo.com>
To: Simon Riggs <simon@2ndQuadrant.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Thursday, April 26, 2012 11:59 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
Simon and Greg,
The math on space savings is assuming that columns will be used roughly from first to last as declared in the DDL, not a random distribution of column values. This is the case for the particular schema that I'm looking at. I'm not asserting that it is the common case in general, though it may be more common than not given the fact that several commercial databases optimize for trailing null column values and developers often pay attention to this.
If there is a exact standard as to how this group does performance analysis (e.g. removing outliers beyond a certain standard deviation, number of repetitions, machine isolation requirements and so forth), please let me know. I can submit my results as is but in the interest of avoiding a lot of duplicate postings perhaps someone can point me to an example of what kinds of numbers are desired so I can make sure my posting conforms to that. For what it is worth I ran the 3 tests 10 times each and removed the outliers, but I can run 100 times or do something different if need be (e.g. post a csv for easy consumption in a spreadsheet). Also, Simon, you mentioned posting "environment notes", can you let me know what kind of environment notes are desired? For example, are you thinking about changes to the vanilla postgresql.conf, hardware information, OS config, etc?
Greg, all I'm trying to establish is that this change doesn't hurt insert performance for the common case as per Tom's comments. I'll try to add some additional test cases with varying trailing null column values to see if we can establish the potential salutary effect with a bit more data, but I'm not actually asserting that this significant or is a justification for the patch. It would be interesting to see what the performance benefit is with real queries against rows that have much smaller bitmaps, but I'd prefer not to get into that.
As for proof of the size reduction, I'd actually like to codify something in a regression test to ensure there are no regressions in the behavior of the patch. I was a little leery of creating a regression test that is dependent on internals that might cause the test to break over time, so I punted on it. Does anyone have a good suggestion as to a safe way to codify that the proposed behavioral change is working as intended in the form of a test that is unlikely to break over time? The best thing I could come up with was to create a very wide table and insert some sparse rows (trailing nulls) and verify that the pages. In any event, I'll also post a comparative relation size number and test as well.
Cheers.
-Jamie
From: Simon Riggs <simon@2ndQuadrant.com>
To: Jameison Martin <jameisonb@yahoo.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Thursday, April 26, 2012 12:27 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin <jameisonb@yahoo.com> wrote:
> Tom, I whipped up some INSERT/SELECT tests where I selected into a
> temporary table as you suggested. The target temporary table and the source
> table were in cache and I basically disabled things that would cause noise.
> The source table had 5 integer columns, and was populated with 10 million
> rows.
>
> I tried 3 variations:
> 1) target has all nullable columns, all set to non null values: the
> results were the same
> 2) target has all nullable columns, only the first column is set: the
> patch was slightly faster
> 3) target has all non-null columns: the patch maybe was slightly faster,
> probably not statistically relevant
>
> By slightly faster I'm talking on order of 10 nanoseconds per row.
>
> I think #2 is explained by the reduction in loop iterations in
> heap_fill_tuple().
I see this as a useful use case that I have come across in a few
cases, most typically associated with very large databases.
It will be a win in those cases, but I think your maths is unrealistic
for the common case. In your case, you're saying that you have 750
trailing null columns that will be all-NULL in 90% of cases. Given a
randomly distributed set of col values, I'd expect the last NULL to be
on average around the 400th column, perhaps more. So the savings are
still high, but not as high in the general case as it is for you.
The performance tests Tom asks for are essential, otherwise we cannot
proceed. Thanks for starting those.
Please post your test code, any environment notes and your exact test
results. The important point is that we need objectively confirmable
tests, not just your word it was faster. Everybody is held to the same
level of proof here, so its not a personal doubt.
It would be useful to post sizes of databases also, to confirm that
the patch really does reduce database size.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Josh Berkus
Date:
On 5/2/12 10:20 AM, Jameison Martin wrote: > Attached are the following as per various requests: > * test_results.txt: the performance benchmarking results, > > * TestTrailingNull.java: the performance benchmarking code, with a few additional scenarios as per various requests > > * hardinfo_report.txt: some information about the hardware and OS of the system on which the benchmarks were run, and > > * postgresql.conf: the postgresql.conf used when running benchmarks. Note that the changes made to the vanilla postgresql.confcan be identified by looking for the string 'jamie' in the file I attached (there aren't that many) Nice, thanks. I'll try some of my own tests when I get a chance; I have a really good use-case for this optimization. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Josh Berkus
Date:
Tom, So that I can test this properly, what is the specific use-case we'd expect to be slow with this patch? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Robert Haas
Date:
On Wed, May 2, 2012 at 9:01 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 5/2/12 10:20 AM, Jameison Martin wrote: >> Attached are the following as per various requests: >> * test_results.txt: the performance benchmarking results, >> >> * TestTrailingNull.java: the performance benchmarking code, with a few additional scenarios as per various requests >> >> * hardinfo_report.txt: some information about the hardware and OS of the system on which the benchmarks were run,and >> >> * postgresql.conf: the postgresql.conf used when running benchmarks. Note that the changes made to the vanilla postgresql.confcan be identified by looking for the string 'jamie' in the file I attached (there aren't that many) > > Nice, thanks. I'll try some of my own tests when I get a chance; I have > a really good use-case for this optimization. Josh, The CommitFest application lists you as the reviewer for this patch. Are you (I hope) planning to review it? I see you posted up a follow-up email asking Tom what he had in mind. Personally, I don't think this needs incredibly complicated testing. I think you should just test a workload involving inserting and/or updating rows with lots of trailing NULL columns, and then another workload with a table of similar width that... doesn't. If we can't find a regression - or, better, we find a win in one or both cases - then I think we're done here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Simon Riggs
Date:
On 17 April 2012 17:22, Jameison Martin <jameisonb@yahoo.com> wrote: > The following patch truncates trailing null attributes from heap rows to > reduce the size of the row bitmap. > The intuition for this change is that ALTER TABLE t ADD COLUMN c type NULL > is a metadata only change. Postgres works fine when a row's metadata (tuple > descriptor) is inconsistent with the actual row data: extra columns are > assumed to be null. This change just adjusts the number of attributes for a > row and the row bitmap to only track up to the last non-null attribute. This is an interesting patch, but its has had various comments made about it. When I look at this I see that it would change the NULL bitmap for all existing rows, which means it forces a complete unload/reload of data. We've moved away from doing things like that, so in its current form we'd probably want to reject that. If I might suggest a way forward? Keep NULL bitmaps as they are now. Have another flag which indicates when a partial trailing col trimmed NULL bitmap is in use. Then we can decide whether a table will benefit from full or partial bitmap and set that in the tupledesc. That way the tupledesc will show heap_form_tuple which kind of null bitmap is preferred for new tuples. That preference might be settable by user on or off, but the default would be for postgres to decide that for us based upon null stats etc, which we would decide at ANALYZE time. That mechanism is both compatible with existing on-disk formats and means that the common path for smaller tables is unaffected, yet we gain the benefit of the patch for larger tables. It would be good to see you take this all the way. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > On 17 April 2012 17:22, Jameison Martin <jameisonb@yahoo.com> wrote: >> The following patch truncates trailing null attributes from heap rows to >> reduce the size of the row bitmap. > This is an interesting patch, but its has had various comments made about it. > When I look at this I see that it would change the NULL bitmap for all > existing rows, which means it forces a complete unload/reload of data. Huh? I thought it would only change how *new* tuples were stored. Old tuples ought to continue to work fine. I'm not really convinced that it's a good idea in the larger scheme of things --- your point in a nearby thread that micro-optimizing storage space at the expense of all else is not good engineering applies here. But I don't see that it forces data reload. Or if it does, that should be easily fixable. > ... Have another flag which indicates > when a partial trailing col trimmed NULL bitmap is in use. That might be useful for forensic purposes, but on the whole I suspect it's just added complexity (and eating up a valuable infomask bit) for relatively little gain. > ... decide whether a table will benefit from full or partial bitmap and > set that in the tupledesc. That way the tupledesc will show > heap_form_tuple which kind of null bitmap is preferred for new tuples. > That preference might be settable by user on or off, but the default > would be for postgres to decide that for us based upon null stats etc, > which we would decide at ANALYZE time. And that seems like huge overcomplication. I think we could probably do fine with some very simple fixed policy, like "don't bother with this for tables of less than N columns", where N is maybe 64 or so and chosen to match the MAXALIGN boundary where there actually could be some savings from trimming the null bitmap. (Note: I've not read the patch, so maybe Jameison already did something of the sort.) regards, tom lane
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Simon Riggs
Date:
On 9 August 2012 15:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 17 April 2012 17:22, Jameison Martin <jameisonb@yahoo.com> wrote: >>> The following patch truncates trailing null attributes from heap rows to >>> reduce the size of the row bitmap. > >> This is an interesting patch, but its has had various comments made about it. > >> When I look at this I see that it would change the NULL bitmap for all >> existing rows, which means it forces a complete unload/reload of data. > > Huh? I thought it would only change how *new* tuples were stored. > Old tuples ought to continue to work fine. That wasn't my understanding, but that could be wrong. > I'm not really convinced that it's a good idea in the larger scheme > of things --- your point in a nearby thread that micro-optimizing > storage space at the expense of all else is not good engineering > applies here. But I don't see that it forces data reload. Or if > it does, that should be easily fixable. Large numbers of columns are surprisingly common and tables with large numbers of columns usually have many rows as well. So this doesn't matter for most tables, but the few that need this can often represent >80% of database volume, so it is important. (Next challenge is how to cope with 1000s of columns.) > And that seems like huge overcomplication. I think we could probably > do fine with some very simple fixed policy, like "don't bother with > this for tables of less than N columns", where N is maybe 64 or so > and chosen to match the MAXALIGN boundary where there actually could > be some savings from trimming the null bitmap. "One simple tweak" works for me. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Jameison Martin
Date:
<div style="color:; background-color:; font-family:tahoma, new york, times, serif;font-size:10pt"><div style="font-family:tahoma, 'new york', times, serif; font-size: 10pt; "><span>Simon, Tom is correct, the patch doesn't changethe existing row format contract or the format of the null bitmap. The change only affects how new rows are writtenout. And it uses the same supported format that has always been there (which is why alter table add col null worksthe way it does). And it keeps to the same MAXALIGN boundaries that are there today. </span></div><div style="font-family:tahoma, 'new york', times, serif; font-size: 13px; color: rgb(0, 0, 0); background-color: transparent;font-style: normal; "><span><br /></span></div><div style="background-color: transparent; "><span><font size="2">Onecould argue that different row formats could make sense in different circumstances, and I'm certainly open tothat kind of discussion, but this change is far more modest and perhaps can be made on its own since it doesn't perturb thecode base much, improves performance (marginally) and improves the size of rows with lots of trailingnulls.</font></span></div><div style="background-color: transparent; color: rgb(0, 0, 0); font-size: 13px; font-family:tahoma, 'new york', times, serif; font-style: normal; "><span><font size="2"><br /></font></span></div><div style="background-color:transparent; color: rgb(0, 0, 0); font-size: 13px; font-family: tahoma, 'new york', times, serif;font-style: normal; "><span><font size="2">[separate topic: pluggable heap manager]</font></span></div><div style="background-color:transparent; color: rgb(0, 0, 0); font-size: 13px; font-family: tahoma, 'new york', times, serif;font-style: normal; "><span><font size="2">I'm quite interested in pursuing more aggressive compression strategies,and I'd like to do so in the context of the heap manager. I'm exploring having a pluggable heap manager implementationand would be interested in feedback on that as a general approach. My thinking is that I'd like to be ableto have PostgreSQL support multiple heap implementations along the lines of how multiple index types are supported, thoughprobably only the existing heap manager implementation would be part of the actual codeline. I've done a little exploratorywork of looking at the heap interface. I was planning on doing a little prototyping before suggesting anythingconcrete, but, assuming the concept of a layered heap manager is not inherently objectionable, I was thinking ofcleaning up the heap interface a little (e.g. some HOT stuff has bled across a little), then taking a whack at formalizingthe interface along the lines of the index layering. So ideally I'd make a few separate submissions and if allgoes according to plan I'd be able to have a pluggable heap manager implementation that I could work on independentlyand which could in theory use the same hooks as the existing heap implementation. And if it turns out that myimplementation is deemed to be general enough it could be released to the community.</font></span></div><div style="background-color:transparent; color: rgb(0, 0, 0); font-size: 13px; font-family: tahoma, 'new york', times, serif;font-style: normal; "><span><font size="2"><br /></font></span></div><div style="background-color: transparent; color:rgb(0, 0, 0); font-family: tahoma, 'new york', times, serif; font-style: normal; "><font size="2">If I do decide topursue this, can anyone suggest the best way solicit feedback? I see that some proposals get shared on the postgres wiki.I could put something up there to frame the issue and encourage some back and forth dialog. Or is email the way thatthis kind of exchange tends to happen? Ultimately I'd like to get into a bit of detail about what the actual heap managercontract is and so forth.</font></div><div style="background-color: transparent; color: rgb(0, 0, 0); font-family:tahoma, 'new york', times, serif; font-style: normal; font-size: 13px; "><font size="2"><br /></font></div><divstyle="background-color: transparent; color: rgb(0, 0, 0); font-family: tahoma, 'new york', times, serif;font-style: normal; font-size: 13px; "><font size="2">Note that I'm a ways from really knowing if this is feasibleon my end, so this is quite speculative at this point. But I'd like to introduce the topic and get some feedbackon the right way to communicate as early as possible.</font></div><div style="background-color: transparent; color:rgb(0, 0, 0); font-family: tahoma, 'new york', times, serif; font-style: normal; font-size: 13px; "><font size="2"><br/></font></div><div style="background-color: transparent; color: rgb(0, 0, 0); font-family: tahoma, 'new york',times, serif; font-style: normal; font-size: 13px; "><font size="2">Thanks.</font></div><div style="background-color:transparent; color: rgb(0, 0, 0); font-family: tahoma, 'new york', times, serif; font-style: normal;font-size: 13px; "><font size="2"><br /></font></div><div style="background-color: transparent; color: rgb(0, 0, 0);font-family: tahoma, 'new york', times, serif; font-style: normal; font-size: 13px; "><font size="2">-Jamie</font></div><divstyle="background-color: transparent; color: rgb(0, 0, 0); font-family: tahoma, 'new york',times, serif; font-style: normal; font-size: 13px; "><font size="2"><br /></font></div><div style="font-family: tahoma,'new york', times, serif; font-size: 10pt; "><div style="font-family: 'times new roman', 'new york', times, serif;font-size: 12pt; "><div dir="ltr"><font face="Arial" size="2"><hr size="1" /><b><span style="font-weight:bold;">From:</span></b>Tom Lane <tgl@sss.pgh.pa.us><br /><b><span style="font-weight: bold;">To:</span></b>Simon Riggs <simon@2ndQuadrant.com> <br /><b><span style="font-weight: bold;">Cc:</span></b> JameisonMartin <jameisonb@yahoo.com>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org> <br /><b><spanstyle="font-weight: bold;">Sent:</span></b> Thursday, August 9, 2012 7:27 AM<br /><b><span style="font-weight:bold;">Subject:</span></b> Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reducethe size of the null bitmap<br /></font></div><br /> Simon Riggs <<a href="mailto:simon@2ndQuadrant.com" ymailto="mailto:simon@2ndQuadrant.com">simon@2ndQuadrant.com</a>>writes:<br />> On 17 April 2012 17:22, Jameison Martin<<a href="mailto:jameisonb@yahoo.com" ymailto="mailto:jameisonb@yahoo.com">jameisonb@yahoo.com</a>> wrote:<br/>>> The following patch truncates trailing null attributes from heap rows to<br />>> reduce the sizeof the row bitmap.<br /><br />> This is an interesting patch, but its has had various comments made about it.<br /><br/>> When I look at this I see that it would change the NULL bitmap for all<br />> existing rows, which means itforces a complete unload/reload of data.<br /><br />Huh? I thought it would only change how *new* tuples were stored.<br/>Old tuples ought to continue to work fine.<br /><br />I'm not really convinced that it's a good idea in the largerscheme<br />of things --- your point in a nearby thread that micro-optimizing<br />storage space at the expense ofall else is not good engineering<br />applies here. But I don't see that it forces data reload. Or if<br />it does, thatshould be easily fixable.<br /><br />> ... Have another flag which indicates<br />> when a partial trailing coltrimmed NULL bitmap is in use.<br /><br />That might be useful for forensic purposes, but on the whole I suspect<br />it'sjust added complexity (and eating up a valuable infomask bit)<br />for relatively little gain.<br /><br />> ...decide whether a table will benefit from full or partial bitmap and<br />> set that in the tupledesc. That way thetupledesc will show<br />> heap_form_tuple which kind of null bitmap is preferred for new tuples.<br />> That preferencemight be settable by user on or off, but the default<br />> would be for postgres to decide that for us basedupon null stats etc,<br />> which we would decide at ANALYZE time.<br /><br />And that seems like huge overcomplication. I think we could probably<br />do fine with some very simple fixed policy, like "don't bother with<br />thisfor tables of less than N columns", where N is maybe 64 or so<br />and chosen to match the MAXALIGN boundary wherethere actually could<br />be some savings from trimming the null bitmap.<br /><br />(Note: I've not read the patch,so maybe Jameison already did something<br />of the sort.)<br /><br /> regards, tom lane<br /><br /><br/></div></div></div>
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Jim Nasby
Date:
On 8/9/12 10:56 AM, Jameison Martin wrote: > [separate topic: pluggable heap manager] > I'm quite interested in pursuing more aggressive compression strategies, and I'd like to do so in the context of the heapmanager. I'm exploring having a pluggable heap manager implementation and would be interested in feedback on that asa general approach. My thinking is that I'd like to be able to have PostgreSQL support multiple heap implementations alongthe lines of how multiple index types are supported, though probably only the existing heap manager implementation wouldbe part of the actual codeline. I've done a little exploratory work of looking at the heap interface. I was planningon doing a little prototyping before suggesting anything concrete, but, assuming the concept of a layered heap manageris not inherently objectionable, I was thinking of cleaning up the heap interface a little (e.g. some HOT stuff hasbled across a little), then taking a whack at formalizing the interface along the lines of the index layering. So ideallyI'd make a few separate > submissions and if all goes according to plan I'd be able to have a pluggable heap manager implementation that I couldwork on independently and which could in theory use the same hooks as the existing heap implementation. And if it turnsout that my implementation is deemed to be general enough it could be released to the community. I'm definitely interested in things that can shrink our working-set-size; things that others might not be keen on. (Likehaving the on-disk format be tighter than the in-memory one). Having the ability to put in different heap storage couldbe a good way to accommodate that. Especially if you could change it on a per-table basis. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
From
Tom Lane
Date:
Jameison Martin <jameisonb@yahoo.com> writes: > [separate topic: pluggable heap manager] > I'm quite interested in pursuing more aggressive compression > strategies, and I'd like to do so in the context of the heap > manager. I'm exploring having a pluggable heap manager implementation > and would be interested in feedback on that as a general approach. My > thinking is that I'd like to be able to have PostgreSQL support > multiple heap implementations along the lines of how multiple index > types are supported, though probably only the existing heap manager > implementation would be part of the actual codeline. There's been some previous talk about "pluggable heap managers"; you might try searching our archives. Unfortunately the story is not very good, especially if what you are really interested in is playing games with the format of heap tuples. We just haven't got an abstraction boundary that isolates that very well. I think the first thing that would have to be worked out before we could even discuss having multiple heap managers is where the abstraction boundary would get drawn and how we'd have to refactor existing code to make it fly. > If I do decide to pursue this, can anyone suggest the best way solicit > feedback? Usually people just post proposals for discussion on pgsql-hackers. Obviously, at the early stages such a proposal might be pretty vague, but that's fine as long as you can explain what kind of feedback you're hoping for. regards, tom lane