Thread: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

The following patch truncates trailing null attributes from heap rows to reduce the size of the row bitmap. 

Applications often have wide rows in which many of the trailing column values are null. On an insert/update, all of the trailing null columns are tracked in the row bitmap. This can add a substantial overhead for very wide rows. This change truncates heap rows such that the trailing nulls are elided. 

The intuition for this change is that ALTER TABLE t ADD COLUMN c type NULL is a metadata only change. Postgres works fine when a row's metadata (tuple descriptor) is inconsistent with the actual row data: extra columns are assumed to be null. This change just adjusts the number of attributes for a row and the row bitmap to only track up to the last non-null attribute.

Thanks.

-Jamie Martin
Attachment
Jameison Martin <jameisonb@yahoo.com> writes:
> The following patch truncates trailing null attributes from heap rows to reduce the size of the row bitmap. 

This has been discussed before, but it always seemed that the
cost-benefit ratio was exceedingly questionable.  You don't get any
savings whatsoever unless you reduce the size of the null bitmap across
a MAXALIGN boundary, which more and more often is 64 bits, so that the
frequency with which the optimization wins anything doesn't look likely
to be that high.  And on the other side of the coin, you're adding
cycles to every single tuple-construction operation to make this work.
The introduction of bugs doesn't seem improbable either.  (Just because
tuples in user tables might have unexpected natts values doesn't mean
that the code is, or should be, prepared to tolerate that in system
tables or plan-constructed tuples.)

So what I'd like to see is some concrete test results proving that this
is a net win, or at least not a net loss, for typical cases.  Just
asserting that it might be a win for certain usage patterns doesn't do
it for me.
        regards, tom lane


On Tue, Apr 17, 2012 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> This has been discussed before, but it always seemed that the
> cost-benefit ratio was exceedingly questionable.  You don't get any
> savings whatsoever unless you reduce the size of the null bitmap across
> a MAXALIGN boundary, which more and more often is 64 bits, so that the
> frequency with which the optimization wins anything doesn't look likely
> to be that high.

There is the usage pattern where (brace yourself) people have
thousands of columns in which they have all but a handful be null.
They might be pretty happy about this. I'm not sure if that's a use
case that makes sense to optimize for though -- even for them the
space overhead would be noticeable but not a showstopper.

--
greg


Greg Stark <stark@mit.edu> writes:
> On Tue, Apr 17, 2012 at 5:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This has been discussed before, but it always seemed that the
>> cost-benefit ratio was exceedingly questionable. �You don't get any
>> savings whatsoever unless you reduce the size of the null bitmap across
>> a MAXALIGN boundary, which more and more often is 64 bits, so that the
>> frequency with which the optimization wins anything doesn't look likely
>> to be that high.

> There is the usage pattern where (brace yourself) people have
> thousands of columns in which they have all but a handful be null.
> They might be pretty happy about this.

Oh, I don't doubt that there are *some* use cases for this.  I'm just
dubious about how much we'd be slowing things down for everybody else.
As I said, what I'd like to see are some benchmarks, and not just
benchmarks that are tuned to match the sort of case where this wins.
        regards, tom lane


<div style="color:#000; background-color:#fff; font-family:tahoma, new york, times, serif;font-size:10pt"><div
style="font-size:10pt; "><span style="font-size: small; ">Thanks for the response.</span><br /></div><div
style="font-family:tahoma, 'new york', times, serif; "><div style="font-family: 'times new roman', 'new york', times,
serif;"><div id="yiv1234046135"><div style="color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); font-family:
tahoma,'new york', times, serif; "><div style="font-size: 10pt; "><font size="2"><br /></font></div><div
style="font-size:10pt; "><font size="2">The use-case I'm targeting is a schema that has multiple tables with ~800
columns,most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which
equatesto 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If
onlythe first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8
bytes,which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows
wouldfall into the category of needing only 8 bytes for the null bitmap.<br /></font></div><div style="font-size: 10pt;
"><fontsize="2"><br /></font></div><div style="font-size: 10pt; "><font size="2">What kind of test results would prove
thatthis is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also,
howwould you define a typical case (e.g. what kind of data shape)?</font></div><div style="font-size: 10pt; "><font
size="2"><br/>Thanks.</font></div><div style="font-size: 10pt; "><span style="font-size:13px;">-jamie</span></div><div
style="font-size:10pt; font-family: tahoma, times, serif; "><br /></div><div style="font-size: 10pt; font-family:
tahoma,times, serif; "><div style="font-size: 12pt; font-family: times, serif; "><div dir="ltr"><font face="Arial"
size="2"><hrsize="1" /><b><span style="font-weight:bold;">From:</span></b> Tom Lane <tgl@sss.pgh.pa.us><br
/><b><spanstyle="font-weight:bold;">To:</span></b> Jameison Martin <jameisonb@yahoo.com> <br /><b><span
style="font-weight:bold;">Cc:</span></b>"pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org> <br
/><b><spanstyle=" 
font-weight:bold;">Sent:</span></b> Tuesday, April 17, 2012 9:38 AM<br /><b><span
style="font-weight:bold;">Subject:</span></b>Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to
reducethe size of the null bitmap <br /></font></div><br /> Jameison Martin <<a href="mailto:jameisonb@yahoo.com"
rel="nofollow"target="_blank" ymailto="mailto:jameisonb@yahoo.com">jameisonb@yahoo.com</a>> writes:<br />> The
followingpatch truncates trailing null attributes from heap rows to reduce the size of the row bitmap. <br /><br />This
hasbeen discussed before, but it always seemed that the<br />cost-benefit ratio was exceedingly questionable.  You
don'tget any<br />savings whatsoever unless you reduce the size of the null bitmap across<br />a MAXALIGN boundary,
whichmore and more often is 64 bits, so that the<br />frequency with which the optimization wins anything doesn't look
likely<br/>to be that high.  And on the other side of the coin, you're adding<br />cycles to every single
tuple-constructionoperation to make this work.<br />The introduction of bugs doesn't seem improbable either.  (Just
because<br/>tuples in user tables might have unexpected natts values doesn't mean<br />that the code is, or should be,
preparedto tolerate that in system<br />tables or plan-constructed tuples.)<br /><br />So what I'd like to see is some
concretetest results proving that this<br />is a net win, or at least not a net loss, for typical cases.  Just<br
/>assertingthat it might be a win for certain usage patterns doesn't do<br />it for me.<br /><br />            regards,
tomlane<br /><br /><br /></div></div></div></div><br /><br /></div></div></div> 
Jameison Martin <jameisonb@yahoo.com> writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first
50or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment
therow bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually
non-null,then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save
96bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes
forthe null bitmap.
 

I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones.  I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?

> What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you
interestedin some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?
 

Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero.  Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most.  Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.
        regards, tom lane


Regarding the schema: I'm afraid the schema cannot be changed at this point, though I appreciate
the suggestions. 

Regarding an INSERT performance test, what kind of table shape would you like me to exercise? 
The patch as submitted may actually shave some cycles off of the insertion of rows with trailing nulls even 
when there are less than 64 columns because it avoids iterating over the null columns a 2nd time in heap_fill_tuple(), 
so I want to be sure that I pick something that you feel is properly representative. 

Thanks.

-Jamie


From: Tom Lane <tgl@sss.pgh.pa.us>
To: Jameison Martin <jameisonb@yahoo.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Tuesday, April 17, 2012 9:57 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

Jameison Martin <jameisonb@yahoo.com> writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes for the null bitmap.

I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones.  I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?

> What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?

Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero.  Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most.  Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.

            regards, tom lane


Tom, I whipped up some  INSERT/SELECT tests where I selected into a temporary table as you suggested. The target temporary table and the source table were in cache and I basically disabled things that would cause noise. The source table had 5 integer columns, and was populated with 10 million rows.

I tried 3 variations:
  1) target has all nullable columns, all set to non null values: the results were the same
  2) target has all nullable columns, only the first column is set: the patch was slightly faster
  3) target has all non-null columns: the patch maybe was slightly faster, probably not statistically relevant

By slightly faster I'm talking on order of 10 nanoseconds per row.

I think #2 is explained by the reduction in loop iterations in heap_fill_tuple(). 


From: Tom Lane <tgl@sss.pgh.pa.us>
To: Jameison Martin <jameisonb@yahoo.com>
Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Tuesday, April 17, 2012 9:57 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

Jameison Martin <jameisonb@yahoo.com> writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 columns, most of which have only the first 50 or so values set. 800 columns would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte alignment the row bitmap would take up 104 bytes with the current implementation. If only the first 50 or so columns are actually non-null, then the minimum bitmap size wouldn't need to be more than 8 bytes, which means the proposed change would save 96 bytes. For the data set I have in mind roughly 90% of the rows would fall into the category of needing only 8 bytes for the null bitmap.

I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones.  I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?

> What kind of test results would prove that this is a net win (or not a net loss) for typical cases? Are you interested in some insert performance tests? Also, how would you define a typical case (e.g. what kind of data shape)?

Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero.  Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most.  Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.

            regards, tom lane


On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin <jameisonb@yahoo.com> wrote:
> Tom, I whipped up some  INSERT/SELECT tests where I selected into a
> temporary table as you suggested. The target temporary table and the source
> table were in cache and I basically disabled things that would cause noise.
> The source table had 5 integer columns, and was populated with 10 million
> rows.
>
> I tried 3 variations:
>   1) target has all nullable columns, all set to non null values: the
> results were the same
>   2) target has all nullable columns, only the first column is set: the
> patch was slightly faster
>   3) target has all non-null columns: the patch maybe was slightly faster,
> probably not statistically relevant
>
> By slightly faster I'm talking on order of 10 nanoseconds per row.
>
> I think #2 is explained by the reduction in loop iterations in
> heap_fill_tuple().

I see this as a useful use case that I have come across in a few
cases, most typically associated with very large databases.

It will be a win in those cases, but I think your maths is unrealistic
for the common case. In your case, you're saying that you have 750
trailing null columns that will be all-NULL in 90% of cases. Given a
randomly distributed set of col values, I'd expect the last NULL to be
on average around the 400th column, perhaps more. So the savings are
still high, but not as high in the general case as it is for you.

The performance tests Tom asks for are essential, otherwise we cannot
proceed. Thanks for starting those.

Please post your test code, any environment notes and your exact test
results. The important point is that we need objectively confirmable
tests, not just your word it was faster. Everybody is held to the same
level of proof here, so its not a personal doubt.

It would be useful to post sizes of databases also, to confirm that
the patch really does reduce database size.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


On Thu, Apr 26, 2012 at 8:27 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> The source table had 5 integer columns, and was populated with 10 million
>> rows.
...
>>   2) target has all nullable columns, only the first column is set: the
>> patch was slightly faster
...
>> By slightly faster I'm talking on order of 10 nanoseconds per row.
>>
>> I think #2 is explained by the reduction in loop iterations in
>> heap_fill_tuple().
>
> I see this as a useful use case that I have come across in a few
> cases, most typically associated with very large databases.

Indeed, if this result holds up then I think that would be pretty
convincing evidence. But I'm pretty skeptical. You're talking about
five bitmap tests in the middle of a loop involving much more
expensive steps. Can we see the raw numbers and the actual test case?

What I think would be strong evidence it's a real effect is if you
repeat the comparison with larger numbers and see the speedup scale
up. For instance if you create a table with 100 nullable columns and
one non-null column value stored in various columns. Is the difference
between the runtimes for the 95th column and 100th column doubled when
you compare the 90th and 100th column cases? And is it doubled again
when you compare the 80th column and the 100th column cases? (Off the
top of my head I think the null bitmap would take the same storage
space for those four)

--
greg


Simon and Greg,

The math on space savings is assuming that columns will be used roughly from first to last as declared in the DDL, not a random distribution of column values. This is the case for the particular schema that I'm looking at. I'm not asserting that it is the common case in general, though it may be more common than not given the fact that several commercial databases optimize for trailing null column values and developers often pay attention to this.

If there is a exact standard as to how this group does performance analysis (e.g. removing outliers beyond a certain standard deviation, number of repetitions, machine isolation requirements and so forth), please let me know. I can submit my results as is but in the interest of avoiding a lot of duplicate postings perhaps someone can point me to an example of what kinds of numbers are desired so I can make sure my posting conforms to that. For what it is worth I ran the 3 tests 10 times each and removed the outliers, but I can run 100 times or do something different if need be (e.g. post a csv for easy consumption in a spreadsheet). Also, Simon, you mentioned posting "environment notes", can you let me know what kind of environment notes are desired? For example, are you thinking about changes to the vanilla postgresql.conf, hardware information, OS config, etc?

Greg, all I'm trying to establish is that this change doesn't hurt insert performance for the common case as per Tom's comments. I'll try to add some additional test cases with varying trailing null column values to see if we can establish the potential salutary effect with a bit more data, but I'm not actually asserting that this significant or is a justification for the patch. It would be interesting to see what the performance benefit is with real queries against rows that have much smaller bitmaps, but I'd prefer not to get into that.

As for proof of the size reduction, I'd actually like to codify something in a regression test to ensure there are no regressions in the behavior of the patch. I was a little leery of creating a regression test that is dependent on internals that might cause the test to break over time, so I punted on it. Does anyone have a good suggestion as to a safe way to codify that the proposed behavioral change is working as intended in the form of a test that is unlikely to break over time? The best thing I could come up with was to create a very wide table and insert some sparse rows (trailing nulls) and verify that the pages. In any event, I'll also post a comparative relation size number and test as well.

Cheers.

-Jamie


From: Simon Riggs <simon@2ndQuadrant.com>
To: Jameison Martin <jameisonb@yahoo.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Thursday, April 26, 2012 12:27 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin <jameisonb@yahoo.com> wrote:
> Tom, I whipped up some  INSERT/SELECT tests where I selected into a
> temporary table as you suggested. The target temporary table and the source
> table were in cache and I basically disabled things that would cause noise.
> The source table had 5 integer columns, and was populated with 10 million
> rows.
>
> I tried 3 variations:
>   1) target has all nullable columns, all set to non null values: the
> results were the same
>   2) target has all nullable columns, only the first column is set: the
> patch was slightly faster
>   3) target has all non-null columns: the patch maybe was slightly faster,
> probably not statistically relevant
>
> By slightly faster I'm talking on order of 10 nanoseconds per row.
>
> I think #2 is explained by the reduction in loop iterations in
> heap_fill_tuple().

I see this as a useful use case that I have come across in a few
cases, most typically associated with very large databases.

It will be a win in those cases, but I think your maths is unrealistic
for the common case. In your case, you're saying that you have 750
trailing null columns that will be all-NULL in 90% of cases. Given a
randomly distributed set of col values, I'd expect the last NULL to be
on average around the 400th column, perhaps more. So the savings are
still high, but not as high in the general case as it is for you.

The performance tests Tom asks for are essential, otherwise we cannot
proceed. Thanks for starting those.

Please post your test code, any environment notes and your exact test
results. The important point is that we need objectively confirmable
tests, not just your word it was faster. Everybody is held to the same
level of proof here, so its not a personal doubt.

It would be useful to post sizes of databases also, to confirm that
the patch really does reduce database size.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Tom,

> I can't help thinking that (a) this is an incredibly narrow use-case,
> and (b) you'd be well advised to rethink your schema design anyway.

It's more common than you'd think.  Both EAV and Hstore have their own
(severe) drawbacks.

For example, I'm working on an application which collects telemetry data
from hardware.  This can involve up to 700 columns of data, most of
which is booleans, and an awful lot of which is NULL.

Also, adding lots of columns *is* following "proper" relational design
like we urge users to do, so it would be nice to make it perfomant.

Now, the other issue I'd be worried about for this optimization is what
happens when the nulls become non-trailing?  For example, this pattern:

1. Out of 700 columns, columns 301+ are all Null, so we map them away.
2. User updates column 688 to non-null
3. Suddenly we have a MUCH larger row which will no longer fit on the page.

If your application had a lot of that kind of update pattern, I'd be
concerned that this would be a deoptimzation.

> If there is a exact standard as to how this group does performance
> analysis (e.g. removing outliers beyond a certain standard deviation,
> number of repetitions, machine isolation requirements and so forth),
> please let me know.

Oh, don't I wish!  We're a lot more "cowboy" that that.  Greg Smith and
Mark Wong have been trying to build a performance testing
infrastructure, but right now our test software and methodology is
*very* primitive.  You're welcome to help and suggest.

> I can submit my results as is but in the interest
> of avoiding a lot of duplicate postings perhaps someone can point me
> to an example of what kinds of numbers are desired so I can make sure
> my posting conforms to that. For what it is worth I ran the 3 tests
> 10 times each and removed the outliers, but I can run 100 times or do
> something different if need be (e.g. post a csv for easy consumption
> in a spreadsheet).

Actually, I think just doing a battery of pgbench tests, for both the
bigger and smaller than memory cases, with the patch installed, would
give us some results for the non-NULL case.  Something more
sophisticated like DVDstore or DBT2 would be even better, since the
tables there have more columns.

> I tried 3 variations:
>   1) target has all nullable columns, all set to non null values: the
results were the same
>   2) target has all nullable columns, only the first column is set:
the patch was slightly faster
>   3) target has all non-null columns: the patch maybe was slightly
faster, probably not statistically relevant

This seems pretty on-target; can you share the numbers, the nature of
the test, and the setup with us so that we can evaulate it?

> Also, Simon, you mentioned posting "environment
> notes", can you let me know what kind of environment notes are
> desired? For example, are you thinking about changes to the vanilla
> postgresql.conf, hardware information, OS config, etc?

Yes, exactly.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


On Fri, Apr 27, 2012 at 1:51 AM, Josh Berkus <josh@agliodbs.com> wrote:

> Now, the other issue I'd be worried about for this optimization is what
> happens when the nulls become non-trailing?  For example, this pattern:
>
> 1. Out of 700 columns, columns 301+ are all Null, so we map them away.
> 2. User updates column 688 to non-null
> 3. Suddenly we have a MUCH larger row which will no longer fit on the page.
>
> If your application had a lot of that kind of update pattern, I'd be
> concerned that this would be a deoptimzation.

Currently, we have a long row before and a long row after. Jamie's
proposals would give us a short row before and a long row after.

Since we don't ever update in place, we're much more likely to fit on
the same page with this optimisation than without it. I guess we can
check that with a performance test.

(Perhaps a more obvious optimisation would be to use a compressed NULL
bitmap. That would respond better in a wider range of use cases than
just truncation of trailing zeroes.)

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


On Fri, Apr 27, 2012 at 1:51 AM, Josh Berkus <josh@agliodbs.com> wrote:
> 1. Out of 700 columns, columns 301+ are all Null, so we map them away.
> 2. User updates column 688 to non-null
> 3. Suddenly we have a MUCH larger row which will no longer fit on the page.

Note that this is only actually 48 bytes more in the null bitmap in
this scenario. That's part of Tom's point that the null bitmap is so
dense that you have to be talking about some pretty huge number of
columns before the size savings are noticeable. Saving 48 bytes is
nothing to sneeze at but it's hardly an impractical update to handle.

-- 
greg


On Sat, Apr 28, 2012 at 6:23 PM, Greg Stark <stark@mit.edu> wrote:
> On Fri, Apr 27, 2012 at 1:51 AM, Josh Berkus <josh@agliodbs.com> wrote:
>> 1. Out of 700 columns, columns 301+ are all Null, so we map them away.
>> 2. User updates column 688 to non-null
>> 3. Suddenly we have a MUCH larger row which will no longer fit on the page.
>
> Note that this is only actually 48 bytes more in the null bitmap in
> this scenario. That's part of Tom's point that the null bitmap is so
> dense that you have to be talking about some pretty huge number of
> columns before the size savings are noticeable. Saving 48 bytes is
> nothing to sneeze at but it's hardly an impractical update to handle.

More to the point, if the old row were 48 bytes larger, that would not
increase the chance of the new row fitting on the page.  You've got to
store the old and new row no matter what: if the old one can be made
smaller than otherwise, that's a win regardless of whether the new one
is also smaller or not.

The other point I feel we're overlooking here is...  according to
Jamie, the patch actually made things FASTER in every case he thought
to test, and those cases don't appear to be anything particularly
favorable to the patch, so that's probably a good sign.  I'd like to
see the exact numbers from each test run, and a complete reproducible
test case, but at present all signs seem to point to this change being
a savings in both time and space.  Let's not go looking for reasons to
reject the approach just because we didn't expect it to work as well
as it does.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Sun, Apr 29, 2012 at 12:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> Let's not go looking for reasons to
> reject the approach just because we didn't expect it to work as well
> as it does.

Who here, in your opinion, is looking for reasons to reject anything?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


There might be a patch available for this already. In the worst case articulated above (less than 64 columns), if all the nulls are trailing nulls, the bitmap need not be saved. Actually it is not 64(actually 72), as postgres heaptupleheader is only 23 bytes and one byte is left for the start of the bitmap.

The same principle can be considered for Index Tuple as an extension

Thanks,
Gokul.

On Sun, Apr 29, 2012 at 7:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Sun, Apr 29, 2012 at 12:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Let's not go looking for reasons to
>> reject the approach just because we didn't expect it to work as well
>> as it does.
>
> Who here, in your opinion, is looking for reasons to reject anything?

I'm just saying that there seems to be a bit more skepticism here than
can be justified considering that the test results are all on one
side.  It wouldn't take a lot of evidence to convince me that this is
a bad idea, but it will take more than none, which is the amount we
have now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Attached are the following as per various requests:
  • test_results.txt: the performance benchmarking results, 
  • TestTrailingNull.java: the performance benchmarking code, with a few additional scenarios as per various requests
  • hardinfo_report.txt: some information about the hardware and OS of the system on which the benchmarks were run, and
  • postgresql.conf: the postgresql.conf used when running benchmarks. Note that the changes made to the vanilla postgresql.conf can be identified by looking for the string 'jamie' in the file I attached (there aren't that many)
I ran the tests against a recent pull from git that I made a week ago or so, both with and without my patch. The results are marked as BASELINE (without my patch) and PATCH (with my patch). As I mentioned previously, I took Tom's advice and ran INSERT SELECT into a temporary table to get some idea of the impact of the proposed patch on the INSERT codepath. The DDL that the test ran is stated in the results along with the time the test took and the size of the target table. The INSERT SELECT always inserted 10 million rows per iteration.  I mostly focused on smaller schemas to address Tom's concerns. I also added some wider schemas as per Simon and Greg. Note that the smaller schema runs fit in memory whereas the wider ones did not necessarily fit in memory; the wider schemas are primarily intended to clearly demonstrate the space savings.

When inserting rows with trailing nulls the patch always improves insert performance. Row size is decreased when the row bitmap can be truncated to something smaller. I'm not seeing a performance degradation without trailing nulls. I'm not asserting that the performance improvement justifies the change, just that the patch can have a significant impact on row size in the scenarios that I have outlined in previous emails (800 nullable columns with only the first 50 set). The fact that it improves insert performance in some cases is gravy in my opinion because this is a micro benchmark and we aren't talking about significant performance differences (in general we are talking about nanoseconds per row).

Hopefully the test output and the code is pretty self-explanatory.

If anyone wants to run TestTrailingNull.java for themselves you'll need the postgres jdbc driver and junit in your classpath.

Thanks.

-Jamie




From: Jameison Martin <jameisonb@yahoo.com>
To: Simon Riggs <simon@2ndQuadrant.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Thursday, April 26, 2012 11:59 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

Simon and Greg,

The math on space savings is assuming that columns will be used roughly from first to last as declared in the DDL, not a random distribution of column values. This is the case for the particular schema that I'm looking at. I'm not asserting that it is the common case in general, though it may be more common than not given the fact that several commercial databases optimize for trailing null column values and developers often pay attention to this.

If there is a exact standard as to how this group does performance analysis (e.g. removing outliers beyond a certain standard deviation, number of repetitions, machine isolation requirements and so forth), please let me know. I can submit my results as is but in the interest of avoiding a lot of duplicate postings perhaps someone can point me to an example of what kinds of numbers are desired so I can make sure my posting conforms to that. For what it is worth I ran the 3 tests 10 times each and removed the outliers, but I can run 100 times or do something different if need be (e.g. post a csv for easy consumption in a spreadsheet). Also, Simon, you mentioned posting "environment notes", can you let me know what kind of environment notes are desired? For example, are you thinking about changes to the vanilla postgresql.conf, hardware information, OS config, etc?

Greg, all I'm trying to establish is that this change doesn't hurt insert performance for the common case as per Tom's comments. I'll try to add some additional test cases with varying trailing null column values to see if we can establish the potential salutary effect with a bit more data, but I'm not actually asserting that this significant or is a justification for the patch. It would be interesting to see what the performance benefit is with real queries against rows that have much smaller bitmaps, but I'd prefer not to get into that.

As for proof of the size reduction, I'd actually like to codify something in a regression test to ensure there are no regressions in the behavior of the patch. I was a little leery of creating a regression test that is dependent on internals that might cause the test to break over time, so I punted on it. Does anyone have a good suggestion as to a safe way to codify that the proposed behavioral change is working as intended in the form of a test that is unlikely to break over time? The best thing I could come up with was to create a very wide table and insert some sparse rows (trailing nulls) and verify that the pages. In any event, I'll also post a comparative relation size number and test as well.

Cheers.

-Jamie


From: Simon Riggs <simon@2ndQuadrant.com>
To: Jameison Martin <jameisonb@yahoo.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Sent: Thursday, April 26, 2012 12:27 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin <jameisonb@yahoo.com> wrote:
> Tom, I whipped up some  INSERT/SELECT tests where I selected into a
> temporary table as you suggested. The target temporary table and the source
> table were in cache and I basically disabled things that would cause noise.
> The source table had 5 integer columns, and was populated with 10 million
> rows.
>
> I tried 3 variations:
>   1) target has all nullable columns, all set to non null values: the
> results were the same
>   2) target has all nullable columns, only the first column is set: the
> patch was slightly faster
>   3) target has all non-null columns: the patch maybe was slightly faster,
> probably not statistically relevant
>
> By slightly faster I'm talking on order of 10 nanoseconds per row.
>
> I think #2 is explained by the reduction in loop iterations in
> heap_fill_tuple().

I see this as a useful use case that I have come across in a few
cases, most typically associated with very large databases.

It will be a win in those cases, but I think your maths is unrealistic
for the common case. In your case, you're saying that you have 750
trailing null columns that will be all-NULL in 90% of cases. Given a
randomly distributed set of col values, I'd expect the last NULL to be
on average around the 400th column, perhaps more. So the savings are
still high, but not as high in the general case as it is for you.

The performance tests Tom asks for are essential, otherwise we cannot
proceed. Thanks for starting those.

Please post your test code, any environment notes and your exact test
results. The important point is that we need objectively confirmable
tests, not just your word it was faster. Everybody is held to the same
level of proof here, so its not a personal doubt.

It would be useful to post sizes of databases also, to confirm that
the patch really does reduce database size.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Attachment
On 5/2/12 10:20 AM, Jameison Martin wrote:
> Attached are the following as per various requests:
>     * test_results.txt: the performance benchmarking results, 
> 
>     * TestTrailingNull.java: the performance benchmarking code, with a few additional scenarios as per various
requests
> 
>     * hardinfo_report.txt: some information about the hardware and OS of the system on which the benchmarks were run,
and
> 
>     * postgresql.conf: the postgresql.conf used when running benchmarks. Note that the changes made to the vanilla
postgresql.confcan be identified by looking for the string 'jamie' in the file I attached (there aren't that many)
 

Nice, thanks.  I'll try some of my own tests when I get a chance; I have
a really good use-case for this optimization.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Tom,

So that I can test this properly, what is the specific use-case we'd
expect to be slow with this patch?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


On Wed, May 2, 2012 at 9:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 5/2/12 10:20 AM, Jameison Martin wrote:
>> Attached are the following as per various requests:
>>       * test_results.txt: the performance benchmarking results,
>>
>>       * TestTrailingNull.java: the performance benchmarking code, with a few additional scenarios as per various
requests
>>
>>       * hardinfo_report.txt: some information about the hardware and OS of the system on which the benchmarks were
run,and 
>>
>>       * postgresql.conf: the postgresql.conf used when running benchmarks. Note that the changes made to the vanilla
postgresql.confcan be identified by looking for the string 'jamie' in the file I attached (there aren't that many) 
>
> Nice, thanks.  I'll try some of my own tests when I get a chance; I have
> a really good use-case for this optimization.

Josh,

The CommitFest application lists you as the reviewer for this patch.
Are you (I hope) planning to review it?

I see you posted up a follow-up email asking Tom what he had in mind.
Personally, I don't think this needs incredibly complicated testing.
I think you should just test a workload involving inserting and/or
updating rows with lots of trailing NULL columns, and then another
workload with a table of similar width that... doesn't.  If we can't
find a regression - or, better, we find a win in one or both cases -
then I think we're done here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On 17 April 2012 17:22, Jameison Martin <jameisonb@yahoo.com> wrote:

> The following patch truncates trailing null attributes from heap rows to
> reduce the size of the row bitmap.

> The intuition for this change is that ALTER TABLE t ADD COLUMN c type NULL
> is a metadata only change. Postgres works fine when a row's metadata (tuple
> descriptor) is inconsistent with the actual row data: extra columns are
> assumed to be null. This change just adjusts the number of attributes for a
> row and the row bitmap to only track up to the last non-null attribute.

This is an interesting patch, but its has had various comments made about it.

When I look at this I see that it would change the NULL bitmap for all
existing rows, which means it forces a complete unload/reload of data.
We've moved away from doing things like that, so in its current form
we'd probably want to reject that.

If I might suggest a way forward?

Keep NULL bitmaps as they are now. Have another flag which indicates
when a partial trailing col trimmed NULL bitmap is in use. Then we can
decide whether a table will benefit from full or partial bitmap and
set that in the tupledesc. That way the tupledesc will show
heap_form_tuple which kind of null bitmap is preferred for new tuples.
That preference might be settable by user on or off, but the default
would be for postgres to decide that for us based upon null stats etc,
which we would decide at ANALYZE time.

That mechanism is both compatible with existing on-disk formats and
means that the common path for smaller tables is unaffected, yet we
gain the benefit of the patch for larger tables.

It would be good to see you take this all the way.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


Simon Riggs <simon@2ndQuadrant.com> writes:
> On 17 April 2012 17:22, Jameison Martin <jameisonb@yahoo.com> wrote:
>> The following patch truncates trailing null attributes from heap rows to
>> reduce the size of the row bitmap.

> This is an interesting patch, but its has had various comments made about it.

> When I look at this I see that it would change the NULL bitmap for all
> existing rows, which means it forces a complete unload/reload of data.

Huh?  I thought it would only change how *new* tuples were stored.
Old tuples ought to continue to work fine.

I'm not really convinced that it's a good idea in the larger scheme
of things --- your point in a nearby thread that micro-optimizing
storage space at the expense of all else is not good engineering
applies here.  But I don't see that it forces data reload.  Or if
it does, that should be easily fixable.

> ...  Have another flag which indicates
> when a partial trailing col trimmed NULL bitmap is in use.

That might be useful for forensic purposes, but on the whole I suspect
it's just added complexity (and eating up a valuable infomask bit)
for relatively little gain.

> ... decide whether a table will benefit from full or partial bitmap and
> set that in the tupledesc. That way the tupledesc will show
> heap_form_tuple which kind of null bitmap is preferred for new tuples.
> That preference might be settable by user on or off, but the default
> would be for postgres to decide that for us based upon null stats etc,
> which we would decide at ANALYZE time.

And that seems like huge overcomplication.  I think we could probably
do fine with some very simple fixed policy, like "don't bother with
this for tables of less than N columns", where N is maybe 64 or so
and chosen to match the MAXALIGN boundary where there actually could
be some savings from trimming the null bitmap.

(Note: I've not read the patch, so maybe Jameison already did something
of the sort.)
        regards, tom lane


On 9 August 2012 15:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 17 April 2012 17:22, Jameison Martin <jameisonb@yahoo.com> wrote:
>>> The following patch truncates trailing null attributes from heap rows to
>>> reduce the size of the row bitmap.
>
>> This is an interesting patch, but its has had various comments made about it.
>
>> When I look at this I see that it would change the NULL bitmap for all
>> existing rows, which means it forces a complete unload/reload of data.
>
> Huh?  I thought it would only change how *new* tuples were stored.
> Old tuples ought to continue to work fine.

That wasn't my understanding, but that could be wrong.

> I'm not really convinced that it's a good idea in the larger scheme
> of things --- your point in a nearby thread that micro-optimizing
> storage space at the expense of all else is not good engineering
> applies here.  But I don't see that it forces data reload.  Or if
> it does, that should be easily fixable.

Large numbers of columns are surprisingly common and tables with large
numbers of columns usually have many rows as well. So this doesn't
matter for most tables, but the few that need this can often represent
>80% of database volume, so it is important.

(Next challenge is how to cope with 1000s of columns.)

> And that seems like huge overcomplication.  I think we could probably
> do fine with some very simple fixed policy, like "don't bother with
> this for tables of less than N columns", where N is maybe 64 or so
> and chosen to match the MAXALIGN boundary where there actually could
> be some savings from trimming the null bitmap.

"One simple tweak" works for me.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services


<div style="color:; background-color:; font-family:tahoma, new york, times, serif;font-size:10pt"><div
style="font-family:tahoma, 'new york', times, serif; font-size: 10pt; "><span>Simon, Tom is correct, the patch doesn't
changethe existing row format contract or the format of the null bitmap. The change only affects how new rows are
writtenout. And it uses the same supported format that has always been there (which is why alter table add col null
worksthe way it does). And it keeps to the same MAXALIGN boundaries that are there today. </span></div><div
style="font-family:tahoma, 'new york', times, serif; font-size: 13px; color: rgb(0, 0, 0); background-color:
transparent;font-style: normal; "><span><br /></span></div><div style="background-color: transparent; "><span><font
size="2">Onecould argue that different row formats could make sense in different circumstances, and I'm certainly open
tothat kind of discussion, but this change is far more modest and perhaps can be made on its own since it
doesn't perturb thecode base much, improves performance (marginally) and improves the size of rows with lots of
trailingnulls.</font></span></div><div style="background-color: transparent; color: rgb(0, 0, 0); font-size: 13px;
font-family:tahoma, 'new york', times, serif; font-style: normal; "><span><font size="2"><br /></font></span></div><div
style="background-color:transparent; color: rgb(0, 0, 0); font-size: 13px; font-family: tahoma, 'new york', times,
serif;font-style: normal; "><span><font size="2">[separate topic: pluggable heap manager]</font></span></div><div
style="background-color:transparent; color: rgb(0, 0, 0); font-size: 13px; font-family: tahoma, 'new york', times,
serif;font-style: normal; "><span><font size="2">I'm quite interested in pursuing more aggressive compression
strategies,and I'd like to do so in the context of the heap manager. I'm exploring having a pluggable heap manager
implementationand would be interested in feedback on that as a general approach. My thinking is that I'd like to be
ableto have PostgreSQL support multiple heap implementations along the lines of how multiple index types are supported,
thoughprobably only the existing heap manager implementation would be part of the actual codeline. I've done a little
exploratorywork of looking at the heap interface. I was planning on doing a little prototyping before suggesting
anythingconcrete, but, assuming the concept of a layered heap manager is not inherently objectionable, I was thinking
ofcleaning up the heap interface a little (e.g. some HOT stuff has bled across a little), then taking a whack at
formalizingthe interface along the lines of the index layering. So ideally I'd make a few separate submissions and if
allgoes according to plan I'd be able to have a pluggable heap manager implementation that I could work on
independentlyand which could in theory use the same hooks as the existing heap implementation. And if it turns out that
myimplementation is deemed to be general enough it could be released to the community.</font></span></div><div
style="background-color:transparent; color: rgb(0, 0, 0); font-size: 13px; font-family: tahoma, 'new york', times,
serif;font-style: normal; "><span><font size="2"><br /></font></span></div><div style="background-color: transparent;
color:rgb(0, 0, 0); font-family: tahoma, 'new york', times, serif; font-style: normal; "><font size="2">If I do decide
topursue this, can anyone suggest the best way solicit feedback? I see that some proposals get shared on the postgres
wiki.I could put something up there to frame the issue and encourage some back and forth dialog. Or is email the way
thatthis kind of exchange tends to happen? Ultimately I'd like to get into a bit of detail about what the actual heap
managercontract is and so forth.</font></div><div style="background-color: transparent; color: rgb(0, 0, 0);
font-family:tahoma, 'new york', times, serif; font-style: normal; font-size: 13px; "><font size="2"><br
/></font></div><divstyle="background-color: transparent; color: rgb(0, 0, 0); font-family: tahoma, 'new york', times,
serif;font-style: normal; font-size: 13px; "><font size="2">Note that I'm a ways from really knowing if this is
feasibleon my end, so this is quite speculative at this point. But I'd like to introduce the topic and get some
feedbackon the right way to communicate as early as possible.</font></div><div style="background-color: transparent;
color:rgb(0, 0, 0); font-family: tahoma, 'new york', times, serif; font-style: normal; font-size: 13px; "><font
size="2"><br/></font></div><div style="background-color: transparent; color: rgb(0, 0, 0); font-family: tahoma, 'new
york',times, serif; font-style: normal; font-size: 13px; "><font size="2">Thanks.</font></div><div
style="background-color:transparent; color: rgb(0, 0, 0); font-family: tahoma, 'new york', times, serif; font-style:
normal;font-size: 13px; "><font size="2"><br /></font></div><div style="background-color: transparent; color: rgb(0, 0,
0);font-family: tahoma, 'new york', times, serif; font-style: normal; font-size: 13px; "><font
size="2">-Jamie</font></div><divstyle="background-color: transparent; color: rgb(0, 0, 0); font-family: tahoma, 'new
york',times, serif; font-style: normal; font-size: 13px; "><font size="2"><br /></font></div><div style="font-family:
tahoma,'new york', times, serif; font-size: 10pt; "><div style="font-family: 'times new roman', 'new york', times,
serif;font-size: 12pt; "><div dir="ltr"><font face="Arial" size="2"><hr size="1" /><b><span
style="font-weight:bold;">From:</span></b>Tom Lane <tgl@sss.pgh.pa.us><br /><b><span style="font-weight:
bold;">To:</span></b>Simon Riggs <simon@2ndQuadrant.com> <br /><b><span style="font-weight: bold;">Cc:</span></b>
JameisonMartin <jameisonb@yahoo.com>; "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org> <br
/><b><spanstyle="font-weight: bold;">Sent:</span></b> Thursday, August 9, 2012 7:27 AM<br /><b><span
style="font-weight:bold;">Subject:</span></b> Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to
reducethe size of the null bitmap<br /></font></div><br /> Simon Riggs <<a href="mailto:simon@2ndQuadrant.com"
ymailto="mailto:simon@2ndQuadrant.com">simon@2ndQuadrant.com</a>>writes:<br />> On 17 April 2012 17:22, Jameison
Martin<<a href="mailto:jameisonb@yahoo.com" ymailto="mailto:jameisonb@yahoo.com">jameisonb@yahoo.com</a>>
wrote:<br/>>> The following patch truncates trailing null attributes from heap rows to<br />>> reduce the
sizeof the row bitmap.<br /><br />> This is an interesting patch, but its has had various comments made about it.<br
/><br/>> When I look at this I see that it would change the NULL bitmap for all<br />> existing rows, which means
itforces a complete unload/reload of data.<br /><br />Huh?  I thought it would only change how *new* tuples were
stored.<br/>Old tuples ought to continue to work fine.<br /><br />I'm not really convinced that it's a good idea in the
largerscheme<br />of things --- your point in a nearby thread that micro-optimizing<br />storage space at the expense
ofall else is not good engineering<br />applies here.  But I don't see that it forces data reload.  Or if<br />it does,
thatshould be easily fixable.<br /><br />> ...  Have another flag which indicates<br />> when a partial trailing
coltrimmed NULL bitmap is in use.<br /><br />That might be useful for forensic purposes, but on the whole I suspect<br
/>it'sjust added complexity (and eating up a valuable infomask bit)<br />for relatively little gain.<br /><br />>
...decide whether a table will benefit from full or partial bitmap and<br />> set that in the tupledesc. That way
thetupledesc will show<br />> heap_form_tuple which kind of null bitmap is preferred for new tuples.<br />> That
preferencemight be settable by user on or off, but the default<br />> would be for postgres to decide that for us
basedupon null stats etc,<br />> which we would decide at ANALYZE time.<br /><br />And that seems like huge
overcomplication. I think we could probably<br />do fine with some very simple fixed policy, like "don't bother with<br
/>thisfor tables of less than N columns", where N is maybe 64 or so<br />and chosen to match the MAXALIGN boundary
wherethere actually could<br />be some savings from trimming the null bitmap.<br /><br />(Note: I've not read the
patch,so maybe Jameison already did something<br />of the sort.)<br /><br />            regards, tom lane<br /><br
/><br/></div></div></div> 
On 8/9/12 10:56 AM, Jameison Martin wrote:
> [separate topic: pluggable heap manager]
> I'm quite interested in pursuing more aggressive compression strategies, and I'd like to do so in the context of the
heapmanager. I'm exploring having a pluggable heap manager implementation and would be interested in feedback on that
asa general approach. My thinking is that I'd like to be able to have PostgreSQL support multiple heap implementations
alongthe lines of how multiple index types are supported, though probably only the existing heap manager implementation
wouldbe part of the actual codeline. I've done a little exploratory work of looking at the heap interface. I was
planningon doing a little prototyping before suggesting anything concrete, but, assuming the concept of a layered heap
manageris not inherently objectionable, I was thinking of cleaning up the heap interface a little (e.g. some HOT stuff
hasbled across a little), then taking a whack at formalizing the interface along the lines of the index layering. So
ideallyI'd make a few separate
 
> submissions and if all goes according to plan I'd be able to have a pluggable heap manager implementation that I
couldwork on independently and which could in theory use the same hooks as the existing heap implementation. And if it
turnsout that my implementation is deemed to be general enough it could be released to the community.
 

I'm definitely interested in things that can shrink our working-set-size; things that others might not be keen on.
(Likehaving the on-disk format be tighter than the in-memory one). Having the ability to put in different heap storage
couldbe a good way to accommodate that. Especially if you could change it on a per-table basis.
 
-- 
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


Jameison Martin <jameisonb@yahoo.com> writes:
> [separate topic: pluggable heap manager]
> I'm quite interested in pursuing more aggressive compression
> strategies, and I'd like to do so in the context of the heap
> manager. I'm exploring having a pluggable heap manager implementation
> and would be interested in feedback on that as a general approach. My
> thinking is that I'd like to be able to have PostgreSQL support
> multiple heap implementations along the lines of how multiple index
> types are supported, though probably only the existing heap manager
> implementation would be part of the actual codeline.

There's been some previous talk about "pluggable heap managers"; you
might try searching our archives.  Unfortunately the story is not very
good, especially if what you are really interested in is playing games
with the format of heap tuples.  We just haven't got an abstraction
boundary that isolates that very well.  I think the first thing that
would have to be worked out before we could even discuss having multiple
heap managers is where the abstraction boundary would get drawn and how
we'd have to refactor existing code to make it fly.

> If I do decide to pursue this, can anyone suggest the best way solicit
> feedback?

Usually people just post proposals for discussion on pgsql-hackers.
Obviously, at the early stages such a proposal might be pretty vague,
but that's fine as long as you can explain what kind of feedback you're
hoping for.
        regards, tom lane