Thread: Fast ALTER TABLE ... ADD COLUMN ... DEFAULT xxx?
Hello.<br /><br />PostgreSQL is very fast when we perform (even on a huge table)<br /><br />ALTER TABLE ... ADD COLUMN ...NULL;<br /><br />(nullable without a default value). This is because of NULL bitmap in tuples. And it's greatest featurefor a developer!<br /><br /><br />But another very common-case query like<br /><br />ALTER TABLE ... ADD COLUMN ...BOOLEAN NOT NULL DEFAULT false;<br />or<br />ALTER TABLE ... ADD COLUMN ... INT NOT NULL DEFAULT 0;<br /><br />for a hugetable is performed very slow - this is because PostgreSQL have to re-create all tuples assigning the default value tothem. If I have a table with 1 billion rows (for example), I have no chance to perform this query at all - too slow.<br/><br />(In most cases NOT NULL DEFAULT xxx fields are BOOLEAN, flags: it is not handy to have 3-way flags.)<br /><br/><br />So, are there plans to optimize such kind of queries? This could be done by many ways:<br /><br />1. Store theDEFAULT flag directly in NULL BITMAP (add a bit to NULL bitmap not only for NULLable fields, but also for NOT NULL DEFAULT... fields).<br /> 2. Add another bitmap for each tuple (DEFAULT bitmap). Bit value 0 means that there is a real valuein a cell, 1 - that the value is default.<br />3. The same as (1), but always force default value to be 0 (or falseor any other values with meaning "zero") and optimize only these cases.<br /><br /><br />
On Thursday 21 May 2009 11:06:29 Dmitry Koterov wrote: > 1. Store the DEFAULT flag directly in NULL BITMAP (add a bit to NULL bitmap > not only for NULLable fields, but also for NOT NULL DEFAULT ... fields). > 2. Add another bitmap for each tuple (DEFAULT bitmap). Bit value 0 means > that there is a real value in a cell, 1 - that the value is default. > 3. The same as (1), but always force default value to be 0 (or false or any > other values with meaning "zero") and optimize only these cases. It seems to me that these solutions would require everyone in the world to pay the overhead of a few to many bits in every tuple to optimize what appears to be a relatively rare circumstance after all. Doesn't seem worth it to me.
On Thu, May 21, 2009 at 12:06:29PM +0400, Dmitry Koterov wrote: > ALTER TABLE ... ADD COLUMN ... NULL; > > (nullable without a default value). This is because of NULL bitmap in > tuples. And it's greatest feature for a developer! I don't think this is because of the "NULL bitmap". PG just never needs to flush the changes to every tuple because it knows that all "old" tuples (i.e. ones that were created before this column was added) are supposed to be NULL. > But another very common-case query like > > ALTER TABLE ... ADD COLUMN ... BOOLEAN NOT NULL DEFAULT false; > or > ALTER TABLE ... ADD COLUMN ... INT NOT NULL DEFAULT 0; > So, are there plans to optimize such kind of queries? This could be done by > many ways: I think this hasn't been done before because it's been considered too difficult to keep track of everything, but I've just tried to come up with an example of why it's difficult and failed. If I'm interpreting things correctly it's not nearly as difficult as I thought it should be. All that needs to be tracked is the "first" default value (this is currently assumed to be NULL). All subsequent INSERTs will have this value in the tuple and things should just work out. CREATE TABLE t ( i INTEGER PRIMARY KEY ); INSERT INTO t (i) VALUES (1); ALTER TABLE t ADD COLUMN j INTEGER DEFAULT 1; INSERTINTO t (i) VALUES (2); ALTER TABLE t ALTER j SET DEFAULT 2; INSERT INTO t (i) VALUES (3); ALTER TABLE t ALTER j DROPDEFAULT; INSERT INTO t (i) VALUES (4); After this we will have the following tuples: (1) (2,1) (3,2) (4,NULL) All that needs to be done is to fill in the "default" for i=1 to the first default (i.e. the integer 1) and everything is done. Who wants to tell me what I've missed? -- Sam http://samason.me.uk/
> (In most cases NOT NULL DEFAULT xxx fields are BOOLEAN, flags: it is not > handy to have 3-way flags.) This is certainly not true for me. I have both nullable booleans and not-nullable, defaulted columns of other types. > So, are there plans to optimize such kind of queries? This could be done by > many ways: > > 1. Store the DEFAULT flag directly in NULL BITMAP (add a bit to NULL bitmap > not only for NULLable fields, but also for NOT NULL DEFAULT ... fields). > 2. Add another bitmap for each tuple (DEFAULT bitmap). Bit value 0 means > that there is a real value in a cell, 1 - that the value is default. Both of these options would expand storage usage by a not-inconsiderable amount. > 3. The same as (1), but always force default value to be 0 (or false or any > other values with meaning "zero") and optimize only these cases. I'm not sure there's any way to know this for an arbitrary data type. ...Robert
Sam Mason <sam@samason.me.uk> writes: > On Thu, May 21, 2009 at 12:06:29PM +0400, Dmitry Koterov wrote: >> ALTER TABLE ... ADD COLUMN ... NULL; >> >> (nullable without a default value). This is because of NULL bitmap in >> tuples. And it's greatest feature for a developer! > I don't think this is because of the "NULL bitmap". No, it isn't. It's because each tuple includes the actual count of fields it contains (t_natts or HeapTupleHeaderGetNatts), and the value extraction routines are coded to assume that references to fields beyond that number should yield NULL. So the ALTER can just leave the existing rows alone --- only when you update a row will it change to include the newly added field(s). AFAICS there's no good way to scale that solution up to handling non-null values. > All that needs to be tracked is the "first" default value (this is > currently assumed to be NULL). You're being a bit vague, but in any case I don't think it can work for non-constant defaults (consider DEFAULT NOW()). And what about ALTER COLUMN DEFAULT? (BTW, I'm quite sure schemes like this have been discussed before. Check the archives...) regards, tom lane
On Thu, May 21, 2009 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sam Mason <sam@samason.me.uk> writes: >> On Thu, May 21, 2009 at 12:06:29PM +0400, Dmitry Koterov wrote: >>> ALTER TABLE ... ADD COLUMN ... NULL; >>> >>> (nullable without a default value). This is because of NULL bitmap in >>> tuples. And it's greatest feature for a developer! > >> I don't think this is because of the "NULL bitmap". > > No, it isn't. It's because each tuple includes the actual count of > fields it contains (t_natts or HeapTupleHeaderGetNatts), and the value > extraction routines are coded to assume that references to fields > beyond that number should yield NULL. So the ALTER can just leave > the existing rows alone --- only when you update a row will it change > to include the newly added field(s). > > AFAICS there's no good way to scale that solution up to handling > non-null values. > >> All that needs to be tracked is the "first" default value (this is >> currently assumed to be NULL). > > You're being a bit vague, but in any case I don't think it can work > for non-constant defaults (consider DEFAULT NOW()). And what about > ALTER COLUMN DEFAULT? I think what Sam is proposing is that, in addition to storing the default value for a column, you could store the value at the time the column was added. These might be the same if the default is a constant and has never been modified, or they might be different. This would even work for now() because it's stable, but it couldn't be used for a volatile function like random() or timeofday(), of course. ...Robert
On Thu, May 21, 2009 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> All that needs to be tracked is the "first" default value (this is >> currently assumed to be NULL). > > You're being a bit vague, but in any case I don't think it can work > for non-constant defaults (consider DEFAULT NOW()). And what about > ALTER COLUMN DEFAULT? > > (BTW, I'm quite sure schemes like this have been discussed before. > Check the archives...) Schemes like this have been discussed before but I don't think we considered applying the limitation that only the "first" default value would be covered. We always wanted to be able to handle new defaults or making a non-null column nullable later. I think the main motivation in the past was space savings for default values rather than making adding new non-null columns cheap. AFAICS as long as we only want to handle newly created non-nullable columns with initial defaults that are applied when the column is added then we could actually handle it by storing the initial default value in the catalog and keeping it in the tupledesc at all times. If you later change the default then it would only affect newly inserted rows which will have the new default value explicitly included anyways. And if likewise if you later make the row nullable the column will be explicitly listed in natts and the null bitmap. One gotcha I can think of is if the default expression depends on something like a type. If the default is later changed then people might be surprised to find there's still an invisible dependency on the type. But if it's limited to simple constants that should minimize that issue quite a bit. At least for types the column itself will have the same dependency anyways. -- greg
Greg Stark <stark@enterprisedb.com> writes: > Schemes like this have been discussed before but I don't think we > considered applying the limitation that only the "first" default value > would be covered. We always wanted to be able to handle new defaults > or making a non-null column nullable later. Yeah ... I don't see exactly what it would buy to restrict it to just the first such value. regards, tom lane
On Thu, May 21, 2009 at 11:13 AM, Greg Stark <stark@enterprisedb.com> wrote: > On Thu, May 21, 2009 at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >>> All that needs to be tracked is the "first" default value (this is >>> currently assumed to be NULL). >> >> You're being a bit vague, but in any case I don't think it can work >> for non-constant defaults (consider DEFAULT NOW()). And what about >> ALTER COLUMN DEFAULT? >> >> (BTW, I'm quite sure schemes like this have been discussed before. >> Check the archives...) > > Schemes like this have been discussed before but I don't think we > considered applying the limitation that only the "first" default value > would be covered. We always wanted to be able to handle new defaults > or making a non-null column nullable later. I think the main How would this scheme prevent you from doing that? The tuples that are added subsequent to the initial ADD COLUMN will have the value for that column in it, whether NULL or otherwise, default or not. The only thing you need to store is the value to be used (in place of NULL) when reading an old, short tuple. > One gotcha I can think of is if the default expression depends on > something like a type. If the default is later changed then people > might be surprised to find there's still an invisible dependency on > the type. But if it's limited to simple constants that should minimize > that issue quite a bit. At least for types the column itself will have > the same dependency anyways. I was also wondering whether there might be some gotchas in this area. ...Robert
On Thu, May 21, 2009 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@enterprisedb.com> writes: >> Schemes like this have been discussed before but I don't think we >> considered applying the limitation that only the "first" default value >> would be covered. We always wanted to be able to handle new defaults >> or making a non-null column nullable later. > > Yeah ... I don't see exactly what it would buy to restrict it to just > the first such value. Well it wouldn't buy you steady-state space savings or performance improvements. What it would buy you is a much narrowed set of circumstances where ALTER TABLE ADD COLUMN goes from a fast O(1) catalog change to a complete table rewrite. The use cases covered such as "boolean DEFAULT false" or "integer DEFAULT 0" are extremely common. I think users today often avoid the full table rewrite either make their application treat null as implicitly the default value or do a piecemeal rewrite using updates. I think Robert Haas is right that we could handle any stable expression by evaluating the expression once and storing only the final resulting value as a constant. That would avoid the problems with dependencies and later changes to functions. Another gotcha is that the default value might be very large.... It can't be very common but I suppose we would have to take some care around that. -- greg
Greg Stark <stark@enterprisedb.com> writes: > On Thu, May 21, 2009 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah ... I don't see exactly what it would buy to restrict it to just >> the first such value. > Well it wouldn't buy you steady-state space savings or performance improvements. > What it would buy you is a much narrowed set of circumstances where > ALTER TABLE ADD COLUMN goes from a fast O(1) catalog change to a > complete table rewrite. The use cases covered such as "boolean DEFAULT > false" or "integer DEFAULT 0" are extremely common. No, you missed my point --- what's the value of having an implementation of this that only works for one column? If we do it, I'd envision it as an extra column in pg_attribute, and it would work for any column(s). There's nothing to be gained by restricting it. > I think Robert Haas is right that we could handle any stable > expression by evaluating the expression once and storing only the > final resulting value as a constant. That would avoid the problems > with dependencies and later changes to functions. Right, that's *necessary* to avoid changing semantics compared to the non-optimized behavior. > Another gotcha is that the default value might be very large.... It > can't be very common but I suppose we would have to take some care > around that. Yeah, that occurred to me too. We'd probably not be able to toast the pg_attribute column (depending on exactly how it's declared/represented) so we'd have to put a limit on the width of data value that we'd be willing to handle this way. Seems doable though. regards, tom lane
-- Greg On 21 May 2009, at 12:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@enterprisedb.com> writes: >> On Thu, May 21, 2009 at 4:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah ... I don't see exactly what it would buy to restrict it to >>> just >>> the first such value. > >> Well it wouldn't buy you steady-state space savings or performance >> improvements. > >> What it would buy you is a much narrowed set of circumstances where >> ALTER TABLE ADD COLUMN goes from a fast O(1) catalog change to a >> complete table rewrite. The use cases covered such as "boolean >> DEFAULT >> false" or "integer DEFAULT 0" are extremely common. > > No, you missed my point --- what's the value of having an > implementation > of this that only works for one column? If we do it, I'd envision it > as an extra column in pg_attribute, and it would work for any > column(s). > There's nothing to be gained by restricting it. > Oh, I never meant to restrict it to one column. It might be nice to have vacuum notice the minimum natts in the table and trim the old unneeded ones. But I can't think of a very compelling reason to. Perhaps to save memory used in tuplestores? >> I think Robert Haas is right that we could handle any stable >> expression by evaluating the expression once and storing only the >> final resulting value as a constant. That would avoid the problems >> with dependencies and later changes to functions. > > Right, that's *necessary* to avoid changing semantics compared to > the non-optimized behavior. I'm coming at it from the other direction. I was assuming we could only handle simple constants and am trying to see how wide we can expand it. Doing all stable expressions would seem pretty convincingly wide to me. >> Another gotcha is that the default value might be very large.... It >> can't be very common but I suppose we would have to take some care >> around that. > > Yeah, that occurred to me too. We'd probably not be able to toast > the pg_attribute column (depending on exactly how it's > declared/represented) so we'd have to put a limit on the width of > data value that we'd be willing to handle this way. Seems doable > though. > > regards, tom lane
On Thu, May 21, 2009 at 12:26:08PM -0400, Tom Lane wrote: > I'd envision it > as an extra column in pg_attribute, and it would work for any column(s). > There's nothing to be gained by restricting it. Yes, when I said "first" I meant the only thing that needs to be stored is the first default value for the column. The code currently assumes that this first default value is NULL and is hence able to optimize the case where you adding a NULLable column. Adding this first default value to pg_attribute would allow you to break this assumption and allow the user to have non-NULL default values without complete table rewrites. I think the discussion covered this, but I don't think I was particularly clear in my first message and thought I should attempt to explain what I was saying. Other comments about only allowing STABLE expressions and non-toastable values make sense and were the sorts of things I thought I would miss. -- Sam http://samason.me.uk/
<div class="gmail_quote">On Thu, May 21, 2009 at 6:45 PM, Tom Lane <span dir="ltr"><<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>></span>wrote:<br /><blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">Sam Mason<<a href="mailto:sam@samason.me.uk">sam@samason.me.uk</a>> writes:<br /> > On Thu, May 21, 2009 at 12:06:29PM+0400, Dmitry Koterov wrote:<br /> >> ALTER TABLE ... ADD COLUMN ... NULL;<br /> >><br /> >>(nullable without a default value). This is because of NULL bitmap in<br /> >> tuples. And it's greatest featurefor a developer!<br /><br /> > I don't think this is because of the "NULL bitmap".<br /><br /></div>No, it isn't. It's because each tuple includes the actual count of<br /> fields it contains (t_natts or HeapTupleHeaderGetNatts),and the value<br /> extraction routines are coded to assume that references to fields<br /> beyondthat number should yield NULL. So the ALTER can just leave<br /> the existing rows alone --- only when you updatea row will it change<br /> to include the newly added field(s).<br /></blockquote></div><br />No, I meant that in caseof the row (1, NULL, NULL, 2, 3, NULL):<br />- the corresponding NULL bitmap is (100110...)<br />- the correspondingtuple is (1, 2, 3)<br />- t_natts=3 (if I am not wrong here)<br /><br />And in case of the row (5, 6, NULL,7, 8, 9):<br />- the corresponding NULL bitmap is (110111...)<br /> - the corresponding tuple is (5, 6, 7, 9)<br />- t_natts=4<br /><br />So, without a NULL bitmap, we cannot handle this kind of rows at all by t_natts only. And the NULLbitmap plays very important role in tuple saving, and I meant exactly that point. <br /><br /><br />
Dmitry Koterov <dmitry@koterov.ru> writes: > No, I meant that in case of the row (1, NULL, NULL, 2, 3, NULL): > - the corresponding NULL bitmap is (100110...) > - the corresponding tuple is (1, 2, 3) > - t_natts=3 (if I am not wrong here) You are wrong --- t_natts would be six here. In general the length of the null bitmap in a tuple (if it has one at all) is always exactly equal to its t_natts value. regards, tom lane
<div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt0pt 0.8ex; padding-left: 1ex;"><div class="im">Dmitry Koterov <<a href="mailto:dmitry@koterov.ru">dmitry@koterov.ru</a>>writes:<br /> > No, I meant that in case of the row (1, NULL,NULL, 2, 3, NULL):<br /> > - the corresponding NULL bitmap is (100110...)<br /> > - the corresponding tuple is(1, 2, 3)<br /> > - t_natts=3 (if I am not wrong here)<br /><br /></div>You are wrong --- t_natts would be six here. In general the length of<br /> the null bitmap in a tuple (if it has one at all) is always exactly<br /> equal to itst_natts value.<br /></blockquote></div><br />And so, the real number of values in the tuple - (1, 2, 3) above - is equalto the number of 1-bits in NULL bitmap. And the size of NULL bitmap is held in t_natts. I meant that when I said "thanksto NULL bitmap, adding a new nullable column is cheap". :-) And, of course, thanks to t_natts (HeapTupleHeaderGetNattsmacro) - too.<br /><br />