Thread: RFC: Restructuring pg_aggregate
I was originally planning to revise pg_aggregate along the same lines as pg_proc and so forth: add an aggnamespace column and fix the search code to be namespace-aware. But it seemed a tad annoying that standard function lookups would thereby incur *two* namespace-aware searches: one in pg_aggregate and one in pg_proc. Thinking about that, it occurred to me that there is another way. Why shouldn't aggregate functions have entries in pg_proc? Then one search would cover both possibilities, and we could eliminate some duplicate code in the parser. Doing things this way would mean that one could not create an aggregate function with the same name and input arguments as a regular function (at least, not in the same namespace). However, doing so has always been a bad idea, and it seems like it'd be a step forward not back for the system to reject it as a naming conflict. A more serious objection is that this will break client applications that know about the pg_aggregate table. However, 7.3 is already going to force a lot of reprogramming of clients that inspect system tables, because of the addition of namespaces. Restructuring pg_aggregate doesn't seem like it makes life all that much worse. I would envision this working like so: In pg_proc: add a boolean column "proisagg" to mark function entries that are aggregates. A row for an aggregate function would contain a pointer to a dummy C function that would just raise an error if called (which shouldn't ever happen, but just in case some bit of code doesn't get updated, this would be a good safety check). In pg_aggregate: remove the columns aggname, aggowner, aggbasetype, aggfinaltype, and add a column aggfnoid containing the OID of the aggregate's pg_proc row. (pg_aggregate itself doesn't need OIDs anymore, and its only index will be on aggfnoid.) Essentially this reduces pg_aggregate to an auxiliary extension of pg_proc, carrying the fields aggtransfn, aggfinalfn, aggtranstype, agginitval for those pg_proc rows that need them. An interesting aspect of this is that the catalog structure would now be prepared to support aggregate functions with more than one input, which is a feature we've been asked for occasionally. I am *not* volunteering to make that happen right now ... but the catalog structures would be ready for it. Comments, objections, better ideas? regards, tom lane
> A more serious objection is that this will break client applications > that know about the pg_aggregate table. However, 7.3 is already going > to force a lot of reprogramming of clients that inspect system tables, > because of the addition of namespaces. Restructuring pg_aggregate > doesn't seem like it makes life all that much worse. How about putting a note in the 7.3 release that tells them not to rely on sequential attnums in tn pg_attribute anymore. That should make it easier to implement column dropping in the future? Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > How about putting a note in the 7.3 release that tells them not to rely on > sequential attnums in tn pg_attribute anymore. That should make it easier > to implement column dropping in the future? That seems like pure speculation to me, if not outright wrong. If we can't renumber the attnums it'll be because the table's tuples still have data at a particular column position. Since we'll need to know the datatype of that data (if only to skip over it correctly), there will still have to be a pg_attribute entry for the dropped column. Thus, what people will more likely have to watch out for is pg_attribute rows marked "deleted" in some fashion. We are actually not that far away from being able to do DROP COLUMN, if people don't mind being slow to recover the space used by a dropped column. It'd work like this: 1. Add an "attisdropped" boolean to pg_attribute. 2. DROP COLUMN sets this flag and changes attname to something like "***deleted_NNN". (Changing attname is only necessary to allow the same column name to be reused without drawing a unique-index error.) That's it --- it's done. 3. Column lookup, expansion of *, etc have to be taught to ignore columns marked attisdropped. The idea is that the extant data sits there but is invisible. Inserts of new rows in the table would always insert a NULL in the dropped column (which'd fall out more or less for free, there being no way to tell the system to insert anything else). Over time, UPDATEs of extant rows would also replace the dropped data with NULLs. I suspect there are only about half a dozen key places that would have to explicitly check attisdropped. None of the low-level executor machinery would care at all, since it's dealing with "real" tuples where the attribute is still there, at least as a NULL. Hiroshi's "DROP_COLUMN_HACK" was essentially along this line, but I think he made a representational mistake by trying to change the attnums of dropped columns to be negative values. That means that a lot of low-level places *do* need to know about the dropped-column convention, else they can't make any sense of tuple layouts. The negative-attnum idea might have been a little easier for clients inspecting pg_attribute to cope with, but in practice I think they'd need to be taught about dropped columns anyway --- as evidenced by your remark suggesting that gaps in the sequence of positive attnums would break clients. regards, tom lane PS: Once you have that, try this on for size: ALTER COLUMN is ALTER DROP COLUMN;ALTER ADD COLUMN newtype;UPDATE foo SET newcol = coercion_fn(oldcol); That last couldn't be expressed as an SQL statement because the parser wouldn't allow access to oldcol, but there's nothing stopping it at the implementation level. This approach changes the user-visible column ordering, which'd be a tad annoying, so probably something based on building a new version of the table would be better. But as a quick hack this would be doable. Actually, given the DROP feature a user could do it for himself: ALTER ADD COLUMN tempcol newtype;UPDATE foo SET tempcol = coercion_fn(oldcol);ALTER DROP COLUMN oldcol;ALTER RENAME COLUMNtempcol to oldcol; which seems like an okay approach, especially since it'd allow the UPDATE computing the new column values to be of arbitrary complexity, not just a simple coercion of one existing column.
Tom Lane writes: > Why shouldn't aggregate functions have entries in pg_proc? Then one > search would cover both possibilities, and we could eliminate some > duplicate code in the parser. Furthermore, we could make the new function privileges apply to aggregates as well. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> Why shouldn't aggregate functions have entries in pg_proc? > Furthermore, we could make the new function privileges apply to aggregates > as well. Good point. Another thing that would fall out for free is that the aggregate type-coercion rules would become exactly like the function type-coercion rules; right now they are a tad stupider. regards, tom lane
> That seems like pure speculation to me, if not outright wrong. If we > can't renumber the attnums it'll be because the table's tuples still > have data at a particular column position. Since we'll need to know > the datatype of that data (if only to skip over it correctly), there > will still have to be a pg_attribute entry for the dropped column. > Thus, what people will more likely have to watch out for is pg_attribute > rows marked "deleted" in some fashion. You know there is a way to do this and not break client compatibility. Rename the current pg_attribute relation to pg_baseatt or something. Fix all references to it in the code. Create a system view called pg_attribute which is SELECT * (except attisdropped) FROM pg_baseattr WHERE NOT attisdropped. More work though, of course. > We are actually not that far away from being able to do DROP COLUMN, > if people don't mind being slow to recover the space used by a dropped > column. It'd work like this: Logical vs. physical column numbers would still be quite handy tho. If you're going to break compatibility, may as well do all breaks at once? > 1. Add an "attisdropped" boolean to pg_attribute. > > 2. DROP COLUMN sets this flag and changes attname to something like > "***deleted_NNN". (Changing attname is only necessary to allow the > same column name to be reused without drawing a unique-index error.) > That's it --- it's done. > > 3. Column lookup, expansion of *, etc have to be taught to ignore > columns marked attisdropped. > > The idea is that the extant data sits there but is invisible. Inserts > of new rows in the table would always insert a NULL in the dropped > column (which'd fall out more or less for free, there being no way > to tell the system to insert anything else). Over time, UPDATEs of > extant rows would also replace the dropped data with NULLs. Would it be possible to modify VACUUM FULL in some way so as to permanently remove these tuples? Surely people would like an actual space-saving column drop? Chris
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > You know there is a way to do this and not break client compatibility. > Rename the current pg_attribute relation to pg_baseatt or something. Fix > all references to it in the code. Create a system view called pg_attribute > which is SELECT * (except attisdropped) FROM pg_baseattr WHERE NOT > attisdropped. Wasn't your original concern that the attnums wouldn't be consecutive? How is this view going to hide that? > Logical vs. physical column numbers would still be quite handy tho. But confusing as all hell, at *all* levels of the code ... I've thought about that quite a bit, and I can't see that we could expect to make it work without a lot of hard-to-find bugs. Too many places where it's not instantly obvious which set of numbers you should be using. regards, tom lane
Tom Lane wrote: > Hiroshi's "DROP_COLUMN_HACK" was essentially along this line, but > I think he made a representational mistake by trying to change the > attnums of dropped columns to be negative values. Negative attnums had 2 advantages then. It had a big advantage that initdb isn't needed. Note that it was only a trial hack and there was no consensus on the way. It was very easy to change the implementation to use attisdropped. OTOH physical/logical attnums approach needed the change on pg_class, pg_attribute and so I've never had a chance to open the patch to public. It was also more sensitive about oversights of needed changes than the attisdropped flag approach. > That means that > a lot of low-level places *do* need to know about the dropped-column > convention, else they can't make any sense of tuple layouts. Why ? As you already mentioned, there were not that many places to be changed. Well what's changed since then ? regards, Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > Tom Lane wrote: >> That means that >> a lot of low-level places *do* need to know about the dropped-column >> convention, else they can't make any sense of tuple layouts. > Why ? As you already mentioned, there were not that many places > to be changed. There are not many places to change if the implementation uses attisdropped, because we *only* have to hide the existence of the column at the parser level. The guts of the system don't know anything funny is going on; a dropped column looks the same as an undropped one throughout the executor. But with negative attnums, even such basic routines as heap_formtuple have to know about it, no? regards, tom lane
Hiroshi Inoue wrote: > Tom Lane wrote: > > > Hiroshi's "DROP_COLUMN_HACK" was essentially along this line, but > > I think he made a representational mistake by trying to change the > > attnums of dropped columns to be negative values. > > Negative attnums had 2 advantages then. It had a big > advantage that initdb isn't needed. Note that it was > only a trial hack and there was no consensus on the way. > It was very easy to change the implementation to use > attisdropped. OTOH physical/logical attnums approach > needed the change on pg_class, pg_attribute and so > I've never had a chance to open the patch to public. > It was also more sensitive about oversights of needed > changes than the attisdropped flag approach. > > > That means that > > a lot of low-level places *do* need to know about the dropped-column > > convention, else they can't make any sense of tuple layouts. > > Why ? As you already mentioned, there were not that many places > to be changed. > > Well what's changed since then ? Here is an old email from me that outlines the idea of having a physical/logical attribute numbering system, and the advantages. For implementation, I thought we could do most of the work by filtering what the client saw, and let the server just worry about physical numbering, except for 'SELECT *' expansion. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Tom Lane wrote: > > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > > Tom Lane wrote: > >> That means that > >> a lot of low-level places *do* need to know about the dropped-column > >> convention, else they can't make any sense of tuple layouts. > > > Why ? As you already mentioned, there were not that many places > > to be changed. > > There are not many places to change if the implementation uses > attisdropped, because we *only* have to hide the existence of the column > at the parser level. The guts of the system don't know anything funny > is going on; a dropped column looks the same as an undropped one > throughout the executor. But with negative attnums, even such basic > routines as heap_formtuple have to know about it, no? When a tuple descriptor is made, the info of dropped columns are placed at (their physical position - 1) index in the same way as ordinary columns. There are few places where conversions between negative attnums and the physical positions are needed. The following is my posting more than 2 years ago. What's changed since then. regards, Hiroshi Inoue I don't want a final implementation this time. What I want is to provide a quick hack for both others and me to judge whetherthis direction is good or not. My idea is essentially an invisible column implementation. DROP COLUMN would change the target pg_attribute tuple as follows..attnum -> an offset - attnum;atttypid -> 0 We would be able to see where to change by tracking error/ crashes caused by this change. I would also change attname to '*already dropped %d' for examle to avoid duplicate attname.
Bruce Momjian wrote: > > Hiroshi Inoue wrote: > > Tom Lane wrote: > > > > > Hiroshi's "DROP_COLUMN_HACK" was essentially along this line, but > > > I think he made a representational mistake by trying to change the > > > attnums of dropped columns to be negative values. > > > > Negative attnums had 2 advantages then. It had a big > > advantage that initdb isn't needed. Note that it was > > only a trial hack and there was no consensus on the way. > > It was very easy to change the implementation to use > > attisdropped. OTOH physical/logical attnums approach > > needed the change on pg_class, pg_attribute and so > > I've never had a chance to open the patch to public. > > It was also more sensitive about oversights of needed > > changes than the attisdropped flag approach. > > > > > That means that > > > a lot of low-level places *do* need to know about the dropped-column > > > convention, else they can't make any sense of tuple layouts. > > > > Why ? As you already mentioned, there were not that many places > > to be changed. > > > > Well what's changed since then ? > > Here is an old email from me that outlines the idea of having a > physical/logical attribute numbering system, and the advantages. I already tried physical/logical attribute implementation pretty long ago. Where are new ideas to solve the problems that the approach has ? regards, Hiroshi Inoue
Hiroshi Inoue wrote: > > > Why ? As you already mentioned, there were not that many places > > > to be changed. > > > > > > Well what's changed since then ? > > > > Here is an old email from me that outlines the idea of having a > > physical/logical attribute numbering system, and the advantages. > > I already tried physical/logical attribute implementation > pretty long ago. Where are new ideas to solve the problems > that the approach has ? Good question. I am suggesting more than just the drop column fix. It could be used for smaller data files to reduce padding, fix for inheritance problems with ADD COLUMN, and performance of moving varlena's to the end of the row. Also, my idea was to have the physical/logical mapping happen closer to the client, so the backend mostly only deals with physical. I was thinking of having the libpq backend communication layer actually do the reordering of the return results. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > Hiroshi Inoue wrote: > > > > Why ? As you already mentioned, there were not that many places > > > > to be changed. > > > > > > > > Well what's changed since then ? > > > > > > Here is an old email from me that outlines the idea of having a > > > physical/logical attribute numbering system, and the advantages. > > > > I already tried physical/logical attribute implementation > > pretty long ago. Where are new ideas to solve the problems > > that the approach has ? > > Good question. I am suggesting more than just the drop column fix. It > could be used for smaller data files to reduce padding, fix for > inheritance problems with ADD COLUMN, and performance of moving > varlena's to the end of the row. > > Also, my idea was to have the physical/logical mapping happen closer to > the client, so the backend mostly only deals with physical. If the client has to bear the some part, isn't the invisible column approach much simpler ? I've put a pretty much time into DROP COLUMN feature but I am really disappointed to see the comments in this thread. What DROP COLUMN has brought me seems only a waste of time. Possibly I must have introduced either implementation forcibly. regards, Hiroshi Inoue
> If the client has to bear the some part, isn't the invisible > column approach much simpler ? > > I've put a pretty much time into DROP COLUMN feature but > I am really disappointed to see the comments in this thread. > What DROP COLUMN has brought me seems only a waste of time. I kind of agree with Hiroshi here. All I want to be able to do is drop columns from my tables, and reclaim the space. I've got all sorts of production tables with columns just sitting there doing nothing, awaiting the time that I can happily drop them. It seems to me that whatever we do will require some kind of client breakage. Chris
Hiroshi Inoue wrote: > If the client has to bear the some part, isn't the invisible > column approach much simpler ? > > I've put a pretty much time into DROP COLUMN feature but > I am really disappointed to see the comments in this thread. > What DROP COLUMN has brought me seems only a waste of time. > > Possibly I must have introduced either implementation forcibly. I understand. I personally think maybe we have been a little to picky about patches being accepted. Sometimes when something is not 100% perfect, we do nothing rather than accept the patch, and replace or improve it later. The DROP COLUMN approach you had clearly is one of them. Personally, now that we have relfilenode, I think we should implement drop of columns by just recreating the table without the column. The big problem with DROP COLUMN was that we couldn't decide on what to do, so we did nothing, which is probably worse than just choosing one and doing it. Our code is littered with my 80% solutions for LIKE optimization, optimizer statistics, BETWEEN, and lots of other solutions that have met a need and are now being replaced with better code. My code was not great, but I hadn't dont them, PostgreSQL would have had even more missing features than we do now. DROP COLUMN is clearly one where we missed getting something that works and would keep people happy. As far as my proposal, my idea was not to do it in the client, but rather to do it just before the data is sent from/to the client. Maybe that is a stupid idea. I never really researched it. My idea was more to make the physical/logical column numbers distinct so certain tricks could be performed. It wasn't for DROP COLUMN specifically, and in fact to do DROP COLUMN with my code, there would have to be more code similar to what you had where clients would see a column and have to skip it. I was focusing more on physical/logical to enable other features. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Christopher Kings-Lynne wrote: > > If the client has to bear the some part, isn't the invisible > > column approach much simpler ? > > > > I've put a pretty much time into DROP COLUMN feature but > > I am really disappointed to see the comments in this thread. > > What DROP COLUMN has brought me seems only a waste of time. > > I kind of agree with Hiroshi here. All I want to be able to do is drop > columns from my tables, and reclaim the space. I've got all sorts of > production tables with columns just sitting there doing nothing, awaiting > the time that I can happily drop them. It seems to me that whatever we do > will require some kind of client breakage. Actually, what we need to do to reclaim space is to enable table recreation without the column, now that we have relfilenode for file renaming. It isn't hard to do, but no one has focused on it. I want to focus on it, but have not had the time, obviously, and would be very excited to assist someone else. Hiroshi's fine idea of marking certain columns as unused would not have reclaimed the missing space, just as my idea of physical/logical column distinction would not reclaim the space either. Again, my physical/logical idea is more for fixing other problems and optimization, not DROP COLUMN. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Actually, what we need to do to reclaim space is to enable table > recreation without the column, now that we have relfilenode for file > renaming. It isn't hard to do, but no one has focused on it. I want to > focus on it, but have not had the time, obviously, and would be very > excited to assist someone else. I'm happy to help - depends if it's within my skill level or not tho. Most of the time the problem I have is finding where to make the changes, not actually making the changes themselves. So, count me in. > Hiroshi's fine idea of marking certain columns as unused would not have > reclaimed the missing space, just as my idea of physical/logical column > distinction would not reclaim the space either. Again, my > physical/logical idea is more for fixing other problems and > optimization, not DROP COLUMN. Question: Is it _possible_ to reclaim the space during a VACUUM FULL? I do not know enough about the file format to know this. What happens if the VACUUM is stopped halfway thru reclaiming a column in a table? Bruce: WRT modifying libpq to do the translation - won't this cause probs for JDBC and ODBC people? Chris
Christopher Kings-Lynne wrote: > > Actually, what we need to do to reclaim space is to enable table > > recreation without the column, now that we have relfilenode for file > > renaming. It isn't hard to do, but no one has focused on it. I want to > > focus on it, but have not had the time, obviously, and would be very > > excited to assist someone else. > > I'm happy to help - depends if it's within my skill level or not tho. Most > of the time the problem I have is finding where to make the changes, not > actually making the changes themselves. So, count me in. OK, let me mention that I have had great success with chat sessions with PostgreSQL developers. They can code and ask questions and I can answer quickly. Seems to be speeding things along for some people. I am:AIM bmomjianICQ 151255111Yahoo bmomjianMSN root@candle.pha.pa.us I am also on the PostgreSQL IRC channel. As far as where to start, I think the CLUSTER command would be a good start because it just reorders the existing table. Then DROP COLUMN can come out of that by removing the column during the copy, and removing mention of the column from pg_attribute, and of course renumbering the gap. > > Hiroshi's fine idea of marking certain columns as unused would not have > > reclaimed the missing space, just as my idea of physical/logical column > > distinction would not reclaim the space either. Again, my > > physical/logical idea is more for fixing other problems and > > optimization, not DROP COLUMN. > > Question: Is it _possible_ to reclaim the space during a VACUUM FULL? I do > not know enough about the file format to know this. What happens if the > VACUUM is stopped halfway thru reclaiming a column in a table? Not really. I moves only whole tuples, and only certain ones. > Bruce: WRT modifying libpq to do the translation - won't this cause probs > for JDBC and ODBC people? No, not in libpq, but rather in backend/libpq, the backend part of the connection. My idea is for the user to think things are in a different order in the row than they actually appear on disk. I haven't really researched it enough to understand its validity. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Christopher Kings-Lynne wrote: > > > If the client has to bear the some part, isn't the invisible > > column approach much simpler ? > > > > I've put a pretty much time into DROP COLUMN feature but > > I am really disappointed to see the comments in this thread. > > What DROP COLUMN has brought me seems only a waste of time. > > I kind of agree with Hiroshi here. All I want to be able to do is drop > columns from my tables, and reclaim the space. I've got all sorts of > production tables with columns just sitting there doing nothing, awaiting > the time that I can happily drop them. > It seems to me that whatever we do > will require some kind of client breakage. Physical/logical attnum approach was mainly to not break clients. I implemented it on trial but the implementation was hard to maintain unfortunately. It's pretty difficult to decide whether the number is physical or logical in many cases. regards, Hiroshi Inoue
Hiroshi Inoue wrote: > > It seems to me that whatever we do > > will require some kind of client breakage. > > Physical/logical attnum approach was mainly to not break > clients. I implemented it on trial but the implementation > was hard to maintain unfortunately. It's pretty difficult > to decide whether the number is physical or logical in > many cases. How many cases do we have that use logical numering? Hiroshi, I know you are the expert on this. I know 'SELECT *' uses it, but are their other places that need to know about the logical ordering of the columns? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > Hiroshi Inoue wrote: > > If the client has to bear the some part, isn't the invisible > > column approach much simpler ? > > > > I've put a pretty much time into DROP COLUMN feature but > > I am really disappointed to see the comments in this thread. > > What DROP COLUMN has brought me seems only a waste of time. > > > > Possibly I must have introduced either implementation forcibly. > > I understand. I personally think maybe we have been a little to picky > about patches being accepted. Sometimes when something is not 100% > perfect, we do nothing rather than accept the patch, and replace or > improve it later. The DROP COLUMN approach you had clearly is one of > them. I don't complain about the rejection of my patch. If it has an essential flaw we had better reject it. What I'm complaining is why it is OK now whereas there's nothing new. regards, Hiroshi Inoue
Hiroshi Inoue wrote: > Bruce Momjian wrote: > > > > Hiroshi Inoue wrote: > > > If the client has to bear the some part, isn't the invisible > > > column approach much simpler ? > > > > > > I've put a pretty much time into DROP COLUMN feature but > > > I am really disappointed to see the comments in this thread. > > > What DROP COLUMN has brought me seems only a waste of time. > > > > > > Possibly I must have introduced either implementation forcibly. > > > > I understand. I personally think maybe we have been a little to picky > > about patches being accepted. Sometimes when something is not 100% > > perfect, we do nothing rather than accept the patch, and replace or > > improve it later. The DROP COLUMN approach you had clearly is one of > > them. > > I don't complain about the rejection of my patch. > If it has an essential flaw we had better reject it. > What I'm complaining is why it is OK now whereas > there's nothing new. Sure, I understand. My physical/logical idea may have the same problems as your DROP COLUMN idea, and may be as rapidly rejected. I am just throwing it out for discussion. I am not sure I like it. :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Actually, what we need to do to reclaim space is to enable table > recreation without the column, now that we have relfilenode for file > renaming. It isn't hard to do, but no one has focused on it. I want to > focus on it, but have not had the time, obviously, and would be very > excited to assist someone else. > > Hiroshi's fine idea of marking certain columns as unused would not have > reclaimed the missing space, just as my idea of physical/logical column > distinction would not reclaim the space either. Again, my > physical/logical idea is more for fixing other problems and > optimization, not DROP COLUMN. Hmmm. Personally, I think that a DROP COLUMN that cannot reclaim space is kinda useless - you may as well just use a view!!! So how would this occur?: 1. Lock target table for writing (allow reads) 2. Begin a table scan on target table, writing a new file with a particular filenode 3. Delete the attribute row from pg_attribute 4. Point the table in the catalog to the new filenode 5. Release locks 6. Commit transaction 7. Delete orhpan filenode i. Upon postmaster startup, remove any orphaned filenodes The real problem here is the fact that there are now missing attnos in pg_attribute. Either that's handled or we renumber the attnos - which is also quite hard? This, of course, suffers from the double size data problem - but I believe that it does not matter - we just need to document it. Interestingly enough, Oracle support ALTER TABLE foo SET UNUSED col; Which invalidates the attribute entry, and: ALTER TABLE foo DROP col CHECKPOINT 1000; Which actually reclaims the space. The optional CHECKPOINT [n] clause tells Oracle to do a checkpoint every [n] rows. "Checkpointing cuts down the amount of undo logs accumulated during the drop column operation to avoid running out of rollback segment space. However, if this statement is interrupted after a checkpoint has been applied, the table remains in an unusable state. While the table is unusable, the only operations allowed on it are DROP TABLE, TRUNCATE TABLE, and ALTER TABLE DROP COLUMNS CONTINUE (described below). " Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > The real problem here is the fact that there are now missing attnos in > pg_attribute. Either that's handled or we renumber the attnos - which is > also quite hard? Updating pg_attribute per se is not so hard --- just store new copies of all the rows for the table. However, propagating the changes into other places could be quite painful (I'm thinking of column numbers in stored constraints, rules, etc). It seems to me that reducing the column to NULLs already gets you the majority of the space savings. I don't think there is a case to be made that getting back that last bit is worth the pain involved, either in implementation effort or direct runtime costs (do you really want a DROP COLUMN to force an immediate rewrite of the whole table?) regards, tom lane
Christopher Kings-Lynne wrote: > > Actually, what we need to do to reclaim space is to enable table > > recreation without the column, now that we have relfilenode for file > > renaming. It isn't hard to do, but no one has focused on it. I want to > > focus on it, but have not had the time, obviously, and would be very > > excited to assist someone else. > > > > Hiroshi's fine idea of marking certain columns as unused would not have > > reclaimed the missing space, just as my idea of physical/logical column > > distinction would not reclaim the space either. Again, my > > physical/logical idea is more for fixing other problems and > > optimization, not DROP COLUMN. > > Hmmm. Personally, I think that a DROP COLUMN that cannot reclaim space is > kinda useless - you may as well just use a view!!! Yep, kind of a problem. It is a tradeoff between double diskspace/speed and removing column from disk. I guess that's why Oracle has both. > > So how would this occur?: > > 1. Lock target table for writing (allow reads) > 2. Begin a table scan on target table, writing > a new file with a particular filenode > 3. Delete the attribute row from pg_attribute > 4. Point the table in the catalog to the new filenode > 5. Release locks > 6. Commit transaction > 7. Delete orhpan filenode Yep, something like that. CLUSTER is a good start. DROP COLUMN just deals with the attno too. You would have to renumber them to fill the gap. > i. Upon postmaster startup, remove any orphaned filenodes Actually, we don't have a good solution for finding orphaned filenodes right now. I do have some code that tries to do this as part of VACUUM but it was not 100% perfect, so it was rejected. I am willing to open the discussion to see if a perfect solution can be found. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > > The real problem here is the fact that there are now missing attnos in > > pg_attribute. Either that's handled or we renumber the attnos - which is > > also quite hard? > > Updating pg_attribute per se is not so hard --- just store new copies of > all the rows for the table. However, propagating the changes into other > places could be quite painful (I'm thinking of column numbers in stored > constraints, rules, etc). > > It seems to me that reducing the column to NULLs already gets you the > majority of the space savings. I don't think there is a case to be made > that getting back that last bit is worth the pain involved, either in > implementation effort or direct runtime costs (do you really want a DROP > COLUMN to force an immediate rewrite of the whole table?) That is an excellent point about having to fix all the places that refer to attno. In fact, we have been moving away from attname references to attno references for a while, so it only gets worse. Tom is also correct that setting it to NULL removes the problem of disk space usage quite easily. That only leaves the problem of having gaps in the pg_attribute for that relation, and as I remember, that was the problem for Hiroshi's DROP COLUMN change, but at this point, after years of delay with no great solution on the horizon, we may as well just get this working. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> Why shouldn't aggregate functions have entries in pg_proc? Then one >> search would cover both possibilities, and we could eliminate some >> duplicate code in the parser. > Furthermore, we could make the new function privileges apply to aggregates > as well. GRANT/REVOKE FUNCTION will now work on aggregate functions too (is there any value in making a variant syntax for aggregates?). However, I didn't implement enforcement of the EXECUTE privilege yet. I was slightly bemused to notice that your implementation of it for regular functions tests the privilege at plan startup but doesn't actually throw the error until the function is called. What's the point of that? Seems like we might as well throw the error in init_fcache and not bother with storing a boolean. regards, tom lane
Tom Lane writes: > I was slightly bemused to notice that your implementation of it for > regular functions tests the privilege at plan startup but doesn't > actually throw the error until the function is called. What's the > point of that? Seems like we might as well throw the error in > init_fcache and not bother with storing a boolean. Yeah, it's a bit funny. I wanted to keep the fcache code from doing anything not to do with caching, and I wanted to keep the permission check in the executor, like it is for tables. There were a couple of cases, which I have not fully explored yet, for which this seemed like a good idea, such as some functions being in the plan but not being executed, or the permission check being avoided for some functions (e.g., cast functions). -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> I was slightly bemused to notice that your implementation of it for >> regular functions tests the privilege at plan startup but doesn't >> actually throw the error until the function is called. What's the >> point of that? Seems like we might as well throw the error in >> init_fcache and not bother with storing a boolean. > Yeah, it's a bit funny. I wanted to keep the fcache code from doing > anything not to do with caching, and I wanted to keep the permission check > in the executor, like it is for tables. Well, init_fcache is part of executor startup, so that seems reasonably parallel to the table case. I do not buy the idea that we shouldn't throw an error if the function happens not to be called --- that makes the behavior dependent on both planner choices and data conditions, which seems like a bad idea. For comparison, we *will* throw a permission error on tables even if the actual execution of the plan turns out never to read a single row from that table. (After a bit of code reading --- actually, at present init_fcache doesn't get called until first use of the function, so it's a distinction without a difference. But I have been thinking about revising this so that function caches are set up during plan initialization, which is why I'm questioning the point now.) > There were a couple of cases, which I have not fully explored yet, for > which this seemed like a good idea, such as some functions being in the > plan but not being executed, or the permission check being avoided for > some functions (e.g., cast functions). Cast functions? Not sure that I see the point of excluding them. What I'm inclined to do for aggregates is to check and throw the error (if any) during ExecInitAgg, ie, plan startup for the Agg plan node. I was just a tad startled to notice that the implementation wasn't parallel for plain functions ... regards, tom lane
Christopher Kings-Lynne wrote: > > Hmmm. Personally, I think that a DROP COLUMN that cannot reclaim space is > kinda useless - you may as well just use a view!!! > > So how would this occur?: > > 1. Lock target table for writing (allow reads) > 2. Begin a table scan on target table, writing > a new file with a particular filenode > 3. Delete the attribute row from pg_attribute > 4. Point the table in the catalog to the new filenode > 5. Release locks > 6. Commit transaction > 7. Delete orhpan filenode > > i. Upon postmaster startup, remove any orphaned filenodes > > The real problem here is the fact that there are now missing attnos in > pg_attribute. Either that's handled or we renumber the attnos - which is > also quite hard? The attnos should be renumbered and it's easy. But the above seems only 20% of the total implementation. If the attnos are renumbered, all objects which refer to the numbers must be invalidated or re-compiled ... For example the parameters of foreign key constraints triggers are consist of relname and colnames currently. There has been a proposal that change to use relid or column numbers instead. Certainly it makes RENAME happy but DROP COLUMN unhappy. If there's a foreign key a_rel/1/3 and the second column of the relation is dropped the parameter must be changed to be a_rel/1/2. If neither foreign key stuff nor DROP COLUMN take the other into account, the consistency is easily broken. regards, Hiroshi Inoue
Hiroshi Inoue wrote: > Christopher Kings-Lynne wrote: > > > > Hmmm. Personally, I think that a DROP COLUMN that cannot reclaim space is > > kinda useless - you may as well just use a view!!! > > > > So how would this occur?: > > > > 1. Lock target table for writing (allow reads) > > 2. Begin a table scan on target table, writing > > a new file with a particular filenode > > 3. Delete the attribute row from pg_attribute > > 4. Point the table in the catalog to the new filenode > > 5. Release locks > > 6. Commit transaction > > 7. Delete orhpan filenode > > > > i. Upon postmaster startup, remove any orphaned filenodes > > > > The real problem here is the fact that there are now missing attnos in > > pg_attribute. Either that's handled or we renumber the attnos - which is > > also quite hard? > > The attnos should be renumbered and it's easy. > But the above seems only 20% of the total implementation. > If the attnos are renumbered, all objects which refer to > the numbers must be invalidated or re-compiled ... > For example the parameters of foreign key constraints > triggers are consist of relname and colnames currently. > There has been a proposal that change to use relid or > column numbers instead. Certainly it makes RENAME happy > but DROP COLUMN unhappy. If there's a foreign key a_rel/1/3 > and the second column of the relation is dropped the > parameter must be changed to be a_rel/1/2. If neither > foreign key stuff nor DROP COLUMN take the other into > account, the consistency is easily broken. I think that is why Tom was suggesting making all the column values NULL and removing the pg_attribute row for the column. With a NULL value, it doesn't take up any room in the tuple, and with the pg_attribute column gone, no one will see that row. The only problem is the gap in attno numbering. How big a problem is that? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I think that is why Tom was suggesting making all the column values NULL > and removing the pg_attribute row for the column. That was not my suggestion. > With a NULL value, it > doesn't take up any room in the tuple, and with the pg_attribute column > gone, no one will see that row. The only problem is the gap in attno > numbering. How big a problem is that? You can't do it that way unless you're intending to rewrite all rows of the relation before committing the ALTER; which would be the worst of both worlds. The pg_attribute row *must* be retained to show the datatype of the former column, so that we can correctly skip over it in tuples where the column isn't yet nulled out. Hiroshi did this by renumbering the attnum; I propose leaving attnum alone and instead adding an attisdropped flag. That would avoid creating a gap in the column numbers, but either way is likely to affect some applications that inspect pg_attribute. regards, tom lane
Bruce Momjian wrote: > > Hiroshi Inoue wrote: > > Christopher Kings-Lynne wrote: > > > > I think that is why Tom was suggesting making all the column values NULL > and removing the pg_attribute row for the column. With a NULL value, it > doesn't take up any room in the tuple, and with the pg_attribute column > gone, no one will see that row. The only problem is the gap in attno > numbering. How big a problem is that? There's no problem with applications which don't inquire of system catalogs(pg_attribute). Unfortunately we have been very tolerant of users' access on system tables and there would be pretty many applications which inquire of pg_attribute. regards, Hiroshi Inoue
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: 12 April 2002 03:54 > To: Bruce Momjian > Cc: Hiroshi Inoue; Christopher Kings-Lynne; > pgsql-hackers@postgresql.org > Subject: Re: RFC: Restructuring pg_aggregate > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I think that is why Tom was suggesting making all the column values > > NULL and removing the pg_attribute row for the column. > > That was not my suggestion. > > > With a NULL value, it > > doesn't take up any room in the tuple, and with the pg_attribute > > column gone, no one will see that row. The only problem is > the gap in > > attno numbering. How big a problem is that? > > You can't do it that way unless you're intending to rewrite > all rows of the relation before committing the ALTER; which > would be the worst of both worlds. The pg_attribute row > *must* be retained to show the datatype of the former column, > so that we can correctly skip over it in tuples where the > column isn't yet nulled out. > > Hiroshi did this by renumbering the attnum; I propose leaving > attnum alone and instead adding an attisdropped flag. That > would avoid creating a gap in the column numbers, but either > way is likely to affect some applications that inspect pg_attribute. Applications like pgAdmin that inspect pg_attribute are being seriously hacked to incorporate schema support anyway for 7.3. Personnally I'd be glad to spend some time re-coding to allow for this, just to not have to answer the numerous 'how do I drop a column' emails I get reguarly. Regards, Dave.
> Updating pg_attribute per se is not so hard --- just store new copies of > all the rows for the table. However, propagating the changes into other > places could be quite painful (I'm thinking of column numbers in stored > constraints, rules, etc). > > It seems to me that reducing the column to NULLs already gets you the > majority of the space savings. I don't think there is a case to be made > that getting back that last bit is worth the pain involved, either in > implementation effort or direct runtime costs (do you really want a DROP > COLUMN to force an immediate rewrite of the whole table?) OK, sounds fair. However, is there a more aggressive way of reclaiming the space? The problem with updating all the rows to null for that column is that the on-disk size is doubled anyway, right? So, could a VACUUM FULL process do the nulling for us? Vacuum works outside of normal transaction constraints anyway...? Also, it seems to me that at some point we are forced to break client compatibility. Either we add attisdropped field to pg_attribute, or we use Hiroshi's (-1 * attnum - offset) idea. Both Tom and Hiroshi have good reasons for each of these - would it be possible for you guys to post with your reasons for and against both the techniques. I just want to get to an implementation specification we all agree on that can be written up and then the coding can proceed. Maybe we should add it to the 'Postgres Major Projects' page - and remove those old ones that have already been implemented. Chris
[ way past time to change the title of this thread ] "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > OK, sounds fair. However, is there a more aggressive way of reclaiming the > space? The problem with updating all the rows to null for that column is > that the on-disk size is doubled anyway, right? So, could a VACUUM FULL > process do the nulling for us? Vacuum works outside of normal transaction > constraints anyway...? No, VACUUM has the same transactional constraints as everyone else (unless you'd like a crash during VACUUM to trash your table...) I do not think that we necessarily need to provide a special mechanism for this at all. The docs for DROP COLUMN could simply explain that the DROP itself doesn't reclaim the space, but that the space will be reclaimed over time as extant rows are updated or deleted. If you want to hurry the process along you could doUPDATE table SET othercol = othercolVACUUM FULL to force all the rows to be updated and then reclaim space. But given the peak-space-is-twice-as-much behavior, this is not obviously a win. I'd sure object to an implementation that *forced* that approach on me, whether during DROP itself or the next VACUUM. > Also, it seems to me that at some point we are forced to break client > compatibility. Either we add attisdropped field to pg_attribute, or we use > Hiroshi's (-1 * attnum - offset) idea. Both Tom and Hiroshi have good > reasons for each of these - would it be possible for you guys to post with > your reasons for and against both the techniques. Er, didn't we do that already? regards, tom lane
On Sat, 2002-04-13 at 17:29, Tom Lane wrote: > [ way past time to change the title of this thread ] > > "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > > OK, sounds fair. However, is there a more aggressive way of reclaiming the > > space? The problem with updating all the rows to null for that column is > > that the on-disk size is doubled anyway, right? So, could a VACUUM FULL > > process do the nulling for us? Vacuum works outside of normal transaction > > constraints anyway...? > > No, VACUUM has the same transactional constraints as everyone else > (unless you'd like a crash during VACUUM to trash your table...) But can't it do the SET TO NULL thing if it knows that the transaction that dropped the column has committed. This could probably even be done in the light version of vacuum with a special flag (VACUUM RECLAIM). Of course running this this makes sense only if the dropped column had some significant amount of data . > I do not think that we necessarily need to provide a special mechanism > for this at all. The docs for DROP COLUMN could simply explain that > the DROP itself doesn't reclaim the space, but that the space will be > reclaimed over time as extant rows are updated or deleted. If you want > to hurry the process along you could do > UPDATE table SET othercol = othercol > VACUUM FULL If only we could do it in namageable chunks: FOR i IN 0 TO (size(table)/chunk) DO UPDATE table SET othercol = othercol OFFSET i*chunk LIMIT chunk VACUUM FULL; END FOR; or even better - "VACUUM FULL OFFSET i*chunk LIMIT chunk" and then make chunk == 1 :) -------------- Hannu
Hannu Krosing <hannu@tm.ee> writes: >> No, VACUUM has the same transactional constraints as everyone else >> (unless you'd like a crash during VACUUM to trash your table...) > But can't it do the SET TO NULL thing if it knows that the transaction > that dropped the column has committed. Hmm, you're thinking of allowing VACUUM to overwrite tuples in-place? Strikes me as unsafe, but I'm not really sure. In any case it's not that easy. If the column is wide enough that reclaiming its space is actually worth doing, then presumably most of its entries are just TOAST links, and what has to be done is not just rewrite the main tuple but mark the TOAST rows deleted. This is not something that VACUUM does now; I'd be rather concerned about the locking implications (especially for lightweight VACUUM). regards, tom lane
> No, VACUUM has the same transactional constraints as everyone else > (unless you'd like a crash during VACUUM to trash your table...) Seriously, you can run VACUUM in a transaction and rollback the movement of a tuple on disk? What do you mean by same transactional constraints? Chris
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: >> No, VACUUM has the same transactional constraints as everyone else >> (unless you'd like a crash during VACUUM to trash your table...) > Seriously, you can run VACUUM in a transaction and rollback the movement of > a tuple on disk? What do you mean by same transactional constraints? In VACUUM FULL, tuples moved to compact the table aren't good until you commit. In this hypothetical column-drop-implementing VACUUM, I think there'd need to be some similar rule --- otherwise it's not clear what happens to TOASTED data if you crash partway through. (In particular, if we tried overwriting main tuples in place as Hannu was suggesting, we'd need a way of being certain the deletion of the corresponding TOAST rows occurs *before* we overwrite the only reference to them.) regards, tom lane
Christopher Kings-Lynne wrote: > > Also, it seems to me that at some point we are forced to break client > compatibility. It's not a users' consensus at all. I'm suspicious if DROP COLUMN is such a significant feature to break client compatibility at our ease. > Either we add attisdropped field to pg_attribute, or we use > Hiroshi's (-1 * attnum - offset) idea. Both Tom and Hiroshi have good > reasons for each of these - would it be possible for you guys to post with > your reasons for and against both the techniques. I don't object to adding attisdropped field. What I meant to say is that the differene is very small. regards, Hiroshi Inoue
Thread added to TODO.detail/drop. --------------------------------------------------------------------------- Christopher Kings-Lynne wrote: > > Actually, what we need to do to reclaim space is to enable table > > recreation without the column, now that we have relfilenode for file > > renaming. It isn't hard to do, but no one has focused on it. I want to > > focus on it, but have not had the time, obviously, and would be very > > excited to assist someone else. > > > > Hiroshi's fine idea of marking certain columns as unused would not have > > reclaimed the missing space, just as my idea of physical/logical column > > distinction would not reclaim the space either. Again, my > > physical/logical idea is more for fixing other problems and > > optimization, not DROP COLUMN. > > Hmmm. Personally, I think that a DROP COLUMN that cannot reclaim space is > kinda useless - you may as well just use a view!!! > > So how would this occur?: > > 1. Lock target table for writing (allow reads) > 2. Begin a table scan on target table, writing > a new file with a particular filenode > 3. Delete the attribute row from pg_attribute > 4. Point the table in the catalog to the new filenode > 5. Release locks > 6. Commit transaction > 7. Delete orhpan filenode > > i. Upon postmaster startup, remove any orphaned filenodes > > The real problem here is the fact that there are now missing attnos in > pg_attribute. Either that's handled or we renumber the attnos - which is > also quite hard? > > This, of course, suffers from the double size data problem - but I believe > that it does not matter - we just need to document it. > > Interestingly enough, Oracle support > > ALTER TABLE foo SET UNUSED col; > > Which invalidates the attribute entry, and: > > ALTER TABLE foo DROP col CHECKPOINT 1000; > > Which actually reclaims the space. The optional CHECKPOINT [n] clause > tells Oracle to do a checkpoint every [n] rows. > > "Checkpointing cuts down the amount of undo logs accumulated during the > drop column operation to avoid running out of rollback segment space. > However, if this statement is interrupted after a checkpoint has been > applied, the table remains in an unusable state. While the table is > unusable, the only operations allowed on it are DROP TABLE, TRUNCATE > TABLE, and ALTER TABLE DROP COLUMNS CONTINUE (described below). " > > Chris > > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026