Thread: psql and Control-C
Some people have indicated that they don't like how psql currently handles Control-C if no query is in progress. I consider the behaviour of the shells desirable but, quite frankly, I don't know how to do it. For some reason a readline'd session always wants me to press one more key after Control-C before getting back to a clean prompt. A much bigger problem is that if I don't use/have readline then I don't see a way to preempt the fgets() call. So unless someone has a hint or wants to look at it, I could offer ignoring the signal altogether in interactive mode, and perhaps make it stop scripts in the other case. (Leaving the query cancelling as is, of course.) Actually, shouldn't a Ctrl-C in a script cancel the query *and* stop the script at all times? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Charset ISO-8859-1 unsupported, filtering to ASCII...] > Some people have indicated that they don't like how psql currently handles > Control-C if no query is in progress. I consider the behaviour of the > shells desirable but, quite frankly, I don't know how to do it. > > For some reason a readline'd session always wants me to press one more key > after Control-C before getting back to a clean prompt. A much bigger > problem is that if I don't use/have readline then I don't see a way to > preempt the fgets() call. > > So unless someone has a hint or wants to look at it, I could offer > ignoring the signal altogether in interactive mode, and perhaps make it > stop scripts in the other case. (Leaving the query cancelling as is, of > course.) > > Actually, shouldn't a Ctrl-C in a script cancel the query *and* stop the > script at all times? Seems we can just ignore ^C if a query is not being run. Is that OK with everyone. Looks easy to do. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Peter Eisentraut wrote: > > Some people have indicated that they don't like how psql currently handles > Control-C if no query is in progress. I consider the behaviour of the > shells desirable but, quite frankly, I don't know how to do it. The typical way to do this sort of thing is to longjmp back to the main loop. And I think if you look at sig.c in bash, this is probably what they are doing. > Actually, shouldn't a Ctrl-C in a script cancel the query *and* stop the > script at all times? Yes.
> Seems we can just ignore ^C if a query is not being run. Is that OK > with everyone. Looks easy to do. It would be a trap for new users (some old ones too) who may not know how to escape. longjmp should be easy too, if it works.
* Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> [000217 16:20] wrote: > Peter Eisentraut wrote: > > > > Some people have indicated that they don't like how psql currently handles > > Control-C if no query is in progress. I consider the behaviour of the > > shells desirable but, quite frankly, I don't know how to do it. > > The typical way to do this sort of thing is to longjmp back to the main > loop. And I think if you look at sig.c in bash, this is probably what > they are doing. > > > Actually, shouldn't a Ctrl-C in a script cancel the query *and* stop the > > script at all times? > > Yes. Whoa whoa... It's a bit more complicated than you think, there's a lot of state that gets put into libpq, i guess the simplest way would be to do so and also cancel the transaction, but a simple longjump won't work reliably and you'd also have to take very careful steps to make sure you handle everything _just right_ from a signal context. I'd rather have the inconvience of psql exiting than a not entirely thought out mechanism for doing this properly potentially having psql run amok on my database. :) -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
> > > Seems we can just ignore ^C if a query is not being run. Is that OK > > with everyone. Looks easy to do. > > It would be a trap for new users (some old ones too) who may not know > how to escape. longjmp should be easy too, if it works. If they don't know ^D exits, they really are going to have trouble with Unix. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On 2000-02-18, Chris Bitmead mentioned: > The typical way to do this sort of thing is to longjmp back to the main > loop. And I think if you look at sig.c in bash, this is probably what > they are doing. Don't wanna look at that GPL'd code ... ;) -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Bruce Momjian wrote: > > > > > > Seems we can just ignore ^C if a query is not being run. Is that OK > > > with everyone. Looks easy to do. > > > > It would be a trap for new users (some old ones too) who may not know > > how to escape. longjmp should be easy too, if it works. > > If they don't know ^D exits, they really are going to have trouble with > Unix. I mean escape from a half-typed in query, not escape from psql altogether.
Peter Eisentraut wrote: > > On 2000-02-18, Chris Bitmead mentioned: > > > The typical way to do this sort of thing is to longjmp back to the main > > loop. And I think if you look at sig.c in bash, this is probably what > > they are doing. > > Don't wanna look at that GPL'd code ... ;) If you don't know how to interact with readline, I think you're gonna have to look at some GPL code ;)
Alfred Perlstein wrote: > Whoa whoa... It's a bit more complicated than you think, there's a lot > of state that gets put into libpq, I don't think this has anything to do with libpq. This has got to do with psql's reading of commands _before_ they get shoved into libpq. As such it shouldn't be that dangerous. > i guess the simplest way would be > to do so and also cancel the transaction, but a simple longjump won't > work reliably and you'd also have to take very careful steps to make > sure you handle everything _just right_ from a signal context. > > I'd rather have the inconvience of psql exiting than a not entirely > thought out mechanism for doing this properly potentially having psql > run amok on my database. :) > > -- > -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] > > ************
* Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> [000217 17:56] wrote: > Peter Eisentraut wrote: > > > > On 2000-02-18, Chris Bitmead mentioned: > > > > > The typical way to do this sort of thing is to longjmp back to the main > > > loop. And I think if you look at sig.c in bash, this is probably what > > > they are doing. > > > > Don't wanna look at that GPL'd code ... ;) > > If you don't know how to interact with readline, I think you're gonna > have to look at some GPL code ;) Actually, FreeBSD(*) has 'libedit' which is pretty nice, it could be made to work under other systems and has a nice unencumbered license. :) -Alfred (*)* Copyright (c) 1992, 1993* The Regents of the University of California. All rights reserved.** This code is derivedfrom software contributed to Berkeley by* Christos Zoulas of Cornell University. (*)
Chris Bitmead <chrisb@nimrod.itg.telstra.com.au> writes: > Alfred Perlstein wrote: >> Whoa whoa... It's a bit more complicated than you think, there's a lot >> of state that gets put into libpq, > I don't think this has anything to do with libpq. This has got to do > with psql's reading of commands _before_ they get shoved into > libpq. As such it shouldn't be that dangerous. Chris is right that this is not a libpq issue. psql would be set up so that the signal-catching routine either issues a cancel request (if a query is in progress) or attempts a longjmp (if not). If properly implemented, there is zero chance of screwing up libpq. However, there is some chance of screwing up libreadline --- I don't know enough about its innards to know if it can survive losing control at a random point. If we can confine the region where longjmp will be attempted to just the point where the program is blocked waiting for user input, it'd probably be pretty safe. Something I've noticed that might or might not be related to this issue is that if psql exits due to backend crash, it fails to save the last few lines of command history into the history file. Not closing down libreadline, maybe? regards, tom lane
Alfred Perlstein wrote: > > Actually, FreeBSD(*) has 'libedit' which is pretty nice, it could > be made to work under other systems and has a nice unencumbered > license. :) As an option - fine, but most things these days use readline, and I would want my ~/.inputrc to work with all apps the same way.
Tom Lane wrote: > Something I've noticed that might or might not be related to this > issue is that if psql exits due to backend crash, it fails to save the > last few lines of command history into the history file. Not closing > down libreadline, maybe? This is actually the gnu history library. Currently psql issues a save command when you exit or when you enter \s. I'm not sure why gnu history appears to require you to write the whole history at once rather than appending to a file but anyway... A better way of doing it might be to have psql's finishInput() passed to atexit(), just to make sure it's always called, and have signal handlers in place for *every* signal whose handlers call exit(), so finishing up is done neatly. It might even be worthwhile to write the history after every command. BTW, is it necessary to print "\q" when you hit ctrl-d ? Seems just a little tacky to me.
On Fri, 18 Feb 2000, Chris Bitmead wrote: > BTW, is it necessary to print "\q" when you hit ctrl-d ? Seems just a > little tacky to me. That's similar to what bash and tcsh do. If you don't like it, I'm not married to it. -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
On Fri, 18 Feb 2000, Tom Lane wrote: > However, there is some chance of screwing up libreadline --- I don't > know enough about its innards to know if it can survive losing > control at a random point. If we can confine the region where longjmp > will be attempted to just the point where the program is blocked > waiting for user input, it'd probably be pretty safe. Readline has an official way to preempt is, namely setting rl_done to non-zero. I can take a look how it copes with a longjmp from a signal handler, but I wouldn't set my hopes too high. > Something I've noticed that might or might not be related to this > issue is that if psql exits due to backend crash, it fails to save the > last few lines of command history into the history file. Not closing > down libreadline, maybe? As someone else pointed out, I might as well move write_history() into an atexit hook. -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
On Thu, 17 Feb 2000, Alfred Perlstein wrote: > Actually, FreeBSD(*) has 'libedit' which is pretty nice, it could > be made to work under other systems and has a nice unencumbered > license. :) Someone else mentioned this to me when I started on this. However, there's not even a common version among the *BSD's, let alone ports to other platforms, so I don't see this happening anytime soon. -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
On Fri, 18 Feb 2000, Chris Bitmead wrote: > If you don't know how to interact with readline, I think you're gonna > have to look at some GPL code ;) There's a difference between copying ideas and code from bash's sig.c (not good) and snooping around in code to see how to interface with it (why not). -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
On Fri, 18 Feb 2000, Chris Bitmead wrote: > I mean escape from a half-typed in query, not escape from psql > altogether. If you don't read the documentation before running a program, I can't help you. Also, the welcome message points out both \? and \q. -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <e99re41@DoCS.UU.SE> writes: > On Fri, 18 Feb 2000, Tom Lane wrote: >> However, there is some chance of screwing up libreadline --- I don't >> know enough about its innards to know if it can survive losing >> control at a random point. If we can confine the region where longjmp >> will be attempted to just the point where the program is blocked >> waiting for user input, it'd probably be pretty safe. > Readline has an official way to preempt is, namely setting rl_done to > non-zero. I can take a look how it copes with a longjmp from a signal > handler, but I wouldn't set my hopes too high. Oh? Maybe we don't *need* a longjmp: maybe the signal handler just needs to do either send-a-cancel or set-rl_done depending on the current state of a flag that's set by the main line code. regards, tom lane
On 2000-02-18, Tom Lane mentioned: > > Readline has an official way to preempt is, namely setting rl_done to > > non-zero. I can take a look how it copes with a longjmp from a signal > > handler, but I wouldn't set my hopes too high. > > Oh? Maybe we don't *need* a longjmp: maybe the signal handler just > needs to do either send-a-cancel or set-rl_done depending on the > current state of a flag that's set by the main line code. I tried that but it doesn't work. On further thought I believe that the purpose of rl_done is for readline extensions, so that, for example, a semicolon handler can scan the current line and then immediately return as if you had pressed enter. When idle, readline hangs on read(), so setting some variable isn't going to interrupt that. The longjmp seems to work but I need to test it more. I'm concerned how it will work across platforms, esp. Windows (being a POSIX thing). Should there be a configure test or can I assume it on every non-WIN32 system? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: > The longjmp seems to work but I need to test it more. I'm concerned how it > will work across platforms, esp. Windows (being a POSIX thing). Should > there be a configure test or can I assume it on every non-WIN32 system? longjmp predates POSIX by an eon or two. I doubt you need to worry about it on Unix platforms. (Since we utterly rely on it in the backend, Postgres wouldn't ever work on a platform without it anyway.) Less sure about the situation on Windows or Mac, but configure isn't going to help for those anyway. regards, tom lane
Peter Eisentraut wrote: > > On Fri, 18 Feb 2000, Chris Bitmead wrote: > > > I mean escape from a half-typed in query, not escape from psql > > altogether. > > If you don't read the documentation before running a program, I can't help > you. Also, the welcome message points out both \? and \q. I think it takes quite a while to realise that you can intersperse a backslash command anywhere. like.. select * from \q which is a bit different to shell where while : exit will not have the desired effect.
Marc called me today to discuss ALTER TABLE DROP COLUMN options. Our new idea is to do the ALTER TABLE DROP COLUMN in place in the existing table, rather than make a new one and try and preserve all the table attributes. You can exclusively lock the table, then do a heap_getnext() scan over the entire table, remove the dropped column, do a heap_insert(), then a heap_delete() on the current tuple, making sure to skip over the tuples inserted by the current transaction. When completed, remove the column from pg_attribute, mark the transaction as committed (if desired), and run vacuum over the table to remove the deleted rows. Seems this would be a very clean implementation for 7.1. It also would be roll-backable in cases where the operation failed half-way during the process. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > You can exclusively lock the table, then do a heap_getnext() scan over > the entire table, remove the dropped column, do a heap_insert(), then a > heap_delete() on the current tuple, making sure to skip over the tuples > inserted by the current transaction. When completed, remove the column > from pg_attribute, mark the transaction as committed (if desired), and > run vacuum over the table to remove the deleted rows. Hmm, that would work --- the new tuples commit at the same instant that the schema updates commit, so it should be correct. You have the 2x disk usage problem, but there's no way around that without losing rollback ability. A potentially tricky bit will be persuading the tuple-reading and tuple- writing subroutines to pay attention to different versions of the tuple structure for the same table. I haven't looked to see if this will be difficult or not. If you can pass the TupleDesc explicitly then it shouldn't be a problem. I'd suggest that the cleanup vacuum *not* be an automatic part of the operation; just recommend that people do it ASAP after dropping a column. Consider needing to drop several columns... regards, tom lane
Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > You can exclusively lock the table, then do a heap_getnext() scan over > > the entire table, remove the dropped column, do a heap_insert(), then a > > heap_delete() on the current tuple, making sure to skip over the tuples > > inserted by the current transaction. When completed, remove the column > > from pg_attribute, mark the transaction as committed (if desired), and > > run vacuum over the table to remove the deleted rows. > > Hmm, that would work --- the new tuples commit at the same instant that > the schema updates commit, so it should be correct. You have the 2x > disk usage problem, but there's no way around that without losing > rollback ability. > > A potentially tricky bit will be persuading the tuple-reading and tuple- > writing subroutines to pay attention to different versions of the tuple > structure for the same table. I haven't looked to see if this will be > difficult or not. If you can pass the TupleDesc explicitly then it > shouldn't be a problem. > > I'd suggest that the cleanup vacuum *not* be an automatic part of > the operation; just recommend that people do it ASAP after dropping > a column. Consider needing to drop several columns... Does SQL92 syntax allow dropping several columns, i.e. ALTER TABLE mytable DROP COLUMN col1,col5,col6; If it does, it would be very desirable to implement it to avoid the need for vacuum between each DROP in order to have _only_ 2X disk usage. ----------- Hannu
At 08:06 PM 2/27/00 +0200, Hannu Krosing wrote: >Does SQL92 syntax allow dropping several columns, i.e. > >ALTER TABLE mytable DROP COLUMN col1,col5,col6; My reading of the syntax says no, it is not allowed. >If it does, it would be very desirable to implement it to avoid the need >for vacuum between each DROP in order to have _only_ 2X disk usage. However, implementing useful extensions to the standard in an upward-compatible way doesn't bother me. I'm not fond of language implementations that are full of gratuitous extensions, but when extensions address real shortcomings in a standard or intersect with a particular implementation in a useful way, then it makes sense to add them. In this case, you're asking for an extension that's useful because Postgres doesn't reclaim storage when a tuple's deleted, but only when the table's vacuumed. Seems fair enough. Whether or not it would be hard to implement is another matter... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Bruce Momjian writes: > You can exclusively lock the table, then do a heap_getnext() scan over > the entire table, remove the dropped column, do a heap_insert(), then a > heap_delete() on the current tuple, Wow, that almost seems to easy to be true. I never thought that having tuples of different structures in the table at the same time would be possible. If so then I don't see a reason why this would be too hard to do. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
[Charset ISO-8859-1 unsupported, filtering to ASCII...] > Bruce Momjian writes: > > > You can exclusively lock the table, then do a heap_getnext() scan over > > the entire table, remove the dropped column, do a heap_insert(), then a > > heap_delete() on the current tuple, > > Wow, that almost seems to easy to be true. I never thought that having > tuples of different structures in the table at the same time would be > possible. If so then I don't see a reason why this would be too hard to > do. If the transaction is not committed, I don't think anything actually reads the tuple columns, so you are safe. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian > > [Charset ISO-8859-1 unsupported, filtering to ASCII...] > > Bruce Momjian writes: > > > > > You can exclusively lock the table, then do a heap_getnext() scan over > > > the entire table, remove the dropped column, do a > heap_insert(), then a > > > heap_delete() on the current tuple, > > > > Wow, that almost seems to easy to be true. I never thought that having > > tuples of different structures in the table at the same time would be > > possible. If so then I don't see a reason why this would be too hard to > > do. > > If the transaction is not committed, I don't think anything actually > reads the tuple columns, so you are safe. > Hmm,tuples of multiple version in a table ? This is neither clean nor easy for me. There's no such stuff which takes the case into account,AFAIK. Seems no one but me object to it. I'm tired of this issue and it's painful for me to continue discussion further in my poor English. I may be able to provide another implementation on trial and it may be easier than only objecting to your proposal. Is it OK ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > Hmm,tuples of multiple version in a table ? > This is neither clean nor easy for me. I'm worried about it too. I think it could maybe be made to work, but it seems fragile. > I may be able to provide another implementation on trial and it > may be easier than only objecting to your proposal. If you have a better idea, let's hear it! regards, tom lane
> > > Wow, that almost seems to easy to be true. I never thought that having > > > tuples of different structures in the table at the same time would be > > > possible. If so then I don't see a reason why this would be too hard to > > > do. > > > > If the transaction is not committed, I don't think anything actually > > reads the tuple columns, so you are safe. > > > > Hmm,tuples of multiple version in a table ? > This is neither clean nor easy for me. > There's no such stuff which takes the case into account,AFAIK. > > Seems no one but me object to it. I'm tired of this issue and it's > painful for me to continue discussion further in my poor English. > I may be able to provide another implementation on trial and it > may be easier than only objecting to your proposal. > Is it OK ? Sure, whatever you want. No one is going to start coding anything for a while. Seemed like a clean solution with no rename() problems. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > Hmm,tuples of multiple version in a table ? > > This is neither clean nor easy for me. > > I'm worried about it too. I think it could maybe be made to work, > but it seems fragile. > > > I may be able to provide another implementation on trial and it > > may be easier than only objecting to your proposal. > > If you have a better idea, let's hear it! > I don't want a final implementation this time. What I want is to provide a quick hack for both others and me to judge whether this direction is good or not. My idea is essentially an invisible column implementation. DROP COLUMN would change the target pg_attribute tuple as follows..attnum -> an offset - attnum;atttypid -> 0 We would be able to see where to change by tracking error/ crashes caused by this change. I would also change attname to '*already dropped %d' for examle to avoid duplicate attname. Regards. Hiroshi Inoue Inoue@tpf.co.jp
On Mon, 28 Feb 2000, Hiroshi Inoue wrote: > > -----Original Message----- > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > > Hmm,tuples of multiple version in a table ? > > > This is neither clean nor easy for me. > > > > I'm worried about it too. I think it could maybe be made to work, > > but it seems fragile. > > > > > I may be able to provide another implementation on trial and it > > > may be easier than only objecting to your proposal. > > > > If you have a better idea, let's hear it! > > > > I don't want a final implementation this time. > What I want is to provide a quick hack for both others and me > to judge whether this direction is good or not. > > My idea is essentially an invisible column implementation. > DROP COLUMN would change the target pg_attribute tuple > as follows.. > > attnum -> an offset - attnum; > atttypid -> 0 > > We would be able to see where to change by tracking error/ > crashes caused by this change. > > I would also change attname to '*already dropped %d' for > examle to avoid duplicate attname. Okay, just curious here, but ... what you are proposing *sounds* to me like half-way to what started this thread. (*Please* correct me if I'm wrong) ... Essentially, in your proposal, when you drop a column, all subsequent tuples inserted/updated would have ... that one column missing? So, instead of doing a massive sweep through the table and removing that column, only do it when an insert/update happens? Basically, eliminate the requirement to re-write every tuples, only those that have activity?
> > I would also change attname to '*already dropped %d' for > > examle to avoid duplicate attname. > > Okay, just curious here, but ... what you are proposing *sounds* to me > like half-way to what started this thread. (*Please* correct me if I'm > wrong) ... > > Essentially, in your proposal, when you drop a column, all subsequent > tuples inserted/updated would have ... that one column missing? So, > instead of doing a massive sweep through the table and removing that > column, only do it when an insert/update happens? > > Basically, eliminate the requirement to re-write every tuples, only those > that have activity? And I think the problem was that there was too much code to modify to allow this. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: The Hermit Hacker [mailto:scrappy@hub.org] > > On Mon, 28 Feb 2000, Hiroshi Inoue wrote: > > > > -----Original Message----- > > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > > > > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > > > Hmm,tuples of multiple version in a table ? > > > > This is neither clean nor easy for me. > > > > > > I'm worried about it too. I think it could maybe be made to work, > > > but it seems fragile. > > > > > > > I may be able to provide another implementation on trial and it > > > > may be easier than only objecting to your proposal. > > > > > > If you have a better idea, let's hear it! > > > > > [snip] > > Okay, just curious here, but ... what you are proposing *sounds* to me > like half-way to what started this thread. (*Please* correct me if I'm > wrong) ... > My proposal is essentially same as what I proposed once in this thread. I don't think DROP COLUMN feature is very important. DROP/ADD CONSTRAINT feature seems much more important. Why do you want a heavy iplementation like vacuum after 2x disk usage for this feature ? My implementation won't touch the target table at all and would never remove dropped columns practically. It would only make them invisible and NULL would be set for newly insert/updated columns. If you want a really clean table for DROP TABLE command,my proposal is useless. Regards. Hiroshi Inoue Inoue@tpf.co.jp
At 11:40 PM 2/27/00 -0400, The Hermit Hacker wrote: >Okay, just curious here, but ... what you are proposing *sounds* to me >like half-way to what started this thread. (*Please* correct me if I'm >wrong) ... > >Essentially, in your proposal, when you drop a column, all subsequent >tuples inserted/updated would have ... that one column missing? So, >instead of doing a massive sweep through the table and removing that >column, only do it when an insert/update happens? > >Basically, eliminate the requirement to re-write every tuples, only those >that have activity? Yes, this was one of the ideas that cropped up in previous discussion. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
At 12:21 PM 2/28/00 +0900, Hiroshi Inoue wrote: >My idea is essentially an invisible column implementation. >DROP COLUMN would change the target pg_attribute tuple >as follows.. I don't see such a solution as being mutually exclusive with the other one on the table. Remember ... Oracle provides both. I suspect that they did so because they were under customer pressure to provide a "real" column drop and a "fast" (and non-2x tablesize!) solution. So they did both. Also keep in mind that being able to drop a column in Oracle is a year 1999 feature ... and both are provided. More evidence of pressure from two points of view. Of course, PG suffers because the "real" column drop is a 2x space solution, so the "invisibility" approach may more frequently be desired. Still... as time goes on and PG gets adopted by more and more serious, large-scale users (which we all are working towards, right?) I suspect that each camp will want to be served. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
> -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > > > > I would also change attname to '*already dropped %d' for > > > examle to avoid duplicate attname. > > > > Okay, just curious here, but ... what you are proposing *sounds* to me > > like half-way to what started this thread. (*Please* correct me if I'm > > wrong) ... > > > > Essentially, in your proposal, when you drop a column, all subsequent > > tuples inserted/updated would have ... that one column missing? So, > > instead of doing a massive sweep through the table and removing that > > column, only do it when an insert/update happens? > > > > Basically, eliminate the requirement to re-write every tuples, > only those > > that have activity? > > And I think the problem was that there was too much code to modify to > allow this. > Seems my trial would be useless. I would give up the trial. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Hiroshi Inoue wrote: > > > Wow, that almost seems to easy to be true. I never thought that having > > > tuples of different structures in the table at the same time would be > > > possible. If so then I don't see a reason why this would be too hard to > > > do. > > > > If the transaction is not committed, I don't think anything actually > > reads the tuple columns, so you are safe. > > > > Hmm,tuples of multiple version in a table ? > This is neither clean nor easy for me. > There's no such stuff which takes the case into account,AFAIK. > > Seems no one but me object to it. I'm tired of this issue and it's > painful for me to continue discussion further in my poor English. > I may be able to provide another implementation on trial and it > may be easier than only objecting to your proposal. > Is it OK ? Consider me on your side. For some good reasons, I added a ReferentialIntegritySnapshotOverride mode, that causes any tuple to be visible when fetched by CTID. Actually, there will be at least a read lock on them, so locking will prevent damage. But I can think of other situations where this kind of "read whatever Iwant you to" could be needed and would fail then. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Don Baccus wrote: > > At 12:21 PM 2/28/00 +0900, Hiroshi Inoue wrote: > > >My idea is essentially an invisible column implementation. > >DROP COLUMN would change the target pg_attribute tuple > >as follows.. > > I don't see such a solution as being mutually exclusive with > the other one on the table. Very true, and we will need the hidden columns feature for a clean implementation of inheritance anyway. > Remember ... Oracle provides both. I suspect that they did so > because they were under customer pressure to provide a "real" > column drop and a "fast" (and non-2x tablesize!) solution. So > they did both. Also keep in mind that being able to drop a > column in Oracle is a year 1999 feature ... and both are provided. > More evidence of pressure from two points of view. > > Of course, PG suffers because the "real" column drop is a 2x > space solution, so the "invisibility" approach may more frequently > be desired. "update t set id=id+1" is also a 2x space, likely even more if referential inheritance is used (and checked at the end of transaction) And my main use of DROP COLUMN will probably be during development, usually meaning small table sizes. ------------ Hannu
On Mon, 28 Feb 2000, Hiroshi Inoue wrote: > > -----Original Message----- > > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > > > > > > I would also change attname to '*already dropped %d' for > > > > examle to avoid duplicate attname. > > > > > > Okay, just curious here, but ... what you are proposing *sounds* to me > > > like half-way to what started this thread. (*Please* correct me if I'm > > > wrong) ... > > > > > > Essentially, in your proposal, when you drop a column, all subsequent > > > tuples inserted/updated would have ... that one column missing? So, > > > instead of doing a massive sweep through the table and removing that > > > column, only do it when an insert/update happens? > > > > > > Basically, eliminate the requirement to re-write every tuples, > > only those > > > that have activity? > > > > And I think the problem was that there was too much code to modify to > > allow this. > > > > Seems my trial would be useless. > I would give up the trial. Hiroshi ...Bruce's comment was just an observation ... if it can be done cleanly, I would love to see a version that didn't involve 2x the disk space ... I don't believe that a trial would be useless, I think that Bruce's only concern/warning is that the amount of code modifications that would have to be made in order to accomplish this *might* be larger then the benefit resulting in doing it this way. If you feel that this can be done more efficiently, *please* proceed with the trial ... I'm curious about one thing ... several ppl have mentioned that Oracle does it "both ways" ... does anyone know the syntax they use so that someone can do it one way or another? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
At 11:45 AM 2/28/00 +0200, Hannu Krosing wrote: >"update t set id=id+1" is also a 2x space, And PG doesn't do it correctly anyway... > likely even more if >referential inheritance is used (and checked at the end of transaction) The triggers are all queued so yes, take memory too. Even better, if "MATCH <unspecified>" or especially "MATCH PARTIAL" is used with multi-column foreign keys containing nulls, it will be impressively slow! We can call these the built-in coffee break feature when used on large tables. (it's inherently slow, not just slow because of the PG implementation) >And my main use of DROP COLUMN will probably be during development, >usually meaning small table sizes. Well, folks who use the web toolkit I've been porting for Oracle will have a use for it, too, because the toolkit has been rapidly evolving (ArsDigita has about 70 employees at the moment, most of them programmers working on the Oracle-based version of the toolkit). ArsDigita provides upgrade .sql files for each version that consist in part of ADD/DROP column statements so users can upgrade in place, a very useful thing. It doesn't need to be fast in this context, just work. You tell the world your site will be down for an evening on such-and-such date, stop listening on port 80, and upgrade. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Don Baccus <dhogaza@pacifier.com> writes: >> "update t set id=id+1" is also a 2x space, > And PG doesn't do it correctly anyway... ? News to me. What's your definition of "correctly"? regards, tom lane
At 10:20 AM 2/28/00 -0500, Tom Lane wrote: >Don Baccus <dhogaza@pacifier.com> writes: >>> "update t set id=id+1" is also a 2x space, > >> And PG doesn't do it correctly anyway... > >? News to me. What's your definition of "correctly"? create table foo(i integer unique); (insert values) donb=# select * from foo;i ---123 (3 rows) donb=# update foo set i=i+1; ERROR: Cannot insert a duplicate key into unique index foo_pkey Shouldn't fail ... the constraint should be applied after the update, but the row-by-row update of the index causes it to fail. At least I presume that this is an artifact of PG implementing the unique constraint by creating a unique index. Stephan Szabo pointed this out to me awhile ago when we were discussing "alter table add constraint" (he was looking into this when he worked on "alter table add foreign key"). Of course, sometimes PG gets it right. I deleted stuff in foo, then did: donb=# insert into foo values(3); INSERT 26907 1 donb=# insert into foo values(2); INSERT 26908 1 donb=# insert into foo values(1); INSERT 26909 1 donb=# update foo set i=i+1; UPDATE 3 donb=# - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Don Baccus wrote: > > At 10:20 AM 2/28/00 -0500, Tom Lane wrote: > >Don Baccus <dhogaza@pacifier.com> writes: > >>> "update t set id=id+1" is also a 2x space, > > > >> And PG doesn't do it correctly anyway... > > > >? News to me. What's your definition of "correctly"? > > create table foo(i integer unique); > > (insert values) > > donb=# select * from foo; > i > --- > 1 > 2 > 3 > (3 rows) > > donb=# update foo set i=i+1; > ERROR: Cannot insert a duplicate key into unique index foo_pkey I knew it used to misbehave that way, but at some point I got the impression that it was fixed ;( IIRC, the same behaviour plagued the old foreign key implementation in contrib, which was why it was refused for a long time to be integrated. I hope that at least the foreig keys don't do it anymore. --------- Hannu
Hannu Krosing wrote: > Don Baccus wrote: > > > donb=# update foo set i=i+1; > > ERROR: Cannot insert a duplicate key into unique index foo_pkey > > IIRC, the same behaviour plagued the old foreign key implementation > in contrib, which was why it was refused for a long time to be > integrated. > > I hope that at least the foreig keys don't do it anymore. ALL the FK triggers are delayed until after the entire statement (what's wrong for ON DELETE RESTRICT - but that's another story), or until the entire transaction (in deferred mode). But the UNIQUE constraint is still built upon unique nbtree indices, thus failing on primary key where such aunique index is automatically created for. I'm far too less familiar with our implementation of nbtree to tell whether it would be possible at all to delayunique checking until statement end or XACT commit. At least I assume it would require some similar techniqueof deferred queue. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
At 02:04 AM 2/29/00 +0200, Hannu Krosing wrote: >IIRC, the same behaviour plagued the old foreign key implementation >in contrib, which was why it was refused for a long time to be >integrated. > >I hope that at least the foreig keys don't do it anymore. It shouldn't because they're implemented via triggers after all the work is done. In other words, the implementation might have bugs but the bugs should be different :) - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
At 01:43 AM 2/29/00 +0100, Jan Wieck wrote: > ALL the FK triggers are delayed until after the entire > statement (what's wrong for ON DELETE RESTRICT - but that's > another story), or until the entire transaction (in deferred > mode). Kind of wrong, just so folks understand the semantics are right in the sense that the right answer is given (pass or fail) - you need a stopwatch to know that we're not doing what the SQL3 suggests should be done (catch the foreign key errors before changes are made and without incurring the cost of a rollback). The current way we're doing it - identically to "NO ACTION" is fine for compatability purposes, though later we'd like to implement a smart ON DELETE RESTRICT because the efficiency considerations that led to its inclusion in SQL3 are reasonable ones. > I'm far too less familiar with our implementation of nbtree > to tell whether it would be possible at all to delay unique > checking until statement end or XACT commit. At least I > assume it would require some similar technique of deferred > queue. Presumably you'd queue up per-row triggers just like for FK constraints and insert into the unique index at that point. I have no idea how many other things this would break, if any. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
> ALL the FK triggers are delayed until after the entire > statement (what's wrong for ON DELETE RESTRICT - but that's > another story), or until the entire transaction (in deferred > mode). > > But the UNIQUE constraint is still built upon unique nbtree > indices, thus failing on primary key where such a unique > index is automatically created for. > > I'm far too less familiar with our implementation of nbtree > to tell whether it would be possible at all to delay unique > checking until statement end or XACT commit. At least I > assume it would require some similar technique of deferred > queue. We might want to look at what we're doing for all of the constraints, because at some point we'll probably want to let you defer the other constraints as well (I'm pretty sure this technically legal in SQL92). If we can think of a good way to handle all of the constraints together that might be worth doing to prevent us from coding the same thing multiple times.
Don Baccus wrote: > At 01:43 AM 2/29/00 +0100, Jan Wieck wrote: > > > ALL the FK triggers are delayed until after the entire > > statement (what's wrong for ON DELETE RESTRICT - but that's > > another story), or until the entire transaction (in deferred > > mode). > > Kind of wrong, just so folks understand the semantics are right in > the sense that the right answer is given (pass or fail) - you need > a stopwatch to know ... Explanative version of "that other story". But not exactly correct IMHO. If following strictly SQL3 suggestions, an ON DELETE RESTRICT action cannot be deferrable at all. Even if the constraint itself is deferrableand is set explicitly to DEFERRED, the check should be done immediately at ROW level. That's the differencebetween "NO ACTION" and "RESTRICT". Actually, a RESTRICT violation can potentially bypass thousands of subsequent queries until COMMIT. Meaningless from the transactional PoV, but from the application programmers one (looking at the return codeof a particular statement) it isn't! > > I'm far too less familiar with our implementation of nbtree > > to tell whether it would be possible at all to delay unique > > checking until statement end or XACT commit. At least I > > assume it would require some similar technique of deferred > > queue. > > Presumably you'd queue up per-row triggers just like for FK constraints > and insert into the unique index at that point. > > I have no idea how many other things this would break, if any. At least if deferring the index insert until XACT commit, any subsequent index scan wouldn't see inserted tuples,even if they MUST be visible. Maybe I'm less far away from knowledge than thought. Inside of a nbtree-index, any number of duplicates is accepted. It's the heap tuples visibility they point to, that triggers the dup message. So it's definitely some kind of "accept duplicates for now but check for final dup's on this key later". But that requires another index scan later. We can remember the relations and indices Oid (to get back the relationand index in question) plus the CTID of the added (inserted/updated tuple) to get back the key values (remembering the key itself could blow up memory). Then do an index scan under current (statement end/XACT commit) visibility to check if more than one HeapTupleSatisfies(). It'll be expensive, compared to current UNIQUE implementation doing it on the fly during btree insert (doesn't it?).But the only way I see. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
At 03:24 AM 2/29/00 +0100, Jan Wieck wrote: > Explanative version of "that other story". But not exactly > correct IMHO. If following strictly SQL3 suggestions, an ON > DELETE RESTRICT action cannot be deferrable at all. Even if > the constraint itself is deferrable and is set explicitly to > DEFERRED, the check should be done immediately at ROW level. > That's the difference between "NO ACTION" and "RESTRICT". > > Actually, a RESTRICT violation can potentially bypass > thousands of subsequent queries until COMMIT. Meaningless > from the transactional PoV, but from the application > programmers one (looking at the return code of a particular > statement) it isn't! No, strictly speaking it isn't correct. But without a stopwatch, it will be hard to tell. Actually, though, since exceptions are only supposed to reject the given SQL-statement and not trigger a PG-style auto-rollback of the transaction, a subsequent "commit" should commit that subsequent work (unless they in turn trigger constraint errors due to dependencies on the first failed constraint). So you don't really get to skip all those subsequent statements unless you're looking for the exception, catch it, and do an explicit rollback. None of that is in place in PG anyway at the moment... I'm assuming that the exception raised for an FK violation is the same as an exception raised for numeric overflow, etc - I think you missed that earlier discussion. The fact that PG's auto-rollback is wrong was news to me, though obvious in hindsight, and I've not gone back to study RI semantics in light of this new information. So I may be wrong, here. We could always take out "RESTRICT" and claim SQL92 rather than SQL3 referential integrity :) :) Given that Oracle only implements "MATCH <unspecified>" (as of 8.1.5, anyway), we're not doing too bad! > >> > I'm far too less familiar with our implementation of nbtree >> > to tell whether it would be possible at all to delay unique >> > checking until statement end or XACT commit. At least I >> > assume it would require some similar technique of deferred >> > queue. >> >> Presumably you'd queue up per-row triggers just like for FK constraints >> and insert into the unique index at that point. >> >> I have no idea how many other things this would break, if any. > > At least if deferring the index insert until XACT commit, any > subsequent index scan wouldn't see inserted tuples, even if > they MUST be visible. Ugh, of course :( > Maybe I'm less far away from knowledge than thought. Inside > of a nbtree-index, any number of duplicates is accepted. > It's the heap tuples visibility they point to, that triggers > the dup message. > > So it's definitely some kind of "accept duplicates for now > but check for final dup's on this key later". > > But that requires another index scan later. We can remember > the relations and indices Oid (to get back the relation and > index in question) plus the CTID of the added > (inserted/updated tuple) to get back the key values > (remembering the key itself could blow up memory). Then do an > index scan under current (statement end/XACT commit) > visibility to check if more than one HeapTupleSatisfies(). > > It'll be expensive, compared to current UNIQUE implementation > doing it on the fly during btree insert (doesn't it?). But > the only way I see. The more I learn about SQL92 the more I understand why RDBMS systems have the reputation for being piggy. But, the standard semantics of UPDATE on a column with a UNIQUE constraint are certainly consistent with the paradigm that queries operate on sets of tuples, not sequences of tuples. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
> -----Original Message----- > From: The Hermit Hacker [mailto:scrappy@hub.org] > [snip] > Hiroshi ... > > Bruce's comment was just an observation ... if it can be done > cleanly, I would love to see a version that didn't involve 2x the disk > space ... I don't believe that a trial would be useless, I think that > Bruce's only concern/warning is that the amount of code modifications that > would have to be made in order to accomplish this *might* be larger then > the benefit resulting in doing it this way. > > If you feel that this can be done more efficiently, *please* > proceed with the trial ... > OK,I may be able to provide a trial patch in a week or so if I'm lucky. How to commit the patch ? With #ifdef ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
wieck@debis.com (Jan Wieck) writes: > So it's definitely some kind of "accept duplicates for now > but check for final dup's on this key later". > But that requires another index scan later. We can remember > the relations and indices Oid (to get back the relation and > index in question) plus the CTID of the added > (inserted/updated tuple) to get back the key values > (remembering the key itself could blow up memory). Then do an > index scan under current (statement end/XACT commit) > visibility to check if more than one HeapTupleSatisfies(). > It'll be expensive, compared to current UNIQUE implementation > doing it on the fly during btree insert (doesn't it?). But > the only way I see. How about: 1. During INSERT into unique index, notice whether any other index entries have same key. If so, add that key value to a queue of possibly-duplicate keys to check later. 2. At commit, or whenever consistency should be checked, scan the queue. For each entry, use the index to look up all the matching tuples, and check that only one will be valid if the transaction commits. This avoids a full index scan in the normal case, although it could be pretty slow in the update-every-tuple scenario... regards, tom lane
On Tue, 29 Feb 2000, Hiroshi Inoue wrote: > > -----Original Message----- > > From: The Hermit Hacker [mailto:scrappy@hub.org] > > > > [snip] > > > Hiroshi ... > > > > Bruce's comment was just an observation ... if it can be done > > cleanly, I would love to see a version that didn't involve 2x the disk > > space ... I don't believe that a trial would be useless, I think that > > Bruce's only concern/warning is that the amount of code modifications that > > would have to be made in order to accomplish this *might* be larger then > > the benefit resulting in doing it this way. > > > > If you feel that this can be done more efficiently, *please* > > proceed with the trial ... > > > > OK,I may be able to provide a trial patch in a week or so if I'm lucky. > How to commit the patch ? > With #ifdef ? Nope, but it will have to wait until *after* 7.0 is released, so don't push yourself on it ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> -----Original Message----- > From: The Hermit Hacker [mailto:scrappy@hub.org] > > On Tue, 29 Feb 2000, Hiroshi Inoue wrote: > > > > -----Original Message----- > > > From: The Hermit Hacker [mailto:scrappy@hub.org] > > > > > > > [snip] > > > > > Hiroshi ... > > > > > > Bruce's comment was just an observation ... if it can be done > > > cleanly, I would love to see a version that didn't involve 2x the disk > > > space ... I don't believe that a trial would be useless, I think that > > > Bruce's only concern/warning is that the amount of code > modifications that > > > would have to be made in order to accomplish this *might* be > larger then > > > the benefit resulting in doing it this way. > > > > > > If you feel that this can be done more efficiently, *please* > > > proceed with the trial ... > > > > > > > OK,I may be able to provide a trial patch in a week or so if I'm lucky. > > How to commit the patch ? > > With #ifdef ? > > Nope, but it will have to wait until *after* 7.0 is released, so don't > push yourself on it ... > Hmm,until 7.0 release ? I don't want to keep my private branch so long. Is #ifdef bad to separate it from 7.0 release stuff ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
Jan Wieck wrote: > > Explanative version of "that other story". But not exactly > correct IMHO. If following strictly SQL3 suggestions, an ON > DELETE RESTRICT action cannot be deferrable at all. Even if > the constraint itself is deferrable and is set explicitly to > DEFERRED, the check should be done immediately at ROW level. > That's the difference between "NO ACTION" and "RESTRICT". > > Actually, a RESTRICT violation can potentially bypass > thousands of subsequent queries until COMMIT. Meaningless > from the transactional PoV, but from the application > programmers one (looking at the return code of a particular > statement) it isn't! ... > It'll be expensive, compared to current UNIQUE implementation > doing it on the fly during btree insert (doesn't it?). But > the only way I see. So currently we have ON UPDATE RESTRICT foreign keys :) ------------- Hannu
Hannu Krosing wrote: > > Jan Wieck wrote: > > > > It'll be expensive, compared to current UNIQUE implementation > > doing it on the fly during btree insert (doesn't it?). But > > the only way I see. > > So currently we have foreign keys :) I meant of course ON UPDATE RESTRICT PRIMARY KEYS .. ---------- Hannu
Don Baccus wrote: > At 03:24 AM 2/29/00 +0100, Jan Wieck wrote: > > > Actually, a RESTRICT violation can potentially bypass > > thousands of subsequent queries until COMMIT. Meaningless > > from the transactional PoV, but from the application > > programmers one (looking at the return code of a particular > > statement) it isn't! > > No, strictly speaking it isn't correct. But without a stopwatch, > it will be hard to tell. It is easy to tell: CREATE TABLE t1 (a integer PRIMARY KEY); CREATE TABLE t2 (a integer REFERENCES t1 ON DELETE RESTRICT DEFERRABLE); INSERT INTO t1 VALUES (1); INSERT INTO t1 VALUES (2); INSERT INTO t1 VALUES (3); INSERT INTO t2 VALUES (1); INSERT INTO t2 VALUES (2); BEGIN TRANSACTION; SET CONSTRAINTS ALL DEFERRED; DELETE FROM t1 WHERE a = 2; DELETE FROM t1 WHEREa = 3; COMMIT TRANSACTION; In this case, the first DELETE from t1 must already bomb the exception, setting the transaction block into error stateand reject all further queries until COMMIT/ROLLBACK. The SET DEFERRED should only affect a check for key existance on INSERT to t2, not the RESTRICT action on DELETE to t1. The end result will be the same, both DELETEs get rolled back. But the application will see it at COMMIT, not at the first DELETE. So the system behaves exactly like for NO ACTION. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Hannu Krosing wrote: > Jan Wieck wrote: > > > > Actually, a RESTRICT violation can potentially bypass > > thousands of subsequent queries until COMMIT. Meaningless > > from the transactional PoV, but from the application > > programmers one (looking at the return code of a particular > > statement) it isn't! > ... > > It'll be expensive, compared to current UNIQUE implementation > > doing it on the fly during btree insert (doesn't it?). But > > the only way I see. > > So currently we have ON UPDATE RESTRICT foreign keys :) For foreign keys we actually have ON UPDATE/DELETE NO ACTION (plus SET NULL and SET DEFAULT). Only the RESTRICT isn't fully SQL3. I just had an idea that might easily turn it to do the right thing. For the UNIQUE constraint, it's totally wrong (and not related to FOREIGN KEY stuff at all). The UNIQUE constraint isn't deferrable at all, and it looks for violations on a per row level, not on a per set level as it should. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
I wrote: > fully SQL3. I just had an idea that might easily turn it to > do the right thing. ON <event> RESTRICT triggers are now executed after the statement allways, ignoring any explicitly set deferredmode. This is pretty close to Date's SQL3 interpretation, or IMHO better. Date says that they are checked BEFORE each ROW, but that would ignore the SET character of a statement. Now we have correct semantics for all 4 possible referential actions. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
On Tue, 29 Feb 2000, Hiroshi Inoue wrote: > > Nope, but it will have to wait until *after* 7.0 is released, so don't > > push yourself on it ... > > > > Hmm,until 7.0 release ? > I don't want to keep my private branch so long. > Is #ifdef bad to separate it from 7.0 release stuff ? Go for it and submit a patch ... if its totally innoculous, then we can try and plug her in, but I won't guarantee it. Since there should be no major changes between now and 7.0 release, we can store any patch until the release also ...
At 01:22 PM 2/29/00 +0100, Jan Wieck wrote: >I wrote: > >> fully SQL3. I just had an idea that might easily turn it to >> do the right thing. > > ON <event> RESTRICT triggers are now executed after the > statement allways, ignoring any explicitly set deferred mode. > This is pretty close to Date's SQL3 interpretation, or IMHO > better. Date says that they are checked BEFORE each ROW, but > that would ignore the SET character of a statement. Perhaps that's actually the point of RESTRICT? Sacrifice the set character of a statement in this special case in order to return an error quickly? Since RESTRICT wasn't in SQL92, and since it's very close to NO ACTION, it reeks of being an efficiency hack. I dread digging into that part of the standard again...this is a case where the various proposals and justifications that were before the committee at the time would be useful since the actual words that made it to the standard are opaque. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
At 11:22 AM 2/29/00 +0100, Jan Wieck wrote: >Don Baccus wrote: > >> At 03:24 AM 2/29/00 +0100, Jan Wieck wrote: >> >> > Actually, a RESTRICT violation can potentially bypass >> > thousands of subsequent queries until COMMIT. Meaningless >> > from the transactional PoV, but from the application >> > programmers one (looking at the return code of a particular >> > statement) it isn't! >> >> No, strictly speaking it isn't correct. But without a stopwatch, >> it will be hard to tell. > > It is easy to tell: > > CREATE TABLE t1 (a integer PRIMARY KEY); > CREATE TABLE t2 (a integer REFERENCES t1 > ON DELETE RESTRICT > DEFERRABLE); > > INSERT INTO t1 VALUES (1); > INSERT INTO t1 VALUES (2); > INSERT INTO t1 VALUES (3); > > INSERT INTO t2 VALUES (1); > INSERT INTO t2 VALUES (2); > > BEGIN TRANSACTION; > SET CONSTRAINTS ALL DEFERRED; > DELETE FROM t1 WHERE a = 2; > DELETE FROM t1 WHERE a = 3; > COMMIT TRANSACTION; > > In this case, the first DELETE from t1 must already bomb the > exception, setting the transaction block into error state and > reject all further queries until COMMIT/ROLLBACK. Ahhh...but the point you're missing, which was brought up a few days ago, is that this PG-ism of rejecting all further queries until COMMIT/ROLLBACK is in itself NONSTANDARD. As far as the effect of DEFERRED on RESTRICT with STANDARD, not PG, transaction semantics I've not investigated it. Neither one of us has a particularly great record at correctly interpreting the SQL3 standard regarding the subtleties of foreign key semantics, since we both had differing interpretations of RESTRICT/NO ACTION and (harumph) we were BOTH wrong :) Date implies that there's no difference other than RESTRICT's returning an error more quickly, but he doesn't talk about the DEFERRED case. Anyway, it's moot at the moment since neither RESTRICT nor standard SQL92 transaction semantics are implemented. > The end result will be the same, Which is what I mean when I say you pretty much need a stopwatch to tell the difference - OK, in PG you can look at the non-standard error messages due to the non-standard rejection of subsequent queries, but I was thinking in terms of standard transaction semantics. > both DELETEs get rolled > back. But the application will see it at COMMIT, not at the > first DELETE. So the system behaves exactly like for NO > ACTION. Yes. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Seems we have 4 DROP COLUMN ideas: Method Advantage----------------------------------------------------------------- 1 invisible column marked by negative attnum fast 2 invisible column marked by is_dropped column fast 3 make copy of table without column col removed 4 make new tuples in existing table without column col removed Folks, we had better choose one and get started. Number 1 Hiroshi has ifdef'ed out in the code. Items 1 and 2 have problems with backend code and 3rd party code not seeing the dropped columns, or having gaps in the attno numbering. Number 3 has problems with making it an atomic operation, and number 4 is described below. --------------------------------------------------------------------------- > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > You can exclusively lock the table, then do a heap_getnext() scan over > > the entire table, remove the dropped column, do a heap_insert(), then a > > heap_delete() on the current tuple, making sure to skip over the tuples > > inserted by the current transaction. When completed, remove the column > > from pg_attribute, mark the transaction as committed (if desired), and > > run vacuum over the table to remove the deleted rows. > > Hmm, that would work --- the new tuples commit at the same instant that > the schema updates commit, so it should be correct. You have the 2x > disk usage problem, but there's no way around that without losing > rollback ability. > > A potentially tricky bit will be persuading the tuple-reading and tuple- > writing subroutines to pay attention to different versions of the tuple > structure for the same table. I haven't looked to see if this will be > difficult or not. If you can pass the TupleDesc explicitly then it > shouldn't be a problem. > > I'd suggest that the cleanup vacuum *not* be an automatic part of > the operation; just recommend that people do it ASAP after dropping > a column. Consider needing to drop several columns... > > regards, tom lane > > ************ > -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: pgsql-hackers-owner@hub.org > [mailto:pgsql-hackers-owner@hub.org]On Behalf Of Bruce Momjian > > Seems we have 4 DROP COLUMN ideas: > > Method Advantage > ----------------------------------------------------------------- > 1 invisible column marked by negative attnum fast > 2 invisible column marked by is_dropped column fast > 3 make copy of table without column col removed > 4 make new tuples in existing table without column col removed > > Folks, we had better choose one and get started. > > Number 1 Hiroshi has ifdef'ed out in the code. Items 1 and 2 have > problems with backend code and 3rd party code not seeing the dropped > columns, Hmm,doesn't *not seeing* mean the column is dropped ? > or having gaps in the attno numbering. Number 3 has problems > with making it an atomic operation, and number 4 is described below. > Don't forget another important point. Currently even DROP TABLE doesn't remove related objects completely. And I don't think I could remove objects related to the dropping column completely using 1)2) in ALTER TABLE DROP COLUMN implementation. Using 3)4) we should not only remove objects as 1)2) but also change attnum-s in all objects related to the relation. Otherwise PostgreSQL would do the wrong thing silently. Regards. Hiroshi Inoue Inoue@tpf.co.jp
At 01:43 PM 6/10/00 +0900, Hiroshi Inoue wrote: >> -----Original Message----- >> From: pgsql-hackers-owner@hub.org >> [mailto:pgsql-hackers-owner@hub.org]On Behalf Of Bruce Momjian >> >> Seems we have 4 DROP COLUMN ideas: >> >> Method Advantage >> ----------------------------------------------------------------- >> 1 invisible column marked by negative attnum fast >> 2 invisible column marked by is_dropped column fast >> 3 make copy of table without column col removed >> 4 make new tuples in existing table without column col removed >> >> Folks, we had better choose one and get started. Oracle gives you the choice between the "cheating" fast method(s) and the "real" slow (really slow?) real method. So there's at least real world experience by virtue of example by the world's most successful database supplier that user control over "hide the column" and "really delete the column" is valuable. It really makes a lot of sense to give such a choice. If one does so by "hiding", at a later date one would think the choice of "really deleting" would be a possibility. I don't know if Oracle does this... If not, they might not care. In today's world, there are bazillions of dollars for Oracle to scoop up from users who could just as easily be PG users - all those "we'll fail if don't IPO 'cause we'll never have any customers" database-backed websites :) - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Bruce Momjian wrote: > > Seems we have 4 DROP COLUMN ideas: > > Method Advantage > ----------------------------------------------------------------- > 1 invisible column marked by negative attnum fast > 2 invisible column marked by is_dropped column fast > 3 make copy of table without column col removed > 4 make new tuples in existing table without column col removed IIRC there was a fifth idea, a variation of 2 that would work better with inheritance - 5 all columns have is_real_column attribute that is true for all coluns present in that relation, so situations like create table tab_a(a_i int); create table tab_b(b_i int) inherits(tab_a); alter table tab_a add column c_i int; can be made to work. It would also require clients to ignore all missing columns that backend can pass to them as nulls (which is usually quite cheap in bandwith usage) in case of "SELECT **" queries. We could even rename attno to attid to make folks aware that it is not be assumed to be continuous. > Folks, we had better choose one and get started. > > Number 1 Hiroshi has ifdef'ed out in the code. Items 1 and 2 have > problems with backend code and 3rd party code not seeing the dropped > columns, or having gaps in the attno numbering. If we want to make ADD COLUMN to work with inheritance wihout having to rewrite every single tuple in both parent and inherited tables, we will have to accept the fact that there are caps in in attno numbering. > Number 3 has problems > with making it an atomic operation, and number 4 is described below. Nr 4 has still problems with attno numbering _changing_ during drop which could either be better or worse for client software than having gaps - in both cases client must be prepared to deal with runtime changes in attribute definition. -------------- Hannu
Don Baccus <dhogaza@pacifier.com> writes: > Oracle gives you the choice between the "cheating" fast method(s) and > the "real" slow (really slow?) real method. > So there's at least real world experience by virtue of example by > the world's most successful database supplier that user control > over "hide the column" and "really delete the column" is valuable. Sure, but you don't need any help from the database to do "really delete the column". SELECT INTO... is enough, and it's not even any slower than the implementations under discussion. So I'm satisfied if we offer the "hide the column" approach. Has anyone thought about what happens to table constraints that use the doomed column? Triggers, RI rules, yadda yadda? Has anyone thought about undoing a DELETE COLUMN? The data is still there, at least in tuples that have not been updated, so it's not totally unreasonable. regards, tom lane
At 01:14 AM 6/10/00 -0400, Tom Lane wrote: >Don Baccus <dhogaza@pacifier.com> writes: >> Oracle gives you the choice between the "cheating" fast method(s) and >> the "real" slow (really slow?) real method. > >> So there's at least real world experience by virtue of example by >> the world's most successful database supplier that user control >> over "hide the column" and "really delete the column" is valuable. > >Sure, but you don't need any help from the database to do "really delete >the column". SELECT INTO... is enough, and it's not even any slower >than the implementations under discussion. > >So I'm satisfied if we offer the "hide the column" approach. <shrug> I wouldn't put a "real" drop column at the top of my list of priorities, but there is something to be said for user convenience. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
> > -----Original Message----- > > From: pgsql-hackers-owner@hub.org > > [mailto:pgsql-hackers-owner@hub.org]On Behalf Of Bruce Momjian > > > > Seems we have 4 DROP COLUMN ideas: > > > > Method Advantage > > ----------------------------------------------------------------- > > 1 invisible column marked by negative attnum fast > > 2 invisible column marked by is_dropped column fast > > 3 make copy of table without column col removed > > 4 make new tuples in existing table without column col removed > > > > Folks, we had better choose one and get started. > > > > Number 1 Hiroshi has ifdef'ed out in the code. Items 1 and 2 have > > problems with backend code and 3rd party code not seeing the dropped > > columns, > > Hmm,doesn't *not seeing* mean the column is dropped ? I meant problems of backend code and 3rd party code _seeing_ the dropped column in pg_attribute. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
>> Seems we have 4 DROP COLUMN ideas: >> Method Advantage >> ----------------------------------------------------------------- >> 1 invisible column marked by negative attnum fast >> 2 invisible column marked by is_dropped column fast >> 3 make copy of table without column col removed >> 4 make new tuples in existing table without column col removed Bruce and I talked about this by phone yesterday, and we realized that none of these are very satisfactory. #1 and #2 both have the flaw that applications that examine pg_attribute will probably break: they will see a sequence of attnum values with gaps in it. And what should the rel's relnatts field be set to? #3 and #4 are better on that point, but they leave us with the problem of renumbering references to columns after the dropped one in constraints, rules, PL functions, etc. Furthermore, there is a closely related problem that none of these approaches give us much help on: recursive ALTER TABLE ADD COLUMN. Right now, ADD puts the new column at the end of each table it's added to, which often means that it gets a different column number in child tables than in parent tables. That leads to havoc for pg_dump. I think the only clean solution is to create a clear distinction between physical and logical column numbers. Each pg_attribute tuple would need two attnum fields, and pg_class would need two relnatts fields as well. A column once created would never change its physical column number, but its logical column number might change as a consequence of adding or dropping columns before it. ADD COLUMN would ensure that a column added to child tables receives the same logical column number as it has in the parent table, thus solving the dump/reload problem. DROP COLUMN would assign an invalid logical column number to dropped columns. They could be numbered zero except that we'd probably still want a unique index on attrelid+attnum, and the index would complain. I'd suggest using Hiroshi's idea: give a dropped column a logical attnum equal to -(physical_attnum + offset). With this approach, internal operations on tuples would all use physical column numbers, but operations that interface to the outside world would present a view of only the valid logical columns. For example, the parser would only allow logical columns to be referenced by name; "SELECT *" would expand to valid logical columns in logical- column-number order; COPY would send or receive valid logical columns in logical-column-number order; etc. Stored rules and so forth probably should store physical column numbers so that they need not be modified during column add/drop. This would require looking at all the places in the backend to determine whether they should be working with logical or physical column numbers, but the design is such that most all places would want to be using physical numbers, so I don't think it'd be too painful. Although I'd prefer to give the replacement columns two new names (eg, "attlnum" and "attpnum") to ensure we find all uses, this would surely break applications that examine pg_attribute. For compatibility we'd have to recycle "attnum" and "relnatts" to indicate logical column number and logical column count, while adding new fields (say "attpnum" and "relnpatts") for the physical number and count. Comments? regards, tom lane
> >> Seems we have 4 DROP COLUMN ideas: > >> Method Advantage > >> ----------------------------------------------------------------- > >> 1 invisible column marked by negative attnum fast > >> 2 invisible column marked by is_dropped column fast > >> 3 make copy of table without column col removed > >> 4 make new tuples in existing table without column col removed > > Bruce and I talked about this by phone yesterday, and we realized that > none of these are very satisfactory. #1 and #2 both have the flaw that > applications that examine pg_attribute will probably break: they will > see a sequence of attnum values with gaps in it. And what should the > rel's relnatts field be set to? #3 and #4 are better on that point, > but they leave us with the problem of renumbering references to columns > after the dropped one in constraints, rules, PL functions, etc. Yes, glad you summarized. > > Furthermore, there is a closely related problem that none of these > approaches give us much help on: recursive ALTER TABLE ADD COLUMN. > Right now, ADD puts the new column at the end of each table it's added > to, which often means that it gets a different column number in child > tables than in parent tables. That leads to havoc for pg_dump. Also good point. > > I think the only clean solution is to create a clear distinction between > physical and logical column numbers. Each pg_attribute tuple would need > two attnum fields, and pg_class would need two relnatts fields as well. Excellent idea. > A column once created would never change its physical column number, but > its logical column number might change as a consequence of adding or > dropping columns before it. ADD COLUMN would ensure that a column added > to child tables receives the same logical column number as it has in the > parent table, thus solving the dump/reload problem. DROP COLUMN would > assign an invalid logical column number to dropped columns. They could > be numbered zero except that we'd probably still want a unique index on > attrelid+attnum, and the index would complain. I'd suggest using > Hiroshi's idea: give a dropped column a logical attnum equal to > -(physical_attnum + offset). My guess is that we would need a unique index on the physical attno, not the logical one. Multiple zero attno's may be fine. > > With this approach, internal operations on tuples would all use > physical column numbers, but operations that interface to the outside > world would present a view of only the valid logical columns. For > example, the parser would only allow logical columns to be referenced > by name; "SELECT *" would expand to valid logical columns in logical- > column-number order; COPY would send or receive valid logical columns > in logical-column-number order; etc. Yes, the only hard part will be taking values supplied in logical order and moving them into pysical order. Not too hard with dropped columns, because they are only gaps, but inheritance would require re-ordering some of the values supplied by the user. Not hard, just something additional that is needed. > > Stored rules and so forth probably should store physical column numbers > so that they need not be modified during column add/drop. Yes! > > This would require looking at all the places in the backend to determine > whether they should be working with logical or physical column numbers, > but the design is such that most all places would want to be using > physical numbers, so I don't think it'd be too painful. Agreed. Most are physical. > > Although I'd prefer to give the replacement columns two new names > (eg, "attlnum" and "attpnum") to ensure we find all uses, this would > surely break applications that examine pg_attribute. For compatibility > we'd have to recycle "attnum" and "relnatts" to indicate logical column > number and logical column count, while adding new fields (say "attpnum" > and "relnpatts") for the physical number and count. Can I recommend keeping attnum and relatts as logical, and adding attheapnum and relheapatts so that it clearly shows these are the heap values, not the user values. Great idea. I was seeing things blocked in every option until your idea. Also, my guess is that Hiroshi's #ifdef's mark the places we need to start looking at. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > >> Seems we have 4 DROP COLUMN ideas: > >> Method Advantage > >> ----------------------------------------------------------------- > >> 1 invisible column marked by negative attnum fast > >> 2 invisible column marked by is_dropped column fast > >> 3 make copy of table without column col removed > >> 4 make new tuples in existing table without column col removed > Hmm,I've received no pg-ML mails for more than 1 day. What's happened with pgsql ML ? > Bruce and I talked about this by phone yesterday, and we realized that > none of these are very satisfactory. #1 and #2 both have the flaw that > applications that examine pg_attribute will probably break: they will > see a sequence of attnum values with gaps in it. And what should the > rel's relnatts field be set to? #3 and #4 are better on that point, > but they leave us with the problem of renumbering references to columns > after the dropped one in constraints, rules, PL functions, etc. > > Furthermore, there is a closely related problem that none of these > approaches give us much help on: recursive ALTER TABLE ADD COLUMN. > Right now, ADD puts the new column at the end of each table it's added > to, which often means that it gets a different column number in child > tables than in parent tables. That leads to havoc for pg_dump. > Inheritance is one of the reason why I didn't take #2. I don't understand marking is_dropped is needed or not when pg_attribute is overhauled for inheritance. I myself have never wanted to use current inheritance functionality mainly because of this big flaw. Judging from the recent discussion about oo(though I don't understand details),the change seems to be needed in order to make inheritance functionality really useful. > I think the only clean solution is to create a clear distinction between > physical and logical column numbers. Each pg_attribute tuple would need > two attnum fields, and pg_class would need two relnatts fields as well. > A column once created would never change its physical column number, but I don't understand inheritance well. In the near future wouldn't the implementation require e.g. attid which is common to all children of a parent and is never changed ? If so,we would need the third attid field which is irrevalent to physical/logical position. If not, physical column number would be sufficient . > its logical column number might change as a consequence of adding or > dropping columns before it. ADD COLUMN would ensure that a column added > to child tables receives the same logical column number as it has in the > parent table, thus solving the dump/reload problem. DROP COLUMN would > assign an invalid logical column number to dropped columns. They could > be numbered zero except that we'd probably still want a unique index on > attrelid+attnum, and the index would complain. I'd suggest using > Hiroshi's idea: give a dropped column a logical attnum equal to > -(physical_attnum + offset). > > With this approach, internal operations on tuples would all use > physical column numbers, but operations that interface to the outside > world would present a view of only the valid logical columns. For > example, the parser would only allow logical columns to be referenced > by name; "SELECT *" would expand to valid logical columns in logical- > column-number order; COPY would send or receive valid logical columns > in logical-column-number order; etc. > > Stored rules and so forth probably should store physical column numbers > so that they need not be modified during column add/drop. > > This would require looking at all the places in the backend to determine > whether they should be working with logical or physical column numbers, > but the design is such that most all places would want to be using > physical numbers, so I don't think it'd be too painful. > > Although I'd prefer to give the replacement columns two new names > (eg, "attlnum" and "attpnum") to ensure we find all uses, this would > surely break applications that examine pg_attribute. For compatibility > we'd have to recycle "attnum" and "relnatts" to indicate logical column > number and logical column count, while adding new fields (say "attpnum" > and "relnpatts") for the physical number and count. > I agree with you that we would add attpnum and change the meaing of attnum as logical column number for backward compatibility. Regards. Hiroshi Inoue Inoue@tpf.co.jp
I don't know if this is one of the 5, but my idea of a good implementation is to do the fast invisible approach, and then update individual tuples to the new format the next time they happen to be UPDATEd. Therefore, ALTER TABLE DROP COLUMN, followed by UPDATE foo SET bar=bar; would cause the equiv of (4). -- Chris Bitmead mailto:chris@bitmead.com Hannu Krosing wrote: > > Bruce Momjian wrote: > > > > Seems we have 4 DROP COLUMN ideas: > > > > Method Advantage > > ----------------------------------------------------------------- > > 1 invisible column marked by negative attnum fast > > 2 invisible column marked by is_dropped column fast > > 3 make copy of table without column col removed > > 4 make new tuples in existing table without column col removed > > IIRC there was a fifth idea, a variation of 2 that would work better > with > inheritance - > > 5 all columns have is_real_column attribute that is true for all > coluns > present in that relation, so situations like > > create table tab_a(a_i int); > create table tab_b(b_i int) inherits(tab_a); > alter table tab_a add column c_i int; > > can be made to work. > > It would also require clients to ignore all missing columns that backend > can > pass to them as nulls (which is usually quite cheap in bandwith usage) > in > case of "SELECT **" queries. > > We could even rename attno to attid to make folks aware that it is not > be > assumed to be continuous. > > > Folks, we had better choose one and get started. > > > > Number 1 Hiroshi has ifdef'ed out in the code. Items 1 and 2 have > > problems with backend code and 3rd party code not seeing the dropped > > columns, or having gaps in the attno numbering. > > If we want to make ADD COLUMN to work with inheritance wihout having to > rewrite every single tuple in both parent and inherited tables, we will > have to accept the fact that there are caps in in attno numbering. > > > Number 3 has problems > > with making it an atomic operation, and number 4 is described below. > > Nr 4 has still problems with attno numbering _changing_ during drop > which > could either be better or worse for client software than having gaps - > in both cases client must be prepared to deal with runtime changes in > attribute definition. > > -------------- > Hannu
Chris Bitmead <chris@bitmead.com> writes: > I don't know if this is one of the 5, but my idea of a good > implementation is to do the fast invisible approach, and then update > individual tuples to the new format the next time they happen to be > UPDATEd. How would you tell whether a particular tuple has been updated or not? Furthermore, how would you remember the old tuple format (or formats) so that you'd know how to make the conversion? Seems to me this approach would require adding some sort of table version number to every tuple header, plus storing a complete set of system catalog entries for every past version of each table's schema. That's a heck of a high price, in both storage and complexity, for a feature of dubious value... regards, tom lane
Yes, it would need to work as you describe below. Such a scheme is used in several object databases I know of. (Versant being one) where it works great. It's not just useful for drop column, but also things like add column with default value. It means that you can add and drop columns to your hearts content in the blink of an eye, and yet ultimately not pay the price in terms of storage costs. But yep, it's a lot more work, and understandable if there isn't enthusiasm for doing it. Tom Lane wrote: > > Chris Bitmead <chris@bitmead.com> writes: > > I don't know if this is one of the 5, but my idea of a good > > implementation is to do the fast invisible approach, and then update > > individual tuples to the new format the next time they happen to be > > UPDATEd. > > How would you tell whether a particular tuple has been updated or not? > > Furthermore, how would you remember the old tuple format (or formats) > so that you'd know how to make the conversion? > > Seems to me this approach would require adding some sort of table > version number to every tuple header, plus storing a complete set of > system catalog entries for every past version of each table's schema. > That's a heck of a high price, in both storage and complexity, for a > feature of dubious value...
Hiroshi Inoue wrote: > I don't understand inheritance well. In the near future wouldn't the > implementation require e.g. attid which is common to all children > of a parent and is never changed ? If so,we would need the third > attid field which is irrevalent to physical/logical position. If not, > physical column number would be sufficient . We only need something like a unique attid of course if we support column renaming in child tables. Otherwise the attname is sufficient to match up child-parent columns. If/when we support renaming, probably a parent_column_oid in pg_attribute might be one way to go. Your idea seems fine Tom.
> -----Original Message----- > From: Chris Bitmead > > Hiroshi Inoue wrote: > > > I don't understand inheritance well. In the near future wouldn't the > > implementation require e.g. attid which is common to all children > > of a parent and is never changed ? If so,we would need the third > > attid field which is irrevalent to physical/logical position. If not, > > physical column number would be sufficient . > > We only need something like a unique attid of course if we support > column renaming in child tables. Otherwise the attname is sufficient to > match up child-parent columns. > There are some objects which keep plans etc as compiled state. create table t1 (i1 int4); create table t2 (i2 int4) inherits t1; create table t3 (i3 int4) inherits t2; alter table t1 add column i4 int4; For each table,the list of (column, logical number, physical number) would be as follows. t1 (i1, 1, 1) (i4, 2, 2) t2 (i1, 1, 1) (i4, 2, 3) (i2, 3, 2) t3 (i1, 1, 1) (i4, 2, 4) (i2, 3, 2) (i3, 4, 3) At this point the compilation of 'select * from t1(*?)' would meanselect (physical #1),(physical #2) from t1 +select (physical#1),(physical #3) from t2 +select (physical #1),(physical #4) from t3 Note that physical # aren't common for column i4. I've wanted to confirm that above compilation would be OK for the (near) future enhancement of inheritance functionality. Regards. Hiroshi Inoue Inoue@tpf.co.jp
On thinking about it I can definitely see your point about wanting an attrid that is common across the hierarchy, regardless of compiled plans. There would be some merit in splitting up pg_attribute into two parts. One part is common across all classes in the hierarchy, the other part is specific to one class. Then the oid of the common part is the attrid you refer to. However, I'm not sure this directly affects Tom's proposal. Selects from hierarchies are implemented in terms of a union of all the classes in the hierarchy. Wouldn't the compiled plan refer to physical ids? In any case, if UNION can be made to work, I would think select hierarchies automatically would work too. Hiroshi Inoue wrote: > > > -----Original Message----- > > From: Chris Bitmead > > > > Hiroshi Inoue wrote: > > > > > I don't understand inheritance well. In the near future wouldn't the > > > implementation require e.g. attid which is common to all children > > > of a parent and is never changed ? If so,we would need the third > > > attid field which is irrevalent to physical/logical position. If not, > > > physical column number would be sufficient . > > > > We only need something like a unique attid of course if we support > > column renaming in child tables. Otherwise the attname is sufficient to > > match up child-parent columns. > > > > There are some objects which keep plans etc as compiled > state. > > create table t1 (i1 int4); > create table t2 (i2 int4) inherits t1; > create table t3 (i3 int4) inherits t2; > alter table t1 add column i4 int4; > > For each table,the list of (column, logical number, physical number) > would be as follows. > > t1 (i1, 1, 1) (i4, 2, 2) > t2 (i1, 1, 1) (i4, 2, 3) (i2, 3, 2) > t3 (i1, 1, 1) (i4, 2, 4) (i2, 3, 2) (i3, 4, 3) > > At this point the compilation of 'select * from t1(*?)' would mean > select (physical #1),(physical #2) from t1 + > select (physical #1),(physical #3) from t2 + > select (physical #1),(physical #4) from t3 > > Note that physical # aren't common for column i4. > I've wanted to confirm that above compilation would be OK for > the (near) future enhancement of inheritance functionality. > > Regards. > > Hiroshi Inoue > Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > create table t1 (i1 int4); > create table t2 (i2 int4) inherits t1; > create table t3 (i3 int4) inherits t2; > alter table t1 add column i4 int4; > For each table,the list of (column, logical number, physical number) > would be as follows. > t1 (i1, 1, 1) (i4, 2, 2) > t2 (i1, 1, 1) (i4, 2, 3) (i2, 3, 2) > t3 (i1, 1, 1) (i4, 2, 4) (i2, 3, 2) (i3, 4, 3) > At this point the compilation of 'select * from t1(*?)' would mean > select (physical #1),(physical #2) from t1 + > select (physical #1),(physical #3) from t2 + > select (physical #1),(physical #4) from t3 > Note that physical # aren't common for column i4. That's no different from the current situation: the planner already must (and does) adjust column numbers for each derived table while expanding an inherited query. It's kind of a pain but hardly an insurmountable problem. Currently the matching is done by column name. We could possibly match on logical column position instead --- not sure if that's better or worse. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > create table t1 (i1 int4); > > create table t2 (i2 int4) inherits t1; > > create table t3 (i3 int4) inherits t2; > > alter table t1 add column i4 int4; > > > For each table,the list of (column, logical number, physical number) > > would be as follows. > > > t1 (i1, 1, 1) (i4, 2, 2) > > t2 (i1, 1, 1) (i4, 2, 3) (i2, 3, 2) > > t3 (i1, 1, 1) (i4, 2, 4) (i2, 3, 2) (i3, 4, 3) > > > At this point the compilation of 'select * from t1(*?)' would mean > > select (physical #1),(physical #2) from t1 + > > select (physical #1),(physical #3) from t2 + > > select (physical #1),(physical #4) from t3 > > > Note that physical # aren't common for column i4. > > That's no different from the current situation: Yes your proposal has no problem currently. I'm only anxious about oo feature. Recently there has been a discussion around oo and we would be able to expect the progress in the near future. If oo people never mind, your proposal would be OK. Regards. Hiroshi Inoue Inoue@tpf.co.jp
OK, I am opening this can of worms again. I personally would like to see this code activated, even if it does take 2x the disk space to alter a column. Hiroshi had other ideas. Where did we leave this? We have one month to decide on a plan. > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > You can exclusively lock the table, then do a heap_getnext() scan over > > the entire table, remove the dropped column, do a heap_insert(), then a > > heap_delete() on the current tuple, making sure to skip over the tuples > > inserted by the current transaction. When completed, remove the column > > from pg_attribute, mark the transaction as committed (if desired), and > > run vacuum over the table to remove the deleted rows. > > Hmm, that would work --- the new tuples commit at the same instant that > the schema updates commit, so it should be correct. You have the 2x > disk usage problem, but there's no way around that without losing > rollback ability. > > A potentially tricky bit will be persuading the tuple-reading and tuple- > writing subroutines to pay attention to different versions of the tuple > structure for the same table. I haven't looked to see if this will be > difficult or not. If you can pass the TupleDesc explicitly then it > shouldn't be a problem. > > I'd suggest that the cleanup vacuum *not* be an automatic part of > the operation; just recommend that people do it ASAP after dropping > a column. Consider needing to drop several columns... > > regards, tom lane > > ************ > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > OK, I am opening this can of worms again. I personally would like to > see this code activated, even if it does take 2x the disk space to alter > a column. Hiroshi had other ideas. Where did we leave this? We have > one month to decide on a plan. I think the plan should be to do nothing for 7.1. ALTER DROP COLUMN isn't an especially pressing feature, and so I don't feel that we should be hustling to squeeze it in just before beta. We're already overdue for beta. regards, tom lane
> -----Original Message----- > From: Tom Lane > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > OK, I am opening this can of worms again. I personally would like to > > see this code activated, even if it does take 2x the disk space to alter > > a column. Hiroshi had other ideas. Where did we leave this? We have > > one month to decide on a plan. > > I think the plan should be to do nothing for 7.1. ALTER DROP COLUMN > isn't an especially pressing feature, and so I don't feel that we > should be hustling to squeeze it in just before beta. We're already > overdue for beta. > Seems some people expect the implementation in 7.1. (recent [GENERAL} drop column?) I could commit my local branch if people don't mind backward incompatibility. I've maintained the branch for more than 1 month and it implements the following TODOs. * Add ALTER TABLE DROP COLUMN feature * ALTER TABLE ADD COLUMN to inherited table put column in wrong place * Prevent column dropping if column is used by foreign key Comments ? Hiroshi Inoue P.S. I've noticed that get_rte_attribute_name() seems to break my implementation. I'm not sure if I could solve it.
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > Seems some people expect the implementation in 7.1. > (recent [GENERAL} drop column?) > I could commit my local branch if people don't mind > backward incompatibility. I've lost track --- is this different from the _DROP_COLUMN_HACK__ code that's already in CVS? I really really didn't like that implementation :-(, but I forget what other methods were being discussed. > P.S. I've noticed that get_rte_attribute_name() seems to > break my implementation. I'm not sure if I could solve it. That would be a problem --- rule dumping depends on that code to produce correct aliases, so making it work is not optional. regards, tom lane
Tom Lane wrote: > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > Seems some people expect the implementation in 7.1. > > (recent [GENERAL} drop column?) > > I could commit my local branch if people don't mind > > backward incompatibility. > > I've lost track --- is this different from the _DROP_COLUMN_HACK__ > code that's already in CVS? I really really didn't like that > implementation :-(, but I forget what other methods were being > discussed. > My current local trial implementation follows your idea(logical/ physical attribute numbers). > > P.S. I've noticed that get_rte_attribute_name() seems to > > break my implementation. I'm not sure if I could solve it. > > That would be a problem --- rule dumping depends on that code to > produce correct aliases, so making it work is not optional. > Your change has no problem if logical==physical attribute numbers. Regards. Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes: >>>> P.S. I've noticed that get_rte_attribute_name() seems to >>>> break my implementation. I'm not sure if I could solve it. >> >> That would be a problem --- rule dumping depends on that code to >> produce correct aliases, so making it work is not optional. > Your change has no problem if logical==physical attribute > numbers. But if they're not, what do we do? Can we define the order of the alias-name lists as being one or the other numbering? (Offhand I'd say it should be logical numbering, but I haven't chased the details.) If neither of those work, we'll need some more complex datastructure than a simple list. regards, tom lane
Tom Lane wrote: > Hiroshi Inoue <Inoue@tpf.co.jp> writes: > >>>> P.S. I've noticed that get_rte_attribute_name() seems to > >>>> break my implementation. I'm not sure if I could solve it. > >> > >> That would be a problem --- rule dumping depends on that code to > >> produce correct aliases, so making it work is not optional. > > > Your change has no problem if logical==physical attribute > > numbers. > > But if they're not, what do we do? Can we define the order of the > alias-name lists as being one or the other numbering? (Offhand I'd > say it should be logical numbering, but I haven't chased the details.) > If neither of those work, we'll need some more complex datastructure > than a simple list. > I'm not sure if we could keep invariant attribute numbers. Though I've used physical attribute numbers as many as possible in my trial implementation,there's already an exception. I had to use logical attribute numbers for FieldSelect node. Regards. Hiroshi Inoue
At 12:05 6/10/00 +0900, Hiroshi Inoue wrote: > >Tom Lane wrote: > >> Hiroshi Inoue <Inoue@tpf.co.jp> writes: >> >>>> P.S. I've noticed that get_rte_attribute_name() seems to >> >>>> break my implementation. I'm not sure if I could solve it. >> >> >> >> That would be a problem --- rule dumping depends on that code to >> >> produce correct aliases, so making it work is not optional. >> >> > Your change has no problem if logical==physical attribute >> > numbers. >> >> But if they're not, what do we do? Can we define the order of the >> alias-name lists as being one or the other numbering? (Offhand I'd >> say it should be logical numbering, but I haven't chased the details.) >> If neither of those work, we'll need some more complex datastructure >> than a simple list. >> > >I'm not sure if we could keep invariant attribute numbers. >Though I've used physical attribute numbers as many as possible >in my trial implementation,there's already an exception. >I had to use logical attribute numbers for FieldSelect node. > Not really a useful suggestion at this stage, but it seems to me that storing plans and/or parse trees is possibly a false economy. Would it be worth considering storing the relevant SQL (or a parse tree with field & table names) and compiling the rule in each backend the first time it is used? (and keep it for the life of the backend). This would allow underlying view tables to be deleted/added as well as make the above problem go away. The 'parse tree with names' would also enable easy construction of dependency information when and if that is implemented... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
seconded ... On Fri, 29 Sep 2000, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > OK, I am opening this can of worms again. I personally would like to > > see this code activated, even if it does take 2x the disk space to alter > > a column. Hiroshi had other ideas. Where did we leave this? We have > > one month to decide on a plan. > > I think the plan should be to do nothing for 7.1. ALTER DROP COLUMN > isn't an especially pressing feature, and so I don't feel that we > should be hustling to squeeze it in just before beta. We're already > overdue for beta. > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Thu, 5 Oct 2000, Tom Lane wrote: > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > Seems some people expect the implementation in 7.1. > > (recent [GENERAL} drop column?) > > I could commit my local branch if people don't mind > > backward incompatibility. there have been several ideas thrown back and forth ... the best one that I saw, forgetting who suggested it, had to do with the idea of locking the table and doing an effective vacuum on that table with a 'row re-write' happening ... Basically, move the first 100 rows to the end of the table file, then take 100 and write it to position 0, 101 to position 1, etc ... that way, at max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table size ... either method is going to lock the file for a period of time, but one is much more friendly as far as disk space is concerned *plus*, if RAM is available for this, it might even be something that the backend could use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and the table is 24Meg in size, it could do it all in memory?
> Basically, move the first 100 rows to the end of the table file, then take > 100 and write it to position 0, 101 to position 1, etc ... that way, at > max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table > size ... either method is going to lock the file for a period of time, but > one is much more friendly as far as disk space is concerned *plus*, if RAM > is available for this, it might even be something that the backend could > use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and > the table is 24Meg in size, it could do it all in memory? Yes, I liked that too. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Basically, move the first 100 rows to the end of the table file, then take >> 100 and write it to position 0, 101 to position 1, etc ... that way, at >> max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table >> size ... either method is going to lock the file for a period of time, but >> one is much more friendly as far as disk space is concerned *plus*, if RAM >> is available for this, it might even be something that the backend could >> use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and >> the table is 24Meg in size, it could do it all in memory? > Yes, I liked that too. What happens if you crash partway through? I don't think it's possible to build a crash-robust rewriting ALTER process that doesn't use 2X disk space: you must have all the old tuples AND all the new tuples down on disk simultaneously just before you commit. The only way around 2X disk space is to adopt some logical renumbering approach to the columns, so that you can pretend the dropped column isn't there anymore when it really still is. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Basically, move the first 100 rows to the end of the table file, then take > >> 100 and write it to position 0, 101 to position 1, etc ... that way, at > >> max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table > >> size ... either method is going to lock the file for a period of time, but > >> one is much more friendly as far as disk space is concerned *plus*, if RAM > >> is available for this, it might even be something that the backend could > >> use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and > >> the table is 24Meg in size, it could do it all in memory? > > > Yes, I liked that too. > > What happens if you crash partway through? > > I don't think it's possible to build a crash-robust rewriting ALTER > process that doesn't use 2X disk space: you must have all the old tuples > AND all the new tuples down on disk simultaneously just before you > commit. The only way around 2X disk space is to adopt some logical > renumbering approach to the columns, so that you can pretend the dropped > column isn't there anymore when it really still is. Yes, I liked the 2X disk space, and making the new tuples visible all at once at the end. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 9 Oct 2000, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Basically, move the first 100 rows to the end of the table file, then take > >> 100 and write it to position 0, 101 to position 1, etc ... that way, at > >> max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table > >> size ... either method is going to lock the file for a period of time, but > >> one is much more friendly as far as disk space is concerned *plus*, if RAM > >> is available for this, it might even be something that the backend could > >> use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and > >> the table is 24Meg in size, it could do it all in memory? > > > Yes, I liked that too. > > What happens if you crash partway through? what happens if you crash partway through a vacuum? > I don't think it's possible to build a crash-robust rewriting ALTER > process that doesn't use 2X disk space: you must have all the old > tuples AND all the new tuples down on disk simultaneously just before > you commit. The only way around 2X disk space is to adopt some > logical renumbering approach to the columns, so that you can pretend > the dropped column isn't there anymore when it really still is. how about a combination of the two? basically, we're gonna want a vacuum of the table after the alter to clean out those extra columns that we've marked as 'dead' ... basically, anything that avoids tht whole 2x disk space option is cool ...
On Mon, 9 Oct 2000, Bruce Momjian wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > >> Basically, move the first 100 rows to the end of the table file, then take > > >> 100 and write it to position 0, 101 to position 1, etc ... that way, at > > >> max, you are using ( tuple * 100 ) bytes of disk space, vs 2x the table > > >> size ... either method is going to lock the file for a period of time, but > > >> one is much more friendly as far as disk space is concerned *plus*, if RAM > > >> is available for this, it might even be something that the backend could > > >> use up to -S blocks of RAM to do it off disk? If I set -S to 64meg, and > > >> the table is 24Meg in size, it could do it all in memory? > > > > > Yes, I liked that too. > > > > What happens if you crash partway through? > > > > I don't think it's possible to build a crash-robust rewriting ALTER > > process that doesn't use 2X disk space: you must have all the old tuples > > AND all the new tuples down on disk simultaneously just before you > > commit. The only way around 2X disk space is to adopt some logical > > renumbering approach to the columns, so that you can pretend the dropped > > column isn't there anymore when it really still is. > > Yes, I liked the 2X disk space, and making the new tuples visible all at > once at the end. man, are you ever wishy-washy on this issue, aren't you? :) you like not using 2x, you like using 2x ... :)
The Hermit Hacker <scrappy@hub.org> writes: >> What happens if you crash partway through? > what happens if you crash partway through a vacuum? Nothing. Vacuum is crash-safe. ALTER TABLE should be too. regards, tom lane
On Mon, 9 Oct 2000, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > >> What happens if you crash partway through? > > > what happens if you crash partway through a vacuum? > > Nothing. Vacuum is crash-safe. ALTER TABLE should be too. Sorry, that's what I meant ... why should marking a column as 'deleted' and running a 'vacuum' to clean up the physical table be any less crash-safe?
> On Mon, 9 Oct 2000, Tom Lane wrote: > > > The Hermit Hacker <scrappy@hub.org> writes: > > >> What happens if you crash partway through? > > > > > what happens if you crash partway through a vacuum? > > > > Nothing. Vacuum is crash-safe. ALTER TABLE should be too. > > Sorry, that's what I meant ... why should marking a column as 'deleted' > and running a 'vacuum' to clean up the physical table be any less > crash-safe? It is not. The only downside is 2x disk space to make new versions of the tuple. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 9 Oct 2000, Bruce Momjian wrote: > > On Mon, 9 Oct 2000, Tom Lane wrote: > > > > > The Hermit Hacker <scrappy@hub.org> writes: > > > >> What happens if you crash partway through? > > > > > > > what happens if you crash partway through a vacuum? > > > > > > Nothing. Vacuum is crash-safe. ALTER TABLE should be too. > > > > Sorry, that's what I meant ... why should marking a column as 'deleted' > > and running a 'vacuum' to clean up the physical table be any less > > crash-safe? > > It is not. The only downside is 2x disk space to make new versions of > the tuple. huh? vacuum moves/cleans up tuples, as well as compresses them, so that the end result is a smaller table then what it started with, at/with very little increase in the total size/space needed to perform the vacuum ... if we reduced vacuum such that it compressed at the field level vs tuple, we could move a few tuples to the end of the table (crash safe) and then move N+1 to position 1 minus that extra field. If we mark the column as being deleted, then if the system crashes part way through, it should be possible to continue after the system is brought up, no?
> > > Sorry, that's what I meant ... why should marking a column as 'deleted' > > > and running a 'vacuum' to clean up the physical table be any less > > > crash-safe? > > > > It is not. The only downside is 2x disk space to make new versions of > > the tuple. > > huh? vacuum moves/cleans up tuples, as well as compresses them, so that > the end result is a smaller table then what it started with, at/with very > little increase in the total size/space needed to perform the vacuum ... > > if we reduced vacuum such that it compressed at the field level vs tuple, > we could move a few tuples to the end of the table (crash safe) and then > move N+1 to position 1 minus that extra field. If we mark the column as > being deleted, then if the system crashes part way through, it should be > possible to continue after the system is brought up, no? If it crashes in the middle, some rows have the column removed, and some do not. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 9 Oct 2000, Bruce Momjian wrote: > > > > Sorry, that's what I meant ... why should marking a column as 'deleted' > > > > and running a 'vacuum' to clean up the physical table be any less > > > > crash-safe? > > > > > > It is not. The only downside is 2x disk space to make new versions of > > > the tuple. > > > > huh? vacuum moves/cleans up tuples, as well as compresses them, so that > > the end result is a smaller table then what it started with, at/with very > > little increase in the total size/space needed to perform the vacuum ... > > > > if we reduced vacuum such that it compressed at the field level vs tuple, > > we could move a few tuples to the end of the table (crash safe) and then > > move N+1 to position 1 minus that extra field. If we mark the column as > > being deleted, then if the system crashes part way through, it should be > > possible to continue after the system is brought up, no? > > If it crashes in the middle, some rows have the column removed, and some > do not. hrmm .. mvcc uses a timestamp, no? is there no way of using that timestamp to determine which columns have/haven't been cleaned up following a crash? maybe some way of marking a table as being in a 'drop column' mode, so that when it gets brought back up again, it is scan'd for any tuples older then that date?
The Hermit Hacker <scrappy@hub.org> writes: >> It is not. The only downside is 2x disk space to make new versions of >> the tuple. > huh? vacuum moves/cleans up tuples, as well as compresses them, so that > the end result is a smaller table then what it started with, at/with very > little increase in the total size/space needed to perform the vacuum ... Huh? right back at you ;-). Vacuum is very careful to make sure that it always has two copies of any tuple it moves. The reason it's not 2x disk space is that it only moves tuples to fill free space in existing pages of the file. So the moved tuples take up space-that-was-free as well as the space they were originally in. But this has nothing whatever to do with the requirements of ALTER DROP COLUMN --- to be safe, that must have two copies of every tuple, free space or no free space. regards, tom lane
The Hermit Hacker <scrappy@hub.org> writes: > hrmm .. mvcc uses a timestamp, no? is there no way of using that > timestamp to determine which columns have/haven't been cleaned up > following a crash? maybe some way of marking a table as being in a 'drop > column' mode, so that when it gets brought back up again, it is scan'd for > any tuples older then that date? WAL would provide the framework to do something like that, but I still say it'd be a bad idea. What you're describing is irrevocable-once-it-starts DROP COLUMN; there is no way to roll it back. We're trying to get rid of statements that act that way, not add more. I am not convinced that a 2x penalty for DROP COLUMN is such a huge problem that we should give up all the normal safety features of SQL in order to avoid it. Seems to me that DROP COLUMN is only a big issue during DB development, when you're usually working with relatively small amounts of test data anyway. regards, tom lane
On Mon, 9 Oct 2000, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > hrmm .. mvcc uses a timestamp, no? is there no way of using that > > timestamp to determine which columns have/haven't been cleaned up > > following a crash? maybe some way of marking a table as being in a 'drop > > column' mode, so that when it gets brought back up again, it is scan'd for > > any tuples older then that date? > > WAL would provide the framework to do something like that, but I still > say it'd be a bad idea. What you're describing is > irrevocable-once-it-starts DROP COLUMN; there is no way to roll it back. > We're trying to get rid of statements that act that way, not add more. Hrmmmm ... this one I can't really argue, or, at least, can't think of anything right now :( > I am not convinced that a 2x penalty for DROP COLUMN is such a huge > problem that we should give up all the normal safety features of SQL > in order to avoid it. Seems to me that DROP COLUMN is only a big > issue during DB development, when you're usually working with > relatively small amounts of test data anyway. Actually, I could see DROP COLUMN being useful in a few other places ... recently, I spent several hours re-structuring a clients database that had been built by someone else who didn't know what 'relational' means in RDBMS ... or how about an application developer that decides to restructure their schema's in a new release and provides an 'upgrade.sql' script that is designed to do this? A good example might be the UDMSearch stuff, where you have tables that are quite large, but they decide that they want to remove the 'base URL' component' of one table and put it into another table? a nice update script could go something like (pseudo like): ADD COLUMN base_url int; INSERT INTO new_table SELECT base_url_text FROM table; DROP COLUMN base_url_text; That would make for a very painful upgrade process if I have to go through the trouble of upgrading my hardware to add more space ...
> The Hermit Hacker <scrappy@hub.org> writes: > > hrmm .. mvcc uses a timestamp, no? is there no way of using that > > timestamp to determine which columns have/haven't been cleaned up > > following a crash? maybe some way of marking a table as being in a 'drop > > column' mode, so that when it gets brought back up again, it is scan'd for > > any tuples older then that date? > > WAL would provide the framework to do something like that, but I still > say it'd be a bad idea. What you're describing is > irrevocable-once-it-starts DROP COLUMN; there is no way to roll it back. > We're trying to get rid of statements that act that way, not add more. > > I am not convinced that a 2x penalty for DROP COLUMN is such a huge > problem that we should give up all the normal safety features of SQL > in order to avoid it. Seems to me that DROP COLUMN is only a big issue > during DB development, when you're usually working with relatively small > amounts of test data anyway. > Bingo! -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 9 Oct 2000, Bruce Momjian wrote: > > The Hermit Hacker <scrappy@hub.org> writes: > > > hrmm .. mvcc uses a timestamp, no? is there no way of using that > > > timestamp to determine which columns have/haven't been cleaned up > > > following a crash? maybe some way of marking a table as being in a 'drop > > > column' mode, so that when it gets brought back up again, it is scan'd for > > > any tuples older then that date? > > > > WAL would provide the framework to do something like that, but I still > > say it'd be a bad idea. What you're describing is > > irrevocable-once-it-starts DROP COLUMN; there is no way to roll it back. > > We're trying to get rid of statements that act that way, not add more. > > > > I am not convinced that a 2x penalty for DROP COLUMN is such a huge > > problem that we should give up all the normal safety features of SQL > > in order to avoid it. Seems to me that DROP COLUMN is only a big issue > > during DB development, when you're usually working with relatively small > > amounts of test data anyway. > > > > Bingo! you are jumping on your 'I agree/Bingo' much much too fast :)
> On Mon, 9 Oct 2000, Bruce Momjian wrote: > > > > The Hermit Hacker <scrappy@hub.org> writes: > > > > hrmm .. mvcc uses a timestamp, no? is there no way of using that > > > > timestamp to determine which columns have/haven't been cleaned up > > > > following a crash? maybe some way of marking a table as being in a 'drop > > > > column' mode, so that when it gets brought back up again, it is scan'd for > > > > any tuples older then that date? > > > > > > WAL would provide the framework to do something like that, but I still > > > say it'd be a bad idea. What you're describing is > > > irrevocable-once-it-starts DROP COLUMN; there is no way to roll it back. > > > We're trying to get rid of statements that act that way, not add more. > > > > > > I am not convinced that a 2x penalty for DROP COLUMN is such a huge > > > problem that we should give up all the normal safety features of SQL > > > in order to avoid it. Seems to me that DROP COLUMN is only a big issue > > > during DB development, when you're usually working with relatively small > > > amounts of test data anyway. > > > > > > > Bingo! > > you are jumping on your 'I agree/Bingo' much much too fast :) You know this DROP COLUMN is a hot button for me. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 9 Oct 2000, Bruce Momjian wrote: > > On Mon, 9 Oct 2000, Bruce Momjian wrote: > > > > > > The Hermit Hacker <scrappy@hub.org> writes: > > > > > hrmm .. mvcc uses a timestamp, no? is there no way of using that > > > > > timestamp to determine which columns have/haven't been cleaned up > > > > > following a crash? maybe some way of marking a table as being in a 'drop > > > > > column' mode, so that when it gets brought back up again, it is scan'd for > > > > > any tuples older then that date? > > > > > > > > WAL would provide the framework to do something like that, but I still > > > > say it'd be a bad idea. What you're describing is > > > > irrevocable-once-it-starts DROP COLUMN; there is no way to roll it back. > > > > We're trying to get rid of statements that act that way, not add more. > > > > > > > > I am not convinced that a 2x penalty for DROP COLUMN is such a huge > > > > problem that we should give up all the normal safety features of SQL > > > > in order to avoid it. Seems to me that DROP COLUMN is only a big issue > > > > during DB development, when you're usually working with relatively small > > > > amounts of test data anyway. > > > > > > > > > > Bingo! > > > > you are jumping on your 'I agree/Bingo' much much too fast :) > > You know this DROP COLUMN is a hot button for me. Ya, but in one email, you appear to agree with me ... then Tom posts a good point and you jump over to that side ... at least pick a side? :) I too wish to see it implemented, I just don't want to have to double my disk space if at some point I decide to upgrade an application and find out that they decided to change their schema(s) :(
> > > > > I am not convinced that a 2x penalty for DROP COLUMN is such a huge > > > > > problem that we should give up all the normal safety features of SQL > > > > > in order to avoid it. Seems to me that DROP COLUMN is only a big issue > > > > > during DB development, when you're usually working with relatively small > > > > > amounts of test data anyway. > > > > > > > > > > > > > Bingo! > > > > > > you are jumping on your 'I agree/Bingo' much much too fast :) > > > > You know this DROP COLUMN is a hot button for me. > > Ya, but in one email, you appear to agree with me ... then Tom posts a > good point and you jump over to that side ... at least pick a side? :) I > too wish to see it implemented, I just don't want to have to double my > disk space if at some point I decide to upgrade an application and find > out that they decided to change their schema(s) :( Sorry, I liked the vacuum idea, but 2x disk usage, not 100 at a time. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 07:55 PM 10/9/00 -0300, The Hermit Hacker wrote: >> I am not convinced that a 2x penalty for DROP COLUMN is such a huge >> problem that we should give up all the normal safety features of SQL >> in order to avoid it. Seems to me that DROP COLUMN is only a big >> issue during DB development, when you're usually working with >> relatively small amounts of test data anyway. > >Actually, I could see DROP COLUMN being useful in a few other places >... recently, I spent several hours re-structuring a clients database that >had been built by someone else who didn't know what 'relational' means in >RDBMS ... or how about an application developer that decides to >restructure their schema's in a new release and provides an 'upgrade.sql' >script that is designed to do this? This last example is one reason DROP COLUMN would be a great help to the OpenACS development effort. However, upgrades (new releases) are fairly infrequent, and many users of current versions won't bother unless they've run into toolkit bugs (same goes for updating PG). Those who do know that doing an upgrade will require planning, testing on a system that's not running their "live" website, and some amount of downtime. So I don't think a 2x penalty is a huge problem. >That would make for a very painful upgrade process if I have to go through >the trouble of upgrading my hardware to add more space ... For many folks, if eating 2x the size of a single table runs their system out of disk space, clearly they should've upgraded long, long ago. An OpenACS site has hundreds of tables, I can't imagine running my disk space so tight that I couldn't double the size of one of them long enough to do a DROP COLUMN. Obviously, some folks doing other things will have single tables that are huge, but after all they can always do what they do now - not drop columns. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
The Hermit Hacker wrote: > > On Mon, 9 Oct 2000, Bruce Momjian wrote: > > Ya, but in one email, you appear to agree with me ... then Tom posts a > good point and you jump over to that side ... at least pick a side? :) I > too wish to see it implemented, I just don't want to have to double my > disk space if at some point I decide to upgrade an application and find > out that they decided to change their schema(s) :( As Don already pointed out, if you don't have enough room to double your table size you must be running an one-table, append-only application where you can only do a very limited set of queries. select * from that_table order by some_column_without_an_index; is definitely out as it takes many times the space of a that_table anyway. There _may_ be some cases where 2x is unacceptable, but without radically changing tables on-disk structure there is no way to avoid it and still be able to rollback or even crash cleanly ;) ----------------- Hannu