Thread: DROP TABLE inside a transaction block
I see following in HISTORY: Disallow DROP TABLE/DROP INDEX inside a transaction block However, it seems that this is not done with current? test=# create table t1(i int); CREATE test=# begin; BEGIN test=# drop table t1; NOTICE: Caution: DROP TABLE cannot be rolled back, so don't abort now DROP test=# end; COMMIT test=# \d No relations found. -- Tatsuo Ishii
Tatsuo Ishii writes: > I see following in HISTORY: > > Disallow DROP TABLE/DROP INDEX inside a transaction block > > However, it seems that this is not done with current? > > test=# create table t1(i int); > CREATE > test=# begin; > BEGIN > test=# drop table t1; > NOTICE: Caution: DROP TABLE cannot be rolled back, so don't abort now Wow, with all due respect, that's pretty sh^H^Hpoor. That's like saying "Haha, either you commit your transaction or your database is fried." Any reason that's not an ERROR before anything destructive is done? > DROP > test=# end; > COMMIT > test=# \d > No relations found. > -- > Tatsuo Ishii > > ************ > > -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut wrote: > > Tatsuo Ishii writes: > > > I see following in HISTORY: > > > > Disallow DROP TABLE/DROP INDEX inside a transaction block > > > > However, it seems that this is not done with current? > > > > test=# create table t1(i int); > > CREATE > > test=# begin; > > BEGIN > > test=# drop table t1; > > NOTICE: Caution: DROP TABLE cannot be rolled back, so don't abort now > > Wow, with all due respect, that's pretty sh^H^Hpoor. That's like saying > "Haha, either you commit your transaction or your database is fried." Any > reason that's not an ERROR before anything destructive is done? > > > DROP > > test=# end; > > COMMIT > > test=# \d > > No relations found. We had an elaborate discussion on this very topic several months ago. What it comes down to is three possible options: 1) Allow DDL statements in transactions. If the transaction aborts, currently, corruption can result. Some DDL statements (such as TRUNCATE) make no sense with respect to ROLLBACK. So, I guess, the idea is that SOME DDL statements will be ROLLBACK-able and some won't - yuck. 2) Disallow DDL statement in transactions. This would break code for people which is working now, only because their transactions are being committed between the time they issue the DDL statement and the COMMIT (or END), instead of aborting and causing their database to become corrupt, or require manual removal of files when the catalogue gets out-of-sync with the filesystem. 3) Implicitly commit the running transaction and begin a new one. Only Vadim and I support this notion, although this is precisely what Oracle does (not that that should define PostgreSQL's behavior, of course). Everyone else, it seems wants to try to implement #1 successfully...(I don't see it happening any time soon). So, as some sort of compromise, a NOTICE was issued. Mike Mascari
On Sun, 5 Mar 2000, Mike Mascari wrote: > 1) Allow DDL statements in transactions. If the transaction > aborts, currently, corruption can result. Some DDL statements ^^^^^^^^^^^^^^^^^^^^^ I think those are the key words. > (such as TRUNCATE) make no sense with respect to ROLLBACK. So, I > guess, the idea is that SOME DDL statements will be ROLLBACK-able > and some won't - yuck. I don't see a problem with disallowing some DDL commands in a transaction as long as they throw an error and the transaction aborts. Users see this and don't do it next time. Sure it's inconsistent but the current state is plain bad, sorry. > 3) Implicitly commit the running transaction and begin a new one. > Only Vadim and I support this notion, although this is precisely > what Oracle does (not that that should define PostgreSQL's > behavior, of course). Everyone else, it seems wants to try to > implement #1 successfully...(I don't see it happening any time > soon). I support that too since it also happens to be SQL's idea more or less. One of these days we'll have to offer this as an option. At least for commands for which #1 doesn't work yet. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
At 07:59 6/03/00 +0100, Peter Eisentraut wrote: >> (such as TRUNCATE) make no sense with respect to ROLLBACK. So, I >> guess, the idea is that SOME DDL statements will be ROLLBACK-able >> and some won't - yuck. > >I don't see a problem with disallowing some DDL commands in a transaction >as long as they throw an error and the transaction aborts. Is it really necessary to abort the TX? Seems a little antisocial - can't you just return an error, and let the user/application decide if it needs to abort? >> 3) Implicitly commit the running transaction and begin a new one. >> Only Vadim and I support this notion, although this is precisely >> what Oracle does (not that that should define PostgreSQL's >> behavior, of course). Everyone else, it seems wants to try to >> implement #1 successfully...(I don't see it happening any time >> soon). > >I support that too since it also happens to be SQL's idea more or less. >One of these days we'll have to offer this as an option. At least for >commands for which #1 doesn't work yet. Do you really mean it when ou say 'Implicitly commit the running transaction'. I would be deeply opposed to this philosophically, if so. No TX should ever be commited unless the user requests it. Just my 0.02c ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > > At 07:59 6/03/00 +0100, Peter Eisentraut wrote: > >> (such as TRUNCATE) make no sense with respect to ROLLBACK. So, I > >> guess, the idea is that SOME DDL statements will be ROLLBACK-able > >> and some won't - yuck. > > > >I don't see a problem with disallowing some DDL commands in a transaction > >as long as they throw an error and the transaction aborts. > > Is it really necessary to abort the TX? Seems a little antisocial - can't > you just return an error, and let the user/application decide if it needs > to abort? > > >> 3) Implicitly commit the running transaction and begin a new one. > >> Only Vadim and I support this notion, although this is precisely > >> what Oracle does (not that that should define PostgreSQL's > >> behavior, of course). Everyone else, it seems wants to try to > >> implement #1 successfully...(I don't see it happening any time > >> soon). > > > >I support that too since it also happens to be SQL's idea more or less. > >One of these days we'll have to offer this as an option. At least for > >commands for which #1 doesn't work yet. > > Do you really mean it when ou say 'Implicitly commit the running > transaction'. I would be deeply opposed to this philosophically, if so. No > TX should ever be commited unless the user requests it. > > Just my 0.02c Philosophically, I agree with you 100%. And apparently, from the previous discussion on this issue, databases like Informix are completely capable of rolling back DDL statements like DROP TABLE, ALTER TABLE RENAME, etc. However, the complexity involved apparently was too much for Oracle: "ORACLE implicitly commits the current transaction before and after every Data Definition Language statement." Its just my feeling that robustness is the number one priority and that the current state is kind of "riding the fence" between ORACLE and Informix. On either side of the fence, it is safe, but in the middle, you risk corruption. Naturally, I'd like to see PostgreSQL on the Informix side of the fence, but I don't see it happening any time soon. And the ORACLE side of the fence is far easier to implement. Or, of course, you could choose Peter's suggestion, and disallow the DDL statement entirely. But as soon as that happened, all those people that begin their .cgi programs with BEGIN and end it with END will say, "Hey, if we can't use DDL statements in transactions, can't we at least do what Oracle does so we don't have to change our code?" Mike Mascari
On Mon, 6 Mar 2000, Philip Warner wrote: > >I don't see a problem with disallowing some DDL commands in a transaction > >as long as they throw an error and the transaction aborts. > > Is it really necessary to abort the TX? Seems a little antisocial - can't > you just return an error, and let the user/application decide if it needs > to abort? I'm afraid yes, it is necessary. Either the whole transaction or none of it. Anything else is opening a can of worms that you can't control unless you have a Ph.D. in fancy databases or something. (Incidentally, I know that a non-zero amount of people around here have one of those, but that won't help the rest of us much. :{ ) -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Mike Mascari <mascarm@mascari.com> writes: > So, as some sort of compromise, a NOTICE was issued. It seems everybody but Mike has forgotten the previous go-round on this issue. I had in fact put in an ERROR for DROP TABLE inside a transaction block, and was beat up for it --- on the very reasonable grounds that it's useful to be able to drop a table and do some other things inside a transaction. Although we can't support rollback-ability for such a transaction right now, we *do* support the atomic nature of such a transaction. It's not reasonable to take away a capability that was available in prior releases just because it's got deficiencies. So the compromise was to issue a NOTICE instead of an ERROR. BTW, we are not *that* far from being able to roll back a DROP TABLE. The only thing that's really needed is for everyone to take a deep breath and let go of the notion that table files ought to be named after the tables. If we named table files after the OIDs of their tables, then rollback-able DROP or RENAME TABLE would be pretty straightforward. If you don't recall why this is, consult the pghackers archives... regards, tom lane
> It seems everybody but Mike has forgotten the previous go-round on > this issue. I had in fact put in an ERROR for DROP TABLE inside a > transaction block, and was beat up for it --- on the very reasonable > grounds that it's useful to be able to drop a table and do some other > things inside a transaction. Although we can't support rollback-ability > for such a transaction right now, we *do* support the atomic nature of > such a transaction. It's not reasonable to take away a capability that > was available in prior releases just because it's got deficiencies. > So the compromise was to issue a NOTICE instead of an ERROR. > > BTW, we are not *that* far from being able to roll back a DROP TABLE. > The only thing that's really needed is for everyone to take a deep > breath and let go of the notion that table files ought to be named > after the tables. If we named table files after the OIDs of their > tables, then rollback-able DROP or RENAME TABLE would be pretty > straightforward. If you don't recall why this is, consult the > pghackers archives... So what was the conclusion for 7.0? > Disallow DROP TABLE/DROP INDEX inside a transaction block We should remove above from HISTORY, no? -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> BTW, we are not *that* far from being able to roll back a DROP TABLE. >> The only thing that's really needed is for everyone to take a deep >> breath and let go of the notion that table files ought to be named >> after the tables. If we named table files after the OIDs of their >> tables, then rollback-able DROP or RENAME TABLE would be pretty >> straightforward. If you don't recall why this is, consult the >> pghackers archives... > So what was the conclusion for 7.0? Too late to consider it for 7.0, I think. I'd like to see it happen in 7.1 or 7.2 or so. >> Disallow DROP TABLE/DROP INDEX inside a transaction block > We should remove above from HISTORY, no? Yes, it's not correct. regards, tom lane
On Tue, Mar 07, 2000 at 02:53:49AM -0500, Tom Lane wrote: > Mike Mascari <mascarm@mascari.com> writes: > > So, as some sort of compromise, a NOTICE was issued. > > > BTW, we are not *that* far from being able to roll back a DROP TABLE. > The only thing that's really needed is for everyone to take a deep > breath and let go of the notion that table files ought to be named > after the tables. If we named table files after the OIDs of their > tables, then rollback-able DROP or RENAME TABLE would be pretty > straightforward. If you don't recall why this is, consult the > pghackers archives... Another data point regarding table filenames: I've been looking into what's needed to support SQL92 schemas. The standard defines a hierarchy of what are essentially scopes for database objects. They are (in order from high to low): Cluster of catalogs -> catalog -> schema -> (DB objects) "Cluster of catalogs" is defined as (section 4.13): Exactly one cluster is associated with an SQL-session and it defines the totality of the SQL-data that is availableto that SQL-session. A catalog is (section 4.12): Catalogs are named collections of schemas in an SQL-environment. An SQL-environment contains zero or more catalogs.A catalog con- tains one or more schemas, but always contains a schema named INFORMATION_SCHEMA that containsthe views and domains of the Information Schema. catalog and schema names show up in the syntax, e.g. the BNF for table names (section 5.4): <table name> ::= <qualified name> <qualified name> ::= [ <schema name> <period>] <qualified identifier> <qualified identifier> ::= <identifier> <schema name> ::= [ <catalog name> <period> ] <unqualified schemaname> Which collapses to (using unique names for the various identifiers): <table name> ::= [ <catalog indentifier> <period> ] [ <schema indentifier> <period> ] <table identifier> and make a fully qualified column name BNF: [ <catalog identifier> <period> ] [ <schema identifier> <period> ] [ <table identifier> <period> ] <column identifier> so: foo.bar.baz.bongo is a well formed column identifier for column bongo of table baz in schema bar in catalog foo. What's all this mean for pgsql? Well, SCHEMA are an Entry SQL requirement. So, the syntax: schema.table needs to be supported. Both schema and catalog define persistent visibilty scopes, and we need to support identical table names in multiple schema. I see two possiblities: 1) Map a pgsql database to a SQL schema. Since we need to support identical table names in multiple schema, it might be tempting to map a pgsql database to a schema. In fact, since Entry SQL requires the syntax: CREATE SCHEMA <schema authorization identifier> And, in practice, the SCHEMA name seems to be equal to the database user name, the pgsql default of creating (and accessing) a DB matching the username implies this mapping. However, that means we need to solve the one backend accessing multiple DBs problem. I have a feeling that there may be 'gotchas' in the current backend code that presume that all the tuples are coming from one DB. 2) Map pgsql DB -> SQL catalog If we do this, the multiDB access problem can be pushed down the road, since cross catalog access (<catalog name> in identifiers) is not even required by Intermediate SQL, only Full SQL. In addition, almost everything about catalogs is 'implemetation defined' so we get to claim them as done. ;-) 2a) However, if a single pgsql database is a catalog, then each DB needs to be able to contain tables in multiple schema, potentially with the identical table names. One solution would be to do what we do for DBs: create seperate subdirs for each schema, and put the table files in there. Changes are probably isolated to the storage manager code, but I haven't looked in detail. 2b) Another possiblity is what Tom has suggested, to solve the DDL statements in a transaction problem: use some other unique identifier for table filenames, perhaps based on OID. Then, supporting schemas means supporting the syntax in the parser, and that's it, I think. This would seem to minimize the changes needed to implement this Entry SQL92 requirement. So, what do y'all think? Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> I see following in HISTORY: > > Disallow DROP TABLE/DROP INDEX inside a transaction block > > However, it seems that this is not done with current? > > test=# create table t1(i int); > CREATE > test=# begin; > BEGIN > test=# drop table t1; > NOTICE: Caution: DROP TABLE cannot be rolled back, so don't abort now > DROP > test=# end; > COMMIT > test=# \d > No relations found. > -- > Tatsuo Ishii OK, seems it is fixed. I will remove the item. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
[Charset ISO-8859-1 unsupported, filtering to ASCII...] > Tatsuo Ishii writes: > > > I see following in HISTORY: > > > > Disallow DROP TABLE/DROP INDEX inside a transaction block > > > > However, it seems that this is not done with current? > > > > test=# create table t1(i int); > > CREATE > > test=# begin; > > BEGIN > > test=# drop table t1; > > NOTICE: Caution: DROP TABLE cannot be rolled back, so don't abort now > > Wow, with all due respect, that's pretty sh^H^Hpoor. That's like saying > "Haha, either you commit your transaction or your database is fried." Any > reason that's not an ERROR before anything destructive is done? I tried it and the ABORT worked, so I have no idea now what is happening here. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> So what was the conclusion for 7.0? > > > Disallow DROP TABLE/DROP INDEX inside a transaction block > > We should remove above from HISTORY, no? Yes removed. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> BTW, we are not *that* far from being able to roll back a DROP TABLE. > The only thing that's really needed is for everyone to take a deep > breath and let go of the notion that table files ought to be named > after the tables. If we named table files after the OIDs of their > tables, then rollback-able DROP or RENAME TABLE would be pretty > straightforward. If you don't recall why this is, consult the > pghackers archives... The oid will be appended to the base file name. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I tried it and the ABORT worked, so I have no idea now what is happening > here. Is the table file still there after the ABORT? If not, it won't work for long... regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> BTW, we are not *that* far from being able to roll back a DROP TABLE. >> The only thing that's really needed is for everyone to take a deep >> breath and let go of the notion that table files ought to be named >> after the tables. If we named table files after the OIDs of their >> tables, then rollback-able DROP or RENAME TABLE would be pretty >> straightforward. If you don't recall why this is, consult the >> pghackers archives... > The oid will be appended to the base file name. If we do it that way, then RENAME TABLE will be kinda complicated... not impossible, but is it worth it? regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I tried it and the ABORT worked, so I have no idea now what is happening > > here. > > Is the table file still there after the ABORT? If not, it won't work > for long... Oh, well. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> BTW, we are not *that* far from being able to roll back a DROP TABLE. > >> The only thing that's really needed is for everyone to take a deep > >> breath and let go of the notion that table files ought to be named > >> after the tables. If we named table files after the OIDs of their > >> tables, then rollback-able DROP or RENAME TABLE would be pretty > >> straightforward. If you don't recall why this is, consult the > >> pghackers archives... > > > The oid will be appended to the base file name. > > If we do it that way, then RENAME TABLE will be kinda complicated... > not impossible, but is it worth it? 100% worth it. Ingres doesn't use table names in the file name, and administration is a mess. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Tue, 07 Mar 2000, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> BTW, we are not *that* far from being able to roll back a DROP TABLE. > >> The only thing that's really needed is for everyone to take a deep > >> breath and let go of the notion that table files ought to be named > >> after the tables. If we named table files after the OIDs of their > >> tables, then rollback-able DROP or RENAME TABLE would be pretty > >> straightforward. If you don't recall why this is, consult the > >> pghackers archives... > > The oid will be appended to the base file name. > If we do it that way, then RENAME TABLE will be kinda complicated... > not impossible, but is it worth it? You know, I really hate to disagree with Bruce, but, Tom, you have a point. IMHO, the benefits of having tables named by OID are going to be numerous -- the schema idea included. Of course, Bruce's concerns are good concerns as well. What to do...... Why not do this: Let the tables be named by OID, and only OID. Then, for admins' convenience, put in a flat file that is updated periodically, similarly to pg_pwd being a flat text dump of pg_shadow. Since there's going to have to be a system table mapping table names to OID's anyway, a flat dump of said system table should be similarly done as pg_pwd. Call it pg_realnames or something. Have it have two columns: OID, and pathname (relative to PGDATA) of table. This would help admins who might want to restore single tables -- as long as they have a snapshot of pg_realnames at the same time each table is dumped, then a restore of pg_realnames into a temp dir, then the actual table can be restored in by its OID (of course, its OID might have changed in the interim, but you would simply restore it on top of the OID that that table now maps to). Besides, the OID for the table itself is not likely to change often. Bruce, would this allay some of your (entirely valid) concerns? (I read the thread about this the first time around, when Vadim said the tbale names _would_ go to OID's.) Just my two cents. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
> > If we do it that way, then RENAME TABLE will be kinda complicated... > > not impossible, but is it worth it? > > You know, I really hate to disagree with Bruce, but, Tom, you have a point. > IMHO, the benefits of having tables named by OID are going to be numerous -- > the schema idea included. Of course, Bruce's concerns are good concerns as > well. What to do...... > > Why not do this: > > Let the tables be named by OID, and only OID. Then, for admins' convenience, > put in a flat file that is updated periodically, similarly to pg_pwd being a > flat text dump of pg_shadow. Since there's going to have to be a system > table mapping table names to OID's anyway, a flat dump of said system table > should be similarly done as pg_pwd. Call it pg_realnames or something. Have > it have two columns: OID, and pathname (relative to PGDATA) of table. > > This would help admins who might want to restore single tables -- as long as > they have a snapshot of pg_realnames at the same time each table is dumped, > then a restore of pg_realnames into a temp dir, then the actual table can be > restored in by its OID (of course, its OID might have changed in the interim, > but you would simply restore it on top of the OID that that table now maps to). > > Besides, the OID for the table itself is not likely to change often. > > Bruce, would this allay some of your (entirely valid) concerns? (I read the > thread about this the first time around, when Vadim said the tbale names > _would_ go to OID's.) I will fight this to my death. :-) I have cursed Ingres every time I needed to look at the Ingres data directory to find out which tables match which files. Even a lookup file is a pain. Right now, I can do ls -l to see which tables are taking disk space. Considering the number of times administrators have to do this, it is a pain. It is only lazy database internals programmers that would advance an oid-only file name concept. FYI, Informix SE uses this the tablename_oid concept in their implementation. Keeping that flat file in sync with the database will be a royal pain. Imagine the concurrency problems keeping it up to date. Guess everyone knows where I stand on this one. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I will fight this to my death. :-) > I have cursed Ingres every time I needed to look at the Ingres data > directory to find out which tables match which files. Even a lookup > file is a pain. Right now, I can do ls -l to see which tables are > taking disk space. I had Ingres also, and found their scheme to be a royal pain. But that was really only because they had such a *bad* schema that I'd have to poke around forever to reconstruct a query which would give me file names and table names. And then I'd have to print that and compare that to the directories which were buried way down in a directory tree. But with Postgres, we can write a utility to do this for us, so I think that it isn't so much of an issue. In fact, perhaps we could have a backend function which could do this, so we could query the sizes directly. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
On Tue, 7 Mar 2000, Lamar Owen wrote: > Let the tables be named by OID, and only OID. Then, for admins' convenience, > put in a flat file that is updated periodically, similarly to pg_pwd being a > flat text dump of pg_shadow. Since there's going to have to be a system > table mapping table names to OID's anyway, a flat dump of said system table > should be similarly done as pg_pwd. Call it pg_realnames or something. Have > it have two columns: OID, and pathname (relative to PGDATA) of table. This I would be against ... I personally hate the whole pg_hba.conf, pg_pwd, etc 'flatfiles' ... But, could there not be some way of 'extracting extended data' from the backend? ie. some sort of \d command that would provide you with tablename+path+disk size+?? Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> > I will fight this to my death. :-) > > I have cursed Ingres every time I needed to look at the Ingres data > > directory to find out which tables match which files. Even a lookup > > file is a pain. Right now, I can do ls -l to see which tables are > > taking disk space. > > I had Ingres also, and found their scheme to be a royal pain. But that > was really only because they had such a *bad* schema that I'd have to > poke around forever to reconstruct a query which would give me file > names and table names. And then I'd have to print that and compare > that to the directories which were buried way down in a directory > tree. > > But with Postgres, we can write a utility to do this for us, so I > think that it isn't so much of an issue. In fact, perhaps we could > have a backend function which could do this, so we could query the > sizes directly. Does not work if the table was accidentally deleted. Also requires the backend to be running. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 21:57 7/03/00 -0500, Lamar Owen wrote: >Let the tables be named by OID, and only OID. Then, for admins' convenience, >put in a flat file that is updated periodically, similarly to pg_pwd being a >flat text dump of pg_shadow. Since there's going to have to be a system >table mapping table names to OID's anyway, a flat dump of said system table >should be similarly done as pg_pwd. Call it pg_realnames or something. Have >it have two columns: OID, and pathname (relative to PGDATA) of table. For the ignorant, are you able to explain why naming files '<table_name>_<IOD>' is not acceptable? This seems to satisfy both requirements (and seemed to be the conclusion of the previous discussion). I presume I have missed something, and assume there is a good reason for the '<IOD>' naming convention, so if that is the final choice, would it be hard to have a file header containing details about the table/index/thing the the file contains and it's OID. In this way, a future pd_dumpfile command can tell us what our backed up file '1FA12347.dat' is supposed to contain? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Bruce Momjian wrote: > I will fight this to my death. :-) > > I have cursed Ingres every time I needed to look at the Ingres data > directory to find out which tables match which files. Even a lookup > file is a pain. Right now, I can do ls -l to see which tables are > taking disk space. Assuming a script "tableoid", is.. ls -l `tableoid foobar` or tableoid | xargs ls -l so bad?
Philip Warner <pjw@rhyme.com.au> writes: > For the ignorant, are you able to explain why naming files > '<table_name>_<IOD>' is not acceptable? This seems to satisfy both > requirements (and seemed to be the conclusion of the previous discussion). Well, it's pretty simple: consider what has to happen to make RENAME TABLE be rollback-able. You clearly have to update the pg_class tuple whose relname field contains the table name. That's no problem, because the normal tuple commit mechanics will take care of making that tuple update visible or not. But, in the current implementation, renaming a table also requires renaming the physical files that hold the table's data --- and last I checked, Unix filesystems don't know anything about Postgres transactions. Our current code renames the files instantly when the table rename command is done, and there isn't any code for undoing that rename. Thus, aborting the xact afterwards fails, because the pg_class entries revert to their pre-xact values, but the physical files don't revert to their prior names. If we change the implementation so that the files are named after the (fixed, never-changed-after-creation) table OID, then RENAME TABLE is no problem: it affects *nothing* except the relname field of the table's pg_class row, and either that row update is committed or it ain't. But if the physical file names contain the logical table name, we have to be prepared to rename those files in sync with the transaction commit that makes the pg_class update valid. Quite aside from any implementation effort involved, the critical point is this: it is *not possible* to ensure that that collection of changes is atomic. At best, we can make the window for failure small. Bruce seems to be willing to accept a window of failure for RENAME TABLE in order to make database admin easier. That is very possibly the right tradeoff --- but it is *not* an open-and-shut decision. We need to talk about it. regards, tom lane
> Bruce Momjian wrote: > > > I will fight this to my death. :-) > > > > I have cursed Ingres every time I needed to look at the Ingres data > > directory to find out which tables match which files. Even a lookup > > file is a pain. Right now, I can do ls -l to see which tables are > > taking disk space. > > Assuming a script "tableoid", is.. > > ls -l `tableoid foobar` > > or > > tableoid | xargs ls -l Give me a reason we don't put the table name in the file name? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 01:54 8/03/00 -0500, Tom Lane wrote: >Philip Warner <pjw@rhyme.com.au> writes: >> For the ignorant, are you able to explain why naming files >> '<table_name>_<IOD>' is not acceptable? This seems to satisfy both >> requirements (and seemed to be the conclusion of the previous discussion). > >Well, it's pretty simple: consider what has to happen to make RENAME >TABLE be rollback-able. ...etc Sorry for the stupid question. I was confusing the previous discussions over 'DROP COLUMN' with this one, without actually engaging my brain. Your response was admirably patient. FWIW, without a 'storage area' or 'table space' concept, I agree that table names based on OID's are TWTG. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
> If we change the implementation so that the files are named after > the (fixed, never-changed-after-creation) table OID, then RENAME > TABLE is no problem: it affects *nothing* except the relname field > of the table's pg_class row, and either that row update is committed > or it ain't. > > But if the physical file names contain the logical table name, we > have to be prepared to rename those files in sync with the transaction > commit that makes the pg_class update valid. Quite aside from any > implementation effort involved, the critical point is this: it is > *not possible* to ensure that that collection of changes is atomic. > At best, we can make the window for failure small. > > Bruce seems to be willing to accept a window of failure for RENAME > TABLE in order to make database admin easier. That is very possibly > the right tradeoff --- but it is *not* an open-and-shut decision. > We need to talk about it. How about creating a hard link during RENAME, and you can just remove the old link on commit or remove the new link on transaction rollback? We can register this in the at_exit processing too if you think it is necessary to clean it up on a backend crash that never gets to an abort, though I think abort is always called. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Can I throw one more question out there on this subject? There's something that I view as inconsistent behavior with respect to DDL statements and MVCC and was wondering if this would have any impact on the discussion (the following is with 6.5.3): Session #1: emptoris=> begin; BEGIN emptoris=> select * from test; value ----- 1 (1 row) Session #2: emptoris=> begin; BEGIN emptoris=> select * from test; value ----- 1 (1 row) Session #1: emptoris=> drop table test; DROP Session #2: emptoris=> select * from test; ERROR: mdopen: couldn't open test: No such file or directory Now it would seem to me that if DROP TABLE is going to be ROLLBACK-able, then Session #2, in a MVCC environment should never see: ERROR: mdopen: couldn't open test: No such file or directory but it does, because the "effect" of the drop table is an action that is seen by all sessions, as though it were "committed". So I am now wondering, are there any Multi-Versioning/Multi-Generational RDBMS that support ROLLBACK-able DDL statements in transactions... Just curious, Mike Mascari
Mike Mascari <mascarm@mascari.com> writes: > Now it would seem to me that if DROP TABLE is going to be > ROLLBACK-able, then Session #2, in a MVCC environment should > never see: > ERROR: mdopen: couldn't open test: No such file or directory Check. We didn't say this worked yet ;-) regards, tom lane
>but it does, because the "effect" of the drop table is an action >that is seen by all sessions, as though it were "committed". So I >am now wondering, are there any >Multi-Versioning/Multi-Generational RDBMS that support >ROLLBACK-able DDL statements in transactions... > Dec/Rdb for one. They do, however, make their lives easier by 'locking the metadata' when a user does a select. This means the 'drop table' would hang until the first user commits. I think it even hangs until the first user exits - basically if they have referenced tha table, you can't touch it until they exit. But they do allow rollback on all DDL statements. They do not allow rollback on 'managment' functions like moving storage areas (where one or more tables are stored) across disks, doing vacuum-like functions etc. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Bruce seems to be willing to accept a window of failure for RENAME >> TABLE in order to make database admin easier. That is very possibly >> the right tradeoff --- but it is *not* an open-and-shut decision. >> We need to talk about it. > How about creating a hard link during RENAME, and you can just remove > the old link on commit or remove the new link on transaction rollback? Still non-atomic as far as I can see... regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Mike Mascari > > Can I throw one more question out there on this subject? > > There's something that I view as inconsistent behavior with > respect to DDL statements and MVCC and was wondering if this > would have any impact on the discussion (the following is with > 6.5.3): > > Session #1: > > emptoris=> begin; > BEGIN > emptoris=> select * from test; > value > ----- > 1 > (1 row) > > Session #2: > > emptoris=> begin; > BEGIN > emptoris=> select * from test; > value > ----- > 1 > (1 row) > > Session #1: > > emptoris=> drop table test; > DROP > > Session #2: > > emptoris=> select * from test; > ERROR: mdopen: couldn't open test: No such file or directory > > Now it would seem to me that if DROP TABLE is going to be > ROLLBACK-able, then Session #2, in a MVCC environment should > never see: > > ERROR: mdopen: couldn't open test: No such file or directory > > but it does, because the "effect" of the drop table is an action > that is seen by all sessions, as though it were "committed". The inconsistency is due the current implementation of DROP TABLE which immediately unlinks the base relation file phisically. Though the definition(i.e pg_class tuple) of test relation still exits (logically),the base relation file doesn't exist. PostgreSQL has a standard mechanism of transaction control for tuples but there's no such mechanism for relation files. Currently even a single DDL command outside transaction doesn't have atomicity. I have really disliked this feature(? bug) for a long time. Flexible mapping from a relation to the relation file name is needed in order to enable transaction control for relation files. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > > Philip Warner <pjw@rhyme.com.au> writes: > > For the ignorant, are you able to explain why naming files > > '<table_name>_<IOD>' is not acceptable? This seems to satisfy both > > requirements (and seemed to be the conclusion of the previous > discussion). > > Well, it's pretty simple: consider what has to happen to make RENAME > TABLE be rollback-able. > Is it necessary to get the relation path name from the relation name/oid etc each time ? Is it bad to keep the relation path name in pg_class(or another relation) ? If a new vessel is needed for copy(etc)ing existent tuples we have to allocate another unique path name otherwise we can use already allocated file name. And is it good to dicide the unique path name from oid/relname etc ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > Is it necessary to get the relation path name from the relation name/oid etc > each time ? > Is it bad to keep the relation path name in pg_class(or another relation) ? Hmm, we could maybe do that for user relations, but it obviously would not work for pg_class itself. I'm a little worried about trying to do it for the other critical system relations, too. We'd want to keep the relation's pathname in its relcache entry, so any system relation that is read while setting up a relcache entry has to have a fixed path that can be determined without a relcache entry. Perhaps it would be good enough to say that all system relations live in the database's primary directory, and only user relations have pathnames specified in their pg_class entries. Renaming a system table would be a Really Bad Idea anyway ;-) regards, tom lane
pgman@candle.pha.pa.us (Bruce Momjian) writes: > I have cursed Ingres every time I needed to look at the Ingres data > directory to find out which tables match which files. Even a lookup > file is a pain. Right now, I can do ls -l to see which tables are > taking disk space. > > Considering the number of times administrators have to do this, it is a > pain. It is only lazy database internals programmers that would advance > an oid-only file name concept. FYI, Informix SE uses this the > tablename_oid concept in their implementation. > > Keeping that flat file in sync with the database will be a royal pain. > Imagine the concurrency problems keeping it up to date. I'm just a lurker here, and not familiar with postgres internals, so this suggestion could easily be so ignorant that it's simply not worth it to even respond. Caveats aside, it occurs to me that a possible compromise might be to name tables files by OID, but put them in directories that are named for the tables themselves. Bruce has to use du -s rather than ls -l, but he can still achieve the same end (measuring space used by particular tables), while allowing the tables to be named by OID, which means that we could get the benefits that accrue from that. This actually occured to me when the whole "tablespace" discussion came up a couple of months ago---by using a different directory for each table, it would suddenly become very easy to split up a database across spindles with what would seem to be at least adequate control. Again, I am sufficiently ignorant of postgres internals---I just read the list, I haven't really looked at the code---that this is probably utterly untenable. Mike.
Tom Lane wrote: > > > If we change the implementation so that the files are named after > the (fixed, never-changed-after-creation) table OID, then RENAME > TABLE is no problem: it affects *nothing* except the relname field > of the table's pg_class row, and either that row update is committed > or it ain't. > > But if the physical file names contain the logical table name, we > have to be prepared to rename those files in sync with the transaction > commit that makes the pg_class update valid. Quite aside from any > implementation effort involved, the critical point is this: it is > *not possible* to ensure that that collection of changes is atomic. > At best, we can make the window for failure small. > How about using hard-links? The transaction that created the change would see the new link along with the new tuple. other transactions would see the old directory and the old tuple. rollback drops the new tuple and the new directory entry. Commit does the obvious. Does WinNT have something similar to a hard link? -- Mark Hollomon mhh@nortelnetworks.com ESN 451-9008 (302)454-9008
On Wed, 8 Mar 2000, Bruce Momjian wrote: > > But with Postgres, we can write a utility to do this for us, so I > > think that it isn't so much of an issue. In fact, perhaps we could > > have a backend function which could do this, so we could query the > > sizes directly. > > Does not work if the table was accidentally deleted. Also requires the > backend to be running. For ppl that aim ourselves at providing for data integrity, we sure have a lot of "if the table was accidentally deleted" problems with poor solutions, no? :) IMHO, we are basically supporting ppl *not* doing regular backups of their data ... most, if not all, of the problems that ppl feel exist that requires the use of 'flat files', IMHO, aren't big problems if properly backup procedures are followed ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker wrote: > IMHO, we are basically supporting ppl *not* doing regular backups of their > data ... most, if not all, of the problems that ppl feel exist that > requires the use of 'flat files', IMHO, aren't big problems if properly > backup procedures are followed ... I suggested the 'flat-file' more as a compromise than anything else. (Although it kindof backfired :-(). Technically speaking, my 'flat-file' is trading the flat-file in the OS's filesystem (the directory) with a separate flat-file. Little to no admin difference from my point of view. The problem that Bruce is talking about occurs when you try to restore (from a properly built off-line binary backup) a single table or small set of tables. It doesn't have anything to do with supporting people who won't do proper backups, IMO. Of course, I personally use on-line pg_dump backups and feed into psql for on-line restore -- which doesn't require knowing anything about the underlying filesystem structures. So, the dichotomy is between those who want to admin at the OS file level versus those who feel the backend should hide all those details from the admin. At least that's how istm. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
On Wed, Mar 08, 2000 at 03:41:17AM -0500, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Bruce seems to be willing to accept a window of failure for RENAME > >> TABLE in order to make database admin easier. That is very possibly > >> the right tradeoff --- but it is *not* an open-and-shut decision. > >> We need to talk about it. > > > How about creating a hard link during RENAME, and you can just remove > > the old link on commit or remove the new link on transaction rollback? > > Still non-atomic as far as I can see... > And there doesn't seem to be an obvious way to extend it to the DROP TABLE case. Hmm, on second thought, to rollback DROP TABLE we'll need to 'hide' the table from the current transaction: one way would be to rename it, then do the drop at commit time. Regardless, since I think there are other, SQL92 standard driven reasons to break the relname == filename link, I decided to go ahead and see how hard coding it would be, and how much code might be depending on that behavior. Looked like it was going to be very simple: the RelationGetRelationName and RelationGetPhysicalRelationName macros encapsulate access to the (relation)->rd_rel->relname structure member pretty effectively (thanks to Bruce's temp. relation work, I presume) As a first crack, I decided to use the oid for the filename, just because it simplified the chamges to the Macro, and there was already an oidout() builtin that'd do the palloc for me ;-) <some time latter...> Well, ... it is, as they say, a Small Matter of Programming. I now know a lot more about the bootstrap process, and the relcache, I can tell you! Most problems where code that used RelationGetPhysicalRelationName when they it should use RelationGetRelationName. In several cases, the code assumed RelationGetPhysicalRelationName handed them a pointer to rd_rel->relname, which they copy into! I substituted RelationGetRelationName for all these cases. There's some uglyness with SharedSystemRelations, as well. I just hacked in hard coded numbers where ever I found hardcoded relation names, for an inital test of principle. I've got a version running, and I can type at a standalone backend: still some problems with initdb: the pg_log, pg_shadow and pg_user relations don't get created: I cheated and copied the first two from my 'current' install. That got the backend up, either standalone, or as postmaster. It'll even accept connections from pgsql: just errors a lot, since pg_user isn't there! However, typing at the backend, I can create tables, insert, delete, start transactions, rollback, etc. Basically, everything works. Suffice to say, altering the physical storage name is not too difficult, if _I_ can get this far in just a few hours. Whatever we decide for 'policy' on the name issue (and now that I've generated it, I can tell you that a directory full of numbers is _really_ ugly) implementation should go easily. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> Looked like it was going to be very simple: the RelationGetRelationName > and RelationGetPhysicalRelationName macros encapsulate access to the > (relation)->rd_rel->relname structure member pretty effectively (thanks > to Bruce's temp. relation work, I presume) Yes. > As a first crack, I decided to use the oid for the filename, just because > it simplified the chamges to the Macro, and there was already an oidout() > builtin that'd do the palloc for me ;-) > > <some time latter...> > > Well, ... it is, as they say, a Small Matter of Programming. I now know > a lot more about the bootstrap process, and the relcache, I can tell you! > > Most problems where code that used RelationGetPhysicalRelationName > when they it should use RelationGetRelationName. In several cases, > the code assumed RelationGetPhysicalRelationName handed them a > pointer to rd_rel->relname, which they copy into! I substituted > RelationGetRelationName for all these cases. Please send in a patch on those if they need to be corrected, OK? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Wed, Mar 08, 2000 at 06:24:35PM -0500, Bruce Momjian wrote: > > Looked like it was going to be very simple: the RelationGetRelationName > > and RelationGetPhysicalRelationName macros encapsulate access to the > > (relation)->rd_rel->relname structure member pretty effectively (thanks > > to Bruce's temp. relation work, I presume) > > Yes. Well, thank you, then ;-) > > > > > Most problems where code that used RelationGetPhysicalRelationName > > when they it should use RelationGetRelationName. In several cases, > > the code assumed RelationGetPhysicalRelationName handed them a > > pointer to rd_rel->relname, which they copy into! I substituted > > RelationGetRelationName for all these cases. > > Please send in a patch on those if they need to be corrected, OK? > Once I'm sure it's the Right Thing To Do, I will. That's probably the only clean part of the ugly hack I've done so far. I've got a complete system up, now. For some reason, the bootstrapping in initdb doesn't create the pg_log (or pg_shadow!) relations, even though the same step on a clean CVS tree does. Can't quite find why. So, the non-bootstrap connections in initdb (creating all the system views) then fail. If I manually copy the pg_log and pg_shadow files over from 'current' to 'hacked', I can then run the code from initdb by hand, and get a fully functional system. I went ahead and ifdefed out the rename() in renamerel(). Low and behold, I can rollback an ALTER TABLE RENAME, and have a concurrent session see the right thing. The conncurent session hangs, though, because of the exclusive lock. Based on comments in that function, I think the lock is still needed to handle the buffer cache, which is indexed by relname. Should probably make that indexed by PhysicalName, since they're disk buffers, after all. Haven't touched DROP TABLE, yet though. My real goal with all this is to now look at the parser, and see how hard it will be to do something with schema. I think that's going to require an other field in the relation structure, to indicate which schema a relation belongs to, then control access based on what the current default schema is. All the relname stuff was just so different schema can have different tables with the same name. ;-) Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian > > > Looked like it was going to be very simple: the RelationGetRelationName > > and RelationGetPhysicalRelationName macros encapsulate access to the > > (relation)->rd_rel->relname structure member pretty effectively (thanks > > to Bruce's temp. relation work, I presume) > > Yes. > > > As a first crack, I decided to use the oid for the filename, > just because > > it simplified the chamges to the Macro, and there was already > an oidout() > > builtin that'd do the palloc for me ;-) > > I object to this proposal. I have been suspicious why mapping algorithm from relations to the relation file names is needed for existent relations. This should be changed first. And pluaral relation file names are needed for a relation oid/relname. Why do you prefer fixed mapping oid/relname --> relation file name ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > > Most problems where code that used RelationGetPhysicalRelationName > > > when they it should use RelationGetRelationName. In several cases, > > > the code assumed RelationGetPhysicalRelationName handed them a > > > pointer to rd_rel->relname, which they copy into! I substituted > > > RelationGetRelationName for all these cases. > > > > Please send in a patch on those if they need to be corrected, OK? > > > > Once I'm sure it's the Right Thing To Do, I will. That's probably > the only clean part of the ugly hack I've done so far. I was just really interested in places where RelationGetPhysicalRelationName() and RelationGetRelationName() where called incorrectly. That can go into 7.0. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026