Thread: Re: RE: [PATCHES] relation filename patch
On Sun, Apr 30, 2000 at 10:18:56PM -0400, Bruce Momjian wrote: > > > If I recognize correctly,RelationGetRelationName() macro returns the > > > relation name local to the backend and RelationGetPhysicalRelation- > > > Name() macro returns pg_class entry name currently. Those names > > > are same unless the relation is a temporary relation. They may be > > > able to have separate entries in pg_class. I don't know why they > > > don't currently. > > > > Different backends can have the same temp file names at the same time, > > so you would have to have a pg_class tuple that is visible only to the > > current transactions, and allow multiple duplicate ones. Easier not to > > have it in pg_class and just hack the syscache code. > > I didn't go to great trouble to keep the existing hack to the syscache code working for temp tables, because I thought that the primary purpose of that hack was to make sure the physical relation name, i.e. the file name, would not collide for multiple backends. Tacking the OID onto the filename fixes that. Getting the right table for the user specific namespace sounds like ... schemas, so I figured I could fix that in the schema implementation. However, I see now that the hack to the syscache actually implements temp table override of an existing persistent relation, as well. Hmm, is this override SQL92, or an extension? It seems to me that TEMPORARY tables are not completely defined in the draft I have. For example, there's syntax defined for ON COMMIT PRESERVE ROWS, but the only semantics seem to be some restrictions on check constraint definitions, not how the temp table rows are supposed to be preserved. Hmm. Further perusal leads me to believe that this override is a standards extension, as the clause about the tablename being unique in the current namespace does not have an exception for temporary tables. Nothing wrong with that, just making it clear. What's the use/case for this feature? Does it come from some other DMBS? > > Second, I trust the patch keeps the on-disk file names similar to the > > table names. Doing some all-numeric file names just to work around NT > > broken-ness is not going to fly. The patch uses Tom Lane's makeObjectName code, to tack the oid on the end of the relname, but keep it under NAMEDATALEN by trimming the relname part. > > I should add I have not seen this patch personally. If it was sent to a > mailing list, I somehow missed it. It went to the patches list, although I'm still not getting mail from there. BTW, it's clearly marked 'experimental, not to include in CVS' I'm not suggesting that this is the way to go: just testing how much the assumption relname == filename has crept up through the code. Not too bad, actually. The bigger problem, for schema implementation, is that the assumption 'relname+dbname' is sufficent to uniquely identify a relation is more pervasive: it's the basis of the relcache hash, isn't it? That's why I changed the key for that hash. I'm not sure I got all the places that make that assumption: in fact, the extra errors I get from the temptable regresion tests strongly imply that I missed a few (attempt to delete non existent relcache entry.) In addition, doing it the way I did does require that all storage managers assure uniqueness of the relphysname across the entire DB. The current patch does that: the oid is unique. Another way to go is to revert that change, but adding the schema to the relcache key. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> Hmm. Further perusal leads me to believe that this override is a > standards extension, as the clause about the tablename being unique in > the current namespace does not have an exception for temporary tables. > Nothing wrong with that, just making it clear. What's the use/case for > this feature? Does it come from some other DMBS? I am quite surprised people don't like that feature, or at least one person doesn't. If someone else creates a table, it should not prevent me from creating a temporary table of the same name. I know Informix doesn't implement it that way, and they complained because a program started not working. Research showed that someone had created a real table with the same name as the temp table. Our code even masks a real table created in the same session. Once the temp table is dropped, the real table becomes visible again. See the regression tests for an example of this. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > Hmm. Further perusal leads me to believe that this override is a > > standards extension, as the clause about the tablename being unique in > > the current namespace does not have an exception for temporary tables. > > Nothing wrong with that, just making it clear. What's the use/case for > > this feature? Does it come from some other DMBS? > > I am quite surprised people don't like that feature, or at least one > person doesn't. If someone else creates a table, it should not prevent > me from creating a temporary table of the same name. > > I know Informix doesn't implement it that way, and they complained Sorry, I should have said "One of my users complained..." > because a program started not working. Research showed that someone had > created a real table with the same name as the temp table. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes: > I didn't go to great trouble to keep the existing hack to the syscache > code working for temp tables, because I thought that the primary purpose > of that hack was to make sure the physical relation name, i.e. the file > name, would not collide for multiple backends. Tacking the OID onto > the filename fixes that. Not good enough --- the logical names of the temp tables (ie, the names they are given in pg_class) can't collide either, or we'll get failures from the unique index on pg_class relname. It could be that schemas will provide a better answer to that point though. I concur with your observation that the semantics we give temp tables are more than the standard requires --- but I also concur with Bruce that it's an exceedingly useful extension. ISTM if an application has to worry about whether its temp table name will collide with something else, then much of the value of the feature is lost. So masking permanent table names is the right thing, just as local vars in subroutines mask global vars of the same name. Again, maybe we can define schemas in such a way that the same effect can be gotten... > The bigger problem, for schema implementation, is that the assumption > 'relname+dbname' is sufficent to uniquely identify a relation is more > pervasive: it's the basis of the relcache hash, isn't it? That's why I > changed the key for that hash. I didn't see what you did here, but I doubt that it was the right thing. There are two things going on: rel name and rel OID must both be unique within a database, but there can be duplicates across databases within an installation. So, most of the backend assumes that rel name or rel OID is alone sufficient to identify a relation, and there are just a couple of places that interface to installation-wide structures (eg, the buffer manager and sinval-queue manager) that know they must attach the current database's name or OID to make the identifier globally unique. This is one reason why a backend can't reconnect to another database on the fly; we've got no way to find all those database-dependent cache entries... It's going to be *extremely* painful if more than one name/OID is needed to refer to a relation in many places in the backend. I think we want to avoid that if possible. It seems to me that what we want is to have schemas control the mapping from rel name to rel OID, but to keep rel OID unique within a database. So schemas are more closely related to the "temp table hack" than you think. regards, tom lane
Then <pgman@candle.pha.pa.us> spoke up and said: > > Hmm. Further perusal leads me to believe that this override is a > > standards extension, as the clause about the tablename being unique in > > the current namespace does not have an exception for temporary tables. > > Nothing wrong with that, just making it clear. What's the use/case for > > this feature? Does it come from some other DMBS? > > I know Informix doesn't implement it that way, and they complained > because a program started not working. Research showed that someone had > created a real table with the same name as the temp table. > > Our code even masks a real table created in the same session. Once the > temp table is dropped, the real table becomes visible again. See the > regression tests for an example of this. Personally, I also like the Ingres table override feature, where if I reference table "foo", Ingres first looks for a table "foo" owned by me, and then one owned by the database owner. I've not explored what happens if neither I nor the DBA owns a "foo". It's also unclear what would happen in that case where multiple other had tables named "foo" and sufficient permits on them to permit my access. -- ===================================================================== | JAVA must have been developed in the wilds of West Virginia. | | After all, why else would it support only single inheritance?? | ===================================================================== | Finger geek@cmu.edu for my public key. | =====================================================================
On Mon, May 01, 2000 at 03:12:44PM -0400, Tom Lane wrote: > "Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes: > > I didn't go to great trouble to keep the existing hack to the syscache > > code working for temp tables, because I thought that the primary purpose > > of that hack was to make sure the physical relation name, i.e. the file > > name, would not collide for multiple backends. Tacking the OID onto > > the filename fixes that. > > Not good enough --- the logical names of the temp tables (ie, the names > they are given in pg_class) can't collide either, or we'll get failures > from the unique index on pg_class relname. It could be that schemas > will provide a better answer to that point though. Well, that unique index will have to go away, one way or another, since schema's require multiple persistent tables with the same undecorated relname in the same db. Adding the schema to the pg_class table is probably where this has to go, then making the index over both fields. Then, temp tables can be implemented as a session specific temp schema. > > I concur with your observation that the semantics we give temp tables > are more than the standard requires --- but I also concur with Bruce > that it's an exceedingly useful extension. ISTM if an application has > to worry about whether its temp table name will collide with something > else, then much of the value of the feature is lost. So masking > permanent table names is the right thing, just as local vars in > subroutines mask global vars of the same name. Again, maybe we can > define schemas in such a way that the same effect can be gotten... Agreed. The masking does require more than just having a special schema for temp tables, though. SO, some version of Bruce's macro hack is still required. > > I didn't see what you did here, but I doubt that it was the right thing. Probably not: this is my first extensive digging into the backend code. And the whole reason this is a 'lets try and implement somne of this, and go back to the discussion' patch, instead of a proposed addition. In defense of what I _did_: The temp table relname hacking is still in place, and seems to work, and could be left in place. However, I knew that relname would not stay unique, once schema are implemented, but physrelname would (since the smgr needs it). > There are two things going on: rel name and rel OID must both be unique > within a database, but there can be duplicates across databases within > an installation. So, most of the backend assumes that rel name or > rel OID is alone sufficient to identify a relation, and there are just > a couple of places that interface to installation-wide structures > (eg, the buffer manager and sinval-queue manager) that know they must > attach the current database's name or OID to make the identifier > globally unique. This is one reason why a backend can't reconnect to > another database on the fly; we've got no way to find all those > database-dependent cache entries... > Right. But it seems to me that everywhere the code uses just a relname, it assumes the currently connected DB, when eventually gets to the reference by the code that interfaces to the installation-wide structures, no? Once schema are in place, this can no longer work. One hack would be to have a 'current schema', like current db, but that won't work, one query will need to refery to schema other than the default for table references (otherwise, they're not very useful, are they?) > It's going to be *extremely* painful if more than one name/OID is needed > to refer to a relation in many places in the backend. I think we want > to avoid that if possible. > > It seems to me that what we want is to have schemas control the mapping > from rel name to rel OID, but to keep rel OID unique within a database. > So schemas are more closely related to the "temp table hack" than you > think. > So, relname cannot be enough. Isn't OID already sufficent though? I thought oids are unique across the entire installation, not just a particular db. In any case, the solution may be to convert relname (+default or user supplied schema) to rel oid, as early as possible, then indexing (and caching) on that. If oids are db specific, the code that needs system wide uniqueness needs to add the db name/OID, just like it does for relname. Actually, I know that schemas are _closely_ related to the temp table hack (which was Bruce's own word for it, BTW, not mine;-), hence my hijacking of his RelationGetRelationName and RelationGetPhysicalRelationName macros. I was hoping to get temp tables back (mostly) for free once schemas are in, but I think the extended semantics will require leaving something like the existing code in place. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes: > Well, that unique index will have to go away, one way or another, since > schema's require multiple persistent tables with the same undecorated > relname in the same db. Adding the schema to the pg_class table is > probably where this has to go, then making the index over both fields. > Then, temp tables can be implemented as a session specific temp schema. Right, there would probably be a unique index on schema name + relname. > Right. But it seems to me that everywhere the code uses just a relname, it > assumes the currently connected DB, when eventually gets to the reference > by the code that interfaces to the installation-wide structures, no? Once > schema are in place, this can no longer work. One hack would be to have a > 'current schema', like current db, but that won't work, one query will > need to refery to schema other than the default for table references > (otherwise, they're not very useful, are they?) I was sort of envisioning a search path of schema names. Temp table masking could be implemented by pushing the session-local schema onto the front of the search path. Not sure how that relates to SQL3's ideas about schemas, however. > So, relname cannot be enough. Isn't OID already sufficent though? I > thought oids are unique across the entire installation, not just a > particular db. Er, well, no. Consider pg_class, which has both the same name and the same OID in every DB in the installation --- but we have to treat it as separately instantiated in each DB. Most of the system tables work that way. OTOH we have a couple of top-level tables like pg_shadow, which are the same table for all DBs in the installation. It could be that we ought to eliminate the whole notion of separate databases within an installation, or more accurately merge it with the concept of schemas. Really, the existing database mechanism is sort of a limited implementation of schemas. > In any case, the solution may be to convert relname > (+default or user supplied schema) to rel oid, as early as possible, > then indexing (and caching) on that. Right, but there wouldn't be only one default schema, there'd be some kind of search path in which an unqualified relname would be sought. regards, tom lane
On Mon, May 01, 2000 at 05:27:04PM -0400, Tom Lane wrote: > > I was sort of envisioning a search path of schema names. Temp table > masking could be implemented by pushing the session-local schema onto > the front of the search path. Not sure how that relates to SQL3's ideas > about schemas, however. > > > So, relname cannot be enough. Isn't OID already sufficent though? I > > thought oids are unique across the entire installation, not just a > > particular db. > > Er, well, no. Consider pg_class, which has both the same name and the > same OID in every DB in the installation --- but we have to treat it > as separately instantiated in each DB. Most of the system tables work > that way. OTOH we have a couple of top-level tables like pg_shadow, > which are the same table for all DBs in the installation. > Well, in schema-land this taken care of by the fact that all those system tables live in a schema named information_schema, which is defined as view on tables in the schema definition_schema. To some extent, our use of pg_ for all the system tables simulates this. > It could be that we ought to eliminate the whole notion of separate > databases within an installation, or more accurately merge it with the > concept of schemas. Really, the existing database mechanism is sort > of a limited implementation of schemas. > See the discussion about this between Peter and I (and Jan?) last time schemas came up. We agreed that pg's databases map to SQL92 Catalogs rather nicely, with the whole installation being a 'cluster of catalogs'. Now, If some one can explain to me what a 'module' is ... > > In any case, the solution may be to convert relname > > (+default or user supplied schema) to rel oid, as early as possible, > > then indexing (and caching) on that. > > Right, but there wouldn't be only one default schema, there'd be some > kind of search path in which an unqualified relname would be sought. > Perhaps, but that's an extension. SQL92 defines a default SCHEMA for a session, which is set via the SET SCHEMA statement, strangely enough. Having nesting SCHEMA's might be useful, but I'm not not sure how. Getting any at all in there would help a lot. I'd suggest the default be configurable on a per user basis. That'd allow some nifty access controls, with just the existing VIEW code. Turns out I needed these kind of schema's last week: I wanted to create filtered access to a set of tables. Simple, right? Add booleans, create views that test the boolean, remove SELECT privilege on the tables. Only problem, is now I had to go and edit all 200+ SELECT statements in the application code, to point at the views instead of the tables, or rename every table, and edit the 200+ SELECTs in the other apps, that do data entry and maintenance. If I had schema, I'd have changed the default schema for the 'web' login, and created views that had the same name as the tables, selecting back into the real tables in their own schema. 10 mins, and done, don't touch the tested app. code at all! That's what got me a round-toit, getting this patch off to be discussed. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
> -----Original Message----- > From: Ross J. Reedstrom [mailto:reedstrm@wallace.ece.rice.edu] > > > > I didn't see what you did here, but I doubt that it was the right thing. > > Probably not: this is my first extensive digging into the backend code. > And the whole reason this is a 'lets try and implement somne of this, > and go back to the discussion' patch, instead of a proposed addition. > > In defense of what I _did_: The temp table relname hacking is still > in place, and seems to work, and could be left in place. Yes,pararell regression tests all pass here if relacache hashes on pg_class entry name. > However, I > knew that relname would not stay unique, once schema are implemented, > but physrelname would (since the smgr needs it). > It is dangerous to combine logical and physical concepts. So it seems difficult to use physrelname both as a storage location and as an unique relation name. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Bruce Momjian writes: > > Hmm. Further perusal leads me to believe that this override is a > > standards extension, > I am quite surprised people don't like that feature, or at least one > person doesn't. FWIW, I always considered that behaviour kind of suspicious. I don't think it's a big problem in practice and could be useful in certain situations, but personally I would have chosen not to do it this way. > If someone else creates a table, it should not prevent me from > creating a temporary table of the same name. The proper solution to this is that "somebody" should not be allowed to create tables wildly. Until we can properly do that it's not worth arguing this out. > Our code even masks a real table created in the same session. Once > the temp table is dropped, the real table becomes visible again. See > the regression tests for an example of this. The real problem here is that there's no way of finding out whether you just dropped the temporary table or the "real" one or whether a table is temporary at all. Sure you can perhaps look into pg_class but tell that to users. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> > Our code even masks a real table created in the same session. Once > > the temp table is dropped, the real table becomes visible again. See > > the regression tests for an example of this. > > The real problem here is that there's no way of finding out whether you > just dropped the temporary table or the "real" one or whether a table is > temporary at all. Sure you can perhaps look into pg_class but tell that to > users. Do a \d on the table. If it doesn't show up, it is temporary. ;-) -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Tue, 2 May 2000, Bruce Momjian wrote: > Do a \d on the table. If it doesn't show up, it is temporary. ;-) But if the temporary is shadowing a permanent table then I know less than nothing. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden