Thread: Re: [GENERAL] Question about databases in alternate locations...
Thomas Lockhart writes: > Peter E (if I recall right) was proposing some changes to remove the > environment variable capabilities in Postgres. He also proposed making > a *list* of allowed locations as an environment variable as a way of > managing or controlling the allowed locations. That was an interesting line of thought until the system catalog idea came up. I believe everyone would agree that keeping things system catalog controlled is the generally preferred choice. If you create a system catalog pg_location(locname name, locpath text) then you still have in fact a list of allowed locations, but one that can be changed while the server is up, that can be queried, that can easily be joined against pg_database, etc. Heck, finely grained permissions are the next logical step. Table spaces are another point of consideration. Surely you would eventually want table space administration to be via query language commands. In essence, the alternative locations are a table space kind of thingy. The only difference is that the granularity of control stops at the database level, but that's only a difference of degree, not kind. In fact, if someone comes around to reworking the logical->physical relation name mapping then you could add a field pg_class.rellocation and voilà, there's your table spaces. So all in all I do like the system catalog driven model much better in terms of ease of use, functionality, extensibility, everything. And no, there's no chicken-and-egg problem because the relation name mapping for shared system relations would presumably not be changed. (How would that work anyway?) > Putting all of this stuff in a table is a possibility, but > 1) Ingres did this, but they had way too many tables involved in > defining and using tables imho. We should do better. Well, so far we'd have one table. Is there any reason why we would need more? Why did they have so many? I don't mind many tables if they give more functionality. > 2) If a dbadmin wants to *carefully* move database locations around, > the environment variables allow this to happen by just shutting down > the backend, tarring/untarring a disk area, redefining the environment > variable, and restarting the backend. 1. shut down database 2. move data area 3. connect to template1 4. update pg_location 5. connect to the moved database That's not very different. > 3) We don't (yet) have a way to move tables from within Postgres. So > hardcoding or "hard storing" absolute paths would make it pretty > difficult to accomplish (2). I don't know what you mean with "hard storing". All in all this might be a relatively small job for great immediate and future benefit. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> > Peter E (if I recall right) was proposing some changes to remove the > > environment variable capabilities in Postgres. He also proposed making > > a *list* of allowed locations as an environment variable as a way of > > managing or controlling the allowed locations. > That was an interesting line of thought until the system catalog idea came > up. I believe everyone would agree that keeping things system catalog > controlled is the generally preferred choice. If you create a system > catalog pg_location(locname name, locpath text) then you still have in > fact a list of allowed locations, but one that can be changed while the > server is up, that can be queried, that can easily be joined against > pg_database, etc. Heck, finely grained permissions are the next logical > step. So pg_location would hold the full path (absolute or logical) to every file resource in every database? Or would it hold only a list of allowed paths? Or only a list of resources for each database (~1 row per database) and then table-specific info would be stored somewhere local to the database itself? > Table spaces are another point of consideration. Surely you would > eventually want table space administration to be via query language > commands. In essence, the alternative locations are a table space kind of > thingy. The only difference is that the granularity of control stops at > the database level, but that's only a difference of degree, not kind. In > fact, if someone comes around to reworking the logical->physical relation > name mapping then you could add a field pg_class.rellocation and voilà, > there's your table spaces. Yes, this capability will be great. ALTER TABLE SET LOCATION=... and/or ALTER DATABASE SET LOCATION=... should help administration and scalability. > So all in all I do like the system catalog driven model much better in > terms of ease of use, functionality, extensibility, everything. And no, > there's no chicken-and-egg problem because the relation name mapping for > shared system relations would presumably not be changed. (How would that > work anyway?) > > Putting all of this stuff in a table is a possibility, but > > 1) Ingres did this, but they had way too many tables involved in > > defining and using tables imho. We should do better. > Well, so far we'd have one table. Is there any reason why we would need > more? Why did they have so many? I don't mind many tables if they give > more functionality. I have no idea why they had so many. Probably because it grew incrementally, or possibly because they normalized their tables to the theoretically correct point. It was ugly either way (right Bruce?). > > 2) If a dbadmin wants to *carefully* move database locations around, > > the environment variables allow this to happen by just shutting down > > the backend, tarring/untarring a disk area, redefining the environment > > variable, and restarting the backend. > 1. shut down database > 2. move data area > 3. connect to template1 > 4. update pg_location > 5. connect to the moved database > That's not very different. But hard to do? If pg_location has 5000 entries, and you've scattered tables all over the place (perhaps a bad decision, but we *should* have the flexibility to do that) then it might be very error prone when working with absolute paths imho. > > 3) We don't (yet) have a way to move tables from within Postgres. So > > hardcoding or "hard storing" absolute paths would make it pretty > > difficult to accomplish (2). > I don't know what you mean with "hard storing". Putting absolute path names as pointers to tables or data areas. I'm getting the sense I'm in a minority (in a group of 3? ;) in this discussion, but imho having some decoupling between logical paths in the database and actual paths outside is A Good Thing. Always has been a mark of good design in my experience. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart writes: > So pg_location would hold the full path (absolute or logical) to every > file resource in every database? Or would it hold only a list of > allowed paths? Or only a list of resources for each database (~1 row > per database) and then table-specific info would be stored somewhere > local to the database itself? > Is a list of allowed paths really necessary? If initlocation has already been run so a directory tree with the proper structure and permissions exists there'd be no new security hole (ie, I couldn't ask the backend to create a database on any arbitrary partition; only one that's already been prepared by the administrator). I'd like to see a list of resources per database, with any table-specific info stored locally. > ALTER TABLE SET LOCATION=... > and/or > ALTER DATABASE SET LOCATION=... > should help administration and scalability. > Definitely. Of course, I'd want to make sure any new LOCATION had been prepared by the administrator. > But hard to do? If pg_location has 5000 entries, and you've scattered > tables all over the place (perhaps a bad decision, but we *should* > have the flexibility to do that) then it might be very error prone > when working with absolute paths imho. > I'd think that a pg_location entry wouldn't be necessary for the majority of tables -- the default location would be just like it is now, under the database directory. Creating a database directory in one place and scattering the tables all over creation would definitely be a Bad Decision, IMHO, but it would be doable. > Putting absolute path names as pointers to tables or data areas. I'm > getting the sense I'm in a minority (in a group of 3? ;) in this > discussion, but imho having some decoupling between logical paths in > the database and actual paths outside is A Good Thing. Always has been > a mark of good design in my experience. > How about requiring an absolute path for the data(base) area, and allowing relative paths for the tables? Actually, if you want ALTER DATABASE SET LOCATION=... to move tables, you'd either have to require relative paths for the tables or ignore tables that have absolute paths, right? Hmm. And all I originally wanted was an easier way to create a database in an alternate location :-). - Rich -- Richard Kuhns rjk@grauel.com PO Box 6249 Tel: (765)477-6000 \ 100 Sawmill Road x319 Lafayette, IN 47903 (800)489-4891 /
Thomas Lockhart writes: > So pg_location would hold the full path (absolute or logical) to every > file resource in every database? Or would it hold only a list of > allowed paths? The way I imagined it it would hold data like this: locname | locpath ----------------+------------------- alt1 | /mnt/foo/db joes alt store | /home/joe/storage When I create a database I would then do CREATE DATABASE "my_db" WITH LOCATION = "alt1"; which would place the database at /mnt/foo/db/data/base/my_db. Then if I create another that I want at the same place I do CREATE DATABASE "another" WITH LOCATION = "alt1";. pg_database would presumably contain a reference to pg_location.oid instead of the current datpath attribute. So one could say I'm really just normalizing pg_database. In some future life you might be able to do CREATE TABLE xxx (...) WITH LOCATION = "joes alt store" but then we'd have to think about how to resolve the path. One idea would be to get rid of per-database subdirectories and just store all heap files in one directory, but I'm sure Bruce would hate that. :) But that's another day's story. So yes, it is a list of allowed locations associated with freely choosable descriptive names. Environment variables do essentially provide a similar service but I find this much more administration friendly and flexible. (E.g., "What sort of stuff is being stored at /var/abc/def?" -- use a query) > > 1. shut down database > > 2. move data area > > 3. connect to template1 > > 4. update pg_location > > 5. connect to the moved database > > That's not very different. > > But hard to do? ALTER LOCATION "name" SET PATH TO '/new/path';? (Alternatively, use update pg_location set locpath='/new/path' where locname='name'.) That isn't any harder than setting environment variables. It might in fact be easier. > but imho having some decoupling between logical paths in the database > and actual paths outside is A Good Thing. Always has been a mark of > good design in my experience. Sure, that's exactly what this would provide. locname is the logical name of the "storage location", locpath is the physical path. It's just a matter of whether you maintain that information in environment variables (which might get unset, forgotten, require postmaster shutdown, are subject to certain rules we don't control) or in the database (which comes with all the conveniences you might imagine). -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden