Thread: Thanks, naming conventions, and count()
First off I just wanted to give a big 'thank you' to all the developers and contributors who have made PostgreSQL what it is today. I haven't come across a single thing since my first experience with it a few years ago that hasn't been corrected, sped up, or otherwise postively enhanced! In working with 7.1 over the past couple weeks, I've noted the following mods may add to usability and speed: o v7.1 changed the database naming convention to be all numeric; I suggest having the DB engine create symbolic links whencreating a new DB and subsequent tables. For instance, in creating a database 'foo' with table 'bar' the /path/to/pgsql/data/base folder will have a new folder named something like '18720'; this folder could also have a symboliclink to 'foo'. Then in the '18720' folder rather than just having numeric files for each table, pk, index, etc. there could be symbolic links following the naming convention 'bar', 'pk_foo_pkey', 'field1_foo_ukey', 'field2_foo_key'. Maybe this would work best as configurable flag that could be set during compilation or in the conf file. o count() should use index scans for tables with a PK; scans would be on the PK index; even after running 'vacuum analyze'such a query still uses a sequential scan. For instance, "select count(*) from bar" and even "select(pk_name) frombar" both use sequential scans. Likewise, scans on fields with indexes should use the index. I hope this input is useful; keep up the excellent work, Casey Lyon Systems Engineer Earthcars.com, Inc www.earthcars.com casey@earthcars.com
> First off I just wanted to give a big 'thank you' to all the > developers and contributors who have made PostgreSQL what it is > today. I haven't come across a single thing since my first > experience with it a few years ago that hasn't been corrected, > sped up, or otherwise postively enhanced! > > In working with 7.1 over the past couple weeks, I've noted the > following mods may add to usability and speed: > > o v7.1 changed the database naming convention to be all numeric; > I suggest having > the DB engine create symbolic links when creating a new DB > and subsequent tables. For instance, in creating a database > 'foo' with table 'bar' the /path/to/pgsql/data/base folder > will have a new folder named something like '18720'; this > folder could also have a symbolic link to 'foo'. Then in the > '18720' folder rather than just having numeric files for each > table, pk, index, etc. there could be symbolic links > following the naming convention 'bar', 'pk_foo_pkey', > 'field1_foo_ukey', 'field2_foo_key'. > > Maybe this would work best as configurable flag that could > be set during compilation or in the conf file. I think this is an excellent idea, and will add it to the TODO list. We agonized over moving to numeric names, and we couldn't think of a good way to allow administrators to know that table matched what files. The big problem is that there is no good way to make the symlinks reliable because in a crash, the symlink could point to a table creation that got rolled back or the renaming of a table that got rolled back. I think symlinks with some postmaster cleanup script that fixed bad symlinks would be great for 7,2. I have added this to the TODO list. If someone objects, I will remove it: * Add tablename symlinks for numeric file names > > o count() should use index scans for tables with a PK; scans > would be on the PK index; > even after running 'vacuum analyze' such a query still uses > a sequential scan. For instance, "select count(*) from bar" > and even "select(pk_name) from bar" both use sequential scans. > Likewise, scans on fields with indexes should use the index. The problem here is that now we don't have commit status in the index rows, so they have to check the heap for every row. One idea is to update the index status on an index scan, and if we can do that, we can easily use the index. However, the table scan is pretty quick. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
doesn't this defeat the reasons for going to numerics? is there a reason why its such a difficult thing to do a SELECT oid on pg_database and pg_class to get this information? that's what I've been doing when I need to know *shrug* On Sun, 29 Apr 2001, Bruce Momjian wrote: > > First off I just wanted to give a big 'thank you' to all the > > developers and contributors who have made PostgreSQL what it is > > today. I haven't come across a single thing since my first > > experience with it a few years ago that hasn't been corrected, > > sped up, or otherwise postively enhanced! > > > > In working with 7.1 over the past couple weeks, I've noted the > > following mods may add to usability and speed: > > > > o v7.1 changed the database naming convention to be all numeric; > > I suggest having > > the DB engine create symbolic links when creating a new DB > > and subsequent tables. For instance, in creating a database > > 'foo' with table 'bar' the /path/to/pgsql/data/base folder > > will have a new folder named something like '18720'; this > > folder could also have a symbolic link to 'foo'. Then in the > > '18720' folder rather than just having numeric files for each > > table, pk, index, etc. there could be symbolic links > > following the naming convention 'bar', 'pk_foo_pkey', > > 'field1_foo_ukey', 'field2_foo_key'. > > > > Maybe this would work best as configurable flag that could > > be set during compilation or in the conf file. > > I think this is an excellent idea, and will add it to the TODO list. We > agonized over moving to numeric names, and we couldn't think of a good > way to allow administrators to know that table matched what files. The > big problem is that there is no good way to make the symlinks reliable > because in a crash, the symlink could point to a table creation that got > rolled back or the renaming of a table that got rolled back. I think > symlinks with some postmaster cleanup script that fixed bad symlinks > would be great for 7,2. > > I have added this to the TODO list. If someone objects, I will remove > it: > > * Add tablename symlinks for numeric file names > > > > > o count() should use index scans for tables with a PK; scans > > would be on the PK index; > > even after running 'vacuum analyze' such a query still uses > > a sequential scan. For instance, "select count(*) from bar" > > and even "select(pk_name) from bar" both use sequential scans. > > Likewise, scans on fields with indexes should use the index. > > The problem here is that now we don't have commit status in the index > rows, so they have to check the heap for every row. One idea is to > update the index status on an index scan, and if we can do that, we can > easily use the index. However, the table scan is pretty quick. > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
> > doesn't this defeat the reasons for going to numerics? is there a reason > why its such a difficult thing to do a SELECT oid on pg_database and > pg_class to get this information? that's what I've been doing when I need > to know *shrug* Yes, but you can't do that if you can't start the database or can't connect for some reason. If people don't think it is worthwhile, we can delete the TODO item. For example, when someone has trouble figuring out which directory is which database, they can just ls and look at the symlinks. Seems like a nice feature. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > > > doesn't this defeat the reasons for going to numerics? is there a reason > > why its such a difficult thing to do a SELECT oid on pg_database and > > pg_class to get this information? that's what I've been doing when I need > > to know *shrug* > > Yes, but you can't do that if you can't start the database or can't > connect for some reason. If people don't think it is worthwhile, we can > delete the TODO item. > > For example, when someone has trouble figuring out which directory is > which database, they can just ls and look at the symlinks. Seems like a > nice feature. I will admit we are not getting flooded with problems due to the new numeric file names like I thought we would, so maybe it is not worth the symlinks. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sun, 29 Apr 2001, Bruce Momjian wrote: > > > > doesn't this defeat the reasons for going to numerics? is there a reason > > why its such a difficult thing to do a SELECT oid on pg_database and > > pg_class to get this information? that's what I've been doing when I need > > to know *shrug* > > Yes, but you can't do that if you can't start the database or can't > connect for some reason. If people don't think it is worthwhile, we can > delete the TODO item. Okay, what does being able to ls the directory give you if you can't start the database? the only thing I do it for is to figure out whicih tables are taking up so much disk space, or which databases ... > For example, when someone has trouble figuring out which directory is > which database, they can just ls and look at the symlinks. Seems like > a nice feature. Ya, but I thought that the reason for going numeric had to do with being transaction safe ... something about being able to safely RENAME a table, if my recollection remotely comes close ... as soon as you start throwing around symlinks, do we break that once more? what about if someone wants to physically move a table to a seperate file system, which is something that has been suggested as a way around the fact that all files are in the same subdirectory? You have a symlink to the symlink? I don't know the answers to these questions, which is why I'm asking them ... if this is something safe to do, and doesn't break us again, then sounds like a good idea to me too ...
> On Sun, 29 Apr 2001, Bruce Momjian wrote: > > > > > > > doesn't this defeat the reasons for going to numerics? is there a reason > > > why its such a difficult thing to do a SELECT oid on pg_database and > > > pg_class to get this information? that's what I've been doing when I need > > > to know *shrug* > > > > Yes, but you can't do that if you can't start the database or can't > > connect for some reason. If people don't think it is worthwhile, we can > > delete the TODO item. > > Okay, what does being able to ls the directory give you if you can't start > the database? the only thing I do it for is to figure out whicih tables > are taking up so much disk space, or which databases ... Yes, it is just for admin convenience, and if you pull back a database from a tar backup, you can know which files are which without starting the database. > > > For example, when someone has trouble figuring out which directory is > > which database, they can just ls and look at the symlinks. Seems like > > a nice feature. > > Ya, but I thought that the reason for going numeric had to do with being > transaction safe ... something about being able to safely RENAME a table, > if my recollection remotely comes close ... as soon as you start throwing > around symlinks, do we break that once more? what about if someone wants > to physically move a table to a seperate file system, which is something > that has been suggested as a way around the fact that all files are in the > same subdirectory? You have a symlink to the symlink? > > I don't know the answers to these questions, which is why I'm asking them > ... if this is something safe to do, and doesn't break us again, then > sounds like a good idea to me too ... I was suggesting the symlinks purely for admin convenience. The database would use only the numeric names. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sun, 29 Apr 2001, Bruce Momjian wrote: > > I don't know the answers to these questions, which is why I'm asking them > > ... if this is something safe to do, and doesn't break us again, then > > sounds like a good idea to me too ... > > I was suggesting the symlinks purely for admin convenience. The database > would use only the numeric names. Except that the database would have to maintain those links ... now you've given something ppl are relying on being there, but, for some reason, a symlink wasn't created, so they think their table doesn't exist? I can even think of a situation, as unlikely as it can be, where this could happen ... run out of inodes on the file system ... last inode used by the table, no inode to stick the symlink onto ... its a remote situation, but I've personally had it happen ... I'd personally prefer to see some text file created in the database directory itself that contains the mappings ... so that each time there is a change, it just redumps that data to the dext file ... less to maintain overall ...
> I can even think of a situation, as unlikely as it can be, where this > could happen ... run out of inodes on the file system ... last inode used > by the table, no inode to stick the symlink onto ... If you run out of inodes, you are going to have much bigger problems than symlinks. Sort file creation would fail too. > > its a remote situation, but I've personally had it happen ... > > I'd personally prefer to see some text file created in the database > directory itself that contains the mappings ... so that each time there is > a change, it just redumps that data to the dext file ... less to maintain > overall ... Yes, I like that idea, but the problem is that it is hard to update just one table in the file. You sort of have to update the entire file each time a table changes. That is why I liked symlinks because they are per-table, but you are right that the symlink creation could fail because the new table file was never created or something, leaving the symlink pointing to nothing. Not sure how to address this. Is there a way to update a flat file when a single table changes? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
* Bruce Momjian <pgman@candle.pha.pa.us> [010429 20:14] wrote: > Yes, I like that idea, but the problem is that it is hard to update just > one table in the file. You sort of have to update the entire file each > time a table changes. That is why I liked symlinks because they are > per-table, but you are right that the symlink creation could fail > because the new table file was never created or something, leaving the > symlink pointing to nothing. Not sure how to address this. Is there a > way to update a flat file when a single table changes? Sort of, if that flat file is in the form of: 123456;"tablename " 000033;"another_table " ie, each line is a fixed length. -- -Alfred Perlstein - [alfred@freebsd.org] Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/
> * Bruce Momjian <pgman@candle.pha.pa.us> [010429 20:14] wrote: > > > Yes, I like that idea, but the problem is that it is hard to update just > > one table in the file. You sort of have to update the entire file each > > time a table changes. That is why I liked symlinks because they are > > per-table, but you are right that the symlink creation could fail > > because the new table file was never created or something, leaving the > > symlink pointing to nothing. Not sure how to address this. Is there a > > way to update a flat file when a single table changes? > > Sort of, if that flat file is in the form of: > 123456;"tablename " > 000033;"another_table " > > ie, each line is a fixed length. > Yea, after I posted, I realized that using a fixed length line would solve the problem. The larger problem, though, I think, is concurrency. Can multiple backends update that single flat file reliably? I suppose they could do append-only to the file, and you could grab the last entry, but again, sometimes it is rolled back, so I think there has to be a way to clean it up. Of course, Tom or Vadim may come along and say this is a stupid idea, and we would be done discussing it. :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sun, 29 Apr 2001, Bruce Momjian wrote: > > I can even think of a situation, as unlikely as it can be, where this > > could happen ... run out of inodes on the file system ... last inode used > > by the table, no inode to stick the symlink onto ... > > > If you run out of inodes, you are going to have much bigger problems > than symlinks. Sort file creation would fail too. > > > > > its a remote situation, but I've personally had it happen ... > > > > I'd personally prefer to see some text file created in the database > > directory itself that contains the mappings ... so that each time there is > > a change, it just redumps that data to the dext file ... less to maintain > > overall ... > > Yes, I like that idea, but the problem is that it is hard to update just > one table in the file. You sort of have to update the entire file each > time a table changes. That is why I liked symlinks because they are > per-table, but you are right that the symlink creation could fail > because the new table file was never created or something, leaving the > symlink pointing to nothing. Not sure how to address this. Is there a > way to update a flat file when a single table changes? Why not just dump the whole file? That way, if a previosu dump failed for whatever reason, the new dump would correct that omission ... Then again, why not some sort of 'lsdb' command that looks at where it is and gives you info as appropriate? if in data/base, then do a connect to template1 using postgres so that you can dump and parse the raw data from pg_database ... if in a directory, you should be able to connect to that database in a similar way to grab the contents of pg_class ... no server would need to be running for this to work, and if it was readonly, it should be workable if a server is running, no?
2 points: - I thought that a big part of the reason we got rid of filenames was so we would use arbitrary table / db names that were not restricted by the file system / OS. Using links would then return this restriction. - What is the format for the table? Could we write a tool that can read the tables raw in case of a 'HARD' crash? One could then walk the table by hand as need. Can someone give me information on the schema for the files? i'll take a look at it. There may also be a way to then use WAL files to do some more serious recovery. Thoughts? - brandon
> > Yes, I like that idea, but the problem is that it is hard to update just > > one table in the file. You sort of have to update the entire file each > > time a table changes. That is why I liked symlinks because they are > > per-table, but you are right that the symlink creation could fail > > because the new table file was never created or something, leaving the > > symlink pointing to nothing. Not sure how to address this. Is there a > > way to update a flat file when a single table changes? > > Why not just dump the whole file? That way, if a previosu dump failed for > whatever reason, the new dump would correct that omission ... Yes, you can do that, but it is only updated during a dump, right? Makes it hard to use during the day, no? > > Then again, why not some sort of 'lsdb' command that looks at where it is > and gives you info as appropriate? I want to do that for oid2name. I had the plan layed out, but never got to it. > > if in data/base, then do a connect to template1 using postgres so that you > can dump and parse the raw data from pg_database ... if in a directory, > you should be able to connect to that database in a similar way to grab > the contents of pg_class ... > > no server would need to be running for this to work, and if it was > readonly, it should be workable if a server is running, no? I think parsing the file contents is too hard. The database would have to be running and I would use psql. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I could even see a utility that does a dump of this info into a flat file, entirely overwriting the file every time. This would be quick to reference and usable in a meltdown scenario. Could easily be incorporated into vacuum and other db maintenance cron scripts. -Casey Bruce Momjian wrote: >>> Yes, I like that idea, but the problem is that it is hard to update just >>> one table in the file. You sort of have to update the entire file each >>> time a table changes. That is why I liked symlinks because they are >>> per-table, but you are right that the symlink creation could fail >>> because the new table file was never created or something, leaving the >>> symlink pointing to nothing. Not sure how to address this. Is there a >>> way to update a flat file when a single table changes? >> >> Why not just dump the whole file? That way, if a previosu dump failed for >> whatever reason, the new dump would correct that omission ... > > > Yes, you can do that, but it is only updated during a dump, right? > Makes it hard to use during the day, no? > > >> Then again, why not some sort of 'lsdb' command that looks at where it is >> and gives you info as appropriate? > > > > I want to do that for oid2name. I had the plan layed out, but never got > to it. > > >> if in data/base, then do a connect to template1 using postgres so that you >> can dump and parse the raw data from pg_database ... if in a directory, >> you should be able to connect to that database in a similar way to grab >> the contents of pg_class ... >> >> no server would need to be running for this to work, and if it was >> readonly, it should be workable if a server is running, no? > > > I think parsing the file contents is too hard. The database would have > to be running and I would use psql.
Bruce Momjian wrote: > The problem here is that now we don't have commit status in the index > rows, so they have to check the heap for every row. One idea is to > update the index status on an index scan, and if we can do that, we can > easily use the index. However, the table scan is pretty quick. It certainly works quickly for smaller tables, however the 21.7 million record table I ran this on takes a touch longer as shown here: database=# explain select count(*) from table; NOTICE: QUERY PLAN: Aggregate (cost=478056.20..478056.20 rows=1 width=0) -> Seq Scan on table (cost=0.00..423737.76 rows=21727376 width=0) EXPLAIN However I noted explain provides rows as part of it's data; from what I've seen this loses precision over time or with large data imports, though; at least until the table is vacuumed again. -Casey
On Sun, 29 Apr 2001, Bruce Momjian wrote: > > > Yes, I like that idea, but the problem is that it is hard to update just > > > one table in the file. You sort of have to update the entire file each > > > time a table changes. That is why I liked symlinks because they are > > > per-table, but you are right that the symlink creation could fail > > > because the new table file was never created or something, leaving the > > > symlink pointing to nothing. Not sure how to address this. Is there a > > > way to update a flat file when a single table changes? > > > > Why not just dump the whole file? That way, if a previosu dump failed for > > whatever reason, the new dump would correct that omission ... > > Yes, you can do that, but it is only updated during a dump, right? > Makes it hard to use during the day, no? > > > > > Then again, why not some sort of 'lsdb' command that looks at where it is > > and gives you info as appropriate? > > > I want to do that for oid2name. I had the plan layed out, but never got > to it. > > > > > if in data/base, then do a connect to template1 using postgres so that you > > can dump and parse the raw data from pg_database ... if in a directory, > > you should be able to connect to that database in a similar way to grab > > the contents of pg_class ... > > > > no server would need to be running for this to work, and if it was > > readonly, it should be workable if a server is running, no? > > I think parsing the file contents is too hard. The database would have > to be running and I would use psql. I don't know, I recovered someone's database using a "raw" connection ... wasn't that difficult once I figured out the format *shrug* the following gets the oid,relname's for a database in the format: echo "select oid,relname from pg_class" | postgres -L -D /usr/local/pgsql/data eceb | egrep "oid|relname" then just parse the output using a simple perl script: 1: oid = "163338" (typeid = 26, len = 4, typmod = -1, byval = t) 2: relname = "auth_info_uid_key" (typeid = 19, len = 32, typmod = -1, byval = f) 1: oid = "163341" (typeid = 26, len = 4, typmod = -1, byval= t) 2: relname = "auth_info_id" (typeid = 19, len = 32, typmod = -1, byval = f) 1: oid = "56082" (typeid = 26, len = 4, typmod = -1, byval = t) 2: relname = "auth_info" (typeid = 19, len = 32, typmod= -1, byval = f) the above won't work on a live database, did try that, so best is to test for a connection first, and this would be a fall back ... but you'd at least have a live *and* non live way of parsing the data *shrug*
----- Original Message ----- From: Alfred Perlstein <bright@wintelcom.net> To: Bruce Momjian <pgman@candle.pha.pa.us> Cc: The Hermit Hacker <scrappy@hub.org>; Casey Lyon <casey@earthcars.com>; <pgsql-hackers@postgresql.org> Sent: Sunday, April 29, 2001 11:17 PM Subject: Re: [HACKERS] Thanks, naming conventions, and count() > * Bruce Momjian <pgman@candle.pha.pa.us> [010429 20:14] wrote: > > > Yes, I like that idea, but the problem is that it is hard to update just > > one table in the file. You sort of have to update the entire file each > > time a table changes. That is why I liked symlinks because they are > > per-table, but you are right that the symlink creation could fail > > because the new table file was never created or something, leaving the > > symlink pointing to nothing. Not sure how to address this. Is there a > > way to update a flat file when a single table changes? > > Sort of, if that flat file is in the form of: > 123456;"tablename " > 000033;"another_table " > > ie, each line is a fixed length. What if have one such a line in separate file in one dir? Then there is no restriction on field length, you don't need to dump the file each time and maintain the real .symlinks. The 'lsdb' command (courtesy of The Hermit Hacker :)) will assemble all of them together and will show the DBA where to look for a specific table. File names can be your OIDs again, and just keep table name inside the file. Keep these files under a certain dir, and let the lsdb display them appropriately when needed. Or another idea is to create 'deferred' symlinks. The (real) symlinks only created when DBA issues the 'lsdb' command and lists them, and this list is maintained only when the 'lsdb' is invoked.... Maybe this sounds stupid, but just a thought... Serguei
> It certainly works quickly for smaller tables, however the 21.7 million > record table I ran this on takes a touch longer as shown here: > > database=# explain select count(*) from table; > NOTICE: QUERY PLAN: > > Aggregate (cost=478056.20..478056.20 rows=1 width=0) > -> Seq Scan on table (cost=0.00..423737.76 rows=21727376 width=0) > > EXPLAIN > > However I noted explain provides rows as part of it's data; from what > I've seen this loses precision over time or with large data imports, > though; at least until the table is vacuumed again. I guess I was saying that an index scan could take longer because it has to walk the btree. However it only has one column of the table, so it may be faster. I never measured the two, but the heap access needed for the index scan currently is a performance killer. Sequential is faster than all those random heap lookups from the index. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > I think parsing the file contents is too hard. The database would have > > to be running and I would use psql. > > I don't know, I recovered someone's database using a "raw" connection ... > wasn't that difficult once I figured out the format *shrug* > > the following gets the oid,relname's for a database in the format: > > echo "select oid,relname from pg_class" | postgres -L -D /usr/local/pgsql/data eceb | egrep "oid|relname" > > then just parse the output using a simple perl script: > > 1: oid = "163338" (typeid = 26, len = 4, typmod = -1, byval = t) > 2: relname = "auth_info_uid_key" (typeid = 19, len = 32, typmod = -1, byval = f) > 1: oid = "163341" (typeid = 26, len = 4, typmod = -1, byval = t) > 2: relname = "auth_info_id" (typeid = 19, len = 32, typmod = -1, byval = f) > 1: oid = "56082" (typeid = 26, len = 4, typmod = -1, byval = t) > 2: relname = "auth_info" (typeid = 19, len = 32, typmod = -1, byval = f) Oh, you did a direct postgres backend connect. Yes, that will work fine. Good idea if the postmaster is down. I originally thought you meant reading the pg_class file raw. Of course, that would be really hard because there is no way to know what numeric file is pg_class! Actually, seems it is always 1259. I see this in include/catalog/pg_class.h: DATA(insert OID = 1259 ( pg_class 83 PGUID 0 1259 0 0 0 0 f f r22 0 0 0 0 0 f f f _null_ )); -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
If this isn't incorporated into a utility, it would certainly be prime for inclusion for the yet-to-be-written chapter 11 of the PG Admin Manual "Database Recovery." Thanks for your responses, -Casey The Hermit Hacker wrote: > On Sun, 29 Apr 2001, Bruce Momjian wrote: > > >>>> Yes, I like that idea, but the problem is that it is hard to update just >>>> one table in the file. You sort of have to update the entire file each >>>> time a table changes. That is why I liked symlinks because they are >>>> per-table, but you are right that the symlink creation could fail >>>> because the new table file was never created or something, leaving the >>>> symlink pointing to nothing. Not sure how to address this. Is there a >>>> way to update a flat file when a single table changes? >>> >>> Why not just dump the whole file? That way, if a previosu dump failed for >>> whatever reason, the new dump would correct that omission ... >> >> Yes, you can do that, but it is only updated during a dump, right? >> Makes it hard to use during the day, no? >> >> >>> Then again, why not some sort of 'lsdb' command that looks at where it is >>> and gives you info as appropriate? >> >> >> I want to do that for oid2name. I had the plan layed out, but never got >> to it. >> >> >>> if in data/base, then do a connect to template1 using postgres so that you >>> can dump and parse the raw data from pg_database ... if in a directory, >>> you should be able to connect to that database in a similar way to grab >>> the contents of pg_class ... >>> >>> no server would need to be running for this to work, and if it was >>> readonly, it should be workable if a server is running, no? >> >> I think parsing the file contents is too hard. The database would have >> to be running and I would use psql. > > > I don't know, I recovered someone's database using a "raw" connection ... > wasn't that difficult once I figured out the format *shrug* > > the following gets the oid,relname's for a database in the format: > > echo "select oid,relname from pg_class" | postgres -L -D /usr/local/pgsql/data eceb | egrep "oid|relname" > > then just parse the output using a simple perl script: > > 1: oid = "163338" (typeid = 26, len = 4, typmod = -1, byval = t) > 2: relname = "auth_info_uid_key" (typeid = 19, len = 32, typmod = -1, byval = f) > 1: oid = "163341" (typeid = 26, len = 4, typmod = -1, byval = t) > 2: relname = "auth_info_id" (typeid = 19, len = 32, typmod = -1, byval = f) > 1: oid = "56082" (typeid = 26, len = 4, typmod = -1, byval = t) > 2: relname = "auth_info" (typeid = 19, len = 32, typmod = -1, byval = f) > > the above won't work on a live database, did try that, so best is to test > for a connection first, and this would be a fall back ... but you'd at > least have a live *and* non live way of parsing the data *shrug*
Here is what I suggested for oid2name to do with file names: --------------------------------------------------------------------------- Just seems like a major pain; not worth the work. If you do a ls and pipe it, here is what you would need to do: - find out where $PWD is - in that database (found from PID), for each file in the dir, look it up using oid2name - print that out problems: - ls -l vs ls - column are different for differing OSs / filesystems - du will REALLY suck - what if the user tries to do "ls /var/postgres/data/base/12364" Will you try to parse out the request? Ugh. no thanks. I also don't think people will have much reason to use the script. oid2name will have little enough use, what use will the script have? Who knows.. I guess keep it on back burner till there is a demand. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > big problem is that there is no good way to make the symlinks reliable > because in a crash, the symlink could point to a table creation that got > rolled back or the renaming of a table that got rolled back. Yes. Have you already forgotten the very long discussion we had about this some months back? There is no way to provide a reliable symlink mapping without re-introducing all the same problems that we went to numeric filenames to avoid. Now if you want an *UNRELIABLE* symlink mapping, maybe we could talk about it ... but IMHO such a feature would be worse than useless. Murphy's law says that the symlinks would be right often enough to mislead dbadmins into trusting them, and wrong exactly when it would do the most damage to trust them. The same goes for other methods of unreliably exporting the name-to-number mapping, such as dumping it into a flat file. We do need to document how to get the mapping (ie, select relfilenode, relname from pg_class). But I really doubt that an automated method for exporting the mapping would be worth the cycles it would cost, even if it could be made reliable which it can't. regards, tom lane
* Tom Lane <tgl@sss.pgh.pa.us> [010429 23:12] wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > big problem is that there is no good way to make the symlinks reliable > > because in a crash, the symlink could point to a table creation that got > > rolled back or the renaming of a table that got rolled back. > > Yes. Have you already forgotten the very long discussion we had about > this some months back? There is no way to provide a reliable symlink > mapping without re-introducing all the same problems that we went to > numeric filenames to avoid. Now if you want an *UNRELIABLE* symlink > mapping, maybe we could talk about it ... but IMHO such a feature would > be worse than useless. Murphy's law says that the symlinks would be > right often enough to mislead dbadmins into trusting them, and wrong > exactly when it would do the most damage to trust them. The same goes > for other methods of unreliably exporting the name-to-number mapping, > such as dumping it into a flat file. > > We do need to document how to get the mapping (ie, select relfilenode, > relname from pg_class). But I really doubt that an automated method > for exporting the mapping would be worth the cycles it would cost, > even if it could be made reliable which it can't. Perhaps an external tool to rebuild the symlink state that could be run on an offline database. But I'm sure you have more important things to do. :) -- -Alfred Perlstein - [alfred@freebsd.org] Daemon News Magazine in your snail-mail! http://magazine.daemonnews.org/
On Sun, 29 Apr 2001, Bruce Momjian wrote: > > > I think parsing the file contents is too hard. The database would have > > > to be running and I would use psql. > > > > I don't know, I recovered someone's database using a "raw" connection ... > > wasn't that difficult once I figured out the format *shrug* > > > > the following gets the oid,relname's for a database in the format: > > > > echo "select oid,relname from pg_class" | postgres -L -D /usr/local/pgsql/data eceb | egrep "oid|relname" > > > > then just parse the output using a simple perl script: > > > > 1: oid = "163338" (typeid = 26, len = 4, typmod = -1, byval = t) > > 2: relname = "auth_info_uid_key" (typeid = 19, len = 32, typmod = -1, byval = f) > > 1: oid = "163341" (typeid = 26, len = 4, typmod = -1, byval = t) > > 2: relname = "auth_info_id" (typeid = 19, len = 32, typmod = -1, byval = f) > > 1: oid = "56082" (typeid = 26, len = 4, typmod = -1, byval = t) > > 2: relname = "auth_info" (typeid = 19, len = 32, typmod = -1, byval = f) > > Oh, you did a direct postgres backend connect. Yes, that will work > fine. Good idea if the postmaster is down. I originally thought you > meant reading the pg_class file raw. Of course, that would be really > hard because there is no way to know what numeric file is pg_class! But would it work on a crashed database that won't come up or doesn't the direct connect care about any other tables in this usage? Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net 56K Nationwide Dialup from $16.00/mo atPop4 Networking Online Campground Directory http://www.camping-usa.com Online Giftshop Superstore http://www.cloudninegifts.com ==========================================================================
Vince Vielhaber <vev@michvhf.com> writes: >> Oh, you did a direct postgres backend connect. Yes, that will work >> fine. Good idea if the postmaster is down. I originally thought you >> meant reading the pg_class file raw. Of course, that would be really >> hard because there is no way to know what numeric file is pg_class! > But would it work on a crashed database that won't come up No. It's not that hard to know "which numeric file is pg_class" --- that info has to be hard-wired in at some level. (The backends cannot learn pg_class's own relfilenode number by examining its pg_class entry...) It might be worth making a simple utility (could be based on Bryan White's pg_check) to grovel through the raw pg_class bits and extract relfilenode info the hard way. You'd only need it in certain disaster scenarios, but when you did need it you'd need it bad. So far we have not seen a report of a situation where this seemed to be useful, so I'm not that excited about having it... WAL dump and interrogation utilities are higher on my want list. regards, tom lane
> It might be worth making a simple utility (could be based on Bryan > White's pg_check) to grovel through the raw pg_class bits and extract > relfilenode info the hard way. You'd only need it in certain disaster > scenarios, but when you did need it you'd need it bad. > > So far we have not seen a report of a situation where this seemed to be > useful, so I'm not that excited about having it... WAL dump and > interrogation utilities are higher on my want list. OK, updated TODO item: * Add table name mapping for numeric file names I removed the symlink mention, and I agree it is low priority. No one is really asking for it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Sun, Apr 29, 2001 at 08:17:28PM -0700, Alfred Perlstein wrote: > Sort of, if that flat file is in the form of: > 123456;"tablename " > 000033;"another_table " Or better yet, since the flat file is unlikely to be large, you could just do this dance: 1) open file for reading 2) flock() file exclusively, non-blocking. 3) If 2 failed, sleep a bit, then go back to 1, otherwise open new file for writing 4) Write out new file 5) rename() the temp file over the new file 6) close files, etc That way, you'll never have the race of 2 programs trying to write the file at a time (therefore losing changes), and you get total atomicity of the writing. You could also do it with an open(O_EXCL) on a fixed temp file, instead of the flock() call. The semantics should be the same. Of course, you could always fork() a child to handle this in the background, as it's hardly important to the running of the database. (Or if it is, it can become part of the transaction, which means that at rename() time, there must be no room for other failures, but it mustn't be too late to roll back) -- Michael Samuel <michael@miknet.net>
> > Yes, I like that idea, but the problem is that it is hard to update just > > one table in the file. > > why not have just one ever-growing file that is only appended to and > that has > lines of form > > OID, type (DB/TABLE/INDEX/...), name, time > > so when you need tha actual info you grep for name and use tha last line > whose > file actually exists. Not too convenient but useful enough when you > really need it. Yes, that is one idea, but it is hard to undo a change. You would have to write to the file only on a commit. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > I can even think of a situation, as unlikely as it can be, where this > > could happen ... run out of inodes on the file system ... last inode used > > by the table, no inode to stick the symlink onto ... > > If you run out of inodes, you are going to have much bigger problems > than symlinks. Sort file creation would fail too. > > > > > its a remote situation, but I've personally had it happen ... > > > > I'd personally prefer to see some text file created in the database > > directory itself that contains the mappings ... so that each time there is > > a change, it just redumps that data to the dext file ... less to maintain > > overall ... > > Yes, I like that idea, but the problem is that it is hard to update just > one table in the file. why not have just one ever-growing file that is only appended to and that has lines of form OID, type (DB/TABLE/INDEX/...), name, time so when you need tha actual info you grep for name and use tha last line whose file actually exists. Not too convenient but useful enough when you really need it. ------------------- Hannu