Thread: RE: [HACKERS] mdnblocks is an amazing time sink in huge relations
> -----Original Message----- > From: Hiroshi Inoue [mailto:Inoue@tpf.co.jp] > Sent: Tuesday, October 19, 1999 6:45 PM > To: Tom Lane > Cc: pgsql-hackers@postgreSQL.org > Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge > relations > > > > > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > [snip] > > > > > > Deletion is necessary only not to consume disk space. > > > > > > For example vacuum could remove not deleted files. > > > > Hmm ... interesting idea ... but I can hear the complaints > > from users already... > > > > My idea is only an analogy of PostgreSQL's simple recovery > mechanism of tuples. > > And my main point is > "delete fails after commit" doesn't harm the database > except that not deleted files consume disk space. > > Of cource,it's preferable to delete relation files immediately > after(or just when) commit. > Useless files are visible though useless tuples are invisible. > Anyway I don't need "DROP TABLE inside transactions" now and my idea is originally for that issue. After a thought,I propose the following solution. 1. mdcreate() couldn't create existent relation files. If the existent file is of length zero,we would overwrite thefile.(seems the comment in md.c says so but the code doesn't do so). If the file is an Index relation file,we wouldoverwrite the file. 2. mdunlink() couldn't unlink non-existent relation files. mdunlink() doesn't call elog(ERROR) even if the file doesn'texist,though I couldn't find where to change now. mdopen() doesn't call elog(ERROR) even if the file doesn'texist and leaves the relation as CLOSED. Comments ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > After a thought,I propose the following solution. > > 1. mdcreate() couldn't create existent relation files. > If the existent file is of length zero,we would overwrite > the file.(seems the comment in md.c says so but the > code doesn't do so). > If the file is an Index relation file,we would overwrite > the file. > This may allow to CREATE TABLE simultaneously for the same table name. I would change to check the existence of the same table name correctly in heap_create_with_ca talog(). > 2. mdunlink() couldn't unlink non-existent relation files. > mdunlink() doesn't call elog(ERROR) even if the file > doesn't exist,though I couldn't find where to change > now. _mdfd_getrelnfd(),mdnblocks() doesn't call elog(). Return code will be checked. > mdopen() doesn't call elog(ERROR) even if the file > doesn't exist and leaves the relation as CLOSED. > > Comments ? > Recently I saw 2 postings about this in pgsql MLs. So I want to change as above. 2. was changed by Tom(mdunlink/mdopen) and Tatsuo(mdopen) recently. Any Problems ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
System indexes are never unique indexes( was RE: [HACKERS] mdnblocks is an amazing time sink in huge relations)
From
"Hiroshi Inoue"
Date:
> > > > After a thought,I propose the following solution. > > > > 1. mdcreate() couldn't create existent relation files. > > If the existent file is of length zero,we would overwrite > > the file.(seems the comment in md.c says so but the > > code doesn't do so). > > If the file is an Index relation file,we would overwrite > > the file. > > > > This may allow to CREATE TABLE simultaneously for the > same table name. I would change to check the existence As I was afraid,2 tables of a same name could be made. After a short investigating,I found that system indexes are never unique indexes. Why ? Without duplicate index check,it's very difficult to prevent objects from having same name. Comments ? Hiroshi Inoue Inoue@tpf.co.jp
Re: System indexes are never unique indexes( was RE: [HACKERS] mdnblocks is an amazing time sink in huge relations)
From
Bruce Momjian
Date:
> As I was afraid,2 tables of a same name could be made. > After a short investigating,I found that system indexes are > never unique indexes. > Why ? > Without duplicate index check,it's very difficult to prevent > objects from having same name. They certainly should be unique. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
RE: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis an amazing time sink in huge relations)
From
"Hiroshi Inoue"
Date:
> > > As I was afraid,2 tables of a same name could be made. > > After a short investigating,I found that system indexes are > > never unique indexes. > > Why ? > > Without duplicate index check,it's very difficult to prevent > > objects from having same name. > > They certainly should be unique. > All should be unique ? I don't know system indexes well. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Re: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis an amazing time sink in huge relations)
From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...] > > > > > As I was afraid,2 tables of a same name could be made. > > > After a short investigating,I found that system indexes are > > > never unique indexes. > > > Why ? > > > Without duplicate index check,it's very difficult to prevent > > > objects from having same name. > > > > They certainly should be unique. > > > > All should be unique ? > I don't know system indexes well. Not sure. I don't remember which ones. I can take a look when I add more indexes for 7.0. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Re: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis
From
wieck@debis.com (Jan Wieck)
Date:
> > [Charset iso-8859-1 unsupported, filtering to ASCII...] > > > > > > > As I was afraid,2 tables of a same name could be made. > > > > After a short investigating,I found that system indexes are > > > > never unique indexes. > > > > Why ? > > > > Without duplicate index check,it's very difficult to prevent > > > > objects from having same name. > > > > > > They certainly should be unique. > > > > > > > All should be unique ? > > I don't know system indexes well. > > Not sure. I don't remember which ones. I can take a look when I add > more indexes for 7.0. Don't remember if really or what, but wasn't there some problem with cached system relations, unique indices and concurrency? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> > Not sure. I don't remember which ones. I can take a look when I add > > more indexes for 7.0. > > Don't remember if really or what, but wasn't there some > problem with cached system relations, unique indices and > concurrency? > I don't remember anything about that. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
RE: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis
From
"Hiroshi Inoue"
Date:
> > > > Not sure. I don't remember which ones. I can take a look when I add > > > more indexes for 7.0. > > > > Don't remember if really or what, but wasn't there some > > problem with cached system relations, unique indices and > > concurrency? > > > > I don't remember anything about that. > I don't know old PostgreSQL at all. Only one thing I could suppose is the following. Before MVCC it was unnecessary to read dirty(uncommited) tuples to check uniqueness because a table level exclusive lock was acquired automatically. As for user tuples,the consistency was perserved because the lock was held until transaction end. As for system tuples,the consistency could be broken if the lock was a short term lock. After MVCC,dirty(uncommitted) tuples are taken into account to check uniqueness and any lock is no longer needed. AFAIK,there are no other means to check(lock ?) (logically) non-existent rows now(Referencial Integrity would provide the second one). So probably PostgreSQL couldn't guarantee the uniquness of system tuples in many cases. Anyway,I want to change the implementation of mdcreate() to reuse existent files but the uniqueness of table name is preserved by the current implementation narrowly. First of all,I would change pg_type,pg_class. It's OK ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
> I don't know old PostgreSQL at all. > Only one thing I could suppose is the following. > > Before MVCC it was unnecessary to read dirty(uncommited) tuples > to check uniqueness because a table level exclusive lock was acquired > automatically. As for user tuples,the consistency was perserved because > the lock was held until transaction end. As for system tuples,the > consistency > could be broken if the lock was a short term lock. > > After MVCC,dirty(uncommitted) tuples are taken into account to check > uniqueness and any lock is no longer needed. > > AFAIK,there are no other means to check(lock ?) (logically) non-existent > rows now(Referencial Integrity would provide the second one). > So probably PostgreSQL couldn't guarantee the uniquness of system > tuples in many cases. > > Anyway,I want to change the implementation of mdcreate() to reuse > existent files but the uniqueness of table name is preserved by the > current implementation narrowly. > > First of all,I would change pg_type,pg_class. > It's OK ? Sure. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
RE: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis
From
"Hiroshi Inoue"
Date:
> > > > Anyway,I want to change the implementation of mdcreate() to reuse > > existent files but the uniqueness of table name is preserved by the > > current implementation narrowly. > > > > First of all,I would change pg_type,pg_class. > > It's OK ? > > Sure. > I made a patch. But I'm not sure my solution is right. Is there a better way ? Regards. Hiroshi Inoue Inoue@tpf.co.jp *** ../../head/pgcurrent/backend/bootstrap/bootscanner.l Tue Sep 14 12:17:34 1999 --- backend/bootstrap/bootscanner.l Tue Oct 26 23:36:08 1999 *************** *** 90,95 **** --- 90,96 ---- "declare" { return(XDECLARE); } "build" { return(XBUILD); } "indices" { return(INDICES);} + "unique" { return(UNIQUE); } "index" { return(INDEX); } "on" { return(ON); } "using" { return(USING); } *** ../../head/pgcurrent/backend/bootstrap/bootparse.y Mon Jul 26 12:44:44 1999 --- backend/bootstrap/bootparse.y Tue Oct 26 23:47:20 1999 *************** *** 80,86 **** %token <ival> CONST ID %token OPEN XCLOSE XCREATE INSERT_TUPLE %token STRING XDEFINE ! %token XDECLARE INDEX ON USING XBUILD INDICES %token COMMA EQUALS LPAREN RPAREN %token OBJ_ID XBOOTSTRAP NULLVAL %startTopLevel --- 80,86 ---- %token <ival> CONST ID %token OPEN XCLOSE XCREATE INSERT_TUPLE %token STRING XDEFINE ! %token XDECLARE INDEX ON USING XBUILD INDICES UNIQUE %token COMMA EQUALS LPAREN RPAREN %token OBJ_ID XBOOTSTRAP NULLVAL%start TopLevel *************** *** 106,111 **** --- 106,112 ---- | Boot_CreateStmt | Boot_InsertStmt | Boot_DeclareIndexStmt + | Boot_DeclareUniqueIndexStmt | Boot_BuildIndsStmt ; *************** *** 226,231 **** --- 227,245 ---- LexIDStr($3), LexIDStr($7), $9, NIL, 0, 0, 0, NIL); + DO_END; + } + ; + + Boot_DeclareUniqueIndexStmt: + XDECLARE UNIQUE INDEX boot_ident ON boot_ident USING boot_ident LPAREN boot_index_params RPAREN + { + DO_START; + + DefineIndex(LexIDStr($6), + LexIDStr($4), + LexIDStr($8), + $10, NIL, 1, 0, 0, NIL); DO_END; } ; *** ../../head/pgcurrent/backend/catalog/genbki.sh.in Mon Jul 26 12:44:44 1999 --- backend/catalog/genbki.sh.in Tue Oct 26 22:03:43 1999 *************** *** 164,169 **** --- 164,183 ---- print "declare index " data } + /^DECLARE_UNIQUE_INDEX\(/ { + # ---- + # end any prior catalog data insertions before starting a define unique index + # ---- + if (reln_open == 1) { + # print "show"; + print "close " catalog; + reln_open = 0; + } + + data = substr($0, 22, length($0) - 22); + print "declare unique index " data + } + /^BUILD_INDICES/ { print "build indices"; } # ---------------- *** ../../head/pgcurrent/include/postgres.h Mon Oct 25 22:13:13 1999 --- include/postgres.h Tue Oct 26 21:45:27 1999 *************** *** 138,143 **** --- 138,144 ---- #define DATA(x) extern int errno #define DESCR(x) extern int errno #define DECLARE_INDEX(x) extern int errno + #define DECLARE_UNIQUE_INDEX(x) extern int errno #define BUILD_INDICES #define BOOTSTRAP *** ../../head/pgcurrent/include/catalog/indexing.h Mon Oct 4 14:25:34 1999 --- include/catalog/indexing.h Tue Oct 26 21:47:24 1999 *************** *** 102,112 **** DECLARE_INDEX(pg_proc_oid_index on pg_proc using btree(oid oid_ops)); DECLARE_INDEX(pg_proc_proname_narg_type_indexon pg_proc using btree(proname name_ops, pronargs int2_ops, proargtypes oid8_ops)); ! DECLARE_INDEX(pg_type_oid_index on pg_type using btree(oid oid_ops)); ! DECLARE_INDEX(pg_type_typname_index on pg_type using btree(typname name_ops)); ! DECLARE_INDEX(pg_class_oid_index on pg_class using btree(oid oid_ops)); ! DECLARE_INDEX(pg_class_relname_index on pg_class using btree(relname name_ops)); DECLARE_INDEX(pg_attrdef_adrelid_index on pg_attrdef using btree(adrelid oid_ops)); --- 102,112 ---- DECLARE_INDEX(pg_proc_oid_index on pg_proc using btree(oid oid_ops)); DECLARE_INDEX(pg_proc_proname_narg_type_indexon pg_proc using btree(proname name_ops, pronargs int2_ops, proargtypes oid8_ops)); ! DECLARE_UNIQUE_INDEX(pg_type_oid_index on pg_type using btree(oid oid_ops)); ! DECLARE_UNIQUE_INDEX(pg_type_typname_index on pg_type using btree(typname name_ops)); ! DECLARE_UNIQUE_INDEX(pg_class_oid_index on pg_class using btree(oid oid_ops)); ! DECLARE_UNIQUE_INDEX(pg_class_relname_index on pg_class using btree(relname name_ops)); DECLARE_INDEX(pg_attrdef_adrelid_index on pg_attrdef using btree(adrelid oid_ops));
[Charset iso-8859-1 unsupported, filtering to ASCII...] > > > > > > Anyway,I want to change the implementation of mdcreate() to reuse > > > existent files but the uniqueness of table name is preserved by the > > > current implementation narrowly. > > > > > > First of all,I would change pg_type,pg_class. > > > It's OK ? > > > > Sure. > > > > I made a patch. > But I'm not sure my solution is right. > Is there a better way ? Looks perfect to me. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
RE: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis
From
"Hiroshi Inoue"
Date:
> > [Charset iso-8859-1 unsupported, filtering to ASCII...] > > > > > > > > Anyway,I want to change the implementation of mdcreate() to reuse > > > > existent files but the uniqueness of table name is preserved by the > > > > current implementation narrowly. > > > > > > > > First of all,I would change pg_type,pg_class. > > > > It's OK ? > > > > > > Sure. > > > > > > > I made a patch. > > But I'm not sure my solution is right. > > Is there a better way ? > > Looks perfect to me. > Thanks. I would commit it with some other changes. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > Looks perfect to me. > > > > Thanks. > I would commit it with some other changes. > > Regards. > > Hiroshi Inoue > Inoue@tpf.co.jp > I want to add some system indexes to cache lookups are faster, but am having problems with the bootup code. Let me know if you are interested in looking at it. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Hiroshi, there are two things I want to do for 7.0. First, I want to make more of the system indexes unique. Are you aware of any reasons not to do that? I see you have done some of them already. I talked to Tom Lane, and he thinks that it will not cause problems because unique insertions/updates wait for transactions to commit before doing a conflicting change to the index, right? Second, I want to add more system indexes to match all caches. Anything that could cause problems there? Are you working on any of this? -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > Sent: Tuesday, November 16, 1999 1:38 PM > To: Hiroshi Inoue > Cc: PostgreSQL-development > Subject: Unique indexes on system tables > > > Hiroshi, there are two things I want to do for 7.0. > > First, I want to make more of the system indexes unique. Are you aware > of any reasons not to do that? No. It should be done to guarantee the uniqueness of system tuples. > I see you have done some of them > already. I talked to Tom Lane, and he thinks that it will not cause > problems because unique insertions/updates wait for transactions to > commit before doing a conflicting change to the index, right? > Yes. > Second, I want to add more system indexes to match all caches. Anything > that could cause problems there? > I am only afraid of index corruption. The more we have system indexes,the more index corruption would happen. How could we recover from the state ? Tom suggested rebuilding indexes in vacuum. According to Jan,there was a utility called reindexdb. WAL by Vadim may be able to recover indexes completely in case of crash. > Are you working on any of this? > No. Regards. Hiroshi Inoue Inoue@tpf.co.jp
On Tue, 16 Nov 1999, Hiroshi Inoue wrote: > I am only afraid of index corruption. > The more we have system indexes,the more index corruption would happen. Just a concerned user question: Why does index corruption seem to happen so often or is a genuine concern? Wouldn't the next thing be table corruption? Or are indices optimized for speed rather than correctness because they don't contain important data? -- Peter Eisentraut Sernanders vaeg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
> > On Tue, 16 Nov 1999, Hiroshi Inoue wrote: > > > I am only afraid of index corruption. > > The more we have system indexes,the more index corruption would happen. > > Just a concerned user question: Why does index corruption seem to happen > so often or is a genuine concern? Wouldn't the next thing be table > corruption? Or are indices optimized for speed rather than correctness > because they don't contain important data? There are more complicated concurrency issues on indices than for regular tables. That's where the corrupt indices but not tables come from. For a user index, this isn't very critical, because a drop/create index sequence will recover to consistent data. For system catalog indices, this is a desaster, because you cannot drop and recreate indices on system tables. At least we need to tackle this problem by reincarnating reindexdb. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #