Thread: RE: [HACKERS] mdnblocks is an amazing time sink in huge relations

RE: [HACKERS] mdnblocks is an amazing time sink in huge relations

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Hiroshi Inoue [mailto:Inoue@tpf.co.jp]
> Sent: Tuesday, October 19, 1999 6:45 PM
> To: Tom Lane
> Cc: pgsql-hackers@postgreSQL.org
> Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge
> relations 
> 
> 
> > 
> > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> 
> [snip]
>  
> > 
> > > Deletion is necessary only not to consume disk space.
> > >
> > > For example vacuum could remove not deleted files.
> > 
> > Hmm ... interesting idea ... but I can hear the complaints
> > from users already...
> >
> 
> My idea is only an analogy of PostgreSQL's simple recovery
> mechanism of tuples.
> 
> And my main point is
>     "delete fails after commit" doesn't harm the database
>     except that not deleted files consume disk space.
> 
> Of cource,it's preferable to delete relation files immediately
> after(or just when) commit.
> Useless files are visible though useless tuples are invisible.
>

Anyway I don't need "DROP TABLE inside transactions" now
and my idea is originally for that issue.

After a thought,I propose the following solution.

1. mdcreate() couldn't create existent relation files.   If the existent file is of length zero,we would overwrite
thefile.(seems the comment in md.c says so but the   code doesn't do so).    If the file is an Index relation file,we
wouldoverwrite   the file.
 

2. mdunlink() couldn't unlink non-existent relation files.   mdunlink() doesn't call elog(ERROR) even if the file
doesn'texist,though I couldn't find where to change   now.   mdopen() doesn't call elog(ERROR) even if the file
doesn'texist and leaves the relation as CLOSED. 
 

Comments ?

Regards. 

Hiroshi Inoue
Inoue@tpf.co.jp


RE: [HACKERS] mdnblocks is an amazing time sink in huge relations

From
"Hiroshi Inoue"
Date:
> 
> After a thought,I propose the following solution.
> 
> 1. mdcreate() couldn't create existent relation files.
>     If the existent file is of length zero,we would overwrite
>     the file.(seems the comment in md.c says so but the
>     code doesn't do so). 
>     If the file is an Index relation file,we would overwrite
>     the file.
>

This may allow to CREATE TABLE simultaneously for the
same table name. I would change to check the existence
of the same table name correctly in heap_create_with_ca
talog().
> 2. mdunlink() couldn't unlink non-existent relation files.
>     mdunlink() doesn't call elog(ERROR) even if the file
>     doesn't exist,though I couldn't find where to change
>     now.

_mdfd_getrelnfd(),mdnblocks() doesn't call elog().
Return code will be checked.

>     mdopen() doesn't call elog(ERROR) even if the file
>     doesn't exist and leaves the relation as CLOSED. 
> 
> Comments ?
>

Recently I saw 2 postings about this in pgsql MLs.
So I want to change as above.

2. was changed by Tom(mdunlink/mdopen) and
Tatsuo(mdopen) recently.
Any Problems ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp 


> > 
> > After a thought,I propose the following solution.
> > 
> > 1. mdcreate() couldn't create existent relation files.
> >     If the existent file is of length zero,we would overwrite
> >     the file.(seems the comment in md.c says so but the
> >     code doesn't do so). 
> >     If the file is an Index relation file,we would overwrite
> >     the file.
> >
> 
> This may allow to CREATE TABLE simultaneously for the
> same table name. I would change to check the existence

As I was afraid,2 tables of a same name could be made.
After a short investigating,I found that system indexes are
never unique indexes.
Why ?
Without duplicate index check,it's very difficult to prevent
objects from having same name.

Comments ?

Hiroshi Inoue
Inoue@tpf.co.jp



> As I was afraid,2 tables of a same name could be made.
> After a short investigating,I found that system indexes are
> never unique indexes.
> Why ?
> Without duplicate index check,it's very difficult to prevent
> objects from having same name.

They certainly should be unique.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


> 
> > As I was afraid,2 tables of a same name could be made.
> > After a short investigating,I found that system indexes are
> > never unique indexes.
> > Why ?
> > Without duplicate index check,it's very difficult to prevent
> > objects from having same name.
> 
> They certainly should be unique.
>

All should be unique ?
I don't know system indexes well.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp 


[Charset iso-8859-1 unsupported, filtering to ASCII...]
> > 
> > > As I was afraid,2 tables of a same name could be made.
> > > After a short investigating,I found that system indexes are
> > > never unique indexes.
> > > Why ?
> > > Without duplicate index check,it's very difficult to prevent
> > > objects from having same name.
> > 
> > They certainly should be unique.
> >
> 
> All should be unique ?
> I don't know system indexes well.

Not sure.  I don't remember which ones.  I can take a look when I add
more indexes for 7.0.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis

From
wieck@debis.com (Jan Wieck)
Date:
>
> [Charset iso-8859-1 unsupported, filtering to ASCII...]
> > >
> > > > As I was afraid,2 tables of a same name could be made.
> > > > After a short investigating,I found that system indexes are
> > > > never unique indexes.
> > > > Why ?
> > > > Without duplicate index check,it's very difficult to prevent
> > > > objects from having same name.
> > >
> > > They certainly should be unique.
> > >
> >
> > All should be unique ?
> > I don't know system indexes well.
>
> Not sure.  I don't remember which ones.  I can take a look when I add
> more indexes for 7.0.

    Don't  remember  if  really  or  what,  but wasn't there some
    problem with cached  system  relations,  unique  indices  and
    concurrency?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis

From
Bruce Momjian
Date:
> > Not sure.  I don't remember which ones.  I can take a look when I add
> > more indexes for 7.0.
> 
>     Don't  remember  if  really  or  what,  but wasn't there some
>     problem with cached  system  relations,  unique  indices  and
>     concurrency?
> 

I don't remember anything about that.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis

From
"Hiroshi Inoue"
Date:
>
> > > Not sure.  I don't remember which ones.  I can take a look when I add
> > > more indexes for 7.0.
> >
> >     Don't  remember  if  really  or  what,  but wasn't there some
> >     problem with cached  system  relations,  unique  indices  and
> >     concurrency?
> >
>
> I don't remember anything about that.
>

I don't know old PostgreSQL at all.
Only one thing I could suppose is the following.

Before MVCC it was unnecessary to read dirty(uncommited) tuples
to check uniqueness because a table level exclusive lock was acquired
automatically. As for user tuples,the consistency was perserved because
the lock was held until transaction end.  As for system tuples,the
consistency
could be broken if the lock was a short term lock.

After MVCC,dirty(uncommitted) tuples are taken into account to check
uniqueness and  any lock is no longer needed.

AFAIK,there are no other means to check(lock ?)  (logically) non-existent
rows now(Referencial Integrity would provide the second one).
So probably PostgreSQL couldn't guarantee the uniquness of system
tuples in many cases.

Anyway,I want to change the implementation of mdcreate() to reuse
existent files but the uniqueness of table name is preserved by the
current implementation narrowly.

First of all,I would change pg_type,pg_class.
It's OK ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis

From
Bruce Momjian
Date:
> I don't know old PostgreSQL at all.
> Only one thing I could suppose is the following.
> 
> Before MVCC it was unnecessary to read dirty(uncommited) tuples
> to check uniqueness because a table level exclusive lock was acquired
> automatically. As for user tuples,the consistency was perserved because
> the lock was held until transaction end.  As for system tuples,the
> consistency
> could be broken if the lock was a short term lock.
> 
> After MVCC,dirty(uncommitted) tuples are taken into account to check
> uniqueness and  any lock is no longer needed.
> 
> AFAIK,there are no other means to check(lock ?)  (logically) non-existent
> rows now(Referencial Integrity would provide the second one).
> So probably PostgreSQL couldn't guarantee the uniquness of system
> tuples in many cases.
> 
> Anyway,I want to change the implementation of mdcreate() to reuse
> existent files but the uniqueness of table name is preserved by the
> current implementation narrowly.
> 
> First of all,I would change pg_type,pg_class.
> It's OK ?

Sure.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis

From
"Hiroshi Inoue"
Date:
> >
> > Anyway,I want to change the implementation of mdcreate() to reuse
> > existent files but the uniqueness of table name is preserved by the
> > current implementation narrowly.
> >
> > First of all,I would change pg_type,pg_class.
> > It's OK ?
>
> Sure.
>

I made a patch.
But I'm not sure my solution is right.
Is there a better way ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

*** ../../head/pgcurrent/backend/bootstrap/bootscanner.l    Tue Sep 14
12:17:34 1999
--- backend/bootstrap/bootscanner.l    Tue Oct 26 23:36:08 1999
***************
*** 90,95 ****
--- 90,96 ---- "declare"        { return(XDECLARE); } "build"            { return(XBUILD); } "indices"        {
return(INDICES);}
 
+ "unique"        { return(UNIQUE); } "index"            { return(INDEX); } "on"            { return(ON); } "using"
      { return(USING); }
 
*** ../../head/pgcurrent/backend/bootstrap/bootparse.y    Mon Jul 26 12:44:44
1999
--- backend/bootstrap/bootparse.y    Tue Oct 26 23:47:20 1999
***************
*** 80,86 **** %token <ival> CONST ID %token OPEN XCLOSE XCREATE INSERT_TUPLE %token STRING XDEFINE
! %token XDECLARE INDEX ON USING XBUILD INDICES %token COMMA EQUALS LPAREN RPAREN %token OBJ_ID XBOOTSTRAP NULLVAL
%startTopLevel
 
--- 80,86 ---- %token <ival> CONST ID %token OPEN XCLOSE XCREATE INSERT_TUPLE %token STRING XDEFINE
! %token XDECLARE INDEX ON USING XBUILD INDICES UNIQUE %token COMMA EQUALS LPAREN RPAREN %token OBJ_ID XBOOTSTRAP
NULLVAL%start TopLevel
 
***************
*** 106,111 ****
--- 106,112 ----         | Boot_CreateStmt         | Boot_InsertStmt         | Boot_DeclareIndexStmt
+         | Boot_DeclareUniqueIndexStmt         | Boot_BuildIndsStmt         ;

***************
*** 226,231 ****
--- 227,245 ----                                 LexIDStr($3),                                 LexIDStr($7),
                    $9, NIL, 0, 0, 0, NIL);
 
+                     DO_END;
+                 }
+         ;
+
+ Boot_DeclareUniqueIndexStmt:
+           XDECLARE UNIQUE INDEX boot_ident ON boot_ident USING boot_ident
LPAREN boot_index_params RPAREN
+                 {
+                     DO_START;
+
+                     DefineIndex(LexIDStr($6),
+                                 LexIDStr($4),
+                                 LexIDStr($8),
+                                 $10, NIL, 1, 0, 0, NIL);                     DO_END;                 }         ;
*** ../../head/pgcurrent/backend/catalog/genbki.sh.in    Mon Jul 26 12:44:44
1999
--- backend/catalog/genbki.sh.in    Tue Oct 26 22:03:43 1999
***************
*** 164,169 ****
--- 164,183 ----     print "declare index " data }

+ /^DECLARE_UNIQUE_INDEX\(/ {
+ # ----
+ #  end any prior catalog data insertions before starting a define unique
index
+ # ----
+     if (reln_open == 1) {
+ #        print "show";
+         print "close " catalog;
+         reln_open = 0;
+     }
+
+     data = substr($0, 22, length($0) - 22);
+     print "declare unique index " data
+ }
+ /^BUILD_INDICES/    { print "build indices"; }
 # ----------------
*** ../../head/pgcurrent/include/postgres.h    Mon Oct 25 22:13:13 1999
--- include/postgres.h    Tue Oct 26 21:45:27 1999
***************
*** 138,143 ****
--- 138,144 ---- #define DATA(x) extern int errno #define DESCR(x) extern int errno #define DECLARE_INDEX(x) extern int
errno
+ #define DECLARE_UNIQUE_INDEX(x) extern int errno
 #define BUILD_INDICES #define BOOTSTRAP
*** ../../head/pgcurrent/include/catalog/indexing.h    Mon Oct  4 14:25:34
1999
--- include/catalog/indexing.h    Tue Oct 26 21:47:24 1999
***************
*** 102,112 **** DECLARE_INDEX(pg_proc_oid_index on pg_proc using btree(oid oid_ops));
DECLARE_INDEX(pg_proc_proname_narg_type_indexon pg_proc using
 
btree(proname name_ops, pronargs int2_ops, proargtypes oid8_ops));

! DECLARE_INDEX(pg_type_oid_index on pg_type using btree(oid oid_ops));
! DECLARE_INDEX(pg_type_typname_index on pg_type using btree(typname
name_ops));

! DECLARE_INDEX(pg_class_oid_index on pg_class using btree(oid oid_ops));
! DECLARE_INDEX(pg_class_relname_index on pg_class using btree(relname
name_ops));
 DECLARE_INDEX(pg_attrdef_adrelid_index on pg_attrdef using btree(adrelid
oid_ops));

--- 102,112 ---- DECLARE_INDEX(pg_proc_oid_index on pg_proc using btree(oid oid_ops));
DECLARE_INDEX(pg_proc_proname_narg_type_indexon pg_proc using
 
btree(proname name_ops, pronargs int2_ops, proargtypes oid8_ops));

! DECLARE_UNIQUE_INDEX(pg_type_oid_index on pg_type using btree(oid
oid_ops));
! DECLARE_UNIQUE_INDEX(pg_type_typname_index on pg_type using btree(typname
name_ops));

! DECLARE_UNIQUE_INDEX(pg_class_oid_index on pg_class using btree(oid
oid_ops));
! DECLARE_UNIQUE_INDEX(pg_class_relname_index on pg_class using
btree(relname name_ops));
 DECLARE_INDEX(pg_attrdef_adrelid_index on pg_attrdef using btree(adrelid
oid_ops));




Re: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis

From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> > >
> > > Anyway,I want to change the implementation of mdcreate() to reuse
> > > existent files but the uniqueness of table name is preserved by the
> > > current implementation narrowly.
> > >
> > > First of all,I would change pg_type,pg_class.
> > > It's OK ?
> >
> > Sure.
> >
> 
> I made a patch.
> But I'm not sure my solution is right.
> Is there a better way ?

Looks perfect to me.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis

From
"Hiroshi Inoue"
Date:
> 
> [Charset iso-8859-1 unsupported, filtering to ASCII...]
> > > >
> > > > Anyway,I want to change the implementation of mdcreate() to reuse
> > > > existent files but the uniqueness of table name is preserved by the
> > > > current implementation narrowly.
> > > >
> > > > First of all,I would change pg_type,pg_class.
> > > > It's OK ?
> > >
> > > Sure.
> > >
> > 
> > I made a patch.
> > But I'm not sure my solution is right.
> > Is there a better way ?
> 
> Looks perfect to me.
>

Thanks.
I would commit it with some other changes.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp 


Re: System indexes are never unique indexes( was RE: [HACKERS] mdnblocksis

From
Bruce Momjian
Date:
> > Looks perfect to me.
> >
> 
> Thanks.
> I would commit it with some other changes.
> 
> Regards.
> 
> Hiroshi Inoue
> Inoue@tpf.co.jp 
> 

I want to add some system indexes to cache lookups are faster, but am
having problems with the bootup code.  Let me know if you are interested
in looking at it.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Unique indexes on system tables

From
Bruce Momjian
Date:
Hiroshi, there are two things I want to do for 7.0.

First, I want to make more of the system indexes unique.  Are you aware
of any reasons not to do that?  I see you have done some of them
already.  I talked to Tom Lane, and he thinks that it will not cause
problems because unique insertions/updates wait for transactions to
commit before doing a conflicting change to the index, right?
Second, I want to add more system indexes to match all caches.  Anything
that could cause problems there?

Are you working on any of this?


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: Unique indexes on system tables

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
> Sent: Tuesday, November 16, 1999 1:38 PM
> To: Hiroshi Inoue
> Cc: PostgreSQL-development
> Subject: Unique indexes on system tables
> 
> 
> Hiroshi, there are two things I want to do for 7.0.
> 
> First, I want to make more of the system indexes unique.  Are you aware
> of any reasons not to do that?

No.
It should be done to guarantee the uniqueness of system tuples.

> I see you have done some of them
> already.  I talked to Tom Lane, and he thinks that it will not cause
> problems because unique insertions/updates wait for transactions to
> commit before doing a conflicting change to the index, right?
>

Yes. 
> Second, I want to add more system indexes to match all caches.  Anything
> that could cause problems there?
>

I am only afraid of index corruption.
The more we have system indexes,the more index corruption would happen.

How could we recover from the state ?
Tom suggested rebuilding indexes in vacuum.
According to Jan,there was a utility called reindexdb.
WAL by Vadim may be able to recover indexes completely in case of crash.

> Are you working on any of this?
>

No.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] RE: Unique indexes on system tables

From
Peter Eisentraut
Date:
On Tue, 16 Nov 1999, Hiroshi Inoue wrote:

> I am only afraid of index corruption.
> The more we have system indexes,the more index corruption would happen.

Just a concerned user question: Why does index corruption seem to happen
so often or is a genuine concern? Wouldn't the next thing be table
corruption? Or are indices optimized for speed rather than correctness
because they don't contain important data?

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden



Re: [HACKERS] RE: Unique indexes on system tables

From
wieck@debis.com (Jan Wieck)
Date:
>
> On Tue, 16 Nov 1999, Hiroshi Inoue wrote:
>
> > I am only afraid of index corruption.
> > The more we have system indexes,the more index corruption would happen.
>
> Just a concerned user question: Why does index corruption seem to happen
> so often or is a genuine concern? Wouldn't the next thing be table
> corruption? Or are indices optimized for speed rather than correctness
> because they don't contain important data?

    There are more complicated concurrency issues on indices than
    for regular tables. That's where the corrupt indices but  not
    tables come from.

    For  a  user  index,  this  isn't  very  critical,  because a
    drop/create index sequence will recover to consistent data.

    For system catalog indices, this is a desaster,  because  you
    cannot  drop  and recreate indices on system tables. At least
    we need to tackle this problem by reincarnating reindexdb.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #