Thread: Open 6.5 items

Open 6.5 items

From
Bruce Momjian
Date:
SELECT * FROM test WHERE test IN (SELECT * FROM test) fails with strange error
When creating a table with either type inet or type cidr as a primary,unique  key, the "198.68.123.0/24" and
"198.68.123.0/27"are considered equal
 
Fix function pointer calls to take Datum args for char and int2 args(ecgs)
Regression test for new Numeric type
Large Object memory problems
refint problems
invalidate cache on aborted transaction
spinlock stuck problem
benchmark performance problem

Make psql \help, man pages, and sgml reflect changes in grammar
Markup sql.sgml, Stefan's intro to SQL
Generate Admin, User, Programmer hardcopy postscript
Generate INSTALL and HISTORY from sgml sources.
Update ref/lock.sgml, ref/set.sgml to reflect MVCC and locking changes.


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Open 6.5 items

From
Tatsuo Ishii
Date:
>spinlock stuck problem

This time I have tested on another slower/less memory machine. Seems
things getting worse. I got:
LockAcquire: xid table corrupted

This comes from:
    /*     * Find or create an xid entry with this tag     */    result = (XIDLookupEnt *) hash_search(xidTable,
(Pointer)&item,
 
      HASH_ENTER, &found);    if (!result)    {        elog(NOTICE, "LockAcquire: xid table corrupted");        return
STATUS_ERROR;   }
 

As you can see the aquired master lock never released, and all
backends get stucked. (of course, corrupted xid table is a problem too
).

Another error was:
out of free buffers: time to abort !

I will do more testing...
---
Tatsuo Ishii



Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>     LockAcquire: xid table corrupted

> This comes from:

>      /*
>       * Find or create an xid entry with this tag
>       */
>      result = (XIDLookupEnt *) hash_search(xidTable, (Pointer) &item,

>        HASH_ENTER, &found);
>      if (!result)
>      {
>          elog(NOTICE, "LockAcquire: xid table corrupted");
>          return STATUS_ERROR;
>      }

> As you can see the aquired master lock never released, and all
> backends get stucked. (of course, corrupted xid table is a problem too

Actually, corrupted xid table is *the* problem --- whatever happens
after that is just collateral damage.  (The elog should likely be
elog(FATAL) not NOTICE...)

If I recall the dynahash.c code correctly, a null return value
indicates either damage to the structure of the table (ie someone
stomped on memory that didn't belong to them) or running out of memory
to add entries to the table.  The latter should be impossible if we
sized shared memory correctly.  Perhaps the table size estimation code
has been obsoleted by recent changes?
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
"D'Arcy" "J.M." Cain
Date:
Thus spake Bruce Momjian
> When creating a table with either type inet or type cidr as a primary,unique
>    key, the "198.68.123.0/24" and "198.68.123.0/27" are considered equal

So have we decided that this is still to be fixed?  If so, it's an easy fix
but we have to decide  which of the following is true.
   198.68.123.0/24 < 198.68.123.0/27   198.68.123.0/24 > 198.68.123.0/27

Maybe deciding that should be the TODO item.  :-)

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> If I recall the dynahash.c code correctly, a null return value
> indicates either damage to the structure of the table (ie someone
> stomped on memory that didn't belong to them) or running out of memory
> to add entries to the table.  The latter should be impossible if we

Quite different cases and should result in different reactions.
If structure is corrupted then only abort() is proper thing.
If running out of memory then elog(ERROR) is enough.

> sized shared memory correctly.  Perhaps the table size estimation code
> has been obsoleted by recent changes?

lock.h:

/* ----------------------* The following defines are used to estimate how much shared * memory the lock manager is
goingto require.* See LockShmemSize() in lock.c.** NLOCKS_PER_XACT - The number of unique locks acquired in a
transaction * NLOCKENTS - The maximum number of lock entries in the lock table.* ----------------------*/
 
#define NLOCKS_PER_XACT         40                               ^^
Isn't it too low?

#define NLOCKENTS(maxBackends)  (NLOCKS_PER_XACT*(maxBackends))

And now - LockShmemSize() in lock.c:
   /* lockHash table */   size += hash_estimate_size(NLOCKENTS(maxBackends),
^^^^^^^^^^^^^^^^^^^^^^                             SHMEM_LOCKTAB_KEYSIZE,
SHMEM_LOCKTAB_DATASIZE);
   /* xidHash table */   size += hash_estimate_size(maxBackends,                              ^^^^^^^^^^^
              SHMEM_XIDTAB_KEYSIZE,                              SHMEM_XIDTAB_DATASIZE);
 

Why just maxBackends is here? NLOCKENTS should be used too
(each transaction lock requieres own xidhash entry).

Vadim


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
>> If I recall the dynahash.c code correctly, a null return value
>> indicates either damage to the structure of the table (ie someone
>> stomped on memory that didn't belong to them) or running out of memory
>> to add entries to the table.  The latter should be impossible if we

> Quite different cases and should result in different reactions.

I agree; will see about cleaning up hash_search's call convention after
6.5 is done.  Actually, maybe I should do it now?  I'm not convinced yet
whether the reports we're seeing are due to memory clobber or running
out of space... fixing this may be the easiest way to find out.

> #define NLOCKS_PER_XACT         40
>                                 ^^
> Isn't it too low?

You tell me ... that was the number that was in the 6.4 code, but I
have no idea if it's right or not.  (Does MVCC require more locks
than the old stuff?)  What is a good upper bound on the number
of concurrently existing locks?

>     /* xidHash table */
>     size += hash_estimate_size(maxBackends,
>                                ^^^^^^^^^^^
>                                SHMEM_XIDTAB_KEYSIZE,
>                                SHMEM_XIDTAB_DATASIZE);

> Why just maxBackends is here? NLOCKENTS should be used too
> (each transaction lock requieres own xidhash entry).

Should it be NLOCKENTS(maxBackends) xid entries, or do you mean
NLOCKENTS(maxBackends) + maxBackends?  Feel free to stick in any
estimates that you like better --- what's there now is an interpretation
of what the 6.4 code was trying to do (but it was sufficiently buggy and
unreadable that it was probably coming out with different numbers in
the end...)
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
"D'Arcy" "J.M." Cain <darcy@druid.net> writes:
> but we have to decide  which of the following is true.

>     198.68.123.0/24 < 198.68.123.0/27
>     198.68.123.0/24 > 198.68.123.0/27

I'd say the former, on the same principle that 'abc' < 'abcd'.
Think of the addresses as being bit strings of the specified length,
and compare them the same way character strings are compared.

But if Vixie's got a different opinion, I defer to him...
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Vadim Mikheev <vadim@krs.ru> writes:
> >> If I recall the dynahash.c code correctly, a null return value
> >> indicates either damage to the structure of the table (ie someone
> >> stomped on memory that didn't belong to them) or running out of memory
> >> to add entries to the table.  The latter should be impossible if we
> 
> > Quite different cases and should result in different reactions.
> 
> I agree; will see about cleaning up hash_search's call convention after
> 6.5 is done.  Actually, maybe I should do it now?  I'm not convinced yet
> whether the reports we're seeing are due to memory clobber or running
> out of space... fixing this may be the easiest way to find out.

Imho, we have to fix it in some way before 6.5
Either by changing dynahash.c (to return 0x1 if table is
corrupted and 0x0 if out of space) or by changing
elog(NOTICE) to elog(ERROR).

> 
> > #define NLOCKS_PER_XACT         40
> >                                 ^^
> > Isn't it too low?
> 
> You tell me ... that was the number that was in the 6.4 code, but I
> have no idea if it's right or not.  (Does MVCC require more locks
> than the old stuff?)  What is a good upper bound on the number
> of concurrently existing locks?

Probably yes, because of writers can continue to work and lock
other tables instead of sleeping of first lock due to concurrent
select. I'll change it to 64, but this should be configurable
thing.

> 
> >     /* xidHash table */
> >     size += hash_estimate_size(maxBackends,
> >                                ^^^^^^^^^^^
> >                                SHMEM_XIDTAB_KEYSIZE,
> >                                SHMEM_XIDTAB_DATASIZE);
> 
> > Why just maxBackends is here? NLOCKENTS should be used too
> > (each transaction lock requieres own xidhash entry).
> 
> Should it be NLOCKENTS(maxBackends) xid entries, or do you mean
> NLOCKENTS(maxBackends) + maxBackends?  Feel free to stick in any
> estimates that you like better --- what's there now is an interpretation
> of what the 6.4 code was trying to do (but it was sufficiently buggy and
> unreadable that it was probably coming out with different numbers in
> the end...)

Just NLOCKENTS(maxBackends) - I'll change it now.

Vadim


Re: [HACKERS] Open 6.5 items

From
"D'Arcy" "J.M." Cain
Date:
Thus spake Tom Lane
> "D'Arcy" "J.M." Cain <darcy@druid.net> writes:
> > but we have to decide  which of the following is true.
> 
> >     198.68.123.0/24 < 198.68.123.0/27
> >     198.68.123.0/24 > 198.68.123.0/27
> 
> I'd say the former, on the same principle that 'abc' < 'abcd'.

And, in fact, that's what happens if you use the operators.  The only
place they are equal is when sorting them so they can't be used as
primary keys.  I guess there is no argument about the sorting order
if we think they should be sorted.  There is still the question of
whether or not they should be sorted.  There seems to be tacit sgreement
but could we have a little more discussion.  The question is, when inet
or cidr is used as the primary key on a table, should they be considered
equal.  In fact, think about the question separately as we may want a
different behaviour for each.  Here is my breakdown of the question.

For inet type, the value specifies primarily, I think, the host but
also carries information about its place on the network.  Given an inet
type you can extract the host, broadcast, netmask and even the cidr
that it is part of.  So, 198.68.123.0/24 and 198.68.123.0/27 really
refer to the same host but on different networks.  Since a host can only
be on one network, there is an argument that they can't both be used
as the primary key in the same table.

A cidr type is primarily a network.  In fact, some valid inet values
aren't even valid cidr.  So, the question is, if one network is part
of another then should it be possible to have both as a primary key?

Of course, both of these beg the real question, should either of these
types be used as a primary key, but that is a database design question.

> Think of the addresses as being bit strings of the specified length,
> and compare them the same way character strings are compared.

Not sure that that clarifies it but we do have the code to order them
in any case.  We just need to decide whether we want to.

> But if Vixie's got a different opinion, I defer to him...

Paul's code orders them without regard to netmask which implies "no"
as the answer to the question but his original code only referred to
what we eventually called the cidr type.  The question would still
be open for the inet type anyway.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 424 2871     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
"D'Arcy" "J.M." Cain <darcy@druid.net> writes:
> And, in fact, that's what happens if you use the operators.  The only
> place they are equal is when sorting them so they can't be used as
> primary keys.

Huh?  Indexes and operators are the same thing --- or more specifically,
indexes rely on operators to compare keys.  I don't see how it's even
*possible* that an index would think that two keys are equal when the
underlying = operator says they are not.

A little experimentation shows that's indeed what's happening, though.
Weird.  Is this a deliberate effect, and if so how did you achieve it?
It looks like what could be a serious bug to me.

> I guess there is no argument about the sorting order
> if we think they should be sorted.  There is still the question of
> whether or not they should be sorted.  There seems to be tacit sgreement
> but could we have a little more discussion.  The question is, when inet
> or cidr is used as the primary key on a table, should they be considered
> equal.  In fact, think about the question separately as we may want a
> different behaviour for each.

I'd argue that plain indexing ought not try to do anything especially
subtle --- in particular it ought not vary from the behavior of the
comparison operators for the type.  If someone wants a table wherein you
can't enter two spellings of the same hostname, the right way would be
to construct a unique functional index using a function that reduces the
INET type into the simpler form.  A good analogy might be a text field
where you don't want any two entries to be equal on a case-insensitive
basis.  You don't up and change the behavior of indexing to be
case-insensitive, you sayCREATE UNIQUE INDEX foo_f1_key ON foo (lower(f1) text_ops);
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
I wrote:
> A little experimentation shows that's indeed what's happening, though.
> Weird.  Is this a deliberate effect, and if so how did you achieve it?

Oh, I see it: the network_cmp function is deliberately inconsistent with
the regular comparison functions on network values.

This is *very bad*.  Indexes depend on both the operators and the cmp
support function.  You cannot have inconsistent behavior between these
functions, or indexing will misbehave.  Do I need to gin up an example
where it fails?
        regards, tom lane


RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:
Hello all,

> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Vadim Mikheev
> Sent: Saturday, May 29, 1999 2:51 PM
> To: Tom Lane
> Cc: t-ishii@sra.co.jp; PostgreSQL-development
> Subject: Re: [HACKERS] Open 6.5 items
>
>
> Tom Lane wrote:
> >
> > Vadim Mikheev <vadim@krs.ru> writes:
> > >> If I recall the dynahash.c code correctly, a null return value
> > >> indicates either damage to the structure of the table (ie someone
> > >> stomped on memory that didn't belong to them) or running out
> of memory
> > >> to add entries to the table.  The latter should be impossible if we
> >
> > > Quite different cases and should result in different reactions.
> >
> > I agree; will see about cleaning up hash_search's call convention after
> > 6.5 is done.  Actually, maybe I should do it now?  I'm not convinced yet
> > whether the reports we're seeing are due to memory clobber or running
> > out of space... fixing this may be the easiest way to find out.
>
> Imho, we have to fix it in some way before 6.5
> Either by changing dynahash.c (to return 0x1 if table is
> corrupted and 0x0 if out of space) or by changing
> elog(NOTICE) to elog(ERROR).
>

Another case exists which causes stuck spinlock abort.
               status = WaitOnLock(lockmethod, lock, lockmode);
               /*                * Check the xid entry status, in case something in the ipc                *
communicationdoesn't work correctly.                */               if (!((result->nHolding > 0) &&
(result->holders[lockmode]>
 
0))
)               {                       XID_PRINT_AUX("LockAcquire: INCONSISTENT ", result);
LOCK_PRINT_AUX("LockAcquire:INCONSISTENT ", lock,
 
lockm
ode);                       /* Should we retry ? */                       return FALSE;

This case returns without releasing LockMgrLock and doesn't call even
elog().
As far as I see,different entries in xidHash have a same key when above
case occurs.  Moreover xidHash has been in abnormal state since the
number of xidHash entries exceeded 256.

Is this bug solved by change maxBackends->NLOCKENTS(maxBackends)
by Vadim or the change about hash by Tom ?


As for my test case,xidHash is filled with XactLockTable entries which have
been acquired by XactLockTableWait().
Could those entries be released immediately after they are acquired ?

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp



RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:

> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian
> Sent: Friday, May 28, 1999 1:58 PM
> To: PostgreSQL-development
> Subject: [HACKERS] Open 6.5 items
>
>
> SELECT * FROM test WHERE test IN (SELECT * FROM test) fails with
> strange error
> When creating a table with either type inet or type cidr as a
> primary,unique
>    key, the "198.68.123.0/24" and "198.68.123.0/27" are considered equal
> Fix function pointer calls to take Datum args for char and int2 args(ecgs)
> Regression test for new Numeric type
> Large Object memory problems
> refint problems
> invalidate cache on aborted transaction
> spinlock stuck problem
> benchmark performance problem
>
> Make psql \help, man pages, and sgml reflect changes in grammar
> Markup sql.sgml, Stefan's intro to SQL
> Generate Admin, User, Programmer hardcopy postscript
> Generate INSTALL and HISTORY from sgml sources.
> Update ref/lock.sgml, ref/set.sgml to reflect MVCC and locking changes.
>

What about mdtruncate() for multi-segments relation ?
AFAIK,it has not been solved yet.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Hiroshi Inoue wrote:
> 
> As far as I see,different entries in xidHash have a same key when above
> case occurs.  Moreover xidHash has been in abnormal state since the
> number of xidHash entries exceeded 256.
> 
> Is this bug solved by change maxBackends->NLOCKENTS(maxBackends)
> by Vadim or the change about hash by Tom ?

Should be fixed now.

> 
> As for my test case,xidHash is filled with XactLockTable entries which have
> been acquired by XactLockTableWait().
> Could those entries be released immediately after they are acquired ?

Ops. Thanks! Must be released. 

Vadim


Re: [HACKERS] Open 6.5 items

From
Bruce Momjian
Date:
> > Make psql \help, man pages, and sgml reflect changes in grammar
> > Markup sql.sgml, Stefan's intro to SQL
> > Generate Admin, User, Programmer hardcopy postscript
> > Generate INSTALL and HISTORY from sgml sources.
> > Update ref/lock.sgml, ref/set.sgml to reflect MVCC and locking changes.
> >
> 
> What about mdtruncate() for multi-segments relation ?
> AFAIK,it has not been solved yet.
> 

I thought we decided that file descriptors are kept by backends, and are
still accessable while new backends don't see the files.  Correct?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
> Sent: Monday, May 31, 1999 11:15 AM
> To: Hiroshi Inoue
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] Open 6.5 items
> 
> 
> > > Make psql \help, man pages, and sgml reflect changes in grammar
> > > Markup sql.sgml, Stefan's intro to SQL
> > > Generate Admin, User, Programmer hardcopy postscript
> > > Generate INSTALL and HISTORY from sgml sources.
> > > Update ref/lock.sgml, ref/set.sgml to reflect MVCC and 
> locking changes.
> > >
> > 
> > What about mdtruncate() for multi-segments relation ?
> > AFAIK,it has not been solved yet.
> > 
> 
> I thought we decided that file descriptors are kept by backends, and are
> still accessable while new backends don't see the files.  Correct?
>

Yes,other backends could write to unliked files which would be 
vanished before long.
I think it's more secure to truncate useless segments to size 0 
than unlinking the segments though vacuum would never remove 
useless segments.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] Open 6.5 items

From
Bruce Momjian
Date:
> > I thought we decided that file descriptors are kept by backends, and are
> > still accessable while new backends don't see the files.  Correct?
> >
> 
> Yes,other backends could write to unliked files which would be 
> vanished before long.
> I think it's more secure to truncate useless segments to size 0 
> than unlinking the segments though vacuum would never remove 
> useless segments.

If you truncate, other backends will see the data gone, and will be
writing into the middle of an empty file.  Better to remove.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:
> 
> > > I thought we decided that file descriptors are kept by 
> backends, and are
> > > still accessable while new backends don't see the files.  Correct?
> > >
> > 
> > Yes,other backends could write to unliked files which would be 
> > vanished before long.
> > I think it's more secure to truncate useless segments to size 0 
> > than unlinking the segments though vacuum would never remove 
> > useless segments.
> 
> If you truncate, other backends will see the data gone, and will be
> writing into the middle of an empty file.  Better to remove.
>

I couldn't explain more because of my poor English,sorry.

But my test case usually causes backend abort.
My test case isWhile 1 or more sessions frequently insert/update a table,vacuum the table.

After vacuum, those sessions abort with message ERROR:  cannot open segment .. of relation ...

This ERROR finally causes spinlock freeze as I reported in a posting
[HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another 
example)). 

Comments ?

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] Open 6.5 items

From
Bruce Momjian
Date:
> I couldn't explain more because of my poor English,sorry.
> 
> But my test case usually causes backend abort.
> My test case is
>     While 1 or more sessions frequently insert/update a table,
>     vacuum the table.
> 
> After vacuum, those sessions abort with message 
>     ERROR:  cannot open segment .. of relation ...
> 
> This ERROR finally causes spinlock freeze as I reported in a posting
> [HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another 
> example)). 
> 
> Comments ?

OK, I buy that.  How will truncate fix things?  Isn't that going to be
strange too.  Hard to imagine how we are going to modify these things. 
I am now leaning to the truncate option, especially considering that
usually only the last segment is going to be truncated.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Open 6.5 items

From
Tatsuo Ishii
Date:
>Tom Lane wrote:
>> 
>> Vadim Mikheev <vadim@krs.ru> writes:
>> >> If I recall the dynahash.c code correctly, a null return value
>> >> indicates either damage to the structure of the table (ie someone
>> >> stomped on memory that didn't belong to them) or running out of memory
>> >> to add entries to the table.  The latter should be impossible if we
>> 
>> > Quite different cases and should result in different reactions.
>> 
>> I agree; will see about cleaning up hash_search's call convention after
>> 6.5 is done.  Actually, maybe I should do it now?  I'm not convinced yet
>> whether the reports we're seeing are due to memory clobber or running
>> out of space... fixing this may be the easiest way to find out.
>
>Imho, we have to fix it in some way before 6.5
>Either by changing dynahash.c (to return 0x1 if table is
>corrupted and 0x0 if out of space) or by changing
>elog(NOTICE) to elog(ERROR).
>
>> 
>> > #define NLOCKS_PER_XACT         40
>> >                                 ^^
>> > Isn't it too low?
>> 
>> You tell me ... that was the number that was in the 6.4 code, but I
>> have no idea if it's right or not.  (Does MVCC require more locks
>> than the old stuff?)  What is a good upper bound on the number
>> of concurrently existing locks?
>
>Probably yes, because of writers can continue to work and lock
>other tables instead of sleeping of first lock due to concurrent
>select. I'll change it to 64, but this should be configurable
>thing.
>
>> 
>> >     /* xidHash table */
>> >     size += hash_estimate_size(maxBackends,
>> >                                ^^^^^^^^^^^
>> >                                SHMEM_XIDTAB_KEYSIZE,
>> >                                SHMEM_XIDTAB_DATASIZE);
>> 
>> > Why just maxBackends is here? NLOCKENTS should be used too
>> > (each transaction lock requieres own xidhash entry).
>> 
>> Should it be NLOCKENTS(maxBackends) xid entries, or do you mean
>> NLOCKENTS(maxBackends) + maxBackends?  Feel free to stick in any
>> estimates that you like better --- what's there now is an interpretation
>> of what the 6.4 code was trying to do (but it was sufficiently buggy and
>> unreadable that it was probably coming out with different numbers in
>> the end...)
>
>Just NLOCKENTS(maxBackends) - I'll change it now.

I have just done cvs update and saw your changes. I tried the same
testing as I did before (64 conccurrent connections, and each
connection excutes 100 transactions), but it failed again.

(1) without -B 1024, it failed: out of free buffers: time to abort!

(2) with -B 1024, it went into stuck spin lock

So I looked into sources a little bit, and made a minor change to
include/storage/lock.h:

#define INIT_TABLE_SIZE                 100

to:

#define INIT_TABLE_SIZE                 4096

then restarted postmaster with -B 1024 (this will prevent
out-of-free-buffers problem, I guess). Now everything seems to work
great!

I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
hash tables and seems there's something wrong in the routines
responsible for that.

Comments?
--
Tatsuo Ishii



Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Tatsuo Ishii wrote:
> 
> I have just done cvs update and saw your changes. I tried the same
> testing as I did before (64 conccurrent connections, and each
> connection excutes 100 transactions), but it failed again.
> 
> (1) without -B 1024, it failed: out of free buffers: time to abort!
> 
> (2) with -B 1024, it went into stuck spin lock
> 
> So I looked into sources a little bit, and made a minor change to
> include/storage/lock.h:
> 
> #define INIT_TABLE_SIZE                 100
> 
> to:
> 
> #define INIT_TABLE_SIZE                 4096
> 
> then restarted postmaster with -B 1024 (this will prevent
> out-of-free-buffers problem, I guess). Now everything seems to work
> great!
> 
> I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
> hash tables and seems there's something wrong in the routines
> responsible for that.

Seems like that -:(

If we'll not fix expand hash code before 6.5 release then
I would recommend to don't use INIT_TABLE_SIZE in
   lockMethodTable->lockHash = (HTAB *) ShmemInitHash(shmemName,
INIT_TABLE_SIZE,MAX_TABLE_SIZE,                                                      &info, hash_flags);
 

and
   lockMethodTable->xidHash = (HTAB *) ShmemInitHash(shmemName,                                        INIT_TABLE_SIZE,
MAX_TABLE_SIZE,                                                    &info, hash_flags);
 

but use NLOCKENTS(maxBackends) instead.

Vadim


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
>> As for my test case,xidHash is filled with XactLockTable entries which have
>> been acquired by XactLockTableWait().
>> Could those entries be released immediately after they are acquired ?

> Ops. Thanks! Must be released. 

Does this account for the "ShmemAlloc: out of memory" errors we've been
seeing?  I spent a good deal of time yesterday grovelling through all
the calls to ShmemAlloc, and concluded that (unless there is a memory
stomp somewhere) it has to be caused by one of the shared hashtables
growing well beyond its size estimate.

I did find that the PROC structures are not counted in the initial
sizing of the shared memory block.  This is no problem at the default
limit of 32 backends, but could get to be an issue for hundreds of
backends.  I will add that item to the size estimate today.
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
>> I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
>> hash tables and seems there's something wrong in the routines
>> responsible for that.

> Seems like that -:(

OK, as the last guy to touch dynahash.c I suppose this is my
bailiwick... I will look into it today.

> If we'll not fix expand hash code before 6.5 release then
> I would recommend to don't use INIT_TABLE_SIZE in

If we can't figure out what's really wrong then that might be
a good kluge solution to get 6.5 out the door.  But I'd rather
try to fix it right first.

Can anyone send me a test case script that reproduces the problem?
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
I wrote:
>>> I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
>>> hash tables and seems there's something wrong in the routines
>>> responsible for that.

> OK, as the last guy to touch dynahash.c I suppose this is my
> bailiwick... I will look into it today.

It's amazing how much easier it is to see a bug when you know it must be
there ;-).

I discovered that the hashtable expansion routine would mess up in the
case that all the records in the bucket being split ended up in the new
bucket rather than the old.  In that case it forgot to clear the old
bucket's chain header, with the result that all the records appeared to
be in both buckets at once.  This would not be a big problem until and
unless the first record in the chain got deleted --- it would only be
correctly removed from the new bucket, leaving the old bucket's chain
header pointing at a now-free record (and failing to link to any records
that had been added to the shared chain on its behalf in the meanwhile).
Disaster ensues.

An actual failure via this path seems somewhat improbable, but that
may just explain why we hadn't seen it happen very much before...

I have committed a fix in dynahash.c.  Hiroshi and Tatsuo, would you
please grab latest sources and see whether the problems you are
observing are fixed?
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I have just done cvs update and saw your changes. I tried the same
> testing as I did before (64 conccurrent connections, and each
> connection excutes 100 transactions), but it failed again.
>
> (1) without -B 1024, it failed: out of free buffers: time to abort!

Right now, the postmaster will let you set any combination of -B and -N
you please.  But it seems obvious that there is some minimum number of
buffers per backend below which things aren't going to work very well.
I wonder whether the postmaster startup code ought to enforce a minimum
ratio, say -B at least twice -N ?  I have no idea what an appropriate
limit would be, however.  Vadim, do you have any thoughts?
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
I said:
> I did find that the PROC structures are not counted in the initial
> sizing of the shared memory block.

Er ... yes they are ... never mind ...

At this point I am not going to make any further changes to the
shared-mem code unless Hiroshi and Tatsuo report that there are still
problems.

I would still like to alter the calling conventions for hash_search
and hash_seq, which are ugly and also dependent on static state
variables.  But as far as I can tell right now, those are just code
beautification items rather than repairs for currently-existing bugs.
So it seems best not to risk breaking anything with so little testing
time left in the 6.5 cycle.  I will put them on my to-do-for-6.6 list
instead.
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Vadim Mikheev <vadim@krs.ru> writes:
> >> As for my test case,xidHash is filled with XactLockTable entries which have
> >> been acquired by XactLockTableWait().
> >> Could those entries be released immediately after they are acquired ?
> 
> > Ops. Thanks! Must be released.
> 
> Does this account for the "ShmemAlloc: out of memory" errors we've been
> seeing?  I spent a good deal of time yesterday grovelling through all

Yes.

Vadim


Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I have just done cvs update and saw your changes. I tried the same
> > testing as I did before (64 conccurrent connections, and each
> > connection excutes 100 transactions), but it failed again.
> >
> > (1) without -B 1024, it failed: out of free buffers: time to abort!
> 
> Right now, the postmaster will let you set any combination of -B and -N
> you please.  But it seems obvious that there is some minimum number of
> buffers per backend below which things aren't going to work very well.
> I wonder whether the postmaster startup code ought to enforce a minimum
> ratio, say -B at least twice -N ?  I have no idea what an appropriate               ^^^^^^^^^^^^^^
It's enough for select from single table using index, so it's
probably good ratio.

> limit would be, however.  Vadim, do you have any thoughts?

Vadim


Re: [HACKERS] Open 6.5 items

From
Tatsuo Ishii
Date:
Tom,

>It's amazing how much easier it is to see a bug when you know it must be
>there ;-).
>
>I discovered that the hashtable expansion routine would mess up in the
>case that all the records in the bucket being split ended up in the new
>bucket rather than the old.  In that case it forgot to clear the old
>bucket's chain header, with the result that all the records appeared to
>be in both buckets at once.  This would not be a big problem until and
>unless the first record in the chain got deleted --- it would only be
>correctly removed from the new bucket, leaving the old bucket's chain
>header pointing at a now-free record (and failing to link to any records
>that had been added to the shared chain on its behalf in the meanwhile).
>Disaster ensues.
>
>An actual failure via this path seems somewhat improbable, but that
>may just explain why we hadn't seen it happen very much before...
>
>I have committed a fix in dynahash.c.  Hiroshi and Tatsuo, would you
>please grab latest sources and see whether the problems you are
>observing are fixed?

Bingo! Your fix seems to solve the problem! Now 64 concurrent
transactions ran 100 transactions each without any problem. Thanks.

BTW, the script I'm using for the heavy load testing is written in
Java(not written by me). Do you want to try it?

>> (1) without -B 1024, it failed: out of free buffers: time to abort!
>
>Right now, the postmaster will let you set any combination of -B and -N
>you please.  But it seems obvious that there is some minimum number of
>buffers per backend below which things aren't going to work very well.
>I wonder whether the postmaster startup code ought to enforce a minimum
>ratio, say -B at least twice -N ?  I have no idea what an appropriate
>limit would be, however.  Vadim, do you have any thoughts?

Just a few questions.
o I observed the backend processes grew ~10MB with -B 1024. Is this
normal?

o Is it possible to let the backend wait for free buffers in case of
insufficient shared buffers? (with reasonable retries, of course)
--
Tatsuo Ishii



Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Tatsuo Ishii wrote:
> 
> Just a few questions.
> 
>  o I observed the backend processes grew ~10MB with -B 1024. Is this
> normal?

Backend attaches to 1024*8K + other shmem, so probably
ps takes it into account.

> o Is it possible to let the backend wait for free buffers in case of
> insufficient shared buffers? (with reasonable retries, of course)

Yes, but not now.

Vadim


RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, June 01, 1999 2:14 AM
> To: Hiroshi Inoue; t-ishii@sra.co.jp
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] Open 6.5 items 
> 
> 
> I wrote:
> >>> I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
> >>> hash tables and seems there's something wrong in the routines
> >>> responsible for that.
> 
> > OK, as the last guy to touch dynahash.c I suppose this is my
> > bailiwick... I will look into it today.
> 
> It's amazing how much easier it is to see a bug when you know it must be
> there ;-).
>

[snip] 

> 
> I have committed a fix in dynahash.c.  Hiroshi and Tatsuo, would you
> please grab latest sources and see whether the problems you are
> observing are fixed?
>

It works fine.
The number of xidHash entry exceeded 600 but spinlock error didn't 
occur.

However,when I did vacuum while testing I got the following error 
message.ERROR:  Child itemid marked as unused

TransactionId-s of tuples in update chain may be out of order.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp


RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Bruce Momjian
> Sent: Monday, May 31, 1999 3:15 PM
> To: Hiroshi Inoue
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] Open 6.5 items
> 
> 
> > I couldn't explain more because of my poor English,sorry.
> > 
> > But my test case usually causes backend abort.
> > My test case is
> >     While 1 or more sessions frequently insert/update a table,
> >     vacuum the table.
> > 
> > After vacuum, those sessions abort with message 
> >     ERROR:  cannot open segment .. of relation ...
> > 
> > This ERROR finally causes spinlock freeze as I reported in a posting
> > [HACKERS] spinlock freeze ?(Re: INSERT/UPDATE waiting (another 
> > example)). 
> > 
> > Comments ?
> 
> OK, I buy that.  How will truncate fix things?  Isn't that going to be
> strange too.  Hard to imagine how we are going to modify these things. 
> I am now leaning to the truncate option, especially considering that
> usually only the last segment is going to be truncated.
>

I made a patch on trial.

1)  Useless segments are never removed by my implementation     because I call FileTruncate() instead of
File(Name)Unlink().
2)  mdfd_lstbcnt member of MdfdVec was abused in mdnblocks().    I am maintaining the value of mdfd_lstbcnt
unwillingly.   Is it preferable to get rid of mdfd_lstbcnt completely ?
 

I'm not sure that this patch has no problem.
Please check and comment on my patch.

I don't have > 1G disk space.
Could someone run under > 1G environment ?
As Ole Gjerde mentioned,current implementation by his old 
patch is not right.  His new patch seems right if vacuum is 
executed alone.
Please run vacuum while other concurrent sessions are 
reading or writing,if you would see the difference.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp

*** storage/smgr/md.c.orig    Wed May 26 16:05:02 1999
--- storage/smgr/md.c    Wed Jun  2 15:35:35 1999
***************
*** 674,684 ****     segno = 0;     for (;;)     {
!         if (v->mdfd_lstbcnt == RELSEG_SIZE
!             || (nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ)) == RELSEG_SIZE)         {
- 
-             v->mdfd_lstbcnt = RELSEG_SIZE;             segno++;              if (v->mdfd_chain == (MdfdVec *) NULL)
--- 674,685 ----     segno = 0;     for (;;)     {
!         nblocks = _mdnblocks(v->mdfd_vfd, BLCKSZ);
!         if (nblocks > RELSEG_SIZE)
!             elog(FATAL, "segment too big in mdnblocks!");
!         v->mdfd_lstbcnt = nblocks;
!         if (nblocks == RELSEG_SIZE)         {             segno++;              if (v->mdfd_chain == (MdfdVec *)
NULL)
***************
*** 714,745 ****     int            curnblk,                 i,                 oldsegno,
!                 newsegno;
!     char        fname[NAMEDATALEN];
!     char        tname[NAMEDATALEN + 10];      curnblk = mdnblocks(reln);     oldsegno = curnblk / RELSEG_SIZE;
newsegno= nblocks / RELSEG_SIZE; 
 
-     StrNCpy(fname, RelationGetRelationName(reln)->data, NAMEDATALEN);
- 
-     if (newsegno < oldsegno)
-     {
-         for (i = (newsegno + 1);; i++)
-         {
-             sprintf(tname, "%s.%d", fname, i);
-             if (FileNameUnlink(tname) < 0)
-                 break;
-         }
-     } #endif      fd = RelationGetFile(reln);     v = &Md_fdvec[fd];      if (FileTruncate(v->mdfd_vfd, nblocks *
BLCKSZ)< 0)         return -1;      return nblocks; 
 
--- 715,766 ----     int            curnblk,                 i,                 oldsegno,
!                 newsegno,
!                 lastsegblocks;
!     MdfdVec            **varray;      curnblk = mdnblocks(reln);
+     if (nblocks > curnblk)
+         return -1;     oldsegno = curnblk / RELSEG_SIZE;     newsegno = nblocks / RELSEG_SIZE;  #endif      fd =
RelationGetFile(reln);    v = &Md_fdvec[fd]; 
 
+ #ifndef LET_OS_MANAGE_FILESIZE
+     varray = (MdfdVec **)palloc((oldsegno + 1) * sizeof(MdfdVec *));
+     for (i = 0; i <= oldsegno; i++)
+     {
+         if (!v)
+             elog(ERROR,"segment isn't open in mdtruncate!");
+         varray[i] = v;
+         v = v->mdfd_chain;
+     }
+     for (i = oldsegno; i > newsegno; i--)
+     {
+         v = varray[i];
+         if (FileTruncate(v->mdfd_vfd, 0) < 0)
+         {
+             pfree(varray);
+             return -1;
+         }
+         v->mdfd_lstbcnt = 0;
+     }
+     /* Calculate the # of blocks in the last segment */
+     lastsegblocks = nblocks - (newsegno * RELSEG_SIZE);
+     v = varray[newsegno];
+     pfree(varray);
+     if (FileTruncate(v->mdfd_vfd, lastsegblocks * BLCKSZ) < 0)
+         return -1;
+     v->mdfd_lstbcnt = lastsegblocks;
+ #else     if (FileTruncate(v->mdfd_vfd, nblocks * BLCKSZ) < 0)         return -1;
+     v->mdfd_lstbcnt = nblocks;
+ #endif      return nblocks; 




Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Hiroshi Inoue wrote:
> 
> However,when I did vacuum while testing I got the following error
> message.
>         ERROR:  Child itemid marked as unused
> 
> TransactionId-s of tuples in update chain may be out of order.

I see... Need 1-2 days to fix this -:(

Vadim


Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Hiroshi Inoue wrote:
> 
> However,when I did vacuum while testing I got the following error
> message.
>         ERROR:  Child itemid marked as unused
> 
> TransactionId-s of tuples in update chain may be out of order.

1. Fix and explanation in xact.c:CommitTransaction():
    RecordTransactionCommit();
   /*    * Let others know about no transaction in progress by me.    * Note that this must be done _before_ releasing
lockswe hold    * and SpinAcquire(ShmemIndexLock) is required - or bad (too high)    * XmaxRecent value might be used
byvacuum: UPDATE with xid 0 is    * blocked by xid 1' UPDATE, xid 1 is doing commit while xid 2    * gets snapshot - if
xid2' GetSnapshotData sees xid 1 as running    * then it must see xid 0 as running as well or XmaxRecent = 1    * might
beused by concurrent vacuum causing    *         ERROR:  Child itemid marked as unused    * This bug was reported by
HiroshiInoue and I was able to reproduce    * it with 3 sessions and gdb.  - vadim 06/03/99    */   if (MyProc != (PROC
*)NULL)   {       SpinAcquire(ShmemIndexLock);       MyProc->xid = InvalidTransactionId;       MyProc->xmin =
InvalidTransactionId;      SpinRelease(ShmemIndexLock);   }
 

2. It was possible to get two versions of the same row from  select. Fixed by moving MyProc->xid assignment from
StartTransaction()inside GetNewTransactionId().
 

Thanks, Hiroshi! And please run your tests - I used just 
3 sessions and gdb.

Vadim


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
>> ERROR:  Child itemid marked as unused
> [ is fixed ]

Great!  Vadim (also Hiroshi and Tatsuo), how many bugs remain on your
must-fix-for-6.5 lists?  I was just wondering over in the "Freezing
docs" thread whether we had any problems severe enough to justify
delaying the release.  It sounds like at least one such problem is
gone...
        regards, tom lane


Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Vadim Mikheev <vadim@krs.ru> writes:
> >> ERROR:  Child itemid marked as unused
> > [ is fixed ]
> 
> Great!  Vadim (also Hiroshi and Tatsuo), how many bugs remain on your
> must-fix-for-6.5 lists?  I was just wondering over in the "Freezing
> docs" thread whether we had any problems severe enough to justify
> delaying the release.  It sounds like at least one such problem is
> gone...

No one in mine.

There are still some bad things, but they are old:

1. elog(NOTICE) in lock manager when locking was not succeeded.  Hope that our recent changes will reduce possibility
ofthis.
 

Hiroshi wrote:

2. 
>     spinlock io_in_progress_lock of a buffer page is not
>     released by operations called by elog() such as
>     ProcReleaseSpins(),ResetBufferPool() etc.

I tried to fix this before 6.4 but without success (don't
remember why).

3.
> It seems elog(FATAL) doesn't release allocated buffer pages.
> It's OK ?
> AFAIC elog(FATAL) causes proc_exit(0) and proc_exit() doesn't
> call ResetBufferPool().

Seems to me that elog(FATAL) should call siglongjmp(Warn_restart, 1),
like elog(ERROR), but force exit in tcop main loop after
AbortCurrentTransaction(). AbortTransaction() does pretty nice
things like RelationPurgeLocalRelation(false) and DestroyNoNameRels()...

Vadim


Error exits (Re: [HACKERS] Open 6.5 items)

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
>> It seems elog(FATAL) doesn't release allocated buffer pages.
>> It's OK ?
>> AFAIC elog(FATAL) causes proc_exit(0) and proc_exit() doesn't
>> call ResetBufferPool().

> Seems to me that elog(FATAL) should call siglongjmp(Warn_restart, 1),
> like elog(ERROR), but force exit in tcop main loop after
> AbortCurrentTransaction(). AbortTransaction() does pretty nice
> things like RelationPurgeLocalRelation(false) and DestroyNoNameRels()...

Seems reasonable to me.  It seems to me that elog(FATAL) means "this
backend is too messed up to continue, but I think the rest of the
backends can keep going."  So we need to clean up our allocated
resources before quitting.  abort() is for the cases where we think
shared memory may be corrupted and everyone must bail out.  We might
need to revisit the uses of each routine and make sure that different
error conditions are properly classified.

Of course, if things *are* messed up then trying to AbortTransaction
might make it worse.  How bad is that risk?
        regards, tom lane


Re: Error exits (Re: [HACKERS] Open 6.5 items)

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> Vadim Mikheev <vadim@krs.ru> writes:
> >> It seems elog(FATAL) doesn't release allocated buffer pages.
> >> It's OK ?
> >> AFAIC elog(FATAL) causes proc_exit(0) and proc_exit() doesn't
> >> call ResetBufferPool().
> 
> > Seems to me that elog(FATAL) should call siglongjmp(Warn_restart, 1),
> > like elog(ERROR), but force exit in tcop main loop after
> > AbortCurrentTransaction(). AbortTransaction() does pretty nice
> > things like RelationPurgeLocalRelation(false) and DestroyNoNameRels()...
> 
> Seems reasonable to me.  It seems to me that elog(FATAL) means "this
> backend is too messed up to continue, but I think the rest of the
> backends can keep going."  So we need to clean up our allocated
> resources before quitting.  abort() is for the cases where we think
> shared memory may be corrupted and everyone must bail out.  We might
> need to revisit the uses of each routine and make sure that different
> error conditions are properly classified.
> 
> Of course, if things *are* messed up then trying to AbortTransaction
> might make it worse.  How bad is that risk?

Don't know. I think that we have no time to fix this in 6.5, -:(

Vadim


RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim
> Mikheev
> Sent: Thursday, June 03, 1999 10:45 PM
> To: Hiroshi Inoue
> Cc: Tom Lane; t-ishii@sra.co.jp; PostgreSQL-development
> Subject: Re: [HACKERS] Open 6.5 items
> 
> 
> Hiroshi Inoue wrote:
> > 
> > However,when I did vacuum while testing I got the following error
> > message.
> >         ERROR:  Child itemid marked as unused
> > 
> > TransactionId-s of tuples in update chain may be out of order.
> 
> 1. Fix and explanation in xact.c:CommitTransaction():
> 
>      RecordTransactionCommit();
> 
>     /*
>      * Let others know about no transaction in progress by me.
>      * Note that this must be done _before_ releasing locks we hold
>      * and SpinAcquire(ShmemIndexLock) is required - or bad (too high)
>      * XmaxRecent value might be used by vacuum: UPDATE with xid 0 is
>      * blocked by xid 1' UPDATE, xid 1 is doing commit while xid 2
>      * gets snapshot - if xid 2' GetSnapshotData sees xid 1 as running
>      * then it must see xid 0 as running as well or XmaxRecent = 1
>      * might be used by concurrent vacuum causing
>      *         ERROR:  Child itemid marked as unused
>      * This bug was reported by Hiroshi Inoue and I was able to reproduce
>      * it with 3 sessions and gdb.  - vadim 06/03/99
>      */
>     if (MyProc != (PROC *) NULL)
>     {
>         SpinAcquire(ShmemIndexLock);
>         MyProc->xid = InvalidTransactionId;
>         MyProc->xmin = InvalidTransactionId;
>         SpinRelease(ShmemIndexLock);
>     }
> 
> 2. It was possible to get two versions of the same row from
>    select. Fixed by moving MyProc->xid assignment from
>    StartTransaction() inside GetNewTransactionId().
> 
> Thanks, Hiroshi! And please run your tests - I used just 
> 3 sessions and gdb.
>

Unfortunately,the error still occurs(I changed xact.c as above 
by hand OK ?).

It seems there are cases that tuples are updated by older 
transactions than their xmin-s and only some tuples in the middle 
of update chain may be deleted.

I have no idea to fix this now.
It's OK for me to leave this unsolved because those cases would 
rarely occur.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp     



RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim
> Mikheev
> Sent: Friday, June 04, 1999 2:24 AM
> To: Tom Lane
> Cc: Hiroshi Inoue; t-ishii@sra.co.jp; PostgreSQL-development
> Subject: Re: [HACKERS] Open 6.5 items
> 
> Tom Lane wrote:
> > 
> > Vadim Mikheev <vadim@krs.ru> writes:
> > >> ERROR:  Child itemid marked as unused
> > > [ is fixed ]
> > 
> > Great!  Vadim (also Hiroshi and Tatsuo), how many bugs remain on your
> > must-fix-for-6.5 lists?  I was just wondering over in the "Freezing
> > docs" thread whether we had any problems severe enough to justify
> > delaying the release.  It sounds like at least one such problem is
> > gone...
> 
> No one in mine.
>
[snip]

> 
> Hiroshi wrote:
> 
> 2. 
> >     spinlock io_in_progress_lock of a buffer page is not
> >     released by operations called by elog() such as
> >     ProcReleaseSpins(),ResetBufferPool() etc.
> 
> I tried to fix this before 6.4 but without success (don't
> remember why).
>

This is not in my must-fix-for-6.5 lists. 
For the present this is caused by other bugs not by bufmgr/smgr itself 
and an easy fix may introduce other bugs.


And on segmented relations.

Ole Gjerde who provided the patch for current implementation of 
mdtruncate() sayz.
"First, please reverse my patch to mdtruncate() in md.c as soon aspossible.  It does not work properly in some cases."

I also recommend to reverse his patch to mdtruncate().

Though we could not shrink segmented relations by old implementation 
the result by vacuum would never be inconsistent(?).

I think we don't have enough time to fix this.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] Open 6.5 items

From
Vadim Mikheev
Date:
Hiroshi Inoue wrote:
> 
> >
> > 2. It was possible to get two versions of the same row from
> >    select. Fixed by moving MyProc->xid assignment from
> >    StartTransaction() inside GetNewTransactionId().
> >
> > Thanks, Hiroshi! And please run your tests - I used just
> > 3 sessions and gdb.
> >
> 
> Unfortunately,the error still occurs(I changed xact.c as above
> by hand OK ?).

Did you add 2. changes too?
Also, I made some changes in shmem.c:GetSnapshotData() but seems
that they are not relevant. Could you post me your xact.c and
varsup.c or grab current sources?

> It seems there are cases that tuples are updated by older
> transactions than their xmin-s and only some tuples in the middle
> of update chain may be deleted.
> 
> I have no idea to fix this now.
> It's OK for me to leave this unsolved because those cases would
> rarely occur.

I would like to see it fixed anyway.

Vadim


RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: root@sunpine.krs.ru [mailto:root@sunpine.krs.ru]On Behalf Of Vadim
> Mikheev
> Sent: Friday, June 04, 1999 5:47 PM
> To: Hiroshi Inoue
> Cc: Tom Lane; t-ishii@sra.co.jp; PostgreSQL-development
> Subject: Re: [HACKERS] Open 6.5 items
> 
> 
> Hiroshi Inoue wrote:
> > 
> > >
> > > 2. It was possible to get two versions of the same row from
> > >    select. Fixed by moving MyProc->xid assignment from
> > >    StartTransaction() inside GetNewTransactionId().
> > >
> > > Thanks, Hiroshi! And please run your tests - I used just
> > > 3 sessions and gdb.
> > >
> > 
> > Unfortunately,the error still occurs(I changed xact.c as above
> > by hand OK ?).
> 
> Did you add 2. changes too?
> Also, I made some changes in shmem.c:GetSnapshotData() but seems
> that they are not relevant. Could you post me your xact.c and
> varsup.c or grab current sources?
>

OK I would grab current sources and retry.

Thanks.

Hiroshi Inoue
Inoue@tpf.co.jp 


Re: [HACKERS] Open 6.5 items

From
Bruce Momjian
Date:
> > 
> > Hiroshi wrote:
> > 
> > 2. 
> > >     spinlock io_in_progress_lock of a buffer page is not
> > >     released by operations called by elog() such as
> > >     ProcReleaseSpins(),ResetBufferPool() etc.
> > 
> > I tried to fix this before 6.4 but without success (don't
> > remember why).
> >
> 
> This is not in my must-fix-for-6.5 lists. 
> For the present this is caused by other bugs not by bufmgr/smgr itself 
> and an easy fix may introduce other bugs.
> 
> 
> And on segmented relations.
> 
> Ole Gjerde who provided the patch for current implementation of 
> mdtruncate() sayz.
> "First, please reverse my patch to mdtruncate() in md.c as soon as
>  possible.  It does not work properly in some cases."
> 
> I also recommend to reverse his patch to mdtruncate().
> 
> Though we could not shrink segmented relations by old implementation 
> the result by vacuum would never be inconsistent(?).
> 
> I think we don't have enough time to fix this.

So what do we put in its place when we reverse out the patch?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
Vadim Mikheev <vadim@krs.ru> writes:
> Tom Lane wrote:
>> Right now, the postmaster will let you set any combination of -B and -N
>> you please.  But it seems obvious that there is some minimum number of
>> buffers per backend below which things aren't going to work very well.
>> I wonder whether the postmaster startup code ought to enforce a minimum
>> ratio, say -B at least twice -N ?  I have no idea what an appropriate
>                 ^^^^^^^^^^^^^^
> It's enough for select from single table using index, so it's
> probably good ratio.

I've added a check in postmaster.c to require -B to be at least twice
-N, per this discussion.  It also enforces a minimum -B of 16 no matter
how small -N is.  I pulled that number out of the air --- anyone have an
idea whether some other number would be better?

The current default values of -B 64, -N 32 are not affected.  However,
since the -N default is easily configurable, I wonder whether we should
move the default -B value into config.h as well...
        regards, tom lane


RE: [HACKERS] Open 6.5 items

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
> Sent: Friday, June 04, 1999 11:37 PM
> To: Hiroshi Inoue
> Cc: Vadim Mikheev; Tom Lane; t-ishii@sra.co.jp; PostgreSQL-development
> Subject: Re: [HACKERS] Open 6.5 items
> 
> 
> > > 
> > > Hiroshi wrote:
> > > 
> > 
> > And on segmented relations.
> > 
> > Ole Gjerde who provided the patch for current implementation of 
> > mdtruncate() sayz.
> > "First, please reverse my patch to mdtruncate() in md.c as soon as
> >  possible.  It does not work properly in some cases."
> > 
> > I also recommend to reverse his patch to mdtruncate().
> > 
> > Though we could not shrink segmented relations by old implementation 
> > the result by vacuum would never be inconsistent(?).
> > 
> > I think we don't have enough time to fix this.
> 
> So what do we put in its place when we reverse out the patch?
>

Future TODO items ?

As far as I see,there's no consensus of opinion whether we would 
remove useless segments(I also think it's preferable if possible) or 
we would only truncate the segments(as my trial patch does).

Only Bruce and Ole objected to my opinion and no one agreed 
with me.
How do other people who would use segmented relations think ?   
Thanks.

Hiroshi Inoue
Inoue@tpf.co.p 


Re: [HACKERS] Open 6.5 items

From
Bruce Momjian
Date:
> Future TODO items ?
> 
> As far as I see,there's no consensus of opinion whether we would 
> remove useless segments(I also think it's preferable if possible) or 
> we would only truncate the segments(as my trial patch does).
> 
> Only Bruce and Ole objected to my opinion and no one agreed 
> with me.
> How do other people who would use segmented relations think ?  
>   

I liked unlinking because it allowed old backends to still see the
segments if they still have open file descriptors, and new backends can
see there is no file there.  That seemed nice, but you clearly
demostrated it caused major problems.  Maybe truncation is the answer. 
I don't know, but we need to resolve this for 6.5.  I can't imagine us
focusing on this like we have in the past few weeks.  Let's just figure
out an answer.  I am on IRC now if someone can get on to discuss this. I
will even phone someone in US or Canada to discuss it.

What is it on the backend that causes some backend to think there is
another segment.  Does it just go off the end of the max segment size
and try to open another, or do we store the number of segments
somewhere. I thought it was the former in sgml() area.  I honestly don't
care if the segment files stay around if that is going to be a reliable
solution.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Open 6.5 items

From
The Hermit Hacker
Date:
On Sun, 6 Jun 1999, Bruce Momjian wrote:

> > Future TODO items ?
> > 
> > As far as I see,there's no consensus of opinion whether we would 
> > remove useless segments(I also think it's preferable if possible) or 
> > we would only truncate the segments(as my trial patch does).
> > 
> > Only Bruce and Ole objected to my opinion and no one agreed 
> > with me.
> > How do other people who would use segmented relations think ?  
> >   
> 
> I liked unlinking because it allowed old backends to still see the
> segments if they still have open file descriptors, and new backends can
> see there is no file there.  That seemed nice, but you clearly
> demostrated it caused major problems.  Maybe truncation is the answer. 
> I don't know, but we need to resolve this for 6.5.  I can't imagine us
> focusing on this like we have in the past few weeks.  Let's just figure
> out an answer.  I am on IRC now if someone can get on to discuss this. I
> will even phone someone in US or Canada to discuss it.
> 
> What is it on the backend that causes some backend to think there is
> another segment.  Does it just go off the end of the max segment size
> and try to open another, or do we store the number of segments
> somewhere. I thought it was the former in sgml() area.  I honestly don't
> care if the segment files stay around if that is going to be a reliable
> solution.

Other then the inode being used, what is wrong with a zero-length segment
file?

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Open 6.5 items

From
Bruce Momjian
Date:
> > What is it on the backend that causes some backend to think there is
> > another segment.  Does it just go off the end of the max segment size
> > and try to open another, or do we store the number of segments
> > somewhere. I thought it was the former in sgml() area.  I honestly don't
> > care if the segment files stay around if that is going to be a reliable
> > solution.
> 
> Other then the inode being used, what is wrong with a zero-length segment
> file?

Nothing is wrong with it.  I just thought it would be more reliable to
unlink it, but now am considering I was wrong.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Open 6.5 items

From
The Hermit Hacker
Date:
On Mon, 7 Jun 1999, Bruce Momjian wrote:

> > > What is it on the backend that causes some backend to think there is
> > > another segment.  Does it just go off the end of the max segment size
> > > and try to open another, or do we store the number of segments
> > > somewhere. I thought it was the former in sgml() area.  I honestly don't
> > > care if the segment files stay around if that is going to be a reliable
> > > solution.
> > 
> > Other then the inode being used, what is wrong with a zero-length segment
> > file?
> 
> Nothing is wrong with it.  I just thought it would be more reliable to
> unlink it, but now am considering I was wrong.

Just a thought, but if you left it zero length, the dba could use it as a
means for estimating disk space requirements? :)  buff.0 buff.1 is zero
lenght, but buff.2 isn't, we know that we've filled 2x1gig buffers plus a
little bit, so can allocate space accordingly? :)

I'm groping here, help me out ... :)

Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Open 6.5 items

From
Bruce Momjian
Date:
> > Nothing is wrong with it.  I just thought it would be more reliable to
> > unlink it, but now am considering I was wrong.
> 
> Just a thought, but if you left it zero length, the dba could use it as a
> means for estimating disk space requirements? :)  buff.0 buff.1 is zero
> lenght, but buff.2 isn't, we know that we've filled 2x1gig buffers plus a
> little bit, so can allocate space accordingly? :)
> 
> I'm groping here, help me out ... :)
> 

One reason to do truncate is that if it is a symbolic link to another
driver, that link will stay, while unlink will not, and will recreate on
on the same drive.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


RE: [HACKERS] Open 6.5 items

From
Ole Gjerde
Date:
On Wed, 2 Jun 1999, Hiroshi Inoue wrote:
> I made a patch on trial.
> 1)  Useless segments are never removed by my implementation 
>      because I call FileTruncate() instead of File(Name)Unlink().
> I'm not sure that this patch has no problem.
> Please check and comment on my patch.

I have tried it, and it seems to work.

> As Ole Gjerde mentioned,current implementation by his old 
> patch is not right.  His new patch seems right if vacuum is 
> executed alone.

Yes, my first patch was horribly wrong :)
The second one, as you mention, only works right if no reading or writing
is going on.

I'll talk more about the new patch in later emails.

Ole Gjerde




Re: [HACKERS] Open 6.5 items

From
Ole Gjerde
Date:
On Sun, 6 Jun 1999, Bruce Momjian wrote:

> I liked unlinking because it allowed old backends to still see the
> segments if they still have open file descriptors, and new backends can
> see there is no file there.  That seemed nice, but you clearly
> demostrated it caused major problems.  Maybe truncation is the answer. 
> I don't know, but we need to resolve this for 6.5.  I can't imagine us
> focusing on this like we have in the past few weeks.  Let's just figure
> out an answer.  I am on IRC now if someone can get on to discuss this. I
> will even phone someone in US or Canada to discuss it.

Personally, I think the right thing is to unlink the unused segments.  For
the most part keeping them around is not going to cause any problems, but
I can't really think of any good reasons to keep them around.  Keeping the
database directories clean is a good thing in my opinion.

> What is it on the backend that causes some backend to think there is
> another segment.  Does it just go off the end of the max segment size
> and try to open another, or do we store the number of segments
> somewhere. I thought it was the former in sgml() area.  I honestly don't
> care if the segment files stay around if that is going to be a reliable
> solution.

The new patch from Hiroshi Inoue <Inoue@tpf.co.jp> works.  I believe it is
a reliable solution, I just don't agree it's the right one.  That is
probably just a matter of opinion however.  As his patch doesn't have any
immediate problems, so I vote for that to be included in 6.5.

Thanks,
Ole Gjerde






Re: [HACKERS] Open 6.5 items

From
Bruce Momjian
Date:

Folks, do we have anything to revisit here?


> Tatsuo Ishii wrote:
> > 
> > I have just done cvs update and saw your changes. I tried the same
> > testing as I did before (64 conccurrent connections, and each
> > connection excutes 100 transactions), but it failed again.
> > 
> > (1) without -B 1024, it failed: out of free buffers: time to abort!
> > 
> > (2) with -B 1024, it went into stuck spin lock
> > 
> > So I looked into sources a little bit, and made a minor change to
> > include/storage/lock.h:
> > 
> > #define INIT_TABLE_SIZE                 100
> > 
> > to:
> > 
> > #define INIT_TABLE_SIZE                 4096
> > 
> > then restarted postmaster with -B 1024 (this will prevent
> > out-of-free-buffers problem, I guess). Now everything seems to work
> > great!
> > 
> > I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
> > hash tables and seems there's something wrong in the routines
> > responsible for that.
> 
> Seems like that -:(
> 
> If we'll not fix expand hash code before 6.5 release then
> I would recommend to don't use INIT_TABLE_SIZE in
> 
>     lockMethodTable->lockHash = (HTAB *) ShmemInitHash(shmemName,
>                                          INIT_TABLE_SIZE, MAX_TABLE_SIZE,
>                                                        &info, hash_flags);
> 
> and
> 
>     lockMethodTable->xidHash = (HTAB *) ShmemInitHash(shmemName,
>                                          INIT_TABLE_SIZE, MAX_TABLE_SIZE,
>                                                       &info, hash_flags);
> 
> but use NLOCKENTS(maxBackends) instead.
> 
> Vadim
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Open 6.5 items

From
Tom Lane
Date:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Folks, do we have anything to revisit here?

I believe it is fixed --- I went back and found some more bugs in
dynahash.c after seeing Tatsuo's report ;-)
        regards, tom lane


>> Tatsuo Ishii wrote:
>>>> 
>>>> I have just done cvs update and saw your changes. I tried the same
>>>> testing as I did before (64 conccurrent connections, and each
>>>> connection excutes 100 transactions), but it failed again.
>>>> 
>>>> (1) without -B 1024, it failed: out of free buffers: time to abort!
>>>> 
>>>> (2) with -B 1024, it went into stuck spin lock
>>>> 
>>>> So I looked into sources a little bit, and made a minor change to
>>>> include/storage/lock.h:
>>>> 
>>>> #define INIT_TABLE_SIZE                 100
>>>> 
>>>> to:
>>>> 
>>>> #define INIT_TABLE_SIZE                 4096
>>>> 
>>>> then restarted postmaster with -B 1024 (this will prevent
>>>> out-of-free-buffers problem, I guess). Now everything seems to work
>>>> great!
>>>> 
>>>> I suspect that huge INIT_TABLE_SIZE prevented dynamic expanding the
>>>> hash tables and seems there's something wrong in the routines
>>>> responsible for that.
>> 
>> Seems like that -:(
>> 
>> If we'll not fix expand hash code before 6.5 release then
>> I would recommend to don't use INIT_TABLE_SIZE in
>> 
lockMethodTable-> lockHash = (HTAB *) ShmemInitHash(shmemName,
>> INIT_TABLE_SIZE, MAX_TABLE_SIZE,
>> &info, hash_flags);
>> 
>> and
>> 
lockMethodTable-> xidHash = (HTAB *) ShmemInitHash(shmemName,
>> INIT_TABLE_SIZE, MAX_TABLE_SIZE,
>> &info, hash_flags);
>> 
>> but use NLOCKENTS(maxBackends) instead.
>> 
>> Vadim