Thread: strange behaviour (bug)

strange behaviour (bug)

From
Kovacs Zoltan
Date:
Hi,

I experience a strange error with 7.0.2. I cannot get any results with
certain queries. For example, a foo table is defined with a few columns,
it has a 

id_string varchar(100) 

column, too. I filling this table, it contains e.g. a row with
'something' in the column id_string. I give the next query:

> select * from foo where id_string = 'something';

I get no result.

> select * from foo where id_string like '%something';

I get the row. Strange. Then, if I try to check the result:

> select substr(id_string,1,1) from foo where id_string like '%something';

now I will get 's' as expected... Dumping the database out and bringing it
back the problem doesn't appear anymore... for a while... I cannot give
an exact report, but usually this bug occurs when I stop the database
and I start it again.

Did anybody experience such a behaviour?

TIA, Zoltan
                        Kov\'acs, Zolt\'an                        kovacsz@pc10.radnoti-szeged.sulinet.hu
       http://www.math.u-szeged.hu/~kovzol                        ftp://pc10.radnoti-szeged.sulinet.hu/home/kovacsz
 



Re: strange behaviour (bug)

From
Tom Lane
Date:
Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes:
> now I will get 's' as expected... Dumping the database out and bringing it
> back the problem doesn't appear anymore... for a while... I cannot give
> an exact report, but usually this bug occurs when I stop the database
> and I start it again.

Hmm.  Is it possible that when you restart the postmaster, you are
accidentally starting it with a different environment --- in particular,
different LOCALE or LC_xxx settings --- than it had before?

If there is an index on id_string then
> select * from foo where id_string = 'something';
would try to use the index, and so could get messed up by a change
in LOCALE; the index would now appear to be out of order according to
the new LOCALE value.

We really ought to fix things so that all the LOCALE settings are saved
by "initdb" and then re-established during postmaster start, rather than
relying on the user always to start the postmaster with the same
environment.  People have been burnt by this before :-(
        regards, tom lane


RE: strange behaviour (bug)

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane
> 
> Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes:
> > now I will get 's' as expected... Dumping the database out and 
> bringing it
> > back the problem doesn't appear anymore... for a while... I cannot give
> > an exact report, but usually this bug occurs when I stop the database
> > and I start it again.
> 
> Hmm.  Is it possible that when you restart the postmaster, you are
> accidentally starting it with a different environment --- in particular,
> different LOCALE or LC_xxx settings --- than it had before?
> 
> If there is an index on id_string then
> > select * from foo where id_string = 'something';
> would try to use the index, and so could get messed up by a change
> in LOCALE; the index would now appear to be out of order according to
> the new LOCALE value.
>

There could be another cause.
If a B-tree page A was splitted to the page A(changed) and a page B but
the transaction was rolled back,the pages A,B would not be written to
disc and the followings could occur for example.
1)  The changed non-leaf page of A and B may be written to disc later.
2)  An index entry may be inserted into the page B and committed later.

I don't know how often those could occur.

Regards.

Hiroshi Inoue


Re: strange behaviour (bug)

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> If a B-tree page A was splitted to the page A(changed) and a page B but
> the transaction was rolled back,the pages A,B would not be written to
> disc and the followings could occur for example.

Yes.  I have been thinking that it's a mistake not to write changed
pages to disk at transaction abort, because that just makes for a longer
window where a system crash might leave you with corrupted indexes.
I don't think fsync is really essential, but leaving the pages unwritten
in shared memory is bad.  (For example, if we next shut down the
postmaster, then the pages will NEVER get written.)

Skipping the update is a bit silly anyway; we aren't really that
concerned about optimizing performance of abort, are we?
        regards, tom lane


RE: strange behaviour (bug)

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > If a B-tree page A was splitted to the page A(changed) and a page B but
> > the transaction was rolled back,the pages A,B would not be written to
> > disc and the followings could occur for example.
> 
> Yes.  I have been thinking that it's a mistake not to write changed
> pages to disk at transaction abort, because that just makes for a longer
> window where a system crash might leave you with corrupted indexes.
> I don't think fsync is really essential, but leaving the pages unwritten
> in shared memory is bad.  (For example, if we next shut down the
> postmaster, then the pages will NEVER get written.)
> 
> Skipping the update is a bit silly anyway; we aren't really that
> concerned about optimizing performance of abort, are we?
>

Probably WAL would solve this phenomenon by rolling
back the content of disc and shared buffer in reality.
However if 7.0.x would be released we had better change 
bufmgr IMHO.

Regards.

Hiroshi Inoue


RE: strange behaviour (bug)

From
"Mikheev, Vadim"
Date:
> Probably WAL would solve this phenomenon by rolling
> back the content of disc and shared buffer in reality.
> However if 7.0.x would be released we had better change 
> bufmgr IMHO.

I'm going to handle btree split but currently there is no way
to rollback it - we unlock splitted pages after parent
is locked and concurrent backend may update one/both of
siblings before we get our locks back.
We have to continue with split or could leave parent unchanged
and handle "my bits moved..." (ie continue split in another
xaction if we found no parent for a page) ... or we could hold
locks on all splitted pages till some parent updated without
split, but I wouldn't do this.

Vadim



RE: strange behaviour (bug)

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM]
> 
> > Probably WAL would solve this phenomenon by rolling
> > back the content of disc and shared buffer in reality.
> > However if 7.0.x would be released we had better change 
> > bufmgr IMHO.
> 
> I'm going to handle btree split but currently there is no way
> to rollback it - we unlock splitted pages after parent
> is locked and concurrent backend may update one/both of
> siblings before we get our locks back.
> We have to continue with split or could leave parent unchanged
> and handle "my bits moved..." (ie continue split in another
> xaction if we found no parent for a page) ... or we could hold
> locks on all splitted pages till some parent updated without
> split, but I wouldn't do this.
>

It seems to me that btree split operations must always be
rolled forward even in case of abort/crash. DO you have
other ideas ?

Regards.

Hiroshi Inoue


RE: strange behaviour (bug)

From
"Mikheev, Vadim"
Date:
> > I'm going to handle btree split but currently there is no way
> > to rollback it - we unlock splitted pages after parent
> > is locked and concurrent backend may update one/both of
> > siblings before we get our locks back.
> > We have to continue with split or could leave parent unchanged
> > and handle "my bits moved..." (ie continue split in another
> > xaction if we found no parent for a page) ... or we could hold
> > locks on all splitted pages till some parent updated without
> > split, but I wouldn't do this.
> >
> 
> It seems to me that btree split operations must always be
> rolled forward even in case of abort/crash. DO you have
> other ideas ?

Yes, it should, but hard to implement, especially for abort case.
So, for the moment, I would proceed with handling "my bits moved...":
no reason to elog(FATAL) here - we can try to insert missed pointers
into parent page(s). WAL will guarantee that btitems moved to right
sibling will not be lost (level consistency), and missing some pointers
in parent level is acceptable - scans will work.

Vadim


RE: strange behaviour (bug)

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM]
> 
> > > I'm going to handle btree split but currently there is no way
> > > to rollback it - we unlock splitted pages after parent
> > > is locked and concurrent backend may update one/both of
> > > siblings before we get our locks back.
> > > We have to continue with split or could leave parent unchanged
> > > and handle "my bits moved..." (ie continue split in another
> > > xaction if we found no parent for a page) ... or we could hold
> > > locks on all splitted pages till some parent updated without
> > > split, but I wouldn't do this.
> > >
> > 
> > It seems to me that btree split operations must always be
> > rolled forward even in case of abort/crash. DO you have
> > other ideas ?
> 
> Yes, it should, but hard to implement, especially for abort case.
> So, for the moment, I would proceed with handling "my bits moved...":
> no reason to elog(FATAL) here - we can try to insert missed pointers
> into parent page(s). WAL will guarantee that btitems moved to right
> sibling will not be lost (level consistency), and missing some pointers
> in parent level is acceptable - scans will work.
>

I looked into your XLOG stuff a little.
It seems that XLogFileOpen() isn't implemented yet.
Would/should XLogFIleOpen() guarantee to open a Relation
properly at any time ?

Regards.

Hiroshi Inoue


RE: strange behaviour (bug)

From
"Mikheev, Vadim"
Date:
> I looked into your XLOG stuff a little.
> It seems that XLogFileOpen() isn't implemented yet.
> Would/should XLogFIleOpen() guarantee to open a Relation
> properly at any time ?

If each relation will have unique file name then there will be no
problem. If a relation was dropped then after crash redo will try
to open probably unexisted file. XLogFileOpen will return NULL in this case
(redo will do nothing) and remember this fact (ie - "file deletion is
expected").

Vadim