Thread: strange behaviour (bug)
Hi, I experience a strange error with 7.0.2. I cannot get any results with certain queries. For example, a foo table is defined with a few columns, it has a id_string varchar(100) column, too. I filling this table, it contains e.g. a row with 'something' in the column id_string. I give the next query: > select * from foo where id_string = 'something'; I get no result. > select * from foo where id_string like '%something'; I get the row. Strange. Then, if I try to check the result: > select substr(id_string,1,1) from foo where id_string like '%something'; now I will get 's' as expected... Dumping the database out and bringing it back the problem doesn't appear anymore... for a while... I cannot give an exact report, but usually this bug occurs when I stop the database and I start it again. Did anybody experience such a behaviour? TIA, Zoltan Kov\'acs, Zolt\'an kovacsz@pc10.radnoti-szeged.sulinet.hu http://www.math.u-szeged.hu/~kovzol ftp://pc10.radnoti-szeged.sulinet.hu/home/kovacsz
Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes: > now I will get 's' as expected... Dumping the database out and bringing it > back the problem doesn't appear anymore... for a while... I cannot give > an exact report, but usually this bug occurs when I stop the database > and I start it again. Hmm. Is it possible that when you restart the postmaster, you are accidentally starting it with a different environment --- in particular, different LOCALE or LC_xxx settings --- than it had before? If there is an index on id_string then > select * from foo where id_string = 'something'; would try to use the index, and so could get messed up by a change in LOCALE; the index would now appear to be out of order according to the new LOCALE value. We really ought to fix things so that all the LOCALE settings are saved by "initdb" and then re-established during postmaster start, rather than relying on the user always to start the postmaster with the same environment. People have been burnt by this before :-( regards, tom lane
> -----Original Message----- > From: Tom Lane > > Kovacs Zoltan <kovacsz@pc10.radnoti-szeged.sulinet.hu> writes: > > now I will get 's' as expected... Dumping the database out and > bringing it > > back the problem doesn't appear anymore... for a while... I cannot give > > an exact report, but usually this bug occurs when I stop the database > > and I start it again. > > Hmm. Is it possible that when you restart the postmaster, you are > accidentally starting it with a different environment --- in particular, > different LOCALE or LC_xxx settings --- than it had before? > > If there is an index on id_string then > > select * from foo where id_string = 'something'; > would try to use the index, and so could get messed up by a change > in LOCALE; the index would now appear to be out of order according to > the new LOCALE value. > There could be another cause. If a B-tree page A was splitted to the page A(changed) and a page B but the transaction was rolled back,the pages A,B would not be written to disc and the followings could occur for example. 1) The changed non-leaf page of A and B may be written to disc later. 2) An index entry may be inserted into the page B and committed later. I don't know how often those could occur. Regards. Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > If a B-tree page A was splitted to the page A(changed) and a page B but > the transaction was rolled back,the pages A,B would not be written to > disc and the followings could occur for example. Yes. I have been thinking that it's a mistake not to write changed pages to disk at transaction abort, because that just makes for a longer window where a system crash might leave you with corrupted indexes. I don't think fsync is really essential, but leaving the pages unwritten in shared memory is bad. (For example, if we next shut down the postmaster, then the pages will NEVER get written.) Skipping the update is a bit silly anyway; we aren't really that concerned about optimizing performance of abort, are we? regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > If a B-tree page A was splitted to the page A(changed) and a page B but > > the transaction was rolled back,the pages A,B would not be written to > > disc and the followings could occur for example. > > Yes. I have been thinking that it's a mistake not to write changed > pages to disk at transaction abort, because that just makes for a longer > window where a system crash might leave you with corrupted indexes. > I don't think fsync is really essential, but leaving the pages unwritten > in shared memory is bad. (For example, if we next shut down the > postmaster, then the pages will NEVER get written.) > > Skipping the update is a bit silly anyway; we aren't really that > concerned about optimizing performance of abort, are we? > Probably WAL would solve this phenomenon by rolling back the content of disc and shared buffer in reality. However if 7.0.x would be released we had better change bufmgr IMHO. Regards. Hiroshi Inoue
> Probably WAL would solve this phenomenon by rolling > back the content of disc and shared buffer in reality. > However if 7.0.x would be released we had better change > bufmgr IMHO. I'm going to handle btree split but currently there is no way to rollback it - we unlock splitted pages after parent is locked and concurrent backend may update one/both of siblings before we get our locks back. We have to continue with split or could leave parent unchanged and handle "my bits moved..." (ie continue split in another xaction if we found no parent for a page) ... or we could hold locks on all splitted pages till some parent updated without split, but I wouldn't do this. Vadim
> -----Original Message----- > From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM] > > > Probably WAL would solve this phenomenon by rolling > > back the content of disc and shared buffer in reality. > > However if 7.0.x would be released we had better change > > bufmgr IMHO. > > I'm going to handle btree split but currently there is no way > to rollback it - we unlock splitted pages after parent > is locked and concurrent backend may update one/both of > siblings before we get our locks back. > We have to continue with split or could leave parent unchanged > and handle "my bits moved..." (ie continue split in another > xaction if we found no parent for a page) ... or we could hold > locks on all splitted pages till some parent updated without > split, but I wouldn't do this. > It seems to me that btree split operations must always be rolled forward even in case of abort/crash. DO you have other ideas ? Regards. Hiroshi Inoue
> > I'm going to handle btree split but currently there is no way > > to rollback it - we unlock splitted pages after parent > > is locked and concurrent backend may update one/both of > > siblings before we get our locks back. > > We have to continue with split or could leave parent unchanged > > and handle "my bits moved..." (ie continue split in another > > xaction if we found no parent for a page) ... or we could hold > > locks on all splitted pages till some parent updated without > > split, but I wouldn't do this. > > > > It seems to me that btree split operations must always be > rolled forward even in case of abort/crash. DO you have > other ideas ? Yes, it should, but hard to implement, especially for abort case. So, for the moment, I would proceed with handling "my bits moved...": no reason to elog(FATAL) here - we can try to insert missed pointers into parent page(s). WAL will guarantee that btitems moved to right sibling will not be lost (level consistency), and missing some pointers in parent level is acceptable - scans will work. Vadim
> -----Original Message----- > From: Mikheev, Vadim [mailto:vmikheev@SECTORBASE.COM] > > > > I'm going to handle btree split but currently there is no way > > > to rollback it - we unlock splitted pages after parent > > > is locked and concurrent backend may update one/both of > > > siblings before we get our locks back. > > > We have to continue with split or could leave parent unchanged > > > and handle "my bits moved..." (ie continue split in another > > > xaction if we found no parent for a page) ... or we could hold > > > locks on all splitted pages till some parent updated without > > > split, but I wouldn't do this. > > > > > > > It seems to me that btree split operations must always be > > rolled forward even in case of abort/crash. DO you have > > other ideas ? > > Yes, it should, but hard to implement, especially for abort case. > So, for the moment, I would proceed with handling "my bits moved...": > no reason to elog(FATAL) here - we can try to insert missed pointers > into parent page(s). WAL will guarantee that btitems moved to right > sibling will not be lost (level consistency), and missing some pointers > in parent level is acceptable - scans will work. > I looked into your XLOG stuff a little. It seems that XLogFileOpen() isn't implemented yet. Would/should XLogFIleOpen() guarantee to open a Relation properly at any time ? Regards. Hiroshi Inoue
> I looked into your XLOG stuff a little. > It seems that XLogFileOpen() isn't implemented yet. > Would/should XLogFIleOpen() guarantee to open a Relation > properly at any time ? If each relation will have unique file name then there will be no problem. If a relation was dropped then after crash redo will try to open probably unexisted file. XLogFileOpen will return NULL in this case (redo will do nothing) and remember this fact (ie - "file deletion is expected"). Vadim