Re: 8.2beta1 (w32): server process crash (tsvector) - Mailing list pgsql-bugs
From | Magnus Hagander |
---|---|
Subject | Re: 8.2beta1 (w32): server process crash (tsvector) |
Date | |
Msg-id | 6BCB9D8A16AC4241919521715F4D8BCEA3591E@algol.sollentuna.se Whole thread Raw |
In response to | Re: 8.2beta1 (w32): server process crash (tsvector) ("Magnus Hagander" <mha@sollentuna.net>) |
List | pgsql-bugs |
Just wanted to let you know that working with Teodor I beleive we have now found the cause for this bug, and I have a workaround that passes your tests. I'm unsure if it's safe on !=3D win32, so I'm deferring to Teodor to come up with the final fix. It's also possible that I spotted another problem with taht code that might require slightly more code than a one-liner to fix. We'll know in a while when Teodor has analyzed it completely. //Magnus > -----Original Message----- > From: pgsql-bugs-owner@postgresql.org=20 > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Magnus Hagander > Sent: den 13 november 2006 10:36 > To: Thomas H.; pgsql-bugs@postgresql.org > Cc: Tom Lane > Subject: Re: [BUGS] 8.2beta1 (w32): server process crash (tsvector) >=20 > Did yet some more stuff, mainly in line of shotgun-debugging :-) >=20 > If I add a "return NULL" right before: > res =3D (TSLexeme *) > DatumGetPointer(FunctionCall4( >=20=09 > &(dict->lexize_info), >=20=09 > PointerGetDatum(dict->dictionary), >=20=09 > PointerGetDatum(curVal->lemm), >=20=09 > Int32GetDatum(curVal->lenlemm), >=20=09 > PointerGetDatum(&ld->dictState) >=20=09 > )); > in LexizeExec(), I get no crash and NULL values for all tsvectors. > If I add it right *after* that line, I get a crash. But I=20 > don't get the warnings about overwriting. >=20 > Not entirely sure how much that helps. Don't have time to dig=20 > in more right htis minute, but I'd definitly appreciate some=20 > pointers into what to look at next. >=20 > //Magnus >=20 > > -----Original Message----- > > From: pgsql-bugs-owner@postgresql.org=20 > > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of=20 > Magnus Hagander > > Sent: den 13 november 2006 10:01 > > To: Thomas H.; pgsql-bugs@postgresql.org > > Cc: Tom Lane > > Subject: Re: [BUGS] 8.2beta1 (w32): server process crash (tsvector) > >=20 > > Did some more tests, and it's even easier to repro this. No need to=20 > > update, just SELECT to_tsvector('default',sometext) from test; > >=20 > > It appears to be data dependent still, though, because just running=20 > > SELECT to_tsvector('default','foo bar foo bar foo bar foo bar') > > fromgenerate_series(1,1000000) works withuot a hitch. > >=20 > >=20 > > Also, the "detected write past chunk end" comes from the=20 > > pfree() in tsvector.c, function unqiueWORD() line 656. At=20 > > this point, "ptr->word" > > points at the text "temp" followed by a bunch of junk. The=20 > > callstack leading up to this is: > > postgres.exe!AllocSetFree(MemoryContextData * > > context=3D0x02c37bd8, void * pointer=3D0x02c65ed0) Line 775 C > > postgres.exe!pfree(void * pointer=3D0x02c65ed0) Line 585 + 0x17 > > bytes C > > tsearch2.dll!uniqueWORD(TSWORD * a=3D0x02c66e60, int l=3D517) Line > > 656 + 0xc bytes C > > tsearch2.dll!makevalue(PRSTEXT * prs=3D0x00d4f44c) Line 692 + > > 0x12 bytes C > > tsearch2.dll!to_tsvector(FunctionCallInfoData * > > fcinfo=3D0x00d4f46c) Line 764 + 0x9 bytes C > > postgres.exe!DirectFunctionCall3(unsigned long=20 > > (FunctionCallInfoData *)* func=3D0x02cd1131, unsigned long=20 > > arg1=3D16473, unsigned long arg2=3D49941724, unsigned long=20 > > arg3=3D0) Line 905 + 0xa bytes C > > tsearch2.dll!to_tsvector_name(FunctionCallInfoData * > > fcinfo=3D0x00d4f6b0) Line 787 + 0x20 bytes C > >=20 > >=20 > > Now, the parametsrs to to_tsvector() looks weird - the "text=20 > > *in =3D PG_GETARG_TEXT_P(1)" appears to return a struct with=20 > > vl_len set t 2139062143, which seems. Eh. Wrong? :) But this=20 > > does not happen on the > > *first* call to to_tsquery() which shows a proper text, but=20 > > subsequent calls show what lokos like garbage a couple of=20 > > times, and then the WARNING about the write past end fires. > >=20 > > //Magnus=20 > >=20 > > > -----Original Message----- > > > From: pgsql-bugs-owner@postgresql.org=20 > > > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of=20 > > Magnus Hagander > > > Sent: den 12 november 2006 17:06 > > > To: Thomas H.; pgsql-bugs@postgresql.org > > > Cc: Tom Lane > > > Subject: Re: [BUGS] 8.2beta1 (w32): server process crash=20 > (tsvector) > > >=20 > > > Ok, I've run this test on an assert enabled build (my msvc build,=20 > > > actually, so I could get a debugger on it if needed). > > > It then outputs: > > >=20 > > > WARNING: detected write past chunk end in ExprContext 02C6D0F8 > > > WARNING: detected write past chunk end in ExprContext 02C6AEA0 > > > WARNING: detected write past chunk end in ExprContext 02C6B200 > > > WARNING: detected write past chunk end in ExprContext 02C44630 > > > WARNING: detected write past chunk end in ExprContext 02C4C118 > > > WARNING: problem in alloc set ExprContext: bogus aset link=20 > > in block=20 > > > 02C435A8, c hunk 02C44520 > > > WARNING: detected write past chunk end in ExprContext 02C66440 > > > WARNING: detected write past chunk end in ExprContext 02C3B9D0 > > > WARNING: detected write past chunk end in ExprContext 02C3BDE8 > > > WARNING: detected write past chunk end in ExprContext 02C4E7E0 > > > WARNING: detected write past chunk end in ExprContext 02C47508 > > > WARNING: problem in alloc set ExprContext: bogus aset link=20 > > in block=20 > > > 02C435A8, c hunk 02C47528 > > > WARNING: detected write past chunk end in ExprContext 02C43800 > > > WARNING: detected write past chunk end in ExprContext 02C66C90 > > > WARNING: detected write past chunk end in ExprContext 02C68270 > > > WARNING: detected write past chunk end in ExprContext 02C4F5D8 > > > WARNING: problem in alloc set ExprContext: bogus aset link=20 > > in block=20 > > > 02C4E6F8, c hunk 02C4F5F8 > > > WARNING: detected write past chunk end in ExprContext 02C7B680 > > > WARNING: detected write past chunk end in ExprContext 02C45190 > > > WARNING: detected write past chunk end in ExprContext 02C46AC8 > > > WARNING: detected write past chunk end in ExprContext 02C3C538 > > > WARNING: detected write past chunk end in ExprContext 02C67B90 > > > WARNING: detected write past chunk end in ExprContext 02C438F0 > > > WARNING: problem in alloc set ExprContext: bad single-chunk > > > 02C43FB8 in block 0 > > > 2C435A8 > > > WARNING: problem in alloc set ExprContext: bogus aset link=20 > > in block=20 > > > 02C435A8, c hunk 02C43FB8 server closed the connection=20 > unexpectedly > > > This probably means the server terminated abnormally > > > before or while processing the request.=20 > > >=20 > > >=20 > > >=20 > > >=20 > > > The actual crash happens on line 1142 of AllocSetCheck.=20 > > Full callstack > > > is: > > > > postgres.exe!AllocSetCheck(MemoryContextData * > > > context=3D0x02c455b8) Line 1142 + 0x11 bytes C > > > postgres.exe!AllocSetReset(MemoryContextData * > > > context=3D0x02c455b8) Line 409 + 0x9 bytes C > > > postgres.exe!MemoryContextReset(MemoryContextData * > > > context=3D0x02c455b8) Line 129 + 0xf bytes C > > > postgres.exe!ExecScan(ScanState * node=3D0x02c227d0,=20 > > TupleTableSlot *=20 > > > (ScanState *)* accessMtd=3D0x00530b70) Line 91 + 0xc > > > bytes C > > > postgres.exe!ExecSeqScan(ScanState * node=3D0x02c227d0) Line 130 > > > + 0xe bytes C > > > postgres.exe!ExecProcNode(PlanState * node=3D0x02c227d0) Line 349 > > > + 0x9 bytes C > > > postgres.exe!ExecutePlan(EState * estate=3D0x02c223e8,=20 > > PlanState *=20 > > > planstate=3D0x02c227d0, CmdType operation=3DCMD_UPDATE, long=20 > > > numberTuples=3D0, ScanDirection direction=3DForwardScanDirection,=20 > > > _DestReceiver * > > > dest=3D0x02bf8d30) Line 1081 + 0x9 bytes C > > > postgres.exe!ExecutorRun(QueryDesc * queryDesc=3D0x02c44078,=20 > > > ScanDirection direction=3DForwardScanDirection, long count=3D0)=20 > > Line 246=20 > > > + > > > 0x20 bytes C > > > postgres.exe!ProcessQuery(Query * parsetree=3D0x02bddab0, Plan *=20 > > > plan=3D0x02bf7e28, ParamListInfoData * params=3D0x00000000,=20 > > _DestReceiver=20 > > > * dest=3D0x02bf8d30, char * > > > completionTag=3D0x00d4fb24) Line 157 + 0xd bytes C > > > postgres.exe!PortalRunMulti(PortalData * portal=3D0x02c48138,=20 > > > _DestReceiver * dest=3D0x02bf8d30, _DestReceiver *=20 > altdest=3D0x02bf8d30, > > > char * completionTag=3D0x00d4fb24) Line 1148 + 0x1c bytes C > > > postgres.exe!PortalRun(PortalData * portal=3D0x02c48138, long=20 > > > count=3D2147483647, _DestReceiver * dest=3D0x02bf8d30,=20 > _DestReceiver *=20 > > > altdest=3D0x02bf8d30, char * > > > completionTag=3D0x00d4fb24) Line 700 + 0x15 > > > bytes C > > > postgres.exe!exec_simple_query(const char * > > > query_string=3D0x02bdd4c8) Line 943 + 0x23 bytes C > > > postgres.exe!PostgresMain(int argc=3D4, char * * argv=3D0x02ba2478, > > > const char * username=3D0x025e7af8) Line 3414 + 0xc bytes C > > > postgres.exe!BackendRun(Port * port=3D0x00d4fd2c) Line 2853 + > > > 0x17 bytes C > > > postgres.exe!SubPostmasterMain(int argc=3D3, char * * > > > argv=3D0x025e5b30) Line 3288 + 0xc bytes C > > > postgres.exe!main(int argc=3D3, char * * argv=3D0x025e5b30) Line > > > 165 + 0xd bytes C > > >=20 > > >=20 > > >=20 > > >=20 > > > Can someone point me towards what next to check? :-) > > >=20 > > > Oh, and per Thomas, this does NOT appear to happen in C=20 > locale, but=20 > > > happens in both de_CH and en_US. ANd I've tested in=20 > Swedish_Sweden,=20 > > > which also crashes. > > >=20 > > > Note that this appears not to be index related at all,=20 > > because during=20 > > > this test there is only a standard btree index on "someint", and=20 > > > nothing on the tsvector column. > > >=20 > > > //Magnus > > >=20 > > >=20 > > > > -----Original Message----- > > > > From: pgsql-bugs-owner@postgresql.org=20 > > > > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Thomas H. > > > > Sent: den 12 november 2006 15:06 > > > > To: pgsql-bugs@postgresql.org > > > > Subject: Re: [BUGS] 8.2beta1 (w32): server process crash=20 > > (tsvector) > > > >=20 > > > > this bug (please see below) is unfortunately still present in > > > > beta3 (win32 build). test case still crashes the child=20 > > process and=20 > > > > lets postmaster kill & reload everything. > > > >=20 > > > > it is not GiST-related, i've just validated the same=20 > > problem using=20 > > > > GIN. > > > >=20 > > > > this breaks tsearch2 functionality on our win32 system as no=20 > > > > tsvector-indexing of new/existing rows is possible (crash=20 > > after ~10=20 > > > > processed rows). searching already indexed rows works fine. > > > >=20 > > > > best regards, > > > > thomas > > > >=20 > > > > ----- Original Message ----- > > > > From: <me@alternize.com> > > > > To: "Tom Lane" <tgl@sss.pgh.pa.us> > > > > Cc: <pgsql-bugs@postgresql.org> > > > > Sent: Tuesday, October 17, 2006 9:19 PM > > > > Subject: Re: [BUGS] 8.2beta1 (w32): server process crash=20 > > (tsvector) > > > >=20 > > > >=20 > > > > >>> the following query will crash the server process: > > > > >>> INSERT INTO news.news > > > > >>> SELECT * FROM news.news2; > > > > >> > > > > >> This is undoubtedly data-dependent. Can you supply some > > > > sample data > > > > >> that makes it happen? > > > > > > > > > > it's not only happening with INSERTS, but also updates. as > > > > thats easier to > > > > > test, here's how i can reproduce the error: > > > > > > > > > > 1. create new database (encoding: UTF8) with tsearch2 on > > > > 8.2b1 win32 > > > > > (system > > > > > locale: de_CH.1252) > > > > > 2. insert the data from the zip file=20 > > > > > [http://alternize.com/pgsql/tsearch2test.zip] (be sure to > > > > also update > > > > > pg_ts_cf / > > > > > pg_ts_cfgmap as we have WIN1252 locale) 3. execute UPDATE > > > test SET > > > > > idxFTI =3D to_tsvector('default', > > > > "sometext"); or > > > > > similar queries > > > > > 4. hopefully see the process crashing as i do ;-) > > > > > > > > > > > > > > > 2006-10-17 17:23:44 LOG: server process (PID 4584) > > > exited with exit > > > > > code -1073741819 > > > > > 2006-10-17 17:23:44 LOG: terminating any other active > > > > server processes > > > > > 2006-10-17 17:23:44 WARNING: terminating connection > > > > because of crash of > > > > > another server process > > > > > 2006-10-17 17:23:44 DETAIL: The postmaster has commanded > > > > this server > > > > > process to roll back the current transaction and exit, > > > > because another > > > > > server process exited abnormally and possibly corrupted > > > > shared memory. > > > > > {snipp} > > > > > 2006-10-17 17:23:44 LOG: all server processes terminated; > > > > reinitializing > > > > > 2006-10-17 17:23:44 LOG: database system was interrupted > > > > at 2006-10-17 > > > > > 17:23:41 W. Europe Daylight Time > > > > > 2006-10-17 17:23:44 LOG: Windows > > > > fopen("recovery.conf","r") failed: code > > > > > 2, > > > > > errno 2 > > > > > 2006-10-17 17:23:44 LOG: Windows > > > > fopen("pg_xlog/00000001.history","r") > > > > > failed: code 2, errno 2 > > > > > 2006-10-17 17:23:44 LOG: Windows fopen("backup_label","r") > > > > failed: code > > > > > 2, > > > > > errno 2 > > > > > 2006-10-17 17:23:44 LOG: checkpoint record is at 0/E2ECA728 > > > > > 2006-10-17 17:23:44 LOG: redo record is at 0/E2ECA728; > > > > undo record is at > > > > > 0/0; shutdown FALSE > > > > > 2006-10-17 17:23:44 LOG: next transaction ID: 0/514299; > > > > next OID: 6276957 > > > > > 2006-10-17 17:23:44 LOG: next MultiXactId: 1; next > > > > MultiXactOffset: 0 > > > > > 2006-10-17 17:23:44 LOG: database system was not properly > > > > shut down; > > > > > automatic recovery in progress > > > > > 2006-10-17 17:23:44 LOG: redo starts at 0/E2ECA778 > > > > > 2006-10-17 17:23:44 LOG: unexpected pageaddr 0/DB0CC000 in > > > > log file 0, > > > > > segment 227, offset 835584 > > > > > 2006-10-17 17:23:44 LOG: redo done at 0/E30CBE78 > > > > > 2006-10-17 17:23:45 LOG: database system is ready > > > > > 2006-10-17 17:23:45 LOG: Windows > > > fopen("global/pg_fsm.cache","rb") > > > > > failed: > > > > > code 2, errno 2 > > > > > 2006-10-17 17:23:45 LOG: transaction ID wrap limit is > > > > 2147484172, limited > > > > > by database "postgres" > > > > > 2006-10-17 17:23:45 LOG: Windows > > > > fopen("global/pgstat.stat","rb") failed: > > > > > code 2, errno 2 > > > > > > > > > > > > > > > i've also tried to update each record on its own in a > > > > for-loop. here the > > > > > crash happens as well, sometimes after 10 updates, > > > > sometimes after 100 > > > > > updates, sometimes even after 1 update. but eventually > > > > every record can be > > > > > updated. so i do not think its entierly content-related... > > > > > > > > > > for what its worth, here's the output of pg_controldata: > > > > > > > > > > pg_control version number: 822 > > > > > Catalog version number: 200609181 > > > > > Database system identifier: 4986650172201464825 > > > > > Database cluster state: in production > > > > > pg_control last modified: 17.10.2006 17:44:29 > > > > > Current log file ID: 0 > > > > > Next log file segment: 230 > > > > > Latest checkpoint location: 0/E4E0F978 > > > > > Prior checkpoint location: 0/E46BF420 > > > > > Latest checkpoint's REDO location: 0/E4E03098 > > > > > Latest checkpoint's UNDO location: 0/0 > > > > > Latest checkpoint's TimeLineID: 1 > > > > > Latest checkpoint's NextXID: 0/531333 > > > > > Latest checkpoint's NextOID: 6285149 > > > > > Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's > > > > > NextMultiOffset: 0 > > > > > Time of latest checkpoint: 17.10.2006 17:43:45 > > > > > Minimum recovery ending location: 0/0 > > > > > Maximum data alignment: 8 > > > > > Database block size: 8192 > > > > > Blocks per segment of large relation: 131072 > > > > > WAL block size: 8192 > > > > > Bytes per WAL segment: 16777216 > > > > > Maximum length of identifiers: 64 > > > > > Maximum columns in an index: 32 > > > > > Date/time type storage: floating-point numbers > > > > > Maximum length of locale name: 128 > > > > > LC_COLLATE: German_Switzerland.1252 > > > > > LC_CTYPE: German_Switzerland.1252 > > > > > > > > > > let me know if more information / data is needed. > > > > > > > > > > on a sidenote: are those fopen() errors > > > > debug-code-leftovers or something > > > > > one should worry about? i can't find those files on the > > > file system. > > > > > > > > > > - thomas > > > > > > > > > > > > > > > ---------------------------(end of > > > > broadcast)--------------------------- > > > > > TIP 4: Have you searched our list archives? > > > > > > > > > > http://archives.postgresql.org > > > > >=20 > > > >=20 > > > >=20 > > > >=20 > > > > ---------------------------(end of > > > > broadcast)--------------------------- > > > > TIP 6: explain analyze is your friend > > > >=20 > > >=20 > > > ---------------------------(end of > > > broadcast)--------------------------- > > > TIP 3: Have you checked our extensive FAQ? > > >=20 > > > http://www.postgresql.org/docs/faq > > >=20 > >=20 > > ---------------------------(end of=20 > > broadcast)--------------------------- > > TIP 2: Don't 'kill -9' the postmaster > >=20 >=20 > ---------------------------(end of=20 > broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at >=20 > http://www.postgresql.org/about/donate >=20
pgsql-bugs by date: