Re: 8.2beta1 (w32): server process crash (tsvector) - Mailing list pgsql-bugs

From Magnus Hagander
Subject Re: 8.2beta1 (w32): server process crash (tsvector)
Date
Msg-id 6BCB9D8A16AC4241919521715F4D8BCEA3591E@algol.sollentuna.se
Whole thread Raw
In response to Re: 8.2beta1 (w32): server process crash (tsvector)  ("Magnus Hagander" <mha@sollentuna.net>)
List pgsql-bugs
Just wanted to let you know that working with Teodor I beleive we have
now found the cause for this bug, and I have a workaround that passes
your tests. I'm unsure if it's safe on !=3D win32, so I'm deferring to
Teodor to come up with the final fix.

It's also possible that I spotted another problem with taht code that
might require slightly more code than a one-liner to fix. We'll know in
a while when Teodor has analyzed it completely.

//Magnus

> -----Original Message-----
> From: pgsql-bugs-owner@postgresql.org=20
> [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Magnus Hagander
> Sent: den 13 november 2006 10:36
> To: Thomas H.; pgsql-bugs@postgresql.org
> Cc: Tom Lane
> Subject: Re: [BUGS] 8.2beta1 (w32): server process crash (tsvector)
>=20
> Did yet some more stuff, mainly in line of shotgun-debugging :-)
>=20
> If I add a "return NULL" right before:
>                 res =3D (TSLexeme *)
> DatumGetPointer(FunctionCall4(
>=20=09
> &(dict->lexize_info),
>=20=09
> PointerGetDatum(dict->dictionary),
>=20=09
> PointerGetDatum(curVal->lemm),
>=20=09
> Int32GetDatum(curVal->lenlemm),
>=20=09
> PointerGetDatum(&ld->dictState)
>=20=09
> ));
> in LexizeExec(), I get no crash and NULL values for all tsvectors.
> If I add it right *after* that line, I get a crash. But I=20
> don't get the warnings about overwriting.
>=20
> Not entirely sure how much that helps. Don't have time to dig=20
> in more right htis minute, but I'd definitly appreciate some=20
> pointers into what to look at next.
>=20
> //Magnus
>=20
> > -----Original Message-----
> > From: pgsql-bugs-owner@postgresql.org=20
> > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of=20
> Magnus Hagander
> > Sent: den 13 november 2006 10:01
> > To: Thomas H.; pgsql-bugs@postgresql.org
> > Cc: Tom Lane
> > Subject: Re: [BUGS] 8.2beta1 (w32): server process crash (tsvector)
> >=20
> > Did some more tests, and it's even easier to repro this. No need to=20
> > update, just SELECT to_tsvector('default',sometext) from test;
> >=20
> > It appears to be data dependent still, though, because just running=20
> > SELECT to_tsvector('default','foo bar foo bar foo bar foo bar')
> > fromgenerate_series(1,1000000) works withuot a hitch.
> >=20
> >=20
> > Also, the "detected write past chunk end" comes from the=20
> > pfree() in tsvector.c, function unqiueWORD() line 656. At=20
> > this point, "ptr->word"
> > points at the text "temp" followed by a bunch of junk. The=20
> > callstack leading up to this is:
> >      postgres.exe!AllocSetFree(MemoryContextData *
> > context=3D0x02c37bd8, void * pointer=3D0x02c65ed0)  Line 775    C
> >      postgres.exe!pfree(void * pointer=3D0x02c65ed0)  Line 585 + 0x17
> > bytes    C
> >      tsearch2.dll!uniqueWORD(TSWORD * a=3D0x02c66e60, int l=3D517)  Line
> > 656 + 0xc bytes    C
> >      tsearch2.dll!makevalue(PRSTEXT * prs=3D0x00d4f44c)  Line 692 +
> > 0x12 bytes    C
> >      tsearch2.dll!to_tsvector(FunctionCallInfoData *
> > fcinfo=3D0x00d4f46c)  Line 764 + 0x9 bytes    C
> >      postgres.exe!DirectFunctionCall3(unsigned long=20
> > (FunctionCallInfoData *)* func=3D0x02cd1131, unsigned long=20
> > arg1=3D16473, unsigned long arg2=3D49941724, unsigned long=20
> > arg3=3D0)  Line 905 + 0xa bytes C
> >      tsearch2.dll!to_tsvector_name(FunctionCallInfoData *
> > fcinfo=3D0x00d4f6b0)  Line 787 + 0x20 bytes    C
> >=20
> >=20
> > Now, the parametsrs to to_tsvector() looks weird - the "text=20
> > *in =3D PG_GETARG_TEXT_P(1)" appears to return a struct with=20
> > vl_len set t 2139062143, which seems. Eh. Wrong? :) But this=20
> > does not happen on the
> > *first* call to to_tsquery() which shows a proper text, but=20
> > subsequent calls show what lokos like garbage a couple of=20
> > times, and then the WARNING about the write past end fires.
> >=20
> > //Magnus=20
> >=20
> > > -----Original Message-----
> > > From: pgsql-bugs-owner@postgresql.org=20
> > > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of=20
> > Magnus Hagander
> > > Sent: den 12 november 2006 17:06
> > > To: Thomas H.; pgsql-bugs@postgresql.org
> > > Cc: Tom Lane
> > > Subject: Re: [BUGS] 8.2beta1 (w32): server process crash=20
> (tsvector)
> > >=20
> > > Ok, I've run this test on an assert enabled build (my msvc build,=20
> > > actually, so I could get a debugger on it if needed).
> > > It then outputs:
> > >=20
> > > WARNING:  detected write past chunk end in ExprContext 02C6D0F8
> > > WARNING:  detected write past chunk end in ExprContext 02C6AEA0
> > > WARNING:  detected write past chunk end in ExprContext 02C6B200
> > > WARNING:  detected write past chunk end in ExprContext 02C44630
> > > WARNING:  detected write past chunk end in ExprContext 02C4C118
> > > WARNING:  problem in alloc set ExprContext: bogus aset link=20
> > in block=20
> > > 02C435A8, c hunk 02C44520
> > > WARNING:  detected write past chunk end in ExprContext 02C66440
> > > WARNING:  detected write past chunk end in ExprContext 02C3B9D0
> > > WARNING:  detected write past chunk end in ExprContext 02C3BDE8
> > > WARNING:  detected write past chunk end in ExprContext 02C4E7E0
> > > WARNING:  detected write past chunk end in ExprContext 02C47508
> > > WARNING:  problem in alloc set ExprContext: bogus aset link=20
> > in block=20
> > > 02C435A8, c hunk 02C47528
> > > WARNING:  detected write past chunk end in ExprContext 02C43800
> > > WARNING:  detected write past chunk end in ExprContext 02C66C90
> > > WARNING:  detected write past chunk end in ExprContext 02C68270
> > > WARNING:  detected write past chunk end in ExprContext 02C4F5D8
> > > WARNING:  problem in alloc set ExprContext: bogus aset link=20
> > in block=20
> > > 02C4E6F8, c hunk 02C4F5F8
> > > WARNING:  detected write past chunk end in ExprContext 02C7B680
> > > WARNING:  detected write past chunk end in ExprContext 02C45190
> > > WARNING:  detected write past chunk end in ExprContext 02C46AC8
> > > WARNING:  detected write past chunk end in ExprContext 02C3C538
> > > WARNING:  detected write past chunk end in ExprContext 02C67B90
> > > WARNING:  detected write past chunk end in ExprContext 02C438F0
> > > WARNING:  problem in alloc set ExprContext: bad single-chunk
> > > 02C43FB8 in block 0
> > > 2C435A8
> > > WARNING:  problem in alloc set ExprContext: bogus aset link=20
> > in block=20
> > > 02C435A8, c hunk 02C43FB8 server closed the connection=20
> unexpectedly
> > >         This probably means the server terminated abnormally
> > >         before or while processing the request.=20
> > >=20
> > >=20
> > >=20
> > >=20
> > > The actual crash happens on line 1142 of AllocSetCheck.=20
> > Full callstack
> > > is:
> > > >    postgres.exe!AllocSetCheck(MemoryContextData *
> > > context=3D0x02c455b8)  Line 1142 + 0x11 bytes    C
> > >      postgres.exe!AllocSetReset(MemoryContextData *
> > > context=3D0x02c455b8)  Line 409 + 0x9 bytes    C
> > >      postgres.exe!MemoryContextReset(MemoryContextData *
> > > context=3D0x02c455b8)  Line 129 + 0xf bytes    C
> > >      postgres.exe!ExecScan(ScanState * node=3D0x02c227d0,=20
> > TupleTableSlot *=20
> > > (ScanState *)* accessMtd=3D0x00530b70)  Line 91 + 0xc
> > > bytes    C
> > >      postgres.exe!ExecSeqScan(ScanState * node=3D0x02c227d0)  Line 130
> > > + 0xe bytes    C
> > >      postgres.exe!ExecProcNode(PlanState * node=3D0x02c227d0)  Line 349
> > > + 0x9 bytes    C
> > >      postgres.exe!ExecutePlan(EState * estate=3D0x02c223e8,=20
> > PlanState *=20
> > > planstate=3D0x02c227d0, CmdType operation=3DCMD_UPDATE, long=20
> > > numberTuples=3D0, ScanDirection direction=3DForwardScanDirection,=20
> > > _DestReceiver *
> > > dest=3D0x02bf8d30)  Line 1081 + 0x9 bytes    C
> > >      postgres.exe!ExecutorRun(QueryDesc * queryDesc=3D0x02c44078,=20
> > > ScanDirection direction=3DForwardScanDirection, long count=3D0)=20
> >  Line 246=20
> > > +
> > > 0x20 bytes    C
> > >      postgres.exe!ProcessQuery(Query * parsetree=3D0x02bddab0, Plan *=20
> > > plan=3D0x02bf7e28, ParamListInfoData * params=3D0x00000000,=20
> > _DestReceiver=20
> > > * dest=3D0x02bf8d30, char *
> > > completionTag=3D0x00d4fb24)  Line 157 + 0xd bytes C
> > >      postgres.exe!PortalRunMulti(PortalData * portal=3D0x02c48138,=20
> > > _DestReceiver * dest=3D0x02bf8d30, _DestReceiver *=20
> altdest=3D0x02bf8d30,
> > > char * completionTag=3D0x00d4fb24)  Line 1148 + 0x1c bytes    C
> > >      postgres.exe!PortalRun(PortalData * portal=3D0x02c48138, long=20
> > > count=3D2147483647, _DestReceiver * dest=3D0x02bf8d30,=20
> _DestReceiver *=20
> > > altdest=3D0x02bf8d30, char *
> > > completionTag=3D0x00d4fb24)  Line 700 + 0x15
> > > bytes    C
> > >      postgres.exe!exec_simple_query(const char *
> > > query_string=3D0x02bdd4c8)  Line 943 + 0x23 bytes    C
> > >      postgres.exe!PostgresMain(int argc=3D4, char * * argv=3D0x02ba2478,
> > > const char * username=3D0x025e7af8)  Line 3414 + 0xc bytes    C
> > >      postgres.exe!BackendRun(Port * port=3D0x00d4fd2c)  Line 2853 +
> > > 0x17 bytes    C
> > >      postgres.exe!SubPostmasterMain(int argc=3D3, char * *
> > > argv=3D0x025e5b30)  Line 3288 + 0xc bytes    C
> > >      postgres.exe!main(int argc=3D3, char * * argv=3D0x025e5b30)  Line
> > > 165 + 0xd bytes    C
> > >=20
> > >=20
> > >=20
> > >=20
> > > Can someone point me towards what next to check? :-)
> > >=20
> > > Oh, and per Thomas, this does NOT appear to happen in C=20
> locale, but=20
> > > happens in both de_CH and en_US. ANd I've tested in=20
> Swedish_Sweden,=20
> > > which also crashes.
> > >=20
> > > Note that this appears not to be index related at all,=20
> > because during=20
> > > this test there is only a standard btree index on "someint", and=20
> > > nothing on the tsvector column.
> > >=20
> > > //Magnus
> > >=20
> > >=20
> > > > -----Original Message-----
> > > > From: pgsql-bugs-owner@postgresql.org=20
> > > > [mailto:pgsql-bugs-owner@postgresql.org] On Behalf Of Thomas H.
> > > > Sent: den 12 november 2006 15:06
> > > > To: pgsql-bugs@postgresql.org
> > > > Subject: Re: [BUGS] 8.2beta1 (w32): server process crash=20
> > (tsvector)
> > > >=20
> > > > this bug (please see below) is unfortunately still present in
> > > > beta3 (win32 build). test case still crashes the child=20
> > process and=20
> > > > lets postmaster kill & reload everything.
> > > >=20
> > > > it is not GiST-related, i've just validated the same=20
> > problem using=20
> > > > GIN.
> > > >=20
> > > > this breaks tsearch2 functionality on our win32 system as no=20
> > > > tsvector-indexing of new/existing rows is possible (crash=20
> > after ~10=20
> > > > processed rows). searching already indexed rows works fine.
> > > >=20
> > > > best regards,
> > > > thomas
> > > >=20
> > > > ----- Original Message -----
> > > > From: <me@alternize.com>
> > > > To: "Tom Lane" <tgl@sss.pgh.pa.us>
> > > > Cc: <pgsql-bugs@postgresql.org>
> > > > Sent: Tuesday, October 17, 2006 9:19 PM
> > > > Subject: Re: [BUGS] 8.2beta1 (w32): server process crash=20
> > (tsvector)
> > > >=20
> > > >=20
> > > > >>> the following query will crash the server process:
> > > > >>> INSERT INTO news.news
> > > > >>> SELECT * FROM news.news2;
> > > > >>
> > > > >> This is undoubtedly data-dependent.  Can you supply some
> > > > sample data
> > > > >> that makes it happen?
> > > > >
> > > > > it's not only happening with INSERTS, but also updates. as
> > > > thats easier to
> > > > > test, here's how i can reproduce the error:
> > > > >
> > > > > 1. create new database (encoding: UTF8) with tsearch2 on
> > > > 8.2b1 win32
> > > > > (system
> > > > > locale: de_CH.1252)
> > > > > 2. insert the data from the zip file=20
> > > > > [http://alternize.com/pgsql/tsearch2test.zip] (be sure to
> > > > also update
> > > > > pg_ts_cf /
> > > > > pg_ts_cfgmap as we have WIN1252 locale) 3. execute UPDATE
> > > test SET
> > > > > idxFTI =3D to_tsvector('default',
> > > > "sometext"); or
> > > > > similar queries
> > > > > 4. hopefully see the process crashing as i do ;-)
> > > > >
> > > > >
> > > > > 2006-10-17 17:23:44 LOG:  server process (PID 4584)
> > > exited with exit
> > > > > code -1073741819
> > > > > 2006-10-17 17:23:44 LOG:  terminating any other active
> > > > server processes
> > > > > 2006-10-17 17:23:44 WARNING:  terminating connection
> > > > because of crash of
> > > > > another server process
> > > > > 2006-10-17 17:23:44 DETAIL:  The postmaster has commanded
> > > > this server
> > > > > process to roll back the current transaction and exit,
> > > > because another
> > > > > server process exited abnormally and possibly corrupted
> > > > shared memory.
> > > > > {snipp}
> > > > > 2006-10-17 17:23:44 LOG:  all server processes terminated;
> > > > reinitializing
> > > > > 2006-10-17 17:23:44 LOG:  database system was interrupted
> > > > at 2006-10-17
> > > > > 17:23:41 W. Europe Daylight Time
> > > > > 2006-10-17 17:23:44 LOG:  Windows
> > > > fopen("recovery.conf","r") failed: code
> > > > > 2,
> > > > > errno 2
> > > > > 2006-10-17 17:23:44 LOG:  Windows
> > > > fopen("pg_xlog/00000001.history","r")
> > > > > failed: code 2, errno 2
> > > > > 2006-10-17 17:23:44 LOG:  Windows fopen("backup_label","r")
> > > > failed: code
> > > > > 2,
> > > > > errno 2
> > > > > 2006-10-17 17:23:44 LOG:  checkpoint record is at 0/E2ECA728
> > > > > 2006-10-17 17:23:44 LOG:  redo record is at 0/E2ECA728;
> > > > undo record is at
> > > > > 0/0; shutdown FALSE
> > > > > 2006-10-17 17:23:44 LOG:  next transaction ID: 0/514299;
> > > > next OID: 6276957
> > > > > 2006-10-17 17:23:44 LOG:  next MultiXactId: 1; next
> > > > MultiXactOffset: 0
> > > > > 2006-10-17 17:23:44 LOG:  database system was not properly
> > > > shut down;
> > > > > automatic recovery in progress
> > > > > 2006-10-17 17:23:44 LOG:  redo starts at 0/E2ECA778
> > > > > 2006-10-17 17:23:44 LOG:  unexpected pageaddr 0/DB0CC000 in
> > > > log file 0,
> > > > > segment 227, offset 835584
> > > > > 2006-10-17 17:23:44 LOG:  redo done at 0/E30CBE78
> > > > > 2006-10-17 17:23:45 LOG:  database system is ready
> > > > > 2006-10-17 17:23:45 LOG:  Windows
> > > fopen("global/pg_fsm.cache","rb")
> > > > > failed:
> > > > > code 2, errno 2
> > > > > 2006-10-17 17:23:45 LOG:  transaction ID wrap limit is
> > > > 2147484172, limited
> > > > > by database "postgres"
> > > > > 2006-10-17 17:23:45 LOG:  Windows
> > > > fopen("global/pgstat.stat","rb") failed:
> > > > > code 2, errno 2
> > > > >
> > > > >
> > > > > i've also tried to update each record on its own in a
> > > > for-loop. here the
> > > > > crash happens as well, sometimes after 10 updates,
> > > > sometimes after 100
> > > > > updates, sometimes even after 1 update. but eventually
> > > > every record can be
> > > > > updated. so i do not think its entierly content-related...
> > > > >
> > > > > for what its worth, here's the output of pg_controldata:
> > > > >
> > > > > pg_control version number:            822
> > > > > Catalog version number:               200609181
> > > > > Database system identifier:           4986650172201464825
> > > > > Database cluster state:               in production
> > > > > pg_control last modified:             17.10.2006 17:44:29
> > > > > Current log file ID:                  0
> > > > > Next log file segment:                230
> > > > > Latest checkpoint location:           0/E4E0F978
> > > > > Prior checkpoint location:            0/E46BF420
> > > > > Latest checkpoint's REDO location:    0/E4E03098
> > > > > Latest checkpoint's UNDO location:    0/0
> > > > > Latest checkpoint's TimeLineID:       1
> > > > > Latest checkpoint's NextXID:          0/531333
> > > > > Latest checkpoint's NextOID:          6285149
> > > > > Latest checkpoint's NextMultiXactId:  1 Latest checkpoint's
> > > > > NextMultiOffset:  0
> > > > > Time of latest checkpoint:            17.10.2006 17:43:45
> > > > > Minimum recovery ending location:     0/0
> > > > > Maximum data alignment:               8
> > > > > Database block size:                  8192
> > > > > Blocks per segment of large relation: 131072
> > > > > WAL block size:                       8192
> > > > > Bytes per WAL segment:                16777216
> > > > > Maximum length of identifiers:        64
> > > > > Maximum columns in an index:          32
> > > > > Date/time type storage:               floating-point numbers
> > > > > Maximum length of locale name:        128
> > > > > LC_COLLATE:                           German_Switzerland.1252
> > > > > LC_CTYPE:                             German_Switzerland.1252
> > > > >
> > > > > let me know if more information / data is needed.
> > > > >
> > > > > on a sidenote: are those fopen() errors
> > > > debug-code-leftovers or something
> > > > > one should worry about? i can't find those files on the
> > > file system.
> > > > >
> > > > > - thomas
> > > > >
> > > > >
> > > > > ---------------------------(end of
> > > > broadcast)---------------------------
> > > > > TIP 4: Have you searched our list archives?
> > > > >
> > > > >               http://archives.postgresql.org
> > > > >=20
> > > >=20
> > > >=20
> > > >=20
> > > > ---------------------------(end of
> > > > broadcast)---------------------------
> > > > TIP 6: explain analyze is your friend
> > > >=20
> > >=20
> > > ---------------------------(end of
> > > broadcast)---------------------------
> > > TIP 3: Have you checked our extensive FAQ?
> > >=20
> > >                http://www.postgresql.org/docs/faq
> > >=20
> >=20
> > ---------------------------(end of=20
> > broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
> >=20
>=20
> ---------------------------(end of=20
> broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>=20
>                 http://www.postgresql.org/about/donate
>=20

pgsql-bugs by date:

Previous
From: Joe Malicki
Date:
Subject: Postgresql out-of-memory error
Next
From: Tom Lane
Date:
Subject: Re: How to crash postgres using savepoints