Thread: 8.4b2 tsearch2 strange error

8.4b2 tsearch2 strange error

From

Tatsuo Ishii

Date:

02 June 2009, 06:48:33

Hi,

I have encountered strange errors while testing PostgreSQL 8.4 beta2.

SELECT msg_sid FROM msginfo WHERE plainto_tsquery(E'test') @@
body_index;
or
SELECT msg_sid FROM msginfo WHERE to_tsquery(E'test') @@ body_index;

produces following errors:

ERROR:  tuple offset out of range: 0
(occasionally ERROR:  tuple offset out of range: 459)

Here is the table definition:

CREATE TABLE msginfo (       msg_sid BIGSERIAL PRIMARY KEY,       file_size INTEGER,       file_mtime TIMESTAMP,
msg_dateTIMESTAMP,       flags INTEGER,       hdr_from TEXT,       hdr_to TEXT,       hdr_cc TEXT,       hdr_newsgroups
TEXT,      hdr_subject TEXT,       hdr_msgid TEXT UNIQUE NOT NULL,       hdr_inreplyto TEXT,       hdr_references TEXT,
     body_text TEXT,       body_index TSVECTOR
 
);
CREATE INDEX msginfo_msg_date_index ON msginfo (msg_date);
CREATE INDEX msginfo_body_index ON msginfo USING gin (body_index);

and other info:

Ubuntu 8.04
./configure --prefix=/usr/local/pgsql84
initdb -E UTF-8 --no-locale /path/to/database

sylph=# EXPLAIN SELECT msg_sid FROM msginfo WHERE to_tsquery('test') @@ body_index;
QUERYPLAN                                   
 
--------------------------------------------------------------------------------
-Bitmap Heap Scan on msginfo  (cost=4.59..8.61 rows=1 width=8)  Recheck Cond: (to_tsquery('test'::text) @@ body_index)
-> Bitmap Index Scan on msginfo_body_index  (cost=0.00..4.59 rows=1 width=0)        Index Cond:
(to_tsquery('test'::text)@@ body_index)
 
(4 rows)
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: 8.4b2 tsearch2 strange error

From

Tom Lane

Date:

02 June 2009, 13:40:42

Tatsuo Ishii <ishii@postgresql.org> writes:
> I have encountered strange errors while testing PostgreSQL 8.4 beta2.

> ERROR:  tuple offset out of range: 0
> (occasionally ERROR:  tuple offset out of range: 459)

This is evidently coming from tbm_add_tuples, indicating that it's being
passed bogus TID values from the GIN index.  We'll probably have to get
Teodor to look at it --- can you provide a self-contained test case?
        regards, tom lane

Re: 8.4b2 tsearch2 strange error

From

Tatsuo Ishii

Date:

03 June 2009, 09:55:51

> > I have encountered strange errors while testing PostgreSQL 8.4 beta2.
> 
> > ERROR:  tuple offset out of range: 0
> > (occasionally ERROR:  tuple offset out of range: 459)
> 
> This is evidently coming from tbm_add_tuples, indicating that it's being
> passed bogus TID values from the GIN index.  We'll probably have to get
> Teodor to look at it --- can you provide a self-contained test case?

Sorry for delay. I have put a database dump at:

http://sylpheed.sraoss.jp/tmp/dump.sql.gz

We got following errors using this database:

ERROR:  unrecognized operator: -50
STATEMENT:  SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date, folder_id, msgnum FROM msginfo LEFT JOIN
msg_folderinfoUSING (msg_sid) WHERE plainto_tsquery(E'a') @@ body_index
 
NOTICE:  text-search query contains only stop words or doesn't contain lexemes, ignored

ERROR:  tuple offset out of range: 609
STATEMENT:  SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date, folder_id, msgnum FROM msginfo LEFT JOIN
msg_folderinfoUSING (msg_sid) WHERE plainto_tsquery(E'tsearch') @@ body_index
 

ERROR:  tuple offset out of range: 0
STATEMENT:  SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date, folder_id, msgnum FROM msginfo LEFT JOIN
msg_folderinfoUSING (msg_sid) WHERE plainto_tsquery(E'interval') @@ body_index
 

ERROR:  tuple offset out of range: 924
STATEMENT:  SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date, folder_id, msgnum FROM msginfo LEFT JOIN
msg_folderinfoUSING (msg_sid) WHERE plainto_tsquery(E'pitr') @@ body_index
 
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: 8.4b2 tsearch2 strange error

From

Tom Lane

Date:

03 June 2009, 17:08:43

Tatsuo Ishii <ishii@postgresql.org> writes:
> Sorry for delay. I have put a database dump at:
> http://sylpheed.sraoss.jp/tmp/dump.sql.gz

> We got following errors using this database:

Hmm.  I can confirm the "unrecognized operator" failure (though I get
"8" not "-50" ... uninitialized memory?).  The "tuple out of range"
failures are not happening here though.  Perhaps it is locale dependent?
What locale are you using, and what is your default_text_search_config
setting?  Are you using any nondefault configuration settings?
        regards, tom lane

Re: 8.4b2 tsearch2 strange error

From

Tom Lane

Date:

03 June 2009, 18:44:36

Tatsuo Ishii <ishii@postgresql.org> writes:
> STATEMENT:  SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date, folder_id, msgnum FROM msginfo LEFT JOIN
msg_folderinfoUSING (msg_sid) WHERE plainto_tsquery(E'a') @@ body_index
 
> NOTICE:  text-search query contains only stop words or doesn't contain lexemes, ignored
> ERROR:  unrecognized operator: -50

I found the cause of this one --- the new code for estimating the
selectivity of @@ was not guarding against the possibility that the
given TSQuery is empty, as it is in this example.

I'm still unable to reproduce the "tuple offset out of range" errors.
        regards, tom lane

Re: 8.4b2 tsearch2 strange error

From

Tatsuo Ishii

Date:

03 June 2009, 21:46:25

> Hmm.  I can confirm the "unrecognized operator" failure (though I get
> "8" not "-50" ... uninitialized memory?).  The "tuple out of range"
> failures are not happening here though.  Perhaps it is locale dependent?
> What locale are you using, and what is your default_text_search_config
> setting?  Are you using any nondefault configuration settings?

From my previous post:
> Ubuntu 8.04
> ./configure --prefix=/usr/local/pgsql84
> initdb -E UTF-8 --no-locale /path/to/database

So locale is "C". I believe we are not customizing
default_text_search_config. I'm not think of any nondefault
configuration settings.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: 8.4b2 tsearch2 strange error

From

Tom Lane

Date:

03 June 2009, 22:00:51

Tatsuo Ishii <ishii@postgresql.org> writes:
>> initdb -E UTF-8 --no-locale /path/to/database

> So locale is "C". I believe we are not customizing
> default_text_search_config. I'm not think of any nondefault
> configuration settings.

Huh.  So that should be the same case I tested here.  I wonder why
I can't duplicate it.  Can anyone else get the failure?
        regards, tom lane

Re: 8.4b2 tsearch2 strange error

From

Oleg Bartunov

Date:

03 June 2009, 22:17:17

On Wed, 3 Jun 2009, Tom Lane wrote:

> Tatsuo Ishii <ishii@postgresql.org> writes:
>>> initdb -E UTF-8 --no-locale /path/to/database
>
>> So locale is "C". I believe we are not customizing
>> default_text_search_config. I'm not think of any nondefault
>> configuration settings.
>
> Huh.  So that should be the same case I tested here.  I wonder why
> I can't duplicate it.  Can anyone else get the failure?

I also can't reproduce the problems reported, except one you just fixed.
My system is Linux, locale ru_RU.UTF-8.
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: 8.4b2 tsearch2 strange error

From

"Kevin Grittner"

Date:

03 June 2009, 23:02:24

Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> Tatsuo Ishii <ishii@postgresql.org> writes:
>>> initdb -E UTF-8 --no-locale /path/to/database
> 
>> So locale is "C". I believe we are not customizing
>> default_text_search_config. I'm not think of any nondefault
>> configuration settings.
> 
> Huh.  So that should be the same case I tested here.  I wonder why
> I can't duplicate it.  Can anyone else get the failure?
I can't duplicate it on beta2, either.
I didn't use the -E switch on initdb.
initdb --no-locale /path/to/database
kgrittn@ATHENA:~> uname -a
Linux ATHENA 2.6.16.60-0.31-smp #1 SMP Tue Oct 7 16:16:29 UTC 2008
x86_64 x86_64 x86_64 GNU/Linux
kgrittn@ATHENA:~> cat /etc/SuSE-release
SUSE Linux Enterprise Server 10 (x86_64)
VERSION = 10
PATCHLEVEL = 2
kgrittn@ATHENA:~> /usr/local/pgsql-8.4beta2/bin/pg_config
BINDIR = /usr/local/pgsql-8.4beta2/bin
DOCDIR = /usr/local/pgsql-8.4beta2/share/doc
HTMLDIR = /usr/local/pgsql-8.4beta2/share/doc
INCLUDEDIR = /usr/local/pgsql-8.4beta2/include
PKGINCLUDEDIR = /usr/local/pgsql-8.4beta2/include
INCLUDEDIR-SERVER = /usr/local/pgsql-8.4beta2/include/server
LIBDIR = /usr/local/pgsql-8.4beta2/lib
PKGLIBDIR = /usr/local/pgsql-8.4beta2/lib
LOCALEDIR = /usr/local/pgsql-8.4beta2/share/locale
MANDIR = /usr/local/pgsql-8.4beta2/share/man
SHAREDIR = /usr/local/pgsql-8.4beta2/share
SYSCONFDIR = /usr/local/pgsql-8.4beta2/etc
PGXS = /usr/local/pgsql-8.4beta2/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/usr/local/pgsql-8.4beta2'
'--enable-integer-datetimes' '--enable-debug' '--disable-nls'
'-with-libxml'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2
CFLAGS = -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g
CFLAGS_SL = -fpic
LDFLAGS = -Wl,--as-needed -Wl,-rpath,'/usr/local/pgsql-8.4beta2/lib'
LDFLAGS_SL =
LIBS = -lpgport -lxml2 -lz -lreadline -lcrypt -ldl -lm
VERSION = PostgreSQL 8.4beta2
kgrittn@ATHENA:~> /usr/local/pgsql-8.4beta2/bin/pg_controldata
/var/pgsql/data/kgrittn2
pg_control version number:            843
Catalog version number:               200904091
Database system identifier:           5340644220088610719
Database cluster state:               in production
pg_control last modified:             Wed 03 Jun 2009 05:42:18 PM CDT
Latest checkpoint location:           12/5F9CD778
Prior checkpoint location:            12/5F9CD700
Latest checkpoint's REDO location:    12/5F9CD778
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/4517
Latest checkpoint's NextOID:          57344
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Time of latest checkpoint:            Wed 03 Jun 2009 05:42:18 PM CDT
Minimum recovery ending location:     0/0
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
kgrittn@ATHENA:~> cat /var/pgsql/data/kgrittn2/postgresql.conf
port = 5442
max_connections = 10
shared_buffers = 250MB
max_prepared_transactions = 0
work_mem = 50MB
maintenance_work_mem = 1GB
checkpoint_segments = 20
checkpoint_timeout = 30min
effective_cache_size = 100GB
geqo = off
from_collapse_limit = 20
join_collapse_limit = 20
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
escape_string_warning = off
sql_inheritance = off
standard_conforming_strings = on
-Kevin

Re: 8.4b2 tsearch2 strange error

From

Tatsuo Ishii

Date:

04 June 2009, 01:27:50

> I found the cause of this one --- the new code for estimating the
> selectivity of @@ was not guarding against the possibility that the
> given TSQuery is empty, as it is in this example.

Shouldn't be this in regression test?

> I'm still unable to reproduce the "tuple offset out of range" errors.

Maybe this is due to the difference how the index is physically
organized. I will uplaod physical database cluster.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: 8.4b2 tsearch2 strange error

From

Tatsuo Ishii

Date:

04 June 2009, 02:46:28

> > I found the cause of this one --- the new code for estimating the
> > selectivity of @@ was not guarding against the possibility that the
> > given TSQuery is empty, as it is in this example.
> 
> Shouldn't be this in regression test?
> 
> > I'm still unable to reproduce the "tuple offset out of range" errors.
> 
> Maybe this is due to the difference how the index is physically
> organized. I will uplaod physical database cluster.

Done.

http://sylpheed.sraoss.jp/tmp/searcherdb84.tar.bz2

$ psql -U hiro sylph
Pager usage is off.
psql (8.4beta2)
Type "help" for help.

sylph=# SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date, folder_id, msgnum FROM msginfo LEFT JOIN
msg_folderinfoUSING (msg_sid) WHERE plainto_tsquery(E'a') @@ body_index;
 
NOTICE:  text-search query contains only stop words or doesn't contain lexemes, ignored
ERROR:  unrecognized operator: 24
STATEMENT:  SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date, folder_id, msgnum FROM msginfo LEFT JOIN
msg_folderinfoUSING (msg_sid) WHERE plainto_tsquery(E'a') @@ body_index;
 
ERROR:  unrecognized operator: 24
sylph=# SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date, folder_id, msgnum FROM msginfo LEFT JOIN
msg_folderinfoUSING (msg_sid) WHERE plainto_tsquery(E'tsearch') @@ body_index;
 
ERROR:  tuple offset out of range: 609
STATEMENT:  SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date, folder_id, msgnum FROM msginfo LEFT JOIN
msg_folderinfoUSING (msg_sid) WHERE plainto_tsquery(E'tsearch') @@ body_index;
 
ERROR:  tuple offset out of range: 609
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: 8.4b2 tsearch2 strange error

From

"Kevin Grittner"

Date:

04 June 2009, 17:12:59

Tatsuo Ishii <ishii@postgresql.org> wrote: 
> http://sylpheed.sraoss.jp/tmp/searcherdb84.tar.bz2
Using that on my kubuntu machine with no configure options except
prefix:
hiro@kgrittn-desktop:~$ /usr/local/pgsql-8.4beta2/bin/psql -U hiro
sylph
psql (8.4beta2)
Type "help" for help.

sylph=# SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date,
folder_id, msgnum FROM msginfo LEFT JOIN msg_folderinfo USING (msg_sid)
WHERE plainto_tsquery(E'a') @@ body_index;
NOTICE:  text-search query contains only stop words or doesn't contain
lexemes, ignored
ERROR:  unrecognized operator: -26
sylph=# SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date,
folder_id, msgnum FROM msginfo LEFT JOIN msg_folderinfo USING (msg_sid)
WHERE plainto_tsquery(E'a') @@ body_index;
NOTICE:  text-search query contains only stop words or doesn't contain
lexemes, ignored
ERROR:  unrecognized operator: -26
sylph=# SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date,
folder_id, msgnum FROM msginfo LEFT JOIN msg_folderinfo USING (msg_sid)
WHERE plainto_tsquery(E'tsearch') @@ body_index;
ERROR:  tuple offset out of range: 609
sylph=# SELECT msg_sid, hdr_from, hdr_to, hdr_subject, msg_date,
folder_id, msgnum FROM msginfo LEFT JOIN msg_folderinfo USING (msg_sid)
WHERE plainto_tsquery(E'tsearch') @@ body_index;
ERROR:  tuple offset out of range: 609
Given that I can replicate this, is there some useful next step?
-Kevin

Re: 8.4b2 tsearch2 strange error

From

"Kevin Grittner"

Date:

04 June 2009, 17:40:13

I wrote:
> Tatsuo Ishii <ishii@postgresql.org> wrote: 
>  
>> http://sylpheed.sraoss.jp/tmp/searcherdb84.tar.bz2
>  
> Using that on my kubuntu machine with no configure options except
> prefix:
>  
> [problem confirmed]
>  
> Given that I can replicate this, is there some useful next step?
By the way, that's 32 bit Kubuntu 7.10.
Linux kgrittn-desktop 2.6.22-16-generic #1 SMP Thu Apr 2 01:27:50 GMT
2009 i686 GNU/Linux
I've done a new build with --enable-debug and confirmed the problem in
that environment.
-Kevin

Re: 8.4b2 tsearch2 strange error

From

Tom Lane

Date:

04 June 2009, 18:04:15

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> By the way, that's 32 bit Kubuntu 7.10.
> Linux kgrittn-desktop 2.6.22-16-generic #1 SMP Thu Apr 2 01:27:50 GMT
> 2009 i686 GNU/Linux
> I've done a new build with --enable-debug and confirmed the problem in
> that environment.

Hmm ... I'll rev up my old 32-bit-Intel machine.  I suspect that the
missing link here is some state change in the database above and beyond
just loading the data, but since we don't know exactly what that is,
we'll have to resort to working with Tatsuo's bitwise dump.
        regards, tom lane

Re: 8.4b2 tsearch2 strange error

From

Tom Lane

Date:

04 June 2009, 22:10:07

I wrote:
> Hmm ... I'll rev up my old 32-bit-Intel machine.  I suspect that the
> missing link here is some state change in the database above and beyond
> just loading the data, but since we don't know exactly what that is,
> we'll have to resort to working with Tatsuo's bitwise dump.

I poked around in the dump for awhile.  I still can't reproduce the
failure from a standing start.  It looks to me like Tatsuo's database
was possibly produced from separate schema and data load steps, followed
by some update operations.  It would be nice to have a full script for
reproducing the state of the database.

I did find out some interesting stuff though.  One thing that is
particularly striking is that the GIN index is pretty bloated.
If you load the data dump file as-is into an empty DB you get
an index size of 2564 pages (on a 32-bit machine).  The actual size
of the index in Tatsuo's filesystem dump is 5459 pages!  I tried to
see if I could duplicate that by doing separate schema and data load
(so the data is inserted into a pre-existing index).  I got an index
size of 5356 pages, close but still less than the actual.  This is
what makes me think there were some additional update operations
(well, that and the sequence counter being larger than the number of
rows in the table...)

The bogus TIDs are coming from index pages that are clearly corrupt.
For example, the "609" case comes from page 5439.  I tweaked pg_filedump
to know something about printing out GIN leaf pages, and got this:

Block 5439 ********************************************************
<Header> -----Block Offset: 0x02a7e000         Offsets: Lower      24 (0x0018)Block: Size 8192  Version    4
Upper   8184 (0x1ff8)LSN:  logid      0 recoff 0x099cc544      Special  8184 (0x1ff8)Items:    0
FreeSpace: 8160TLI: 0x0001  Prune XID: 0x00000000  Flags: 0x0000 ()Length (including item array): 24
 
 0000: 00000000 44c59c09 01000000 1800f81f  ....D........... 0010: f81f0420 00000000                    ... ....

GIN Data Section: Right Bound: 0/0 Item 1: 1/3 Item 2: 64/5 Item 3: 67/4 Item 4: 90/3 Item 5: 100/5 Item 6: 106/3 Item
7:106/5 Item 8: 114/4 Item 9: 189/5 Item 10: 204/6 Item 11: 300/7 Item 12: 302/5 Item 13: 302/6 Item 14: 309/5 Item 15:
355/2Item 16: 355/4 Item 17: 407/3 Item 18: 472/4 Item 19: 480/6 Item 20: 483/1 Item 21: 486/1 Item 22: 486/3 Item 23:
499/5Item 24: 560/6 Item 25: 584/6 Item 26: 588/3 Item 27: 589/3 Item 28: 660/6 Item 29: 667/4 Item 30: 718/2 Item 31:
719/4Item 32: 738/4 Item 33: 760/6 Item 34: 763/2 Item 35: 764/2 Item 36: 784/6 Item 37: 844/7 Item 38: 912/4 Item 39:
913/3Item 40: 913/5 Item 41: 916/5 Item 42: 930/5 Item 43: 945/1 Item 44: 945/4 Item 45: 945/5 Item 46: 973/7 Item 47:
994/5Item 48: 1036/3 Item 49: 1046/6 Item 50: 1048/4 Item 51: 1048/6 Item 52: 1069/1 Item 53: 1192/4 Item 54: 1205/1
Item55: 1205/3 Item 56: 1280/4 Item 57: 1317/7 Item 58: 1347/5 Item 59: 1363/1 Item 60: 1367/2 Item 61: 1389/5 Item 62:
1390/2Item 63: 1393/2 Item 64: 1400/4 Item 65: 1417/5 Item 66: 1418/3 Item 67: 1513/3 Item 68: 1513/4 Item 69: 1513/5
Item70: 1514/2 Item 71: 1614/4 Item 72: 1654/5 Item 73: 1666/4 Item 74: 1674/1 Item 75: 1690/6 Item 76: 1691/5 Item 77:
1697/5Item 78: 1741/5 Item 79: 393216/609 Item 80: 327680/660 Item 81: 393216/819 Item 82: 327680/820
 

<Special Section> -----GIN Index Section: Flags: 0x00000003 (DATA|LEAF)  Maxoff: 82 Blocks: RightLink (-1)
 1ff8: ffffffff 52000300                    ....R...        

so the last four entries (according to the maxoff) are wrong.
Even more interesting, a binary dump of the page shows there
is data beyond the maxoff point:


Data array begins here, ItemPointerData apiece 0020: 00000100 03000000 40000500 00004300  ........@.....C. 0030:
040000005a000300 00006400 05000000  ....Z.....d..... 0040: 6a000300 00006a00 05000000 72000400  j.....j.....r... 0050:
0000bd0005000000 cc000600 00002c01  ..............,. 0060: 07000000 2e010500 00002e01 06000000  ................ 0070:
3501050000006301 02000000 63010400  5.....c.....c... 0080: 00009701 03000000 d8010400 0000e001  ................ 0090:
06000000e3010100 0000e601 01000000  ................ 00a0: e6010300 0000f301 05000000 30020600  ............0... 00b0:
0000480206000000 4c020300 00004d02  ..H.....L.....M. 00c0: 03000000 94020600 00009b02 04000000  ................ 00d0:
ce0202000000cf02 04000000 e2020400  ................ 00e0: 0000f802 06000000 fb020200 0000fc02  ................ 00f0:
0200000010030600 00004c03 07000000  ..........L..... 0100: 90030400 00009103 03000000 91030500  ................ 0110:
0000940305000000 a2030500 0000b103  ................ 0120: 01000000 b1030400 0000b103 05000000  ................ 0130:
cd0307000000e203 05000000 0c040300  ................ 0140: 00001604 06000000 18040400 00001804  ................ 0150:
060000002d040100 0000a804 04000000  ....-........... 0160: b5040100 0000b504 03000000 00050400  ................ 0170:
0000250507000000 43050500 00005305  ..%.....C.....S. 0180: 01000000 57050200 00006d05 05000000  ....W.....m..... 0190:
6e05020000007105 02000000 78050400  n.....q.....x... 01a0: 00008905 05000000 8a050300 0000e905  ................ 01b0:
03000000e9050400 0000e905 05000000  ................ 01c0: ea050200 00004e06 04000000 76060500  ......N.....v... 01d0:
0000820604000000 8a060100 00009a06  ................ 01e0: 06000000 9b060500 0000a106 05000000  ................ 01f0:
cd060500
first problem slot (79):                06000000 61020500 00009402  ........a....... 0200: 06000000 33030500 00003403
junk here (slots 83 and up):                                  05000000  ....3.....4..... 0210: 34030600 00003303
0500000034030000  4.....3.....4... 0220: 00000000 00000000 00000000 00000000  ................ 0230: 00000000 00000000
0000000000000000  ................
 
... rest of the page is zeroes, up to the special section

What I'm guessing is that either insertion or removal of some entry
or entries on the page was done wrong.  Perhaps maxoff is bigger than
it should be.  But it's also curious that the incorrect data looks
like it might be valid data that's been shifted by two bytes from
where it should be.  Maybe some part of the code is manipulating the
entry array on the assumption that it's an array of PostingItems
instead of ItemPointers?
        regards, tom lane

Re: 8.4b2 tsearch2 strange error

From

Tatsuo Ishii

Date:

05 June 2009, 12:51:34

> I poked around in the dump for awhile.  I still can't reproduce the
> failure from a standing start.  It looks to me like Tatsuo's database
> was possibly produced from separate schema and data load steps, followed
> by some update operations.  It would be nice to have a full script for
> reproducing the state of the database.

Sorry, I don't have full SQL logs.

Here are DML used for the database:
INSERT INTO msginfo(file_size, file_mtime, msg_date, flags, hdr_from,                    hdr_to, hdr_cc,
hdr_newsgroups,hdr_subject,                    hdr_msgid, hdr_inreplyto, hdr_references, body_text,
body_index) VALUES(%u, E'%s', E'%s', %u, E'%s', E'%s', E'%s', E'%s', E'%s', E'%s',         E'%s', E'%s', E'%s',
to_tsvector(E'%s'))
DELETE FROM msginfo WHERE msg_sid = %lu
DELETE FROM msginfo WHERE msg_sid IN (SELECT msg_sid FROM msginfo EXCEPT SELECT msg_sid FROM msg_folderinfo)

So no exotic things appear...
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: 8.4b2 tsearch2 strange error

From

Tom Lane

Date:

05 June 2009, 22:49:13

I wrote:
>> I poked around in the dump for awhile.  I still can't reproduce the
>> failure from a standing start.

I've had a bit of success with reproducing the failure from scratch.
After loading the SQL dump that Tatsuo previously supplied into a UTF8,
C-locale database, generate a data-only dump of it, along the lines of

pg_dump -a ... >ishii-data.sql

Then run this psql script in the database (the function is designed to
exercise every entry in the GIN index):

create table msgstats as select * from ts_stat('select body_index from msginfo');

create language plpgsql;
create function testit () returns int as $$
declare r record;       ifail int := 0;
begin for r in select * from msgstats loop   begin   perform count(*) from msginfo where plainto_tsquery(r.word) @@
body_index;  exception when others then     raise notice 'failed for "%": %', r.word, sqlerrm;     ifail := ifail + 1;
end; end loop; return ifail;

end $$ language plpgsql;

delete from msginfo;
truncate msg_folderinfo;
\i ishii-data.sql
reset client_min_messages;

select testit();

The results are not very stable, but I fairly often get "tuple offset out
of range" errors, and my last try resulted in an Assert failure in
autovacuum.  What I think is happening is that there's some inadequately
locked page-changing logic in there, and if you can get autovacuum to
happen on msginfo concurrently with the data reload step, it'll mess up.
Given that we've not seen reports of this on 8.3, I'll bet lunch that
the bug is somewhere in the "fast insert" code.  I haven't found it
yet, though.

BTW, I also did more analysis of Tatsuo's binary dump using this
function.  Of some 38000 distinct lexemes in the body_index column,
about 1500 trigger an error in testit(); the errors are either
"tuple offset out of range" or something like "could not open segment 1
of relation base/16384/16497 (target block 955254869): No such file or
directory".  There are about 440 distinct messages that show up,
suggesting that approximately that many pages of the index contain
corrupted TIDs.  Manual examination of the index is consistent with
this, and also indicates that all of the corrupt pages are in the second
half of the table.  That's not too hard to square with the
concurrent-update theory --- autovac might have already done about half
the index before a large data load started.  However it means that
corruption is *very* probable if data insertion is happening
concurrently with a vacuum, else not such a large number of pages would
have gotten messed up.
        regards, tom lane

Re: 8.4b2 tsearch2 strange error

From

Tom Lane

Date:

06 June 2009, 02:50:49

I wrote:
> ... What I think is happening is that there's some inadequately
> locked page-changing logic in there, and if you can get autovacuum to
> happen on msginfo concurrently with the data reload step, it'll mess up.
> Given that we've not seen reports of this on 8.3, I'll bet lunch that
> the bug is somewhere in the "fast insert" code.  I haven't found it
> yet, though.

Well, I found at least part of the problem:
http://archives.postgresql.org/message-id/20090606023940.BD4B875331E@cvs.postgresql.org

This is in perfectly sequential code.  The reason it has
nondeterministic effects is that (so far as I can tell) the only
non-crash case where there would be duplicate TIDs to eliminate is if
two backends are concurrently flushing an index's pending-inserts list.
The code is designed to let that happen and suppress the duplicates at
the very end of the process, in addItemPointersToTuple(); but the
duplicate-removal logic was broken and allowed extra garbage TIDs to
creep in.  So at least in the case I'm testing, this happens when
autovacuum fires on the table concurrently with a large insertion.

Please update to CVS HEAD, reindex that index, and then see if you see
any more strange behavior.  I'm not entirely convinced that this is the
only problem ...
        regards, tom lane

Re: 8.4b2 tsearch2 strange error

From

Tatsuo Ishii

Date:

06 June 2009, 03:46:27

> Well, I found at least part of the problem:
> http://archives.postgresql.org/message-id/20090606023940.BD4B875331E@cvs.postgresql.org
> 
> This is in perfectly sequential code.  The reason it has
> nondeterministic effects is that (so far as I can tell) the only
> non-crash case where there would be duplicate TIDs to eliminate is if
> two backends are concurrently flushing an index's pending-inserts list.
> The code is designed to let that happen and suppress the duplicates at
> the very end of the process, in addItemPointersToTuple(); but the
> duplicate-removal logic was broken and allowed extra garbage TIDs to
> creep in.  So at least in the case I'm testing, this happens when
> autovacuum fires on the table concurrently with a large insertion.
> 
> Please update to CVS HEAD, reindex that index, and then see if you see
> any more strange behavior.  I'm not entirely convinced that this is the
> only problem ...

Thanks for investigating the problem. Using CVS HEAD and reindexing
has solved the problems I reported. On Monday I will ask my engineers
try the CVS HEAD and do more operations to see if any strange thing
happen...
--
Tatsuo Ishii
SRA OSS, Inc. Japan