Thread: tuplesort: unexpected end of data

tuplesort: unexpected end of data

From
NunoACHenriques
Date:
Hi all!
First of all, thanks for your fine job and the answers to a problem
(which I thought it was a PG bug) I posted some days ago. I've solved it!
Now I've got another strange issue to post...
Briefing: the program runs daily and this (please see below) error
does not always happen, actually this was the first time this week.
I hope you can help me?... If I'm doing something wrong, please let
me know...
Thanks in advance! :-)


[JDBC driver error report]
org.sourceforge.jxutil.sql.I18nSQLException: sqlError[tuplesort: unexpected end of data]at
org.sourceforge.jxdbcon.postgresql.PGExceptions.sqlError(UnknownSource)at
org.sourceforge.jxdbcon.postgresql.PGErrors.throwError(UnknownSource)at
org.sourceforge.jxdbcon.postgresql.PGResult.checkException(UnknownSource)at
org.sourceforge.jxdbcon.postgresql.PGExecResult.checkException(UnknownSource)at
org.sourceforge.jxdbcon.postgresql.PGConnection.executeSQL(UnknownSource)at
org.sourceforge.jxdbcon.postgresql.PGStatement.execute(UnknownSource)at
org.sourceforge.jxdbcon.AbstractStatement.executeQuery(UnknownSource)
 
(...)

[PostgreSQL log excerpt]
(...)
(comment: the query in question)
Jun  2 05:35:48 srv31 postgres[2986]: [57277] DEBUG:  query: SELECT  fill_warehouse()
Jun  2 05:35:48 srv31 postgres[2986]: [57278-1] DEBUG:  query: insert into warehouse (uri, expression, n, relevance,
spid_measure,size, title, sample) select d.uri,
 
Jun  2 05:35:48 srv31 postgres[2986]: [57278-2]  dn.expression, n.n, dn.relevance, d.spid_measure, d.size, d.title,
dn.samplefrom document as d inner join (document_n_gram as
 
Jun  2 05:35:48 srv31 postgres[2986]: [57278-3]  dn inner join n_gram as n on (dn.expression = n.expression)) on (d.uri
=dn.uri) order by dn.expression asc, n.n asc
 
Jun  2 06:37:08 srv31 postgres[11485]: [3317] DEBUG:  recycled transaction log file 00000018000000DA
Jun  2 06:37:08 srv31 postgres[11485]: [3318] DEBUG:  recycled transaction log file 00000018000000DB
(commment: this is a local monitor server)
Jun  2 06:39:39 srv31 postgres[11488]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 06:44:38 srv31 postgres[11498]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 06:49:39 srv31 postgres[11505]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 06:54:39 srv31 postgres[11515]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 06:59:38 srv31 postgres[11522]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 07:04:39 srv31 postgres[11541]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 07:09:39 srv31 postgres[11548]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 07:14:38 srv31 postgres[11558]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 07:19:38 srv31 postgres[11565]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 07:24:39 srv31 postgres[11575]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
(comment: the error)
Jun  2 06:29:37 srv31 postgres[2986]: [57279] ERROR:  tuplesort: unexpected end of data
Jun  2 06:29:37 srv31 postgres[2986]: [57280] NOTICE:  Error occurred while executing PL/pgSQL function fill_warehouse
Jun  2 06:29:37 srv31 postgres[2986]: [57281] NOTICE:  line 2 at SQL statement
Jun  2 07:29:39 srv31 postgres[11582]: [3317] FATAL 1:  No pg_hba.conf entry for host 193.136.120.30, user root,
databasetemplate1
 
Jun  2 06:29:40 srv31 postgres[2986]: [57282] DEBUG:  query: COMMIT
Jun  2 06:29:40 srv31 postgres[2986]: [57283] DEBUG:  ProcessUtility: COMMIT
Jun  2 06:29:41 srv31 postgres[2985]: [48998] DEBUG:  pq_recvbuf: unexpected EOF on client connection
Jun  2 06:29:41 srv31 postgres[2982]: [3335] DEBUG:  pq_recvbuf: unexpected EOF on client connection
Jun  2 06:29:41 srv31 postgres[2986]: [57284] DEBUG:  pq_recvbuf: unexpected EOF on client connection
(...)

[System info]
Red Hat Linux release 7.2 (Enigma)
Kernel 2.4.9-13
postgresql-libs-7.2.1-2PGDG
postgresql-server-7.2.1-2PGDG
postgresql-7.2.1-2PGDG

--     o__        Bem haja,   _.>/ _            NunoACHenriques  (_) \(_)            ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/



Re: tuplesort: unexpected end of data

From
Tom Lane
Date:
NunoACHenriques <nach@fct.unl.pt> writes:
> Jun  2 06:29:37 srv31 postgres[2986]: [57279] ERROR:  tuplesort: unexpected end of data

Hmm.  This is an internal consistency check in the sort code.  Perhaps
you've found a bug, but there's not enough info here to do much.  Can
you provide the EXPLAIN plan for the query that's triggering the error?
        regards, tom lane


Re: tuplesort: unexpected end of data

From
NunoACHenriques
Date:
Hi!
I should say that this error is a non-deterministic one. It happened 
once/twenty...

-------------------------------explain info---------------------------------
spid=> explain insert into warehouse_tmp (uri, expression, n, relevance, spid_measure, size, title, sample)
spid->    select d.uri, dn.expression, n.n, dn.relevance, d.spid_measure, d.size, d.title, dn.sample
spid->    from document as d
spid->         inner join (document_n_gram as dn
spid(>                     inner join n_gram as n
spid(>                     on (dn.expression = n.expression))
spid->         on (d.uri = dn.uri)
spid->    order by dn.expression asc, n.n asc ;
NOTICE:  QUERY PLAN:

Subquery Scan *SELECT*  (cost=3895109.07..3895109.07 rows=1009271 width=886) ->  Sort  (cost=3895109.07..3895109.07
rows=1009271width=886)       ->  Hash Join  (cost=1155071.81..2115045.12 rows=1009271 width=886)             ->  Merge
Join (cost=1154294.92..1170599.85 rows=1009271 width=588)                   ->  Sort  (cost=1001390.67..1001390.67
rows=1009271width=439)                         ->  Seq Scan on document_n_gram dn  (cost=0.00..49251.71 rows=1009271
width=439)                  ->  Sort  (cost=152904.25..152904.25 rows=466345 width=149)                         ->  Seq
Scanon n_gram n  (cost=0.00..12795.45 rows=466345 width=149)             ->  Hash  (cost=767.71..767.71 rows=3671
width=298)                  ->  Seq Scan on document d  (cost=0.00..767.71 rows=3671 width=298)
 

EXPLAIN
spid=> 
----------------------------------------------------------------------------
If you need more info, just ask. I'll be glad to contribute to a so 
interesting project! :-)

--     o__        Bem haja,   _.>/ _            NunoACHenriques  (_) \(_)            ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/

On Sun, 9 Jun 2002, Tom Lane wrote:

>NunoACHenriques <nach@fct.unl.pt> writes:
>> Jun  2 06:29:37 srv31 postgres[2986]: [57279] ERROR:  tuplesort: unexpected end of data
>
>Hmm.  This is an internal consistency check in the sort code.  Perhaps
>you've found a bug, but there's not enough info here to do much.  Can
>you provide the EXPLAIN plan for the query that's triggering the error?
>
>            regards, tom lane
>



Re: tuplesort: unexpected end of data

From
Tom Lane
Date:
NunoACHenriques <nach@fct.unl.pt> writes:
>     I should say that this error is a non-deterministic one. It happened 
> once/twenty...

Is the data in the tables changing constantly?  If you can repeat the
same query on the same data and get varying results, then we're
dealing with something odder than I suspected.
        regards, tom lane


Re: tuplesort: unexpected end of data

From
NunoACHenriques
Date:
Hi!

On Sun, 9 Jun 2002, Tom Lane wrote:

>Is the data in the tables changing constantly?  If you can repeat the
>same query on the same data and get varying results, then we're
>dealing with something odder than I suspected.
>
>            regards, tom lane
>
Not constantly, once a day.
I cannot repeat the same query on the same data, but the app does the 
same query over similar data (most static Web docs with little variation of 
about 50 docs in 7000) but diferent to PG because of the drop/create 
tables before the new fill...

--     o__        Bem haja,   _.>/ _            NunoACHenriques  (_) \(_)            ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/




Re: tuplesort: unexpected end of data

From
Tom Lane
Date:
NunoACHenriques <nach@fct.unl.pt> writes:
> On Sun, 9 Jun 2002, Tom Lane wrote:
>> Is the data in the tables changing constantly?

>     Not constantly, once a day.

Can't you set up a situation where the failure is reproducible, then?
On a day where you get the failure, dump the database and see if
you can load the data into a fresh database and reproduce the problem.

I spent some time looking at the tuplesort code and could not see any
reason for this failure.  All that code has been fairly stable since
it was written for 7.0, and AFAIR no one else has reported this error
message.  So either you've got a quite-unusual corner case that's
tickling a previously unseen bug, or you've got flaky hardware that for
some reason is manifesting in this way.

I don't necessarily believe the flaky-hardware theory, but I can't
make much progress on the bug theory without a test case to look at.
        regards, tom lane


Re: tuplesort: unexpected end of data

From
NunoACHenriques
Date:
On Sun, 9 Jun 2002, Tom Lane wrote:

>Can't you set up a situation where the failure is reproducible, then?
>On a day where you get the failure, dump the database and see if
>you can load the data into a fresh database and reproduce the problem.
>Ok, I will do that...

>I don't necessarily believe the flaky-hardware theory, but I can't
>make much progress on the bug theory without a test case to look at.
>Neither I believe it because the machine is well tested (including a
24h memtest). But there is something I can't get of my mind: once a day my
app "forces" PG to "play" with some 3GB of disk data in a ext2 fs. It is
known that sometimes ext2 corrupts data...
Thanks for the effort! :-)

--     o__        Bem haja,   _.>/ _            NunoACHenriques  (_) \(_)            ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/





Re: tuplesort: unexpected end of data

From
Tom Lane
Date:
NunoACHenriques <nach@fct.unl.pt> writes:
>     Neither I believe it because the machine is well tested (including a
> 24h memtest). But there is something I can't get of my mind: once a day my
> app "forces" PG to "play" with some 3GB of disk data in a ext2 fs. It is
> known that sometimes ext2 corrupts data...

Unless you've got things set up so that the temporary files created for
the sorting step are in the ext2 partition, this doesn't seem like it
could be the source of the problem.  A more plausible theory is that a
segment of main RAM is bad, but you happen not to use that part of RAM
except under heavy load (ie, while running this daily batch job).

Or it could just be a bug.  If you can get a reproducible test case I'll
be happy to dig into it.
        regards, tom lane


Re: tuplesort: unexpected end of data

From
NunoACHenriques
Date:
Hi!
A different error today:
[MemoryContextAlloc: invalid request size 4294967295]
This is a more often (twice a week) error and I don't understand 
why?...
I'm verifying the machine: fsck (with bad blocks chk), ... but no
hardware problems untill now.

--------------------------------info------------------------------------
$ pg_config --version
PostgreSQL 7.2.1

$ cat /etc/redhat-release 
Red Hat Linux release 7.2 (Enigma)

$/var/log/pgsql (excerpt)
(...)
Jun 11 03:06:04 srv31 postgres[13914]: [3403] ERROR:  cannot open segment 1 of relation n_gram (target block 528325):
Nosuch file or directory
 
(...)
Jun 11 04:26:12 srv31 postgres[14972]: [3317] DEBUG:  recycled transaction log file 00000020000000F6
Jun 11 04:26:12 srv31 postgres[14972]: [3318] DEBUG:  recycled transaction log file 00000020000000F7
Jun 11 04:26:12 srv31 postgres[14972]: [3319] DEBUG:  recycled transaction log file 00000020000000F8
Jun 11 04:28:56 srv31 postgres[14983]: [3317] DEBUG:  recycled transaction log file 00000020000000F9
Jun 11 04:28:56 srv31 postgres[14983]: [3318] DEBUG:  recycled transaction log file 00000020000000FA
Jun 11 04:28:56 srv31 postgres[14983]: [3319] DEBUG:  recycled transaction log file 00000020000000FB
Jun 11 03:29:07 srv31 postgres[13913]: [3383] ERROR:  MemoryContextAlloc: invalid request size 4294967295
Jun 11 03:29:07 srv31 postgres[13913]: [3384] NOTICE:  Error occurred while executing PL/pgSQL function set_n_gram
Jun 11 03:29:07 srv31 postgres[13913]: [3385] NOTICE:  line 9 at select into variables
(...)
------------------------------------------------------------------------

--    o__        Bem haja,   _.>/ _            NunoACHenriques  (_) \(_)            ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/



Re: tuplesort: unexpected end of data

From
Tom Lane
Date:
NunoACHenriques <nach@fct.unl.pt> writes:
>     A different error today:
> [MemoryContextAlloc: invalid request size 4294967295]

This could be a variant of the same problem: instead of getting a zero
tuple length from the sort temp file, we're reading a -1 tuple length.
Still no way to tell if it's a hardware glitch or a software bug.
(If the latter, presumably the code is getting out of step about its
read position in the temp file --- but how?)
        regards, tom lane