Thread: tuplesort: unexpected end of data
Hi all! First of all, thanks for your fine job and the answers to a problem (which I thought it was a PG bug) I posted some days ago. I've solved it! Now I've got another strange issue to post... Briefing: the program runs daily and this (please see below) error does not always happen, actually this was the first time this week. I hope you can help me?... If I'm doing something wrong, please let me know... Thanks in advance! :-) [JDBC driver error report] org.sourceforge.jxutil.sql.I18nSQLException: sqlError[tuplesort: unexpected end of data]at org.sourceforge.jxdbcon.postgresql.PGExceptions.sqlError(UnknownSource)at org.sourceforge.jxdbcon.postgresql.PGErrors.throwError(UnknownSource)at org.sourceforge.jxdbcon.postgresql.PGResult.checkException(UnknownSource)at org.sourceforge.jxdbcon.postgresql.PGExecResult.checkException(UnknownSource)at org.sourceforge.jxdbcon.postgresql.PGConnection.executeSQL(UnknownSource)at org.sourceforge.jxdbcon.postgresql.PGStatement.execute(UnknownSource)at org.sourceforge.jxdbcon.AbstractStatement.executeQuery(UnknownSource) (...) [PostgreSQL log excerpt] (...) (comment: the query in question) Jun 2 05:35:48 srv31 postgres[2986]: [57277] DEBUG: query: SELECT fill_warehouse() Jun 2 05:35:48 srv31 postgres[2986]: [57278-1] DEBUG: query: insert into warehouse (uri, expression, n, relevance, spid_measure,size, title, sample) select d.uri, Jun 2 05:35:48 srv31 postgres[2986]: [57278-2] dn.expression, n.n, dn.relevance, d.spid_measure, d.size, d.title, dn.samplefrom document as d inner join (document_n_gram as Jun 2 05:35:48 srv31 postgres[2986]: [57278-3] dn inner join n_gram as n on (dn.expression = n.expression)) on (d.uri =dn.uri) order by dn.expression asc, n.n asc Jun 2 06:37:08 srv31 postgres[11485]: [3317] DEBUG: recycled transaction log file 00000018000000DA Jun 2 06:37:08 srv31 postgres[11485]: [3318] DEBUG: recycled transaction log file 00000018000000DB (commment: this is a local monitor server) Jun 2 06:39:39 srv31 postgres[11488]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 06:44:38 srv31 postgres[11498]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 06:49:39 srv31 postgres[11505]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 06:54:39 srv31 postgres[11515]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 06:59:38 srv31 postgres[11522]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 07:04:39 srv31 postgres[11541]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 07:09:39 srv31 postgres[11548]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 07:14:38 srv31 postgres[11558]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 07:19:38 srv31 postgres[11565]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 07:24:39 srv31 postgres[11575]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 (comment: the error) Jun 2 06:29:37 srv31 postgres[2986]: [57279] ERROR: tuplesort: unexpected end of data Jun 2 06:29:37 srv31 postgres[2986]: [57280] NOTICE: Error occurred while executing PL/pgSQL function fill_warehouse Jun 2 06:29:37 srv31 postgres[2986]: [57281] NOTICE: line 2 at SQL statement Jun 2 07:29:39 srv31 postgres[11582]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, databasetemplate1 Jun 2 06:29:40 srv31 postgres[2986]: [57282] DEBUG: query: COMMIT Jun 2 06:29:40 srv31 postgres[2986]: [57283] DEBUG: ProcessUtility: COMMIT Jun 2 06:29:41 srv31 postgres[2985]: [48998] DEBUG: pq_recvbuf: unexpected EOF on client connection Jun 2 06:29:41 srv31 postgres[2982]: [3335] DEBUG: pq_recvbuf: unexpected EOF on client connection Jun 2 06:29:41 srv31 postgres[2986]: [57284] DEBUG: pq_recvbuf: unexpected EOF on client connection (...) [System info] Red Hat Linux release 7.2 (Enigma) Kernel 2.4.9-13 postgresql-libs-7.2.1-2PGDG postgresql-server-7.2.1-2PGDG postgresql-7.2.1-2PGDG -- o__ Bem haja, _.>/ _ NunoACHenriques (_) \(_) ~~~~~~~~~~~~~~~ http://students.fct.unl.pt/users/nuno/
NunoACHenriques <nach@fct.unl.pt> writes: > Jun 2 06:29:37 srv31 postgres[2986]: [57279] ERROR: tuplesort: unexpected end of data Hmm. This is an internal consistency check in the sort code. Perhaps you've found a bug, but there's not enough info here to do much. Can you provide the EXPLAIN plan for the query that's triggering the error? regards, tom lane
Hi! I should say that this error is a non-deterministic one. It happened once/twenty... -------------------------------explain info--------------------------------- spid=> explain insert into warehouse_tmp (uri, expression, n, relevance, spid_measure, size, title, sample) spid-> select d.uri, dn.expression, n.n, dn.relevance, d.spid_measure, d.size, d.title, dn.sample spid-> from document as d spid-> inner join (document_n_gram as dn spid(> inner join n_gram as n spid(> on (dn.expression = n.expression)) spid-> on (d.uri = dn.uri) spid-> order by dn.expression asc, n.n asc ; NOTICE: QUERY PLAN: Subquery Scan *SELECT* (cost=3895109.07..3895109.07 rows=1009271 width=886) -> Sort (cost=3895109.07..3895109.07 rows=1009271width=886) -> Hash Join (cost=1155071.81..2115045.12 rows=1009271 width=886) -> Merge Join (cost=1154294.92..1170599.85 rows=1009271 width=588) -> Sort (cost=1001390.67..1001390.67 rows=1009271width=439) -> Seq Scan on document_n_gram dn (cost=0.00..49251.71 rows=1009271 width=439) -> Sort (cost=152904.25..152904.25 rows=466345 width=149) -> Seq Scanon n_gram n (cost=0.00..12795.45 rows=466345 width=149) -> Hash (cost=767.71..767.71 rows=3671 width=298) -> Seq Scan on document d (cost=0.00..767.71 rows=3671 width=298) EXPLAIN spid=> ---------------------------------------------------------------------------- If you need more info, just ask. I'll be glad to contribute to a so interesting project! :-) -- o__ Bem haja, _.>/ _ NunoACHenriques (_) \(_) ~~~~~~~~~~~~~~~ http://students.fct.unl.pt/users/nuno/ On Sun, 9 Jun 2002, Tom Lane wrote: >NunoACHenriques <nach@fct.unl.pt> writes: >> Jun 2 06:29:37 srv31 postgres[2986]: [57279] ERROR: tuplesort: unexpected end of data > >Hmm. This is an internal consistency check in the sort code. Perhaps >you've found a bug, but there's not enough info here to do much. Can >you provide the EXPLAIN plan for the query that's triggering the error? > > regards, tom lane >
NunoACHenriques <nach@fct.unl.pt> writes: > I should say that this error is a non-deterministic one. It happened > once/twenty... Is the data in the tables changing constantly? If you can repeat the same query on the same data and get varying results, then we're dealing with something odder than I suspected. regards, tom lane
Hi! On Sun, 9 Jun 2002, Tom Lane wrote: >Is the data in the tables changing constantly? If you can repeat the >same query on the same data and get varying results, then we're >dealing with something odder than I suspected. > > regards, tom lane > Not constantly, once a day. I cannot repeat the same query on the same data, but the app does the same query over similar data (most static Web docs with little variation of about 50 docs in 7000) but diferent to PG because of the drop/create tables before the new fill... -- o__ Bem haja, _.>/ _ NunoACHenriques (_) \(_) ~~~~~~~~~~~~~~~ http://students.fct.unl.pt/users/nuno/
NunoACHenriques <nach@fct.unl.pt> writes: > On Sun, 9 Jun 2002, Tom Lane wrote: >> Is the data in the tables changing constantly? > Not constantly, once a day. Can't you set up a situation where the failure is reproducible, then? On a day where you get the failure, dump the database and see if you can load the data into a fresh database and reproduce the problem. I spent some time looking at the tuplesort code and could not see any reason for this failure. All that code has been fairly stable since it was written for 7.0, and AFAIR no one else has reported this error message. So either you've got a quite-unusual corner case that's tickling a previously unseen bug, or you've got flaky hardware that for some reason is manifesting in this way. I don't necessarily believe the flaky-hardware theory, but I can't make much progress on the bug theory without a test case to look at. regards, tom lane
On Sun, 9 Jun 2002, Tom Lane wrote: >Can't you set up a situation where the failure is reproducible, then? >On a day where you get the failure, dump the database and see if >you can load the data into a fresh database and reproduce the problem. >Ok, I will do that... >I don't necessarily believe the flaky-hardware theory, but I can't >make much progress on the bug theory without a test case to look at. >Neither I believe it because the machine is well tested (including a 24h memtest). But there is something I can't get of my mind: once a day my app "forces" PG to "play" with some 3GB of disk data in a ext2 fs. It is known that sometimes ext2 corrupts data... Thanks for the effort! :-) -- o__ Bem haja, _.>/ _ NunoACHenriques (_) \(_) ~~~~~~~~~~~~~~~ http://students.fct.unl.pt/users/nuno/
NunoACHenriques <nach@fct.unl.pt> writes: > Neither I believe it because the machine is well tested (including a > 24h memtest). But there is something I can't get of my mind: once a day my > app "forces" PG to "play" with some 3GB of disk data in a ext2 fs. It is > known that sometimes ext2 corrupts data... Unless you've got things set up so that the temporary files created for the sorting step are in the ext2 partition, this doesn't seem like it could be the source of the problem. A more plausible theory is that a segment of main RAM is bad, but you happen not to use that part of RAM except under heavy load (ie, while running this daily batch job). Or it could just be a bug. If you can get a reproducible test case I'll be happy to dig into it. regards, tom lane
Hi! A different error today: [MemoryContextAlloc: invalid request size 4294967295] This is a more often (twice a week) error and I don't understand why?... I'm verifying the machine: fsck (with bad blocks chk), ... but no hardware problems untill now. --------------------------------info------------------------------------ $ pg_config --version PostgreSQL 7.2.1 $ cat /etc/redhat-release Red Hat Linux release 7.2 (Enigma) $/var/log/pgsql (excerpt) (...) Jun 11 03:06:04 srv31 postgres[13914]: [3403] ERROR: cannot open segment 1 of relation n_gram (target block 528325): Nosuch file or directory (...) Jun 11 04:26:12 srv31 postgres[14972]: [3317] DEBUG: recycled transaction log file 00000020000000F6 Jun 11 04:26:12 srv31 postgres[14972]: [3318] DEBUG: recycled transaction log file 00000020000000F7 Jun 11 04:26:12 srv31 postgres[14972]: [3319] DEBUG: recycled transaction log file 00000020000000F8 Jun 11 04:28:56 srv31 postgres[14983]: [3317] DEBUG: recycled transaction log file 00000020000000F9 Jun 11 04:28:56 srv31 postgres[14983]: [3318] DEBUG: recycled transaction log file 00000020000000FA Jun 11 04:28:56 srv31 postgres[14983]: [3319] DEBUG: recycled transaction log file 00000020000000FB Jun 11 03:29:07 srv31 postgres[13913]: [3383] ERROR: MemoryContextAlloc: invalid request size 4294967295 Jun 11 03:29:07 srv31 postgres[13913]: [3384] NOTICE: Error occurred while executing PL/pgSQL function set_n_gram Jun 11 03:29:07 srv31 postgres[13913]: [3385] NOTICE: line 9 at select into variables (...) ------------------------------------------------------------------------ -- o__ Bem haja, _.>/ _ NunoACHenriques (_) \(_) ~~~~~~~~~~~~~~~ http://students.fct.unl.pt/users/nuno/
NunoACHenriques <nach@fct.unl.pt> writes: > A different error today: > [MemoryContextAlloc: invalid request size 4294967295] This could be a variant of the same problem: instead of getting a zero tuple length from the sort temp file, we're reading a -1 tuple length. Still no way to tell if it's a hardware glitch or a software bug. (If the latter, presumably the code is getting out of step about its read position in the temp file --- but how?) regards, tom lane