Thread: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode

Hi I’m looking for some guidance related to the subject line issue.

PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit

128GB RAM

pgsql_tmp is on a FS with 2+TB free

 

Information that I’ve been able to find on the web indicates

  1. That the error message has been updated ( i.e. SUCCESS is not the proper value)
  2. That the error is due to running out of temporary space either disk space or maybe temp_buffers?

 

,Would the presumption that the query is running out of temp_buffers be correct? Or is there something else that I should be looking at/for? (i.e. I’m not aware of any disk quota rules that would restrict creation of tmp files on disk)

 

 

Thanks,

reid

Reid Thompson <Reid.Thompson@omnicell.com> writes:
> Hi I'm looking for some guidance related to the subject line issue.

Is this repeatable?  If so you've found a bug, I think.

>   1.  That the error message has been updated ( i.e. SUCCESS is not the proper value)

Yeah, what this really indicates is an incomplete read (file shorter than
expected).  Since 11.8, we've improved the error reporting for that, but
that wouldn't in itself fix whatever the underlying problem is.

>   2.  That the error is due to running out of temporary space either disk space or maybe temp_buffers?

That could be the proximate cause, although then there would be a bug
that the original write failure wasn't detected.  But it seems about
as likely that there's just some inconsistency between what the temp
file writing code wrote and what the reading code expects to read.

Is this a parallelized hash join by any chance?  That's new in v11
if memory serves, so it'd be interesting to see if disabling
enable_parallel_hash changes anything.

Anyway, I'd counsel updating to current (11.11), and then if you can
still reproduce the problem, try to reduce it to a self-contained
test case.

            regards, tom lane



On 2021-Apr-19, Reid Thompson wrote:

> Hi I'm looking for some guidance related to the subject line issue.
> PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
> 128GB RAM
> pgsql_tmp is on a FS with 2+TB free

This bug report looks familiar.  I think it was fixed in the below commit and
that you'd benefit from running an up-to-date version (11.11).

Author: Thomas Munro <tmunro@postgresql.org>
Branch: master [7897e3bb9] 2020-06-16 16:59:07 +1200
Branch: REL_13_STABLE Release: REL_13_0 [3e0b08c40] 2020-06-16 17:00:06 +1200
Branch: REL_12_STABLE Release: REL_12_4 [28ee12669] 2020-06-16 17:00:21 +1200
Branch: REL_11_STABLE Release: REL_11_9 [9c14d6024] 2020-06-16 17:00:37 +1200
Branch: REL_10_STABLE Release: REL_10_14 [95647a1c7] 2020-06-16 17:00:53 +1200
Branch: REL9_6_STABLE Release: REL9_6_19 [02b71f06b] 2020-06-16 17:01:07 +1200
Branch: REL9_5_STABLE Release: REL9_5_23 [89020a92f] 2020-06-16 17:01:22 +1200

    Fix buffile.c error handling.
    
    Convert buffile.c error handling to use ereport.  This fixes cases where
    I/O errors were indistinguishable from EOF or not reported.  Also remove
    "%m" from error messages where errno would be bogus.  While we're
    modifying those strings, add block numbers and short read byte counts
    where appropriate.
    
    Back-patch to all supported releases.
    
    Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
    Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
    Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
    Reviewed-by: Robert Haas <robertmhaas@gmail.com>
    Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
    Reviewed-by: Michael Paquier <michael@paquier.xyz>
    Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com


-- 
Álvaro Herrera                            39°49'30"S 73°17'W
EnterpriseDB                https://www.enterprisedb.com



Thanks - I found that, which seems to fix the error handling right? Or does it actually correct the cause of the
segfaultalso? 
Any suggestion on how to avoid the error until we can schedule an upgrade?
Would increasing temp_buffers or some other setting for this query potentially avoid the issue until then?


-----Original Message-----
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Sent: Monday, April 19, 2021 10:09 AM
To: Reid Thompson <Reid.Thompson@omnicell.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode

[EXTERNAL SOURCE]



On 2021-Apr-19, Reid Thompson wrote:

> Hi I'm looking for some guidance related to the subject line issue.
> PostgreSQL 11.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
> 20150623 (Red Hat 4.8.5-39), 64-bit 128GB RAM pgsql_tmp is on a FS
> with 2+TB free

This bug report looks familiar.  I think it was fixed in the below commit and that you'd benefit from running an
up-to-dateversion (11.11). 

Author: Thomas Munro <tmunro@postgresql.org>
Branch: master [7897e3bb9] 2020-06-16 16:59:07 +1200
Branch: REL_13_STABLE Release: REL_13_0 [3e0b08c40] 2020-06-16 17:00:06 +1200
Branch: REL_12_STABLE Release: REL_12_4 [28ee12669] 2020-06-16 17:00:21 +1200
Branch: REL_11_STABLE Release: REL_11_9 [9c14d6024] 2020-06-16 17:00:37 +1200
Branch: REL_10_STABLE Release: REL_10_14 [95647a1c7] 2020-06-16 17:00:53 +1200
Branch: REL9_6_STABLE Release: REL9_6_19 [02b71f06b] 2020-06-16 17:01:07 +1200
Branch: REL9_5_STABLE Release: REL9_5_23 [89020a92f] 2020-06-16 17:01:22 +1200

    Fix buffile.c error handling.

    Convert buffile.c error handling to use ereport.  This fixes cases where
    I/O errors were indistinguishable from EOF or not reported.  Also remove
    "%m" from error messages where errno would be bogus.  While we're
    modifying those strings, add block numbers and short read byte counts
    where appropriate.

    Back-patch to all supported releases.

    Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
    Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
    Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
    Reviewed-by: Robert Haas <robertmhaas@gmail.com>
    Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
    Reviewed-by: Michael Paquier <michael@paquier.xyz>
    Discussion:
https://urldefense.com/v3/__https://postgr.es/m/CA*2BhUKGJE04G*3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw*40mail.gmail.com__;JSUl!!N6reDgEgb0HY4g!zaSosN1AQwgx5QR6S1H3a3cbt_0DC3yUUvi9IgYNtSVGRz3V_ZP697VcI9_USNGGGu8C$


--
Álvaro Herrera                            39°49'30"S 73°17'W
EnterpriseDB
https://urldefense.com/v3/__https://www.enterprisedb.com__;!!N6reDgEgb0HY4g!zaSosN1AQwgx5QR6S1H3a3cbt_0DC3yUUvi9IgYNtSVGRz3V_ZP697VcI9_USHTtYxZZ$



On 2021-Apr-19, Reid Thompson wrote:

> Thanks - I found that, which seems to fix the error handling right? Or
> does it actually correct the cause of the segfault also?

Uh, what segfault?  You didn't mention one.  Yes, it fixes the error
handling, so when the system runs out of disk space, that's correctly
reported instead of continuing.

... Ah, I see now that you mentioned that the DB goes in recovery mode
in the subject line.  That's exactly why I was looking at that problem
last year.  What I saw is that the hash-join spill-to-disk phase runs
out of disk, so the disk file is corrupt; later the hash-join reads that
data back in memory, but because it is incomplete, it follows a broken
pointer somewhere and causes a crash.

(In our customer case it was actually a bit more complicated: they had
*two* sessions running the same large hash-join query, and one of them
filled up disk first, then the other also did that; some time later one
of them raised an ERROR freeing up disk space, which allowed the other
to continue until it tried to read hash-join data back and crashed).

So, yes, the fix will avoid the crash by the fact that once you run out
of disk space, the hash-join will be aborted and nothing will try to
read broken data.

You'll probably have to rewrite your query to avoid eating 2TB of disk
space.

-- 
Álvaro Herrera       Valdivia, Chile



Alvaro,
Thanks for the responses and the explanation. It is appreciated.  We will investigate re-writing the query to minimize
theresult set. 

Thanks again.
reid

-----Original Message-----
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Sent: Monday, April 19, 2021 11:59 AM
To: Reid Thompson <Reid.Thompson@omnicell.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode

[EXTERNAL SOURCE]



On 2021-Apr-19, Reid Thompson wrote:

> Thanks - I found that, which seems to fix the error handling right? Or
> does it actually correct the cause of the segfault also?

Uh, what segfault?  You didn't mention one.  Yes, it fixes the error handling, so when the system runs out of disk
space,that's correctly reported instead of continuing. 

... Ah, I see now that you mentioned that the DB goes in recovery mode in the subject line.  That's exactly why I was
lookingat that problem last year.  What I saw is that the hash-join spill-to-disk phase runs out of disk, so the disk
fileis corrupt; later the hash-join reads that data back in memory, but because it is incomplete, it follows a broken
pointersomewhere and causes a crash. 

(In our customer case it was actually a bit more complicated: they had
*two* sessions running the same large hash-join query, and one of them filled up disk first, then the other also did
that;some time later one of them raised an ERROR freeing up disk space, which allowed the other to continue until it
triedto read hash-join data back and crashed). 

So, yes, the fix will avoid the crash by the fact that once you run out of disk space, the hash-join will be aborted
andnothing will try to read broken data. 

You'll probably have to rewrite your query to avoid eating 2TB of disk space.

--
Álvaro Herrera       Valdivia, Chile



Responses inline below
Thanks,
reid

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Monday, April 19, 2021 9:54 AM
To: Reid Thompson <Reid.Thompson@omnicell.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: could not read from hash-join temporary file: SUCCESS && DB goes into recovery mode

Reid Thompson <Reid.Thompson@omnicell.com> writes:
> Hi I'm looking for some guidance related to the subject line issue.

Is this repeatable?  If so you've found a bug, I think.

Alvaro has noted   https://www.postgresql.org/message-id/20210419155928.GA3253%40alvherre.pgsql  that the mentioned bug
fixon the error reporting should correct this ( as the hash join will abort with the fix rather than not recognizing
thatit needs to, and then attempt a later read) 
Somewhat -- I think it's dependent on what else may be writing tmp files ( i.e. it's happened twice in the past few
dayswith successful runs in the interim and I believe that this query has been in place for a long time prior to the
firstoccurrence) 

>   1.  That the error message has been updated ( i.e. SUCCESS is not
> the proper value)

Yeah, what this really indicates is an incomplete read (file shorter than expected).  Since 11.8, we've improved the
errorreporting for that, but that wouldn't in itself fix whatever the underlying problem is.   

See message noted above from Alvaro.

>   2.  That the error is due to running out of temporary space either disk space or maybe temp_buffers?

That could be the proximate cause, although then there would be a bug that the original write failure wasn't detected.
Butit seems about as likely that there's just some inconsistency between what the temp file writing code wrote and what
thereading code expects to read. 

Is this a parallelized hash join by any chance?  That's new in v11 if memory serves, so it'd be interesting to see if
disablingenable_parallel_hash changes anything. 

Explain on the host does not indicate parallelization anywhere - it does indicate that we can narrow the result set by
optimizingour query which we are implementing now.  Hopefully this will get us to a scheduled upgrade timeframe. 

Anyway, I'd counsel updating to current (11.11), and then if you can still reproduce the problem, try to reduce it to a
self-containedtest case. 

                        regards, tom lane