Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free orcorruption (!prev) - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free orcorruption (!prev)
Date
Msg-id 20190826025421.GD7201@telsasoft.com
Whole thread Raw
In response to Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free orcorruption (!prev)  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
I'm not sure but maybe this is useful ?

|(gdb) p VfdCache[2397]
|$9 = {fd = -1, fdstate = 0, resowner = 0x24f93e0, nextFree = 2393, lruMoreRecently = 0, lruLessRecently = 2360,
seekPos= 73016512, fileSize = 0, fileName = 0x0, fileFlags = 2, fileMode = 384}
 

Knowing this report, very possibly this was a tempfile, possible a parallel fileset.
I don't see parallel workers in the query plan, but I do have external sort:

                                                               ->  Sort  (cost=20801.16..20801.60 rows=175 width=628)
(actualtime=39150.619..40730.793 rows=2002440 loops=1)
 
                                                                     Sort Key: data_cell.sect_id
                                                                     Sort Method: external sort  Disk: 1613224kB
                                                                     ->  Hash Join  (cost=20649.46..20794.64 rows=175
width=628)(actual time=736.734..2423.020 rows=2002440 loops=1)
 
                                                                           Hash Cond:
(eric_enodeb_cell_201908.start_time= data_cell.period)
 

Note, we run report with enable_nestloop=off (say what you will).  And the
report runs within an transaction which we roll back.  I suspect that's maybe
relevant for cleaning up files.

Ah, it's more than 24h old but saved the logfile with crash marker, so I found:

sudo zgrep 26188 /var/log/postgresql/crash-postgresql-2019-08-24_121600.log.gz
< 2019-08-24 12:16:17.037 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.5", size 336726940
< 2019-08-24 12:16:17.038 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.54", size 6354
< 2019-08-24 12:16:20.081 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.6", size 270914376
< 2019-08-24 12:16:20.083 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.39", size 7024
< 2019-08-24 12:16:23.464 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.7", size 213332328
< 2019-08-24 12:16:23.465 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.37", size 5984
...
< 2019-08-24 12:17:42.966 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.125", size 6534
< 2019-08-24 12:17:43.035 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.97", size 72828152
< 2019-08-24 12:17:43.036 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.112", size 6474
< 2019-08-24 12:17:43.114 CDT telsasoft >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp26188.90", size 73016512
< 2019-08-24 12:17:50.109 CDT  >LOG:  server process (PID 26188) was terminated by signal 6: Aborted

sudo zgrep -cFw 'pgsql_tmp26188' /var/log/postgresql/crash-postgresql-2019-08-24_121600.log.gz 
=> 116

sudo zgrep -Fw 'pgsql_tmp26188' /var/log/postgresql/crash-postgresql-2019-08-24_121600.log.gz |awk '{s+=$NF}END{print
s/1024^3}'
9.86781

FWIW:

 log_temp_files       | 0
 stats_temp_directory | pg_stat_tmp
 temp_file_limit      | -1
 temp_tablespaces     | """"""""""""""

Apparently, the last is garbage from our upgrade script, but in any case it
went to PGDATA.

Justin



pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: Outputting Standard SQL
Next
From: "Moon, Insung"
Date:
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)