Home > mailing lists

Re: stress test for parallel workers - Mailing list pgsql-hackers

From	Justin Pryzby
Subject	Re: stress test for parallel workers
Date	July 24, 2019 02:04:40
Msg-id	20190723230440.GU22387@telsasoft.com Whole thread Raw
In response to	Re: stress test for parallel workers (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: stress test for parallel workers Re: stress test for parallel workers Re: stress test for parallel workers
List	pgsql-hackers

Tree view

On Wed, Jul 24, 2019 at 10:46:42AM +1200, Thomas Munro wrote:
> On Wed, Jul 24, 2019 at 10:42 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > On Wed, Jul 24, 2019 at 10:03:25AM +1200, Thomas Munro wrote:
> > > On Wed, Jul 24, 2019 at 5:42 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > #2  0x000000000085ddff in errfinish (dummy=<value optimized out>) at elog.c:555
> > > >         edata = <value optimized out>
> > >
> > > If you have that core, it might be interesting to go to frame 2 and
> > > print *edata or edata->saved_errno.
> >
> > As you saw..unless someone you know a trick, it's "optimized out".
> 
> How about something like this:
> 
> print errorData[errordata_stack_depth]

Clever.

(gdb) p errordata[errordata_stack_depth]
$2 = {elevel = 13986192, output_to_server = 254, output_to_client = 127, show_funcname = false, hide_stmt = false,
hide_ctx= false, filename = 0x27b3790 "< %m %u >", lineno = 41745456, 
 
  funcname = 0x3030313335 <Address 0x3030313335 out of bounds>, domain = 0x0, context_domain = 0x27cff90 "postgres",
sqlerrcode= 0, message = 0xe8800000001 <Address 0xe8800000001 out of bounds>, 
 
  detail = 0x297a <Address 0x297a out of bounds>, detail_log = 0x0, hint = 0xe88 <Address 0xe88 out of bounds>, context
=0x297a <Address 0x297a out of bounds>, message_id = 0x0, schema_name = 0x0, 
 
  table_name = 0x0, column_name = 0x0, datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos = 0,
internalquery= 0x0, saved_errno = 0, assoc_context = 0x0}
 
(gdb) p errordata
$3 = {{elevel = 22, output_to_server = true, output_to_client = false, show_funcname = false, hide_stmt = false,
hide_ctx= false, filename = 0x9c4030 "origin.c", lineno = 591, 
 
    funcname = 0x9c46e0 "CheckPointReplicationOrigin", domain = 0x9ac810 "postgres-11", context_domain = 0x9ac810
"postgres-11",sqlerrcode = 4293, 
 
    message = 0x27b0e40 "could not write to file \"pg_logical/replorigin_checkpoint.tmp\": No space left on device",
detail= 0x0, detail_log = 0x0, hint = 0x0, context = 0x0, 
 
    message_id = 0x8a7aa8 "could not write to file \"%s\": %m", ...

I ought to have remembered that it *was* in fact out of space this AM when this
core was dumped (due to having not touched it since scheduling transition to
this VM last week).

I want to say I'm almost certain it wasn't ENOSPC in other cases, since,
failing to find log output, I ran df right after the failure.

But that gives me an idea: is it possible there's an issue with files being
held opened by worker processes ?  Including by parallel workers?  Probably
WALs, even after they're rotated ?  If there were worker processes holding
opened lots of rotated WALs, that could cause ENOSPC, but that wouldn't be
obvious after they die, since the space would then be freed.

Justin

pgsql-hackers by date:

From: Nikita Glukhov
Date: 24 July 2019, 01:48:26
Subject: Re: Support for jsonpath .datetime() method

From: Tom Lane
Date: 24 July 2019, 02:13:51
Subject: Re: pgbench tests vs Windows

Re: stress test for parallel workers - Mailing list pgsql-hackers

Previous

Next