Re: [HACKERS] kqueue - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] kqueue
Date
Msg-id 16202.1579538660@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] kqueue  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: [HACKERS] kqueue  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I took this patch for a quick spin on macOS.  The result was that the
> test suite hangs in the test src/test/recovery/t/017_shm.pl.  I didn't
> see any mentions of this anywhere in the thread, but that test is newer
> than the beginning of this thread.  Can anyone confirm or deny this
> issue?  Is it specific to macOS perhaps?

Yeah, I duplicated the problem in macOS Catalina (10.15.2), using today's
HEAD.  The core regression tests pass, as do the earlier recovery tests
(I didn't try a full check-world though).  Somewhere early in 017_shm.pl,
things freeze up with four postmaster-child processes stuck in 100%-
CPU-consuming loops.  I captured stack traces:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff6554dbb6 libsystem_kernel.dylib`kqueue + 10
    frame #1: 0x0000000105511533 postgres`CreateWaitEventSet(context=<unavailable>, nevents=<unavailable>) at
latch.c:622:19[opt] 
    frame #2: 0x0000000105511305 postgres`WaitLatchOrSocket(latch=0x0000000112e02da4, wakeEvents=41, sock=-1,
timeout=237000,wait_event_info=83886084) at latch.c:389:22 [opt] 
    frame #3: 0x00000001054a7073 postgres`CheckpointerMain at checkpointer.c:514:10 [opt]
    frame #4: 0x00000001052da390 postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:461:4
[opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff6554dbce libsystem_kernel.dylib`kevent + 10
    frame #1: 0x0000000105511ddc postgres`WaitEventAdjustKqueue(set=0x00007fc8e8805920, event=0x00007fc8e8805958,
old_events=<unavailable>)at latch.c:1034:7 [opt] 
    frame #2: 0x0000000105511638 postgres`AddWaitEventToSet(set=<unavailable>, events=<unavailable>, fd=<unavailable>,
latch=<unavailable>,user_data=<unavailable>) at latch.c:778:2 [opt] 
    frame #3: 0x0000000105511342 postgres`WaitLatchOrSocket(latch=0x0000000112e030f4, wakeEvents=41, sock=-1,
timeout=200,wait_event_info=83886083) at latch.c:397:3 [opt] 
    frame #4: 0x00000001054a6d69 postgres`BackgroundWriterMain at bgwriter.c:304:8 [opt]
    frame #5: 0x00000001052da38b postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:456:4
[opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff65549c66 libsystem_kernel.dylib`close + 10
    frame #1: 0x0000000105511466 postgres`WaitLatchOrSocket [inlined] FreeWaitEventSet(set=<unavailable>) at
latch.c:660:2[opt] 
    frame #2: 0x000000010551145d postgres`WaitLatchOrSocket(latch=0x0000000112e03444, wakeEvents=<unavailable>,
sock=-1,timeout=5000, wait_event_info=83886093) at latch.c:432 [opt] 
    frame #3: 0x00000001054b8685 postgres`WalWriterMain at walwriter.c:256:10 [opt]
    frame #4: 0x00000001052da39a postgres`AuxiliaryProcessMain(argc=2, argv=0x00007ffeea9dded0) at bootstrap.c:467:4
[opt]

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff655515be libsystem_kernel.dylib`__select + 10
    frame #1: 0x00000001056a6191 postgres`pg_usleep(microsec=<unavailable>) at pgsleep.c:56:10 [opt]
    frame #2: 0x00000001054abe12 postgres`backend_read_statsfile at pgstat.c:5720:3 [opt]
    frame #3: 0x00000001054adcc0 postgres`pgstat_fetch_stat_dbentry(dbid=<unavailable>) at pgstat.c:2431:2 [opt]
    frame #4: 0x00000001054a320c postgres`do_start_worker at autovacuum.c:1248:20 [opt]
    frame #5: 0x00000001054a2639 postgres`AutoVacLauncherMain [inlined] launch_worker(now=632853327674576) at
autovacuum.c:1357:9[opt] 
    frame #6: 0x00000001054a2634 postgres`AutoVacLauncherMain(argc=<unavailable>, argv=<unavailable>) at
autovacuum.c:769[opt] 
    frame #7: 0x00000001054a1ea7 postgres`StartAutoVacLauncher at autovacuum.c:415:4 [opt]

I'm not sure how much faith to put in the last couple of those, as
stopping the earlier processes could perhaps have had side-effects.
But evidently 017_shm.pl is doing something that interferes with
our ability to create kqueue-based WaitEventSets.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: SLRU statistics
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] kqueue