Thread: create database hangs forever on WSL - autovacuum deadlock?

create database hangs forever on WSL - autovacuum deadlock?

From
Alicja Kucharczyk
Date:
Hi All,
this is a problem from slack channel; it looks like a deadlock. Create database hangs infinitely showing a lock in pg_stat_activity and being mutually locked by autovacuum workers. executing pg_reload_conf couple times releasing the lock. Why sighup helps in this situation? is it a problem with windows memory management?

this is postgres running on windows wsl:  PostgreSQL 14.1 (Ubuntu 14.1-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit

2022-01-08 07:24:06.892 CET [2347] LOG:  process 2347 still waiting for RowExclusiveLock on object 1 of class 1262 of database 0 after 1000.918 ms
2022-01-08 07:24:06.892 CET [2347] DETAIL:  Processes holding the lock: 2308, 2318. Wait queue: 2117, 2347.
pid              | 2117
wait_event_type  | Lock
wait_event       | object
state            | active
backend_xid      |
backend_xmin     | 749
query_id         |
query            | create database mytest1;
backend_type     | client backend


postgres  2347  0.0  0.0 221280  2388 ?        Ss   07:24   0:00 postgres: 14/main: autovacuum worker  waiting
postgres  2308  0.0  0.0 221284  2420 ?        Ss   07:15   0:00 postgres: 14/main: autovacuum worker  waiting
postgres  2318  0.0  0.0 221284  2420 ?        Ss   07:16   0:00 postgres: 14/main: autovacuum worker  waiting
postgres  2347  0.0  0.0 221280  2388 ?        Ss   07:24   0:00 postgres: 14/main: autovacuum worker  waiting


2022-01-08 07:36:44.359 CET [2409] postgres@postgres LOG:  duration: 0.183 ms  statement: SELECT pg_reload_conf();
2022-01-08 07:36:44.359 CET [1561] LOG:  received SIGHUP, reloading configuration files
2022-01-08 07:36:44.361 CET [2117] postgres@postgres LOG:  process 2117 acquired ShareLock on object 1 of class 1262 of database 0 after 759697.185 ms
2022-01-08 07:36:44.361 CET [2117] postgres@postgres STATEMENT:  create database mytest1;
2022-01-08 07:36:44.361 CET [2347] LOG:  process 2347 still waiting for RowExclusiveLock on object 1 of class 1262 of database 0 after 758470.143 ms
2022-01-08 07:36:44.361 CET [2347] DETAIL:  Process holding the lock: 2117. Wait queue: 2347.
2022-01-08 07:36:46.191 CET [2409] postgres@postgres LOG:  duration: 0.146 ms  statement: SELECT pg_reload_conf();
2022-01-08 07:36:46.191 CET [1561] LOG:  received SIGHUP, reloading configuration files
2022-01-08 07:36:46.193 CET [2347] LOG:  process 2347 still waiting for RowExclusiveLock on object 1 of class 1262 of database 0 after 760301.845 ms
2022-01-08 07:36:46.193 CET [2347] DETAIL:  Process holding the lock: 2117. Wait queue: 2347.
2022-01-08 07:36:47.532 CET [2409] postgres@postgres LOG:  duration: 0.113 ms  statement: SELECT pg_reload_conf();
2022-01-08 07:36:47.533 CET [1561] LOG:  received SIGHUP, reloading configuration files
2022-01-08 07:36:47.534 CET [2347] LOG:  process 2347 still waiting for RowExclusiveLock on object 1 of class 1262 of database 0 after 761643.434 ms
2022-01-08 07:36:47.534 CET [2347] DETAIL:  Process holding the lock: 2117. Wait queue: 2347.
2022-01-08 07:36:52.496 CET [2409] postgres@postgres LOG:  duration: 0.141 ms  statement: SELECT pg_reload_conf();
2022-01-08 07:36:52.496 CET [1561] LOG:  received SIGHUP, reloading configuration files
2022-01-08 07:36:52.499 CET [2347] LOG:  process 2347 still waiting for RowExclusiveLock on object 1 of class 1262 of database 0 after 766607.664 ms
2022-01-08 07:36:52.499 CET [2347] DETAIL:  Process holding the lock: 2117. Wait queue: 2347.
2022-01-08 07:36:52.542 CET [2117] postgres@postgres LOG:  duration: 767877.903 ms  statement: create database mytest1;


pozdrawiam,
best regards,
mit freundlichen Grüßen,
Alicja Kucharczyk



Re: create database hangs forever on WSL - autovacuum deadlock?

From
Thomas Munro
Date:
On Sat, Jan 8, 2022 at 8:11 PM Alicja Kucharczyk
<zaledwie10minut@gmail.com> wrote:
> this is postgres running on windows wsl:  PostgreSQL 14.1 (Ubuntu 14.1-1.pgdg18.04+1) on x86_64-pc-linux-gnu,
compiledby gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
 

Is this WSL1 (some kind of Linux system call emulator running on an NT
kernel) or WSL2 (some kind of virtual machine running a Linux kernel)?



Re: create database hangs forever on WSL - autovacuum deadlock?

From
Alicja Kucharczyk
Date:
sob., 8 sty 2022 o 22:40 Thomas Munro <thomas.munro@gmail.com> napisał(a):
On Sat, Jan 8, 2022 at 8:11 PM Alicja Kucharczyk
<zaledwie10minut@gmail.com> wrote:
> this is postgres running on windows wsl:  PostgreSQL 14.1 (Ubuntu 14.1-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit

Is this WSL1 (some kind of Linux system call emulator running on an NT
kernel) or WSL2 (some kind of virtual machine running a Linux kernel)?

 lookd like WSL1:
  NAME      STATE           VERSION
* Ubuntu    Running         1

Re: create database hangs forever on WSL - autovacuum deadlock?

From
Thomas Munro
Date:
On Sun, Jan 9, 2022 at 2:15 PM Alicja Kucharczyk
<zaledwie10minut@gmail.com> wrote:
> sob., 8 sty 2022 o 22:40 Thomas Munro <thomas.munro@gmail.com> napisał(a):
>> On Sat, Jan 8, 2022 at 8:11 PM Alicja Kucharczyk
>> <zaledwie10minut@gmail.com> wrote:
>> > this is postgres running on windows wsl:  PostgreSQL 14.1 (Ubuntu 14.1-1.pgdg18.04+1) on x86_64-pc-linux-gnu,
compiledby gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit 
>>
>> Is this WSL1 (some kind of Linux system call emulator running on an NT
>> kernel) or WSL2 (some kind of virtual machine running a Linux kernel)?
>
>
>  lookd like WSL1:
>   NAME      STATE           VERSION
> * Ubuntu    Running         1

I don't think you're going to like this answer.  In PostgreSQL 14, we
started using a signalfd in an epoll set (multiplexed with other fds
for sockets etc) to receive "latch wakeups".  This works pretty well
on a real Linux kernel, but apparently it is not emulated well enough
to work on WSL1.  I don't know the exact reason why, but if someone
can figure it out I'd be interested in seeing if we can fix it (though
I thought everyone moved to WSL2 which has none of these problems?).
There is a WSL1 machine in the build farm, but it's using
-DWAIT_USE_POLL to avoid this problem; that's useless if you're trying
to run with stock PostgreSQL packages from Ubuntu or whatever, though,
it only helps if you compile from source.  For what it's worth,
running PostgreSQL compiled for Windows using Wine on a Unix system
also fails in various ways.  It turns out that emulating a foreign
kernel is super hard...  Previous discussion:

https://www.postgresql.org/message-id/flat/CAEP4nAymAZP1VEBNoWAQca85ZtU5YxuwS95%2BVu%2BXW%2B-eMfq_vQ%40mail.gmail.com