AW: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x - Mailing list pgsql-bugs

From Hans Buschmann
Subject AW: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x
Date
Msg-id D2B9F2A20670C84685EF7D183F2949E202569F21@gigant.nidsa.net
Whole thread Raw
In response to Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x  (Mithun Cy <mithun.cy@gmail.com>)
List pgsql-bugs


On the weekend, I did some more investigations:

It seems that Huge pages are NOT the cause of this problem.

The problem is only reproducable ONCE, after a database restart it disappears.

By reinstalling the original pg_pasebackup on another test VM the problem reappeared once.

Here is the start of  the error log:

CPS PRD 2019-02-24 12:11:57 CET  00000  1:> LOG:  database system was interrupted; last known up at 2019-02-17 16:14:05 CET
CPS PRD 2019-02-24 12:12:16 CET  00000  2:> LOG:  entering standby mode
CPS PRD 2019-02-24 12:12:16 CET  00000  3:> LOG:  redo starts at 0/23000028
CPS PRD 2019-02-24 12:12:16 CET  00000  4:> LOG:  consistent recovery state reached at 0/23000168
CPS PRD 2019-02-24 12:12:16 CET  00000  5:> LOG:  invalid record length at 0/24000060: wanted 24, got 0
CPS PRD 2019-02-24 12:12:16 CET  00000  9:> LOG:  database system is ready to accept read only connections
CPS PRD 2019-02-24 12:12:16 CET  3D000  1:> FATAL:  database 16384 does not exist
CPS PRD 2019-02-24 12:12:16 CET  00000 10:> LOG:  background worker "autoprewarm worker" (PID 3968) exited with exit code 1
CPS PRD 2019-02-24 12:12:16 CET  00000  1:> LOG:  autoprewarm successfully prewarmed 0 of 12402 previously-loaded blocks
CPS PRD 2019-02-24 12:12:17 CET  XX000  1:> FATAL:  could not connect to the primary server: FATAL:  no pg_hba.conf entry for replication connection from host "192.168.27.155", user "replicator", SSL off
CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment
CPS PRD 2019-02-24 12:12:17 CET  00000 11:> LOG:  background worker "autoprewarm worker" (PID 3296) exited with exit code 1
CPS PRD 2019-02-24 12:12:17 CET  XX000  1:> FATAL:  could not connect to the primary server: FATAL:  no pg_hba.conf entry for replication connection from host "192.168.27.155", user "replicator", SSL off
CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment
CPS PRD 2019-02-24 12:12:17 CET  00000 12:> LOG:  background worker "autoprewarm worker" (PID 2756) exited with exit code 1
CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment
...
(PS: the correct replication function was not set, so causing the errors concerning replication)

It seems that an outdated autoprewarm.blocks causes the problem.

After a restart the autoprewarm.blocks file seems to be rewritten, so that the next start gives no error.

For a test, I copied the erroneus autoprewarm.blocks files over to the data section and the problem reappeared.


The autoprewarm.blocks file is not corrupted or moved around manually but rather a leftover from the preceding test installation.

On this instance I had installed a copy of the production database under 11.2.
By doing the production switch, I dropped the test database and pg_restored the current one.

This left the previous autoprewarm.blocks file in the data directory.

On the first start the autoprewarm files does not match the newly restored database (perhpas the cause of the fatal error: database 16384 does not exist)

So the problem lies in the initial detection of the autoprewarm.blocks file.

This seems easy to reproduce:

- Install/create a database with autoprewarm on and pg_prewarm loaded.
- Fill the autoprewarm cache with some data
- pg_dump the database
- drop the database
- create the database and pg_restore it from the dump
- start the instance and logs are flooded

I have taken no further investigation in the sourcecode due to limited skills so far...


Thanks

Hans Buschmann

pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: BUG #15636: PostgreSQL 11.1 pg_basebackup backup to a CIFSdestination throws fsync error at end of backup
Next
From: Mithun Cy
Date:
Subject: Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x