Thread: Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

t tOn Mon, Feb 25, 2019 at 12:10 AM Mithun Cy <mithun.cy@gmail.com> wrote:
Thanks Hans, for a simple reproducible tests.

The  "worker.bgw_restart_time" is never set for autoprewarm workers so on error it get restarted after some period of time (default behavior). Since database itself is dropped our attempt to connect to that database failed and then worker exited. But again got restated by postmaster then we start seeing above DSM segment error.

I think every autoprewarm worker should be set with "worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be repeated prewarm attempt of a dropped database. I will try to think further and submit a patch for same.

Here is the patch for same,





--
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
> autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per
databaseand wait until it exit by calling WaitForBackgroundWorkerShutdown.  The call WaitForBackgroundWorkerShutdown
cannothandle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED
fromthe call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next
restartedworker keep failing going in a unending loop. 

Ugh, that seems like a silly oversight.  Does it fix the reported problem?

If I understand correctly, the commit message would be something like this:

==
Don't auto-restart per-database autoprewarm workers.

We should try to prewarm each database only once.  Otherwise, if
prewarming fails for some reason, it will just keep retrying in an
infnite loop.  The existing code was intended to implement this
behavior, but because it neglected to set worker.bgw_restart_time, the
per-database workers keep restarting, contrary to what was intended.

Mithun Cy, per a report from Hans Buschmann
==

Does that sound right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
> autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per
databaseand wait until it exit by calling WaitForBackgroundWorkerShutdown.  The call WaitForBackgroundWorkerShutdown
cannothandle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED
fromthe call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next
restartedworker keep failing going in a unending loop. 

Ugh, that seems like a silly oversight.  Does it fix the reported problem?

If I understand correctly, the commit message would be something like this:

==
Don't auto-restart per-database autoprewarm workers.

We should try to prewarm each database only once.  Otherwise, if
prewarming fails for some reason, it will just keep retrying in an
infnite loop.  The existing code was intended to implement this
behavior, but because it neglected to set worker.bgw_restart_time, the
per-database workers keep restarting, contrary to what was intended.

Mithun Cy, per a report from Hans Buschmann
==

Does that sound right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Thanks Robert,
On Mon, Mar 18, 2019 at 9:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
> autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per database and wait until it exit by calling WaitForBackgroundWorkerShutdown.  The call WaitForBackgroundWorkerShutdown cannot handle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next restarted worker keep failing going in a unending loop.

Ugh, that seems like a silly oversight.  Does it fix the reported problem?

-- Yes this fixes the reported issue, Hans Buschmann has given below steps to reproduce.

> This seems easy to reproduce:
>
> - Install/create a database with autoprewarm on and pg_prewarm loaded.
> - Fill the autoprewarm cache with some data
> - pg_dump the database
> - drop the database
> - create the database and pg_restore it from the dump
> - start the instance and logs are flooded

-- It is explained earlier [1] that they used older autoprewarm.blocks which was generated before drop database. So on restrart autoprewarm worker failed to connect to droped database and then lead to retry loop. This patch should fix same.

NOTE : Also, another kind of error user might see because of same bug is, restarted worker getting connected to next database in autoprewarm.blocks because autoprewarm master updated shared data "apw_state->database = current_db;" to start new worker for next database. Both restarted worker and newly created worker will connect to same database(next one) and try to load same pages. Hence end up with spurious log messages like  "LOG:  autoprewarm successfully prewarmed 13 of 11 previously-loaded blocks"

If I understand correctly, the commit message would be something like this:

==
Don't auto-restart per-database autoprewarm workers.

We should try to prewarm each database only once.  Otherwise, if
prewarming fails for some reason, it will just keep retrying in an
infnite loop.  The existing code was intended to implement this
behavior, but because it neglected to set worker.bgw_restart_time, the
per-database workers keep restarting, contrary to what was intended.

Mithun Cy, per a report from Hans Buschmann
==

Does that sound right?

-- Yes I Agree.


--
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com
Thanks Robert,
On Mon, Mar 18, 2019 at 9:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
> autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per database and wait until it exit by calling WaitForBackgroundWorkerShutdown.  The call WaitForBackgroundWorkerShutdown cannot handle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next restarted worker keep failing going in a unending loop.

Ugh, that seems like a silly oversight.  Does it fix the reported problem?

-- Yes this fixes the reported issue, Hans Buschmann has given below steps to reproduce.

> This seems easy to reproduce:
>
> - Install/create a database with autoprewarm on and pg_prewarm loaded.
> - Fill the autoprewarm cache with some data
> - pg_dump the database
> - drop the database
> - create the database and pg_restore it from the dump
> - start the instance and logs are flooded

-- It is explained earlier [1] that they used older autoprewarm.blocks which was generated before drop database. So on restrart autoprewarm worker failed to connect to droped database and then lead to retry loop. This patch should fix same.

NOTE : Also, another kind of error user might see because of same bug is, restarted worker getting connected to next database in autoprewarm.blocks because autoprewarm master updated shared data "apw_state->database = current_db;" to start new worker for next database. Both restarted worker and newly created worker will connect to same database(next one) and try to load same pages. Hence end up with spurious log messages like  "LOG:  autoprewarm successfully prewarmed 13 of 11 previously-loaded blocks"

If I understand correctly, the commit message would be something like this:

==
Don't auto-restart per-database autoprewarm workers.

We should try to prewarm each database only once.  Otherwise, if
prewarming fails for some reason, it will just keep retrying in an
infnite loop.  The existing code was intended to implement this
behavior, but because it neglected to set worker.bgw_restart_time, the
per-database workers keep restarting, contrary to what was intended.

Mithun Cy, per a report from Hans Buschmann
==

Does that sound right?

-- Yes I Agree.


--
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 18, 2019 at 1:43 PM Mithun Cy <mithun.cy@gmail.com> wrote:
>> Does that sound right?
>
> -- Yes I Agree.

Committed with a little more tweaking of the commit message, and
back-patched to v11.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Mon, Mar 18, 2019 at 1:43 PM Mithun Cy <mithun.cy@gmail.com> wrote:
>> Does that sound right?
>
> -- Yes I Agree.

Committed with a little more tweaking of the commit message, and
back-patched to v11.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company