Thread: Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

From

Mithun Cy

Date:

18 March 2019, 07:04:18

t tOn Mon, Feb 25, 2019 at 12:10 AM Mithun Cy <mithun.cy@gmail.com> wrote:

Thanks Hans, for a simple reproducible tests.

The "worker.bgw_restart_time" is never set for autoprewarm workers so on error it get restarted after some period of time (default behavior). Since database itself is dropped our attempt to connect to that database failed and then worker exited. But again got restated by postmaster then we start seeing above DSM segment error.

I think every autoprewarm worker should be set with "worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be repeated prewarm attempt of a dropped database. I will try to think further and submit a patch for same.

Here is the patch for same,

autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per database and wait until it exit by calling WaitForBackgroundWorkerShutdown. The call WaitForBackgroundWorkerShutdown cannot handle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next restarted worker keep failing going in a unending loop.

I think there is no need to restart at all. Following are the normal error we might encounter.

1. Connecting database is droped -- So we need to skip to next database which master will do by starting a new wroker. So not needed.

2. Relation is droped -- try_relation_open(reloid, AccessShareLock) is used so error due to dropped relation is handled also avoids concurrent truncation.

3. smgrexists is used before reading from a fork file. Again error is handled.

4. before reading the block we have check as below. So previously truncated pages will not be read again.

/* Check whether blocknum is valid and within fork file size. */
if (blk->blocknum >= nblocks)

I think if any other unexpected errors occurs that should be fatal so restarting will not be correcting same. Hence there is no need to restart the per database worker process.

I tried to dig why we did not set it earlier. It used to be never restart, but it changed after fixing comments [1]. At that time we did not make explicit database connection per worker and did not handle many error cases as now. So it appeared fair. But, when code changed to make database connection per worker, we should have set every worker with BGW_NEVER_RESTART. Which I think was a mistake.

NOTE : On zero exit status we will not restart the bgworker (see @CleanupBackgroundWorker@ and @maybe_start_bgworkers@)

[1] https://www.postgresql.org/message-id/CA%2BTgmoYNF_wfdwQ3z3713zKy2j0Z9C32WJdtKjvRWzeY7JOL4g%40mail.gmail.com

Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

From

Robert Haas

Date:

18 March 2019, 15:31:26

On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
> autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per
databaseand wait until it exit by calling WaitForBackgroundWorkerShutdown.  The call WaitForBackgroundWorkerShutdown
cannothandle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED
fromthe call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next
restartedworker keep failing going in a unending loop. 

Ugh, that seems like a silly oversight.  Does it fix the reported problem?

If I understand correctly, the commit message would be something like this:

==
Don't auto-restart per-database autoprewarm workers.

We should try to prewarm each database only once.  Otherwise, if
prewarming fails for some reason, it will just keep retrying in an
infnite loop.  The existing code was intended to implement this
behavior, but because it neglected to set worker.bgw_restart_time, the
per-database workers keep restarting, contrary to what was intended.

Mithun Cy, per a report from Hans Buschmann
==

Does that sound right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

From

Robert Haas

Date:

18 March 2019, 15:31:26

On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
> autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per
databaseand wait until it exit by calling WaitForBackgroundWorkerShutdown.  The call WaitForBackgroundWorkerShutdown
cannothandle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED
fromthe call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next
restartedworker keep failing going in a unending loop. 

Ugh, that seems like a silly oversight.  Does it fix the reported problem?

If I understand correctly, the commit message would be something like this:

==
Don't auto-restart per-database autoprewarm workers.

We should try to prewarm each database only once.  Otherwise, if
prewarming fails for some reason, it will just keep retrying in an
infnite loop.  The existing code was intended to implement this
behavior, but because it neglected to set worker.bgw_restart_time, the
per-database workers keep restarting, contrary to what was intended.

Mithun Cy, per a report from Hans Buschmann
==

Does that sound right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

From

Mithun Cy

Date:

18 March 2019, 17:42:24

Thanks Robert,

On Mon, Mar 18, 2019 at 9:01 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
> autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per database and wait until it exit by calling WaitForBackgroundWorkerShutdown. The call WaitForBackgroundWorkerShutdown cannot handle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next restarted worker keep failing going in a unending loop.

Ugh, that seems like a silly oversight. Does it fix the reported problem?

-- Yes this fixes the reported issue, Hans Buschmann has given below steps to reproduce.

> This seems easy to reproduce:
>
> - Install/create a database with autoprewarm on and pg_prewarm loaded.
> - Fill the autoprewarm cache with some data
> - pg_dump the database
> - drop the database
> - create the database and pg_restore it from the dump
> - start the instance and logs are flooded

-- It is explained earlier [1] that they used older autoprewarm.blocks which was generated before drop database. So on restrart autoprewarm worker failed to connect to droped database and then lead to retry loop. This patch should fix same.

NOTE : Also, another kind of error user might see because of same bug is, restarted worker getting connected to next database in autoprewarm.blocks because autoprewarm master updated shared data "apw_state->database = current_db;" to start new worker for next database. Both restarted worker and newly created worker will connect to same database(next one) and try to load same pages. Hence end up with spurious log messages like "LOG: autoprewarm successfully prewarmed 13 of 11 previously-loaded blocks"

If I understand correctly, the commit message would be something like this:

==
Don't auto-restart per-database autoprewarm workers.

We should try to prewarm each database only once. Otherwise, if
prewarming fails for some reason, it will just keep retrying in an
infnite loop. The existing code was intended to implement this
behavior, but because it neglected to set worker.bgw_restart_time, the
per-database workers keep restarting, contrary to what was intended.

Mithun Cy, per a report from Hans Buschmann
==

Does that sound right?

-- Yes I Agree.

[1] https://www.postgresql.org/message-id/D2B9F2A20670C84685EF7D183F2949E202569F21%40gigant.nidsa.net

Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com

Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

From

Mithun Cy

Date:

18 March 2019, 17:42:24

Thanks Robert,

On Mon, Mar 18, 2019 at 9:01 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
> autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per database and wait until it exit by calling WaitForBackgroundWorkerShutdown. The call WaitForBackgroundWorkerShutdown cannot handle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next restarted worker keep failing going in a unending loop.

Ugh, that seems like a silly oversight. Does it fix the reported problem?

-- Yes this fixes the reported issue, Hans Buschmann has given below steps to reproduce.

If I understand correctly, the commit message would be something like this:

==
Don't auto-restart per-database autoprewarm workers.

We should try to prewarm each database only once. Otherwise, if
prewarming fails for some reason, it will just keep retrying in an
infnite loop. The existing code was intended to implement this
behavior, but because it neglected to set worker.bgw_restart_time, the
per-database workers keep restarting, contrary to what was intended.

Mithun Cy, per a report from Hans Buschmann
==

Does that sound right?

-- Yes I Agree.

[1] https://www.postgresql.org/message-id/D2B9F2A20670C84685EF7D183F2949E202569F21%40gigant.nidsa.net

Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com

Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

From

Robert Haas

Date:

18 March 2019, 19:35:39

On Mon, Mar 18, 2019 at 1:43 PM Mithun Cy <mithun.cy@gmail.com> wrote:
>> Does that sound right?
>
> -- Yes I Agree.

Committed with a little more tweaking of the commit message, and
back-patched to v11.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

From

Robert Haas

Date:

18 March 2019, 19:35:39

On Mon, Mar 18, 2019 at 1:43 PM Mithun Cy <mithun.cy@gmail.com> wrote:
>> Does that sound right?
>
> -- Yes I Agree.

Committed with a little more tweaking of the commit message, and
back-patched to v11.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company