Thread: "FATAL: the database system is starting up" and hot_standby = on

"FATAL: the database system is starting up" and hot_standby = on

From
Ilnur Khanov
Date:
Hi,

There is a standalone server with Postgresql 9.6.3 on Windows 2016 Server.

I had been faced with "FATAL: the database system is starting up" after disk failure. This mailing list helped me to solve issue by setting "hot_standby = on" in postgresql.conf.
But I can't fully understand why it had worked.

Comments in conf says, that the setting has to be ignored in my case:
===
# - Standby Servers -
# These settings are ignored on a master server.
hot_standby = on # "on" allows queries during recovery
# (change requires restart)
===

But it is standalone installation, not slave, there is no replication. Though there is no recovery.conf and any sign of pending recovery nor in log, nor in postgres.exe output with debug level 5.

Why setting "hot_standby = on" had helped me with "FATAL: the database system is starting up" error on standalone server? Any ideas?

Re: "FATAL: the database system is starting up" and hot_standby = on

From
David Guyot
Date:
Hello.

I can't say why this tip worked, but I can say you that you shouldn't
bother about the "FATAL: the database system is starting up" message:
PgSQL routinely displays this message when starting up. "FATAL" is
called "gravity", that is to say it describes the importance of the
following message. "FATAL" is the highest importance and, indeed,
knowing if your database is up, or not, is of the highest importance,
but it is not a fatal error.

Regards.

Le jeudi 07 décembre 2017 à 10:49 +0500, Ilnur Khanov a écrit :
> Hi,
>
> There is a standalone server with Postgresql 9.6.3 on Windows 2016
> Server.
>
> I had been faced with "FATAL: the database system is starting up"
> after disk failure. This mailing list helped me to solve issue by
> setting "hot_standby = on" in postgresql.conf.
> But I can't fully understand why it had worked.
>
> Comments in conf says, that the setting has to be ignored in my case:
> ===
> # - Standby Servers -
> # These settings are ignored on a master server.
> hot_standby = on            # "on" allows queries during
> recovery
>                     # (change requires restart)
> ===
>
> But it is standalone installation, not slave, there is no
> replication. Though there is no recovery.conf and any sign of pending
> recovery nor in log, nor in postgres.exe output with debug level 5.
>
> Why setting "hot_standby = on" had helped me with "FATAL: the
> database system is starting up" error on standalone server? Any
> ideas?
>
--
David Guyot
Administrateur système / Sysadmin
Europe Camions Interactive / Stockway
Moulin Collot F-88500 Ambacourt
Tél : +33 (0)3 29 30 47 85

Re: "FATAL: the database system is starting up" and hot_standby = on

From
Ilnur Khanov
Date:
For me, when I saw "FATAL: the database system is starting up", nothing else hasn't been working: pgAdmin, psql, business apps.
Logs has been showing cycles: client connect - worker starts - "the-FATAL-error" - worker shutting down with exit code (1). No any informative errors.

I think, database engine is in some sort of recovery mode after crush, but I was waiting long enough - no CPU or HDD activity. If it's some sort of warning and waiting for me to check consistency and manually reset this mode flag, but then how to do this?!

More important: now problem is not solved, "hot_standby = on" is just workaround letting query databases. But PostgreSQL is still in some strange mode and I don't understand how serious is it and how resolve it correctly.

By "PostgreSQL is still in some strange mode" I mean that setting "hot_standby = off" returns "the-FATAL-error" and nothing works then.

Re: "FATAL: the database system is starting up" and hot_standby = on

From
Laurenz Albe
Date:
Ilnur Khanov wrote:
> For me, when I saw "FATAL: the database system is starting up", nothing else hasn't been working: pgAdmin, psql,
businessapps.
 

"FATAL" just means that a client connection is terminated.
In this case, because the database is not up.

> Logs has been showing cycles: client connect - worker starts - "the-FATAL-error" -
> worker shutting down with exit code (1). No any informative errors.
> 
> I think, database engine is in some sort of recovery mode after crush, but I was
> waiting long enough - no CPU or HDD activity. If it's some sort of warning and waiting for me to check consistency
andmanually reset this mode flag, but then how to do this?!
 
> 
> More important: now problem is not solved, "hot_standby = on" is just workaround letting query databases.
> But PostgreSQL is still in some strange mode and I don't understand how serious is it and how resolve it correctly.

You cannot modify the database, right?
And "SELECT pg_is_in_recovery();" returns TRUE, right?

Could you show us what is written to the log after a new startup?

Yours,
Laurenz Albe


Re: "FATAL: the database system is starting up" and hot_standby = on

From
Ilnur Khanov
Date:
Hi,

Thank you all for suggestions.
I've just disabled hot_standby and restarted server. Everything is OK now.

I assume there was continuous recovery mode.

You cannot modify the database, right?
No, I can. As long it was in this recovery mode with hot_standby=on I was able query and modify database, with hot_standby=off I couldn't even connect.
Here is another my wonder: relying on manual, it had to be read-only mode.

And "SELECT pg_is_in_recovery();" returns TRUE, right?
I don't know :( Can't check it anymore. May be. 
That is the question: If I can't connect with "FATAL: the database system is starting up" how I can check pg_is_in_recovery()?
What else I could check to ensure that postgres recovering databases and I just have to wait?
I was strongly confused by NO CPU and HDD activity during my wait.

So my conclusion is:
  If postgres says "FATAL: the database system is starting up" after failure without any other errors in logs and postgres output, than you can:
     - wait for some long period of time
         or
     - you can turn on hot_standby and use your database with care while it's recovering and turn hot_standby off when it finishes.

Hoping this thread helps someone sometime.

Re: "FATAL: the database system is starting up" and hot_standby = on

From
Laurenz Albe
Date:
Ilnur Khanov wrote:
> I've just disabled hot_standby and restarted server. Everything is OK now.
> 
> I assume there was continuous recovery mode.
> 
> > You cannot modify the database, right?
> 
> No, I can. As long it was in this recovery mode with hot_standby=on I was able query and modify database, with
hot_standby=offI couldn't even connect.
 
> Here is another my wonder: relying on manual, it had to be read-only mode.

Now that would be really strange if you could modify the database while in recovery mode.
Unfortunately you cannot test that now...

> That is the question: If I can't connect with "FATAL: the database system is starting up" how I can check
pg_is_in_recovery()?

If you get the fatal message you *know* the system is recovering.
That query is to ascertain you are still in recovery if you set hot_standby=on.

> So my conclusion is:
>   If postgres says "FATAL: the database system is starting up" after failure without any other errors in logs and
postgresoutput, than you can:
 
>      - wait for some long period of time
>          or
>      - you can turn on hot_standby and use your database with care while it's recovering and turn hot_standby off
whenit finishes.
 
> 
> Hoping this thread helps someone sometime.

In my opinion the main question is why crash recovery took so long.
The server log should be able to tell you more about it.

Is your checkpoint_timeout very high?
Was there a backup_label file in the data directory (now backup_label.old)?

Yours,
Laurenz Albe


Re: "FATAL: the database system is starting up" and hot_standby = on

From
Ilnur Khanov
Date:
Now that would be really strange if you could modify the database while in recovery mode.
Unfortunately you cannot test that now...
I tried that right after enabling  hot_standby, so databases was writable.

In my opinion the main question is why crash recovery took so long.
The server log should be able to tell you more about it.
server logs with debug level 5 didn't tell anything usefull.
there is no something in logs about recovery mode start, nor about recovery mode end. 
so it's hard to say how long crash recovery took. 
No more 24h, no less 2h. Databases are very small - about 60Gb. Resources are high available (SSD, Xeon E7), but not used.

Is your checkpoint_timeout very high?
It's default (5 min).
 
Was there a backup_label file in the data directory (now backup_label.old)?
No, there is no.