Re: psql: FATAL: the database system is starting up - Mailing list pgsql-general

From Tom K
Subject Re: psql: FATAL: the database system is starting up
Date
Msg-id CAE3EmBD+LN3Mx2W2MbqANt=oaZ6dBDDn-o3a=01naM6at8PZ9w@mail.gmail.com
Whole thread Raw
In response to Re: psql: FATAL: the database system is starting up  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: psql: FATAL: the database system is starting up
Re: psql: FATAL: the database system is starting up
List pgsql-general


On Sat, Jun 1, 2019 at 9:55 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 5/31/19 7:53 PM, Tom K wrote:
>

>     There are two places to connect with the Patroni community: on github,
>     via Issues and PRs, and on channel #patroni in the PostgreSQL Slack. If
>     you're using Patroni, or just interested, please join us.
>
>
> Will post there as well.  Thank you.  My thinking was to post here first
> since I suspect the Patroni community will simply refer me back here
> given that the PostgreSQL errors are originating directly from PostgreSQL.
>
>
>     That being said, can you start the copied Postgres instance without
>     using the Patroni instrumentation?
>
>
> Yes, that is something I have been trying to do actually.  But I hit a
> dead end with the three errors above.
>
> So what I did is to copy a single node's backed up copy of the data
> files to */data/patroni* of the same node ( this is the psql data
> directory as defined through patroni ) of the same node then ran this (
> psql03 = 192.168.0.118 ):
>
> # sudo su - postgres
> $ /usr/pgsql-10/bin/postgres -D /data/patroni
> --config-file=/data/patroni/postgresql.conf
> --listen_addresses=192.168.0.118 --max_worker_processes=8
> --max_locks_per_transaction=64 --wal_level=replica
> --track_commit_timestamp=off --max_prepared_transactions=0 --port=5432
> --max_replication_slots=10 --max_connections=100 --hot_standby=on
> --cluster_name=postgres --wal_log_hints=on --max_wal_senders=10 -d 5

Why all the options?
That should be covered in postgresql.conf, no?

>
> This resulted in one of the 3 messages above.  Hence the post here.  If
> I can start a single instance, I should be fine since I could then 1)
> replicate over to the other two or 2) simply take a dump, reinitialize
> all the databases then restore the dump.
>

What if you move the recovery.conf file out?

Will try.



The below looks like missing/corrupted/incorrect files. Hard to tell
without knowing what Patroni did?

Storage disappeared from underneath these clusters.  The OS was of course still in memory making futile attempts to write to disk, which would never complete.

My best guess is that Patroni or postgress was in the middle of some writes across the clusters when the failure occurred.  



> Using the above procedure I get one of three error messages when using
> the data files of each node:
>
> [ PSQL01 ]
> postgres: postgres: startup process waiting for 000000010000000000000008
>
> [ PSQL02 ]
> PANIC:replicationcheckpointhas wrong magic 0 instead of  307747550
>
> [ PSQL03 }
> FATAL:syntax error inhistory file:f2W
>
> And I can't start any one of them.
>
>
>
>      >
>      > Thx,
>      > TK
>      >
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: psql: FATAL: the database system is starting up
Next
From: Tom K
Date:
Subject: Re: psql: FATAL: the database system is starting up