Re: Simplifying replication - Mailing list pgsql-hackers

From Dimitri Fontaine
Subject Re: Simplifying replication
Date
Msg-id m28w1ushwz.fsf@2ndQuadrant.fr
Whole thread Raw
In response to Simplifying replication  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Simplifying replication
Re: Simplifying replication
List pgsql-hackers
Hi,

Josh Berkus <josh@agliodbs.com> writes:
> It is critical that we make replication easier to set up, administrate and
> monitor than it currently is.  In my conversations with people, this is more
> important to our users and the adoption of PostgreSQL than synchronous
> replication is.

I want to say a big big +1 here. The way replication and PITR setup are
implemented now are a very good prototype, it's time to consolidate and
get to something usable by normal people, as opposed to PostgreSQL full
time geeks.

Well, the current setup offers lots of flexibility which we'd better not
lose in the process, but the simple setup simply does not exists yet.

> 1. Any postgresql standalone server can become a replication master simply
> by enabling replication connections in pg_hba.conf.  No other configuration
> is required, and no server restart is required.

That sounds as simple as changing the default wal_level to hot_standby,
and the default max_wal_senders to non-zero.

> 2. Should I choose to adjust master configuration, for say performance
> reasons, most replication variables (including ones like wal_keep_segments)
> should be changeable without a server restart.

Anybody know how difficult that is without having to spend lots of time
studying the source code with the question in mind?

> 3. I can configure a standby by copying the same postgresql.conf on the
> master.  I only have to change a single configuration variable (the
> primary_conninfo, or maybe a replication_mode setting) in order to start the
> server in standby mode.  GUCs which apply only to masters are ignored.
>
> 4. I can start a new replica off the master by running a single command-line
> utility on the standby and giving it connection information to the master.
> Using this connection, it should be able to start a backup snapshot, copy
> the entire database and any required logs, and then come up in standby mode.
> All that should be required for this is one or two highport connections to
> the master.  No recovery.conf file is required, or exists.

There's a prototype to stream a base backup from a libpq connection, I
think someone here wanted to integrate that into the replication
protocol itself. It should be doable with a simple libpq connection and
all automated.

The pg_basebackup python client software is 100 lines of code. It's
mainly a recursive query to get the list of files within the master,
then two server side functions to get binary file chunks,
compressed. Then client side, a loop to decompress and write the chunks
at the right place. That's it.
 http://github.com/dimitri/pg_basebackup/blob/master/pg_basebackup.py

I could prepare a patch given some advice on the replication protocol
integration. For one, is streaming a base backup something that
walsender should care about?

> 5. I can to cause the standby to fail over with a single command to the
> failover server.  If this is a trigger file, then it already has a default
> path to the trigger file in postgresql.conf, so that this does not require
> reconfiguration and restart of the standby at crisis time. Ideally, I use a
> "pg_failover" command or something similar.

This feature is in walmgr.py from Skytools and it's something necessary
to have in -core now that we have failover standby capacity. Much
agreed, and the pg_failover command is a good idea.

BTW, do we have a clear idea of how to implement pg_ping, and should it
reports current WAL location(s) of a standby?

> 6. Should I decide to make the standby the new master, this should also be
> possible with a single command and a one-line configuration on the other
> standbys.  To aid this, we have an easy way to tell which standby in a group
> are most "caught up".  If I try to promote the wrong standby (it's behind or
> somehow incompatible), it should fail with an appropriate message.

That needs a way to define a group of standby. There's nothing there
that makes them know about each other. That could fall off the automated
registration of them in a shared catalog on the master, with this shared
catalog spread over (hard-coded) asynchronous replication (sync ==
disaster here). But there's no agreement on this feature yet.

Then you need a way to organise them in groups in this shared catalog,
and you need to ask your network admins to make it so that they can
communicate with each other.

Now say we have pg_ping (or another tool) returning the current recv,
applied and synced LSNs, it would be possible for any standby to figure
out which other ones must be shot in case you failover here. The
failover command could list those other standby in the group that you're
behind of, and with a force command allow you to still failover to this
one. Now you have to STONITH the one listed, but that's your problem
after all.

Then, of course, any standby that's not in the same group as the one
that you failed over to has to be checked and resynced.

> 7. Should I choose to use archive files as well as streaming replication,
> the utilities to manage them (such as pg_archivecleanup and pg_standby) are
> built and installed with PostgreSQL by default, and do not require complex
> settings with escape codes.

Now that PITR has been in for a long enough time, we *need* to take it
to next step integration-wise. By that I mean that we have to support
internal commands and provide reasonable default implementation of the
different scripts needed (in portable C, hence "internal").

There are too many pitfalls in this part of the setup to be serious in
documenting them all and expecting people to come up with bash or perl
implementations that avoid them all. That used to be good enough, but
Josh is right, we need to get even better!

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: WIP: extensible enums
Next
From: KaiGai Kohei
Date:
Subject: Re: leaky views, yet again