Re: Statistics about Streaming Replication deployments in production - Mailing list pgsql-general

From Tomas Vondra
Subject Re: Statistics about Streaming Replication deployments in production
Date
Msg-id 4E347E92.2080704@fuzzy.cz
Whole thread Raw
In response to Statistics about Streaming Replication deployments in production  (Samba <saasira@gmail.com>)
List pgsql-general
Dne 28.7.2011 13:03, Samba napsal(a):
> One concern that is being coined by the our management team is regarding
> the relative stability and 'industrial-strength' of streaming
> replication. Considering that this feature is just one year old, doubts
> are expressed about
>
>   * data integrity -- cancelled long running transactions on Primary
>     must not be applied on the standby

I'm not quite sure what you mean by "apply on the standby." Queries that
run on primary and modify data (e.g. an INSERT) has to apply the changes
to the standby. That's how streaming application works - it maintains a
binary copy of the datafiles. If a query on primary modifies the
datafiles, the change has to be applied to the standby even if the query
is cancelled.

But those changes won't be visible because it was not commited (just as
you can't see the changes on the primary).

>   *  reliability -- what if the network link is broken or one of the
>     pair got crashed when log-segments for a huge committed transaction
>     are being sent from master top standby?

The standby can ask for the changes either the primary or check the WAL
archiving. So even if the network goes down, the standby can get the
data from the archive.

If you care about continuous backups and PITR, you should probably
enable WAL archiving anyway. See this:

http://www.postgresql.org/docs/9.0/static/continuous-archiving.html

>   *  guaranteed recovery (on failover) -- at any moment, one should be
>     able to turn the standby into active and start using it (there
>     should not be a scenario where master crashed and the slave could
>     not be turned active)

I'm not aware of any bug preventing a failover ...

> On account of these, we thought it would be reassuring to our management
> team if we can cite a few existing production deployments and their
> success stories.

I'd like to see that too, but I guess it's bit too early for that. Keep
in mind the SR is just one year old. That's not much, especially for
large projects - it takes time to develop the system, test it, prepare
the production environment etc.

> I think one year is sufficient time for any product/feature to be
> thoroughly tested for all its strengths and weaknesses; so would it be
> too much to ask the vast postgres customer base about their experiences
> with streaming replication, the good, the bad; and perhaps the best and
> the ugly too? It would be great if customers can give their identity
> (employer info) but not necessary though.

Well, yes. I believe the companies have been testing it, bugs were
reported to pgsql-bugs and fixed. That's how it works ;-)

Tomas

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: ERROR: could not read block 4707 of relation 1663/16384/16564: Success
Next
From: Sim Zacks
Date:
Subject: Re: eval function