Thread: Portworx snapshots
Hello,
We are working on integrating portworx on kubernetes as our volume provider for PostgreSQL.
Portworx says that on a running PostgreSQL it can:
- replicate volumes for failover
- take snapshots of volumes
- backup volumes
There are some articles describing these capabilities:
Can someone comment on these features ?
Does someone use them in production ?
How reliable are these features ?
Are there performance impacts of snapshots ?
Thank you for your answers.
Cordialement,
Ghislain Rouvignac
Email : ghr@sylob.com
Bureau : + 33 (0)5 63 53 08 18
7, rue Marcel Dassault
ZA de la Mouline
81990 Cambon d’Albi
France
Notice de confidentialité : Les informations contenues dans ce courriel sont strictement confidentielles et réservées à l'usage de la ou des personne(s) identifiée(s) comme destinataire(s). L'usage, la publication, la copie, la divulgation ou la transmission des informations contenues dans ce message ou les documents qui y sont attachés est interdit à moins d'y avoir été expressément autorisé par l'émetteur. Si vous avez reçu ce message par erreur, merci de le supprimer et d'en avertir immédiatement son expéditeur.
Greetings, * Ghislain ROUVIGNAC (ghr@sylob.com) wrote: > Portworx says that on a running PostgreSQL it can: > > - replicate volumes for failover > - take snapshots of volumes > - backup volumes The downside with any snapshot-style approach is that it means that when you have a failure, you have to go through and replay all the WAL since the last checkpoint, which is single-threaded and can take a large amount of time. This is why PostgreSQL has streaming replication, where we are constantly sending WAL to the replica and replaying it immediately, and that also allows us to have synchronous replication that is quorum based and works with PostgreSQL, unlike what a snapshot level system would provide. When doing your testing, I'd strongly recommend that you have a large max_wal_size, run a large pgbench which writes a lot of data, and see how long a failover takes with this system. > Does someone use them in production ? > How reliable are these features ? > Are there performance impacts of snapshots ? I don't know anything about the actual utilization of this in production or if this implementation is reliable, just to be clear. My comments specifically are about the performance of using a snapshot-based approach (which could be this solution or various other ones). Thanks! Stephen
Attachment
Stephen Frost wrote: > The downside with any snapshot-style approach is that it means that when > you have a failure, you have to go through and replay all the WAL since > the last checkpoint, which is single-threaded and can take a large > amount of time. > > When doing your testing, I'd strongly recommend that you have a large > max_wal_size, run a large pgbench which writes a lot of data, and see > how long a failover takes with this system. Then "checkpoint_timeout" should also be large, right? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Stephen,
Our application don't write lot of data, so i don't think the time taken on replaying the WAL will be an issue for us.
For reliability, as you said, i was thinking in running a large pgbench which writes a lot of data, while taking snapshots.
Then my idea was to restart from snapshots and see if everything works as expected.
I thought that based on the feedback from the community, maybe i wouldn't need to run these tests.
Thank you.
Cordialement,
Ghislain Rouvignac
Email : ghr@sylob.com
Bureau : + 33 (0)5 63 53 08 18
7, rue Marcel Dassault
ZA de la Mouline
81990 Cambon d’Albi
France
Le dim. 28 oct. 2018 à 16:35, Stephen Frost <sfrost@snowman.net> a écrit :
Greetings,
* Ghislain ROUVIGNAC (ghr@sylob.com) wrote:
> Portworx says that on a running PostgreSQL it can:
>
> - replicate volumes for failover
> - take snapshots of volumes
> - backup volumes
The downside with any snapshot-style approach is that it means that when
you have a failure, you have to go through and replay all the WAL since
the last checkpoint, which is single-threaded and can take a large
amount of time.
This is why PostgreSQL has streaming replication, where we are
constantly sending WAL to the replica and replaying it immediately, and
that also allows us to have synchronous replication that is quorum based
and works with PostgreSQL, unlike what a snapshot level system would
provide.
When doing your testing, I'd strongly recommend that you have a large
max_wal_size, run a large pgbench which writes a lot of data, and see
how long a failover takes with this system.
> Does someone use them in production ?
> How reliable are these features ?
> Are there performance impacts of snapshots ?
I don't know anything about the actual utilization of this in production
or if this implementation is reliable, just to be clear. My comments
specifically are about the performance of using a snapshot-based
approach (which could be this solution or various other ones).
Thanks!
Stephen
Notice de confidentialité : Les informations contenues dans ce courriel sont strictement confidentielles et réservées à l'usage de la ou des personne(s) identifiée(s) comme destinataire(s). L'usage, la publication, la copie, la divulgation ou la transmission des informations contenues dans ce message ou les documents qui y sont attachés est interdit à moins d'y avoir été expressément autorisé par l'émetteur. Si vous avez reçu ce message par erreur, merci de le supprimer et d'en avertir immédiatement son expéditeur.
Greetings, * Laurenz Albe (laurenz.albe@cybertec.at) wrote: > Stephen Frost wrote: > > The downside with any snapshot-style approach is that it means that when > > you have a failure, you have to go through and replay all the WAL since > > the last checkpoint, which is single-threaded and can take a large > > amount of time. > > > > When doing your testing, I'd strongly recommend that you have a large > > max_wal_size, run a large pgbench which writes a lot of data, and see > > how long a failover takes with this system. > > Then "checkpoint_timeout" should also be large, right? Having a larger checkpoint timeout would also show that this method of failover runs the risk of there being a very long time required between when the failure is detected and when the new primary is online. Thanks! Stephen
Attachment
Greetings, * Ghislain ROUVIGNAC (ghr@sylob.com) wrote: > Our application don't write lot of data, so i don't think the time taken on > replaying the WAL will be an issue for us. That certainly makes things simpler. Then again, if you are not writing a lot of data then you might consider using synchronous replication with PostgreSQL if you want to have a durability guarantee which is across multiple otherwise independent systems. You can then also combine that with a proper backup solution (please, do not try and build your own) and WAL archiving and be able to perform PITR (point-in-time-recovery), which snapshots don't give you. > For reliability, as you said, i was thinking in running a large pgbench > which writes a lot of data, while taking snapshots. > Then my idea was to restart from snapshots and see if everything works as > expected. Sure, testing is good and should be done regardless of what solution you employ. > I thought that based on the feedback from the community, maybe i wouldn't > need to run these tests. You should always run your own tests, and do them regularly, including testing things like "am I able to restore this backup?", "am I able to fail over to this other server?", etc. Thanks! Stephen