Thread: Portworx snapshots

Portworx snapshots

From
Ghislain ROUVIGNAC
Date:
Hello,


We are working on integrating portworx on kubernetes as our volume provider for PostgreSQL.

Portworx says that on a running PostgreSQL it can:
  • replicate volumes for failover
  • take snapshots of volumes
  • backup volumes

There are some articles describing these capabilities:


Can someone comment on these features ?
Does someone use them in production ?  
How reliable are these features ?
Are there performance impacts of snapshots ?


Thank you for your answers.


Cordialement,

Ghislain Rouvignac


Email : ghr@sylob.com

Bureau : + 33 (0)5 63 53 08 18


7, rue Marcel Dassault

ZA de la Mouline

81990 Cambon d’Albi

France


Notice de confidentialité : Les informations contenues dans ce courriel sont strictement confidentielles et réservées à l'usage de la ou des personne(s) identifiée(s) comme destinataire(s). L'usage, la publication, la copie, la divulgation ou la transmission des informations contenues dans ce message ou les documents qui y sont attachés est interdit à moins d'y avoir été expressément autorisé par l'émetteur. Si vous avez reçu ce message par erreur, merci de le supprimer et d'en avertir immédiatement son expéditeur.

Re: Portworx snapshots

From
Stephen Frost
Date:
Greetings,

* Ghislain ROUVIGNAC (ghr@sylob.com) wrote:
> Portworx says that on a running PostgreSQL it can:
>
>    - replicate volumes for failover
>    - take snapshots of volumes
>    - backup volumes

The downside with any snapshot-style approach is that it means that when
you have a failure, you have to go through and replay all the WAL since
the last checkpoint, which is single-threaded and can take a large
amount of time.

This is why PostgreSQL has streaming replication, where we are
constantly sending WAL to the replica and replaying it immediately, and
that also allows us to have synchronous replication that is quorum based
and works with PostgreSQL, unlike what a snapshot level system would
provide.

When doing your testing, I'd strongly recommend that you have a large
max_wal_size, run a large pgbench which writes a lot of data, and see
how long a failover takes with this system.

> Does someone use them in production ?
> How reliable are these features ?
> Are there performance impacts of snapshots ?

I don't know anything about the actual utilization of this in production
or if this implementation is reliable, just to be clear.  My comments
specifically are about the performance of using a snapshot-based
approach (which could be this solution or various other ones).

Thanks!

Stephen

Attachment

Re: Portworx snapshots

From
Laurenz Albe
Date:
Stephen Frost wrote:
> The downside with any snapshot-style approach is that it means that when
> you have a failure, you have to go through and replay all the WAL since
> the last checkpoint, which is single-threaded and can take a large
> amount of time.
> 
> When doing your testing, I'd strongly recommend that you have a large
> max_wal_size, run a large pgbench which writes a lot of data, and see
> how long a failover takes with this system.

Then "checkpoint_timeout" should also be large, right?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



Re: Portworx snapshots

From
Ghislain ROUVIGNAC
Date:
Stephen,


Our application don't write lot of data, so i don't think the time taken on replaying the WAL will be an issue for us.


For reliability, as you said, i was thinking in running a large pgbench which writes a lot of data, while taking snapshots.
Then my idea was to restart from snapshots and see if everything works as expected.
I thought that based on the feedback from the community, maybe i wouldn't need to run these tests.


Thank you.

Cordialement,

Ghislain Rouvignac


Email : ghr@sylob.com

Bureau : + 33 (0)5 63 53 08 18


7, rue Marcel Dassault

ZA de la Mouline

81990 Cambon d’Albi

France



Le dim. 28 oct. 2018 à 16:35, Stephen Frost <sfrost@snowman.net> a écrit :
Greetings,

* Ghislain ROUVIGNAC (ghr@sylob.com) wrote:
> Portworx says that on a running PostgreSQL it can:
>
>    - replicate volumes for failover
>    - take snapshots of volumes
>    - backup volumes

The downside with any snapshot-style approach is that it means that when
you have a failure, you have to go through and replay all the WAL since
the last checkpoint, which is single-threaded and can take a large
amount of time.

This is why PostgreSQL has streaming replication, where we are
constantly sending WAL to the replica and replaying it immediately, and
that also allows us to have synchronous replication that is quorum based
and works with PostgreSQL, unlike what a snapshot level system would
provide.

When doing your testing, I'd strongly recommend that you have a large
max_wal_size, run a large pgbench which writes a lot of data, and see
how long a failover takes with this system.

> Does someone use them in production ?
> How reliable are these features ?
> Are there performance impacts of snapshots ?

I don't know anything about the actual utilization of this in production
or if this implementation is reliable, just to be clear.  My comments
specifically are about the performance of using a snapshot-based
approach (which could be this solution or various other ones).

Thanks!

Stephen

Notice de confidentialité : Les informations contenues dans ce courriel sont strictement confidentielles et réservées à l'usage de la ou des personne(s) identifiée(s) comme destinataire(s). L'usage, la publication, la copie, la divulgation ou la transmission des informations contenues dans ce message ou les documents qui y sont attachés est interdit à moins d'y avoir été expressément autorisé par l'émetteur. Si vous avez reçu ce message par erreur, merci de le supprimer et d'en avertir immédiatement son expéditeur.

Re: Portworx snapshots

From
Stephen Frost
Date:
Greetings,

* Laurenz Albe (laurenz.albe@cybertec.at) wrote:
> Stephen Frost wrote:
> > The downside with any snapshot-style approach is that it means that when
> > you have a failure, you have to go through and replay all the WAL since
> > the last checkpoint, which is single-threaded and can take a large
> > amount of time.
> >
> > When doing your testing, I'd strongly recommend that you have a large
> > max_wal_size, run a large pgbench which writes a lot of data, and see
> > how long a failover takes with this system.
>
> Then "checkpoint_timeout" should also be large, right?

Having a larger checkpoint timeout would also show that this method of
failover runs the risk of there being a very long time required between
when the failure is detected and when the new primary is online.

Thanks!

Stephen

Attachment

Re: Portworx snapshots

From
Stephen Frost
Date:
Greetings,

* Ghislain ROUVIGNAC (ghr@sylob.com) wrote:
> Our application don't write lot of data, so i don't think the time taken on
> replaying the WAL will be an issue for us.

That certainly makes things simpler.

Then again, if you are not writing a lot of data then you might consider
using synchronous replication with PostgreSQL if you want to have a
durability guarantee which is across multiple otherwise independent
systems.  You can then also combine that with a proper backup solution
(please, do not try and build your own) and WAL archiving and be able to
perform PITR (point-in-time-recovery), which snapshots don't give you.

> For reliability, as you said, i was thinking in running a large pgbench
> which writes a lot of data, while taking snapshots.
> Then my idea was to restart from snapshots and see if everything works as
> expected.

Sure, testing is good and should be done regardless of what solution you
employ.

> I thought that based on the feedback from the community, maybe i wouldn't
> need to run these tests.

You should always run your own tests, and do them regularly, including
testing things like "am I able to restore this backup?", "am I able to
fail over to this other server?", etc.

Thanks!

Stephen

Attachment