Thread: Physical replication from x86_64 to ARM64

Physical replication from x86_64 to ARM64

From
Jan Mußler
Date:

Fellow Postgres Admins and Developers,

With the arrival of ARM compute nodes on AWS and an existing fleet of Postgres clusters running on x86_64 nodes the question arises how to migrate existing Postgres clusters to ARM64 nodes, ideally with zero downtime, as one is used to.

Initial experiments show no observable problems when copying PGDATA or in fact using physical streaming replication between the two CPU architectures. In our case Postgres is using Docker based on Ubuntu 18.04 base images and PGDG packages for Postgres 13. On top of that, we checked existing indexes with the amcheck extension, which did not reveal any issues.

However experiments are not valid to exclude all corner cases, thus we are curious to hear other input on that matter, as we believe this is of relevance to a bigger audience and ARM is not unlikely to be available on other non AWS platforms going forward.

It is our understanding that AWS RDS in fact for Postgres 12 and Postgres 13 allows the change from x86 nodes to ARM nodes on the fly, which gives us some indication that if done right, both platforms are indeed compatible.

Looking forward to your input and discussion points!


--
Jan Mußler
Engineering Manager - Team Acid & Team Aruha | Zalando SE

Re: Physical replication from x86_64 to ARM64

From
Aleksander Alekseev
Date:
Hi Jan,

> Initial experiments show no observable problems when copying PGDATA or in fact using physical streaming replication between the two CPU architectures.

That's an interesting result. The topic of physical replication compatibility interested me much back in 2017 and I raised this question on PGCon [1]. As I recall the compatibility is not guaranteed, nor tested, and not going to be, because the community doesn't have resources for this. The consensus was that to migrate without downtime the user has to use logical replication. Thus what you observe should be considered a hack, and if something will go wrong, you are on your own.

Of course, there is a possibility that something has changed in the past four years. I'm sure somebody on the mailing list will correct me in this case.


--
Best regards,
Aleksander Alekseev

Re: Physical replication from x86_64 to ARM64

From
Tom Lane
Date:
Aleksander Alekseev <aleksander@timescale.com> writes:
>> Initial experiments show no observable problems when copying PGDATA or in
>> fact using physical streaming replication between the two CPU architectures.

> That's an interesting result. The topic of physical replication
> compatibility interested me much back in 2017 and I raised this question on
> PGCon [1]. As I recall the compatibility is not guaranteed, nor tested, and
> not going to be, because the community doesn't have resources for this.

Yeah.  As far as the hardware goes, if you have the same endianness,
struct alignment rules, and floating-point format [1], then physical
replication ought to work.  Where things get far stickier is if the
operating systems aren't identical, because then you have very great
risk of text sorting rules not being the same, leading to index
corruption [2].  In modern practice that tends to be a bigger issue
than the hardware, and we don't have any good way to check for it.

            regards, tom lane

[1] all of which are checked by pg_control fields, btw
[2] https://wiki.postgresql.org/wiki/Locale_data_changes



Re: Physical replication from x86_64 to ARM64

From
Andres Freund
Date:
Hi,


On September 14, 2021 7:11:25 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Aleksander Alekseev <aleksander@timescale.com> writes:
>>> Initial experiments show no observable problems when copying PGDATA or in
>>> fact using physical streaming replication between the two CPU architectures.
>
>> That's an interesting result. The topic of physical replication
>> compatibility interested me much back in 2017 and I raised this question on
>> PGCon [1]. As I recall the compatibility is not guaranteed, nor tested, and
>> not going to be, because the community doesn't have resources for this.
>
>Yeah.  As far as the hardware goes, if you have the same endianness,
>struct alignment rules, and floating-point format [1], then physical
>replication ought to work.  Where things get far stickier is if the
>operating systems aren't identical, because then you have very great
>risk of text sorting rules not being the same, leading to index
>corruption [2].  In modern practice that tends to be a bigger issue
>than the hardware, and we don't have any goo d way to check for it.

I'd also be worried about subtle changes in floating point math results, and that subsequently leading to index
mismatches.Be that because the hardware gives differing results, or because libc differences. 

Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Physical replication from x86_64 to ARM64

From
Dmitry Dolgov
Date:
> On Tue, Sep 14, 2021 at 08:07:19AM -0700, Andres Freund wrote:
>
> >Yeah.  As far as the hardware goes, if you have the same endianness,
> >struct alignment rules, and floating-point format [1], then physical
> >replication ought to work.  Where things get far stickier is if the
> >operating systems aren't identical, because then you have very great
> >risk of text sorting rules not being the same, leading to index
> >corruption [2].  In modern practice that tends to be a bigger issue
> >than the hardware, and we don't have any goo d way to check for it.
>
> I'd also be worried about subtle changes in floating point math results, and that subsequently leading to index
mismatches.Be that because the hardware gives differing results, or because libc differences.
 

The question about hardware side I find interesting, as at least in
Armv-8 case there are claims to be fully IEEE 754 compliant [1]. From
what I see some parts, which are not specified in this standard, are
also implemented similarly on Arm and x86 ([2], [3]). On top of that
many compilers implement at least partial level of IEEE 754 compliance
(e.g. for gcc [4]) by default. The only strange difference I found is
x87 FPU unit (without no SEE2, see [5]), but I'm not sure what could be
consequences of extra precision here. All in all sounds like at least
from the hardware perspective in case of Arm chances for having subtle
differences in floating point math are small -- do I miss anything?

[1]: https://developer.arm.com/architectures/instruction-sets/floating-point
[2]: https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Single-precision_examples
[3]: https://en.wikipedia.org/wiki/Double-precision_floating-point_format
[4]: https://gcc.gnu.org/wiki/FloatingPointMath
[5]: https://gcc.gnu.org/wiki/x87note