Thread: Physical replication from x86_64 to ARM64
Fellow Postgres Admins and Developers,
With the arrival of ARM compute nodes on AWS and an existing fleet of Postgres clusters running on x86_64 nodes the question arises how to migrate existing Postgres clusters to ARM64 nodes, ideally with zero downtime, as one is used to.
Initial experiments show no observable problems when copying PGDATA or in fact using physical streaming replication between the two CPU architectures. In our case Postgres is using Docker based on Ubuntu 18.04 base images and PGDG packages for Postgres 13. On top of that, we checked existing indexes with the amcheck extension, which did not reveal any issues.
However experiments are not valid to exclude all corner cases, thus we are curious to hear other input on that matter, as we believe this is of relevance to a bigger audience and ARM is not unlikely to be available on other non AWS platforms going forward.
It is our understanding that AWS RDS in fact for Postgres 12 and Postgres 13 allows the change from x86 nodes to ARM nodes on the fly, which gives us some indication that if done right, both platforms are indeed compatible.
Looking forward to your input and discussion points!
--
Aleksander Alekseev <aleksander@timescale.com> writes: >> Initial experiments show no observable problems when copying PGDATA or in >> fact using physical streaming replication between the two CPU architectures. > That's an interesting result. The topic of physical replication > compatibility interested me much back in 2017 and I raised this question on > PGCon [1]. As I recall the compatibility is not guaranteed, nor tested, and > not going to be, because the community doesn't have resources for this. Yeah. As far as the hardware goes, if you have the same endianness, struct alignment rules, and floating-point format [1], then physical replication ought to work. Where things get far stickier is if the operating systems aren't identical, because then you have very great risk of text sorting rules not being the same, leading to index corruption [2]. In modern practice that tends to be a bigger issue than the hardware, and we don't have any good way to check for it. regards, tom lane [1] all of which are checked by pg_control fields, btw [2] https://wiki.postgresql.org/wiki/Locale_data_changes
Hi, On September 14, 2021 7:11:25 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Aleksander Alekseev <aleksander@timescale.com> writes: >>> Initial experiments show no observable problems when copying PGDATA or in >>> fact using physical streaming replication between the two CPU architectures. > >> That's an interesting result. The topic of physical replication >> compatibility interested me much back in 2017 and I raised this question on >> PGCon [1]. As I recall the compatibility is not guaranteed, nor tested, and >> not going to be, because the community doesn't have resources for this. > >Yeah. As far as the hardware goes, if you have the same endianness, >struct alignment rules, and floating-point format [1], then physical >replication ought to work. Where things get far stickier is if the >operating systems aren't identical, because then you have very great >risk of text sorting rules not being the same, leading to index >corruption [2]. In modern practice that tends to be a bigger issue >than the hardware, and we don't have any goo d way to check for it. I'd also be worried about subtle changes in floating point math results, and that subsequently leading to index mismatches.Be that because the hardware gives differing results, or because libc differences. Regards, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
> On Tue, Sep 14, 2021 at 08:07:19AM -0700, Andres Freund wrote: > > >Yeah. As far as the hardware goes, if you have the same endianness, > >struct alignment rules, and floating-point format [1], then physical > >replication ought to work. Where things get far stickier is if the > >operating systems aren't identical, because then you have very great > >risk of text sorting rules not being the same, leading to index > >corruption [2]. In modern practice that tends to be a bigger issue > >than the hardware, and we don't have any goo d way to check for it. > > I'd also be worried about subtle changes in floating point math results, and that subsequently leading to index mismatches.Be that because the hardware gives differing results, or because libc differences. The question about hardware side I find interesting, as at least in Armv-8 case there are claims to be fully IEEE 754 compliant [1]. From what I see some parts, which are not specified in this standard, are also implemented similarly on Arm and x86 ([2], [3]). On top of that many compilers implement at least partial level of IEEE 754 compliance (e.g. for gcc [4]) by default. The only strange difference I found is x87 FPU unit (without no SEE2, see [5]), but I'm not sure what could be consequences of extra precision here. All in all sounds like at least from the hardware perspective in case of Arm chances for having subtle differences in floating point math are small -- do I miss anything? [1]: https://developer.arm.com/architectures/instruction-sets/floating-point [2]: https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Single-precision_examples [3]: https://en.wikipedia.org/wiki/Double-precision_floating-point_format [4]: https://gcc.gnu.org/wiki/FloatingPointMath [5]: https://gcc.gnu.org/wiki/x87note