Both the streaming replication and logical replication areas of the code
are, approximately, utterly broken when !HAVE_INT64_TIMESTAMPS. (The fact
that "make check-world" passes anyway is an indictment of the quality of
the regression tests.)
I started poking around in this area after Thomas Munro reported that
pg_recvlogical.c no longer even compiles with --disable-integer-datetimes,
but the problems go much deeper than that.
replication/logical/worker.c's send_feedback() is sending a backend
timestamp directly as sendTime. This is wrong per the protocol spec,
which says the field is an integer timestamp. I'm not sure if it's
OK to just replace
- pq_sendint64(reply_message, now); /* sendTime */
+ pq_sendint64(reply_message, GetCurrentIntegerTimestamp()); /* sendTime */
... is it important for the sent timestamp to exactly match the
timestamp that was used earlier in the function for deciding
whether to send or not? Should we instead try to convert the
earlier logic to use integer timestamps?
As for logical replication, logicalrep_write_begin and
logicalrep_write_commit are likewise sending TimestampTz values using
pq_sendint64, and this is much harder to patch locally since the values
were taken from WAL records that contain TimestampTz. Although the sent
values are nonsensical according to the protocol spec,
logicalrep_read_begin and logicalrep_read_commit read them with
pq_getmsgint64 and assign the results back to TimestampTz, which means
that things sort of appear to work as long as you don't mind losing
all sub-second precision in the timestamps. (But a decoder that was
written to spec would think the values are garbage.)
I'm inclined to think that at least for logical replication, we're going
to have to give up the protocol spec wording that says that timestamps
are integers, and instead admit that they are dependent on
disable-integer-datetimes. Maybe we should do the same in the streaming
replication protocol, because this area is going to be a very fertile
source of bugs for the foreseeable future if we don't. It's not like
replication between enable-integer-datetimes and disable-integer-datetimes
hosts has any chance of being useful anyway.
Thoughts? Should we double down on trying to make this work according
to the "all integer timestamps" protocol specs, or cut our losses and
change the specs?
regards, tom lane