Re: Followup Timestamp to timestamp with TZ conversion - Mailing list pgsql-hackers
From | Peter Volk |
---|---|
Subject | Re: Followup Timestamp to timestamp with TZ conversion |
Date | |
Msg-id | CAFWHPeZWrdDd1zFpp+sS4zOO6c6jbD-cpsfh1KtG=PrcwnfS3w@mail.gmail.com Whole thread Raw |
In response to | Re: Followup Timestamp to timestamp with TZ conversion (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Followup Timestamp to timestamp with TZ conversion
|
List | pgsql-hackers |
Hi Tom, thanks for the reply, I do understand that if a rewrite of the table needs to be avoided the binary image needs to be the same. Since PG 12 there is an optimisation to avoid a rewrite of timestamp columns if they are converted to timestamp with tz and the target tz offset is 0 I am referring to the function ATColumnChangeRequiresRewrite(Node *expr, AttrNumber varattno) in which the following is checked: (b/src/backend/commands/tablecmds.c) else if (IsA(expr, FuncExpr)) { FuncExpr *f = (FuncExpr *) expr; switch (f->funcid) { case F_TIMESTAMPTZ_TIMESTAMP: case F_TIMESTAMP_TIMESTAMPTZ: if (TimestampTimestampTzRequiresRewrite()) return true; else expr = linitial(f->args); break; default: return true; } and TimestampTimestampTzRequiresRewrite checks if the offset is 0: (b/src/backend/utils/adt/timestamp.c) TimestampTimestampTzRequiresRewrite() * * Returns false if the TimeZone GUC setting causes timestamp_timestamptz and * timestamptz_timestamp to be no-ops, where the return value has the same * bits as the argument. Since project convention is to assume a GUC changes * no more often than STABLE functions change, the answer is valid that long. */ bool TimestampTimestampTzRequiresRewrite(void) { long offset; if (pg_get_timezone_offset(session_timezone, &offset) && offset == 0) PG_RETURN_BOOL(false); PG_RETURN_BOOL(true); } So in this case it is already proven that there is a binary equality between the data types timestamp and timestamp with tz if the offset is considered with 0. Hence this type of optimisation should / could also apply to indexes as well as the columns used in partitions Thanks, Peter On Thu, Jul 22, 2021 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Peter Volk <peterb.volk@gmx.net> writes: > > The problem is that I have a 60TB+ PG installation for which we need to > > modify all of the timestamp columns to timestamp with tz. The data in the > > columns are already in UTC so we can benefit from the patch listed above. > > Yet there are 2 cases in which we are having an issue. > > > 1) Index rebuilds: The patch is only avoiding a rewrite of the table data > > but is not avoiding a rebuild of the indexes. Following the logic in the > > patch above this should also be avoidable under the same condition > > I don't think that follows. What we are concerned about when determining > whether a heap rewrite can be skipped is whether the stored heap entries > are bit-compatible between the two data types. To decide that an index > rebuild is not needed, you'd need to further determine that their sort > orders are equivalent (for a btree index, or who-knows-what semantic > condition for other types of indexes). We don't attempt to do that, > so index rebuilds are always needed. > > As a thought experiment to prove that this is an issue, suppose that > somebody invented an unsigned integer type, and made the cast from > regular int4 follow the rules of a C cast, so that e.g. -1 becomes > 2^32-1. Given that, an ALTER TYPE from int4 to the unsigned type > could skip the heap rewrite. But we absolutely would have to rebuild > any btree index on the column, because the sort ordering of the two > types is different. OTOH, it's quite likely that a hash index would > not really need to be rebuilt. So this is a real can of worms and > we've not cared to open it. > > > 2) Partitioned tables with the timestamp as partition column: In this case > > the current version does not allow a modification of the column data type > > at all. > > PG's partitioning features are still being built out, but I would not > hold my breath for that specific thing to change. Again, the issue > is that bit-compatibility of individual values doesn't prove much > about comparison semantics, so it's not clear that a change of > data type still allows the value-to-partition assignment to be > identical. (This is clearly an issue for RANGE partitioning. Maybe > it could be finessed for HASH or LIST, but you'd still be needing > semantic assumptions that go beyond mere bit-compatibility of values.) > > regards, tom lane > >
pgsql-hackers by date: