Re: Followup Timestamp to timestamp with TZ conversion - Mailing list pgsql-hackers

From Peter Volk
Subject Re: Followup Timestamp to timestamp with TZ conversion
Date
Msg-id CAFWHPeZWrdDd1zFpp+sS4zOO6c6jbD-cpsfh1KtG=PrcwnfS3w@mail.gmail.com
Whole thread Raw
In response to Re: Followup Timestamp to timestamp with TZ conversion  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Followup Timestamp to timestamp with TZ conversion
List pgsql-hackers
Hi Tom,

thanks for the reply, I do understand that if a rewrite of the table
needs to be avoided the binary image needs to be the same. Since PG 12
there is an optimisation to avoid a rewrite of timestamp columns if
they are converted to timestamp with tz and the target tz offset is 0

I am referring to the function

ATColumnChangeRequiresRewrite(Node *expr, AttrNumber varattno)

in which the following is checked:

(b/src/backend/commands/tablecmds.c)

else if (IsA(expr, FuncExpr))
      {
          FuncExpr   *f = (FuncExpr *) expr;

           switch (f->funcid)
           {
               case F_TIMESTAMPTZ_TIMESTAMP:
               case F_TIMESTAMP_TIMESTAMPTZ:
                   if (TimestampTimestampTzRequiresRewrite())
                       return true;
                   else
                       expr = linitial(f->args);
                   break;
               default:
                 return true;
           }


and TimestampTimestampTzRequiresRewrite checks if the offset is 0:

(b/src/backend/utils/adt/timestamp.c)

 TimestampTimestampTzRequiresRewrite()
 *
 * Returns false if the TimeZone GUC setting causes timestamp_timestamptz and
 * timestamptz_timestamp to be no-ops, where the return value has the same
 * bits as the argument.  Since project convention is to assume a GUC changes
 * no more often than STABLE functions change, the answer is valid that long.
 */
bool
TimestampTimestampTzRequiresRewrite(void)
{
   long        offset;

   if (pg_get_timezone_offset(session_timezone, &offset) && offset == 0)
       PG_RETURN_BOOL(false);
   PG_RETURN_BOOL(true);
}

So in this case it is already proven that there is a binary equality
between the data types timestamp and timestamp with tz if the offset
is considered with 0. Hence this type of optimisation should / could
also apply to indexes as well as the columns used in partitions

Thanks,
Peter


On Thu, Jul 22, 2021 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Volk <peterb.volk@gmx.net> writes:
> > The problem is that I have a 60TB+ PG installation for which we need to
> > modify all of the timestamp columns to timestamp with tz. The data in the
> > columns are already in UTC so we can benefit from the patch listed above.
> > Yet there are 2 cases in which we are having an issue.
>
> > 1) Index rebuilds: The patch is only avoiding a rewrite of the table data
> > but is not avoiding a rebuild of the indexes. Following the logic in the
> > patch above this should also be avoidable under the same condition
>
> I don't think that follows.  What we are concerned about when determining
> whether a heap rewrite can be skipped is whether the stored heap entries
> are bit-compatible between the two data types.  To decide that an index
> rebuild is not needed, you'd need to further determine that their sort
> orders are equivalent (for a btree index, or who-knows-what semantic
> condition for other types of indexes).  We don't attempt to do that,
> so index rebuilds are always needed.
>
> As a thought experiment to prove that this is an issue, suppose that
> somebody invented an unsigned integer type, and made the cast from
> regular int4 follow the rules of a C cast, so that e.g. -1 becomes
> 2^32-1.  Given that, an ALTER TYPE from int4 to the unsigned type
> could skip the heap rewrite.  But we absolutely would have to rebuild
> any btree index on the column, because the sort ordering of the two
> types is different.  OTOH, it's quite likely that a hash index would
> not really need to be rebuilt.  So this is a real can of worms and
> we've not cared to open it.
>
> > 2) Partitioned tables with the timestamp as partition column: In this case
> > the current version does not allow a modification of the column data type
> > at all.
>
> PG's partitioning features are still being built out, but I would not
> hold my breath for that specific thing to change.  Again, the issue
> is that bit-compatibility of individual values doesn't prove much
> about comparison semantics, so it's not clear that a change of
> data type still allows the value-to-partition assignment to be
> identical.  (This is clearly an issue for RANGE partitioning.  Maybe
> it could be finessed for HASH or LIST, but you'd still be needing
> semantic assumptions that go beyond mere bit-compatibility of values.)
>
>                         regards, tom lane
>
>



pgsql-hackers by date:

Previous
From: Bauyrzhan Sakhariyev
Date:
Subject: Re: truncating timestamps on arbitrary intervals
Next
From: tushar
Date:
Subject: Re: refactoring basebackup.c