Thread: List of "binary-compatible" data types

List of "binary-compatible" data types

From
Josh Berkus
Date:
Folks,

From our docs:

"Adding a column with a non-null default or changing the type of an
existing column will require the entire table and indexes to be
rewritten. As an exception, if the USING clause does not change the
column contents and the old type is either binary coercible to the new
type or an unconstrained domain over the new type, a table rewrite is
not needed ..."

Which is nice, but nowhere do we present users with a set of
binary-compatible data types, even among the built-in types.  I'd
happily write this up, if I knew what the binary-compatible data types
*were*.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: List of "binary-compatible" data types

From
Thom Brown
Date:
On 4 November 2013 21:58, Josh Berkus <josh@agliodbs.com> wrote:
> Folks,
>
> From our docs:
>
> "Adding a column with a non-null default or changing the type of an
> existing column will require the entire table and indexes to be
> rewritten. As an exception, if the USING clause does not change the
> column contents and the old type is either binary coercible to the new
> type or an unconstrained domain over the new type, a table rewrite is
> not needed ..."
>
> Which is nice, but nowhere do we present users with a set of
> binary-compatible data types, even among the built-in types.  I'd
> happily write this up, if I knew what the binary-compatible data types
> *were*.

You could try this:

SELECT castsource::regtype::text, array_agg(casttarget::regtype order by casttarget::regtype::text) casttargets
FROM pg_cast
WHERE castmethod = 'b'
GROUP BY 1
ORDER BY 1;

-- 
Thom



Re: List of "binary-compatible" data types

From
Josh Berkus
Date:
Thom,


> SELECT
>   castsource::regtype::text,
>   array_agg(casttarget::regtype order by casttarget::regtype::text) casttargets
> FROM pg_cast
> WHERE castmethod = 'b'
> GROUP BY 1
> ORDER BY 1;

Are we actually covering 100% of these for ALTER COLUMN now?


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: List of "binary-compatible" data types

From
Josh Berkus
Date:
On 11/04/2013 05:21 PM, Josh Berkus wrote:
> Thom,
> 
> 
>> SELECT
>>   castsource::regtype::text,
>>   array_agg(casttarget::regtype order by casttarget::regtype::text) casttargets
>> FROM pg_cast
>> WHERE castmethod = 'b'
>> GROUP BY 1
>> ORDER BY 1;
> 
> Are we actually covering 100% of these for ALTER COLUMN now?

Also, JSON <--> Text seems to be missing from the possible binary
conversions.  That's a TODO, I suppose.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: List of "binary-compatible" data types

From
Noah Misch
Date:
On Mon, Nov 04, 2013 at 05:23:36PM -0800, Josh Berkus wrote:
> On 11/04/2013 05:21 PM, Josh Berkus wrote:
> > Thom,
> > 
> > 
> >> SELECT
> >>   castsource::regtype::text,
> >>   array_agg(casttarget::regtype order by casttarget::regtype::text) casttargets
> >> FROM pg_cast
> >> WHERE castmethod = 'b'
> >> GROUP BY 1
> >> ORDER BY 1;
> > 
> > Are we actually covering 100% of these for ALTER COLUMN now?

Yes; ALTER TABLE ALTER TYPE refers to the same metadata as Thom's query.  If
you add to the list by issuing CREATE CAST ... WITHOUT FUNCTION, ALTER TABLE
will respect that, too.

> Also, JSON <--> Text seems to be missing from the possible binary
> conversions.  That's a TODO, I suppose.

Only json --> text, not json <-- text.  Note that you can add the cast
manually if you have an immediate need.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: List of "binary-compatible" data types

From
Josh Berkus
Date:
Noah,

>> Also, JSON <--> Text seems to be missing from the possible binary
>> conversions.  That's a TODO, I suppose.
> 
> Only json --> text, not json <-- text.  Note that you can add the cast
> manually if you have an immediate need.

Huh?  Why would text --> JSON require a physical rewrite?  We have to
validate it, sure, but we don't need to rewrite it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: List of "binary-compatible" data types

From
Noah Misch
Date:
On Tue, Nov 05, 2013 at 10:00:15AM -0800, Josh Berkus wrote:
> Noah,
> 
> >> Also, JSON <--> Text seems to be missing from the possible binary
> >> conversions.  That's a TODO, I suppose.
> > 
> > Only json --> text, not json <-- text.  Note that you can add the cast
> > manually if you have an immediate need.
> 
> Huh?  Why would text --> JSON require a physical rewrite?  We have to
> validate it, sure, but we don't need to rewrite it.

That's all true, but the system has no concept like "this cast validates the
data, never changing it".  We would first need to add metadata supporting such
a concept.  On the other hand, "create cast (json as text) without function;"
leans only on concepts the system already knows.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: List of "binary-compatible" data types

From
Josh Berkus
Date:
Noah,

> That's all true, but the system has no concept like "this cast validates the
> data, never changing it".  We would first need to add metadata supporting such
> a concept.  On the other hand, "create cast (json as text) without function;"
> leans only on concepts the system already knows.
> 

Yeah, I'm thinking it might be worth coming up with a solution for that
specific case.  As users upgrade from 9.0 and 9.1 to 9.3, they're going
to want to convert their text columns containing JSON to columns of the
JSON type, and are going to be surprised how painful that is.

Of course, if we get binary JSON in 9.4 (Oleg?), then a binary
conversion will be required, so maybe it's a moot point.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: List of "binary-compatible" data types

From
Andres Freund
Date:
On 2013-11-05 11:15:29 -0800, Josh Berkus wrote:
> Noah,
> 
> > That's all true, but the system has no concept like "this cast validates the
> > data, never changing it".  We would first need to add metadata supporting such
> > a concept.  On the other hand, "create cast (json as text) without function;"
> > leans only on concepts the system already knows.
> > 
> 
> Yeah, I'm thinking it might be worth coming up with a solution for that
> specific case.  As users upgrade from 9.0 and 9.1 to 9.3, they're going
> to want to convert their text columns containing JSON to columns of the
> JSON type, and are going to be surprised how painful that is.

There's zap chance of doing anything for 9.3, this would require quite a
bit of code in tablecmds.c and that surely isn't going to happen in the
backbranches.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: List of "binary-compatible" data types

From
Josh Berkus
Date:
Andres,

> There's zap chance of doing anything for 9.3, this would require quite a
> bit of code in tablecmds.c and that surely isn't going to happen in the
> backbranches.

Oh, sure, I was thinking of a workaround.

Actually, being able to separate "need to check contents" from "need to
rewrite values" could be useful for a lot of type conversions.

I'd also love some way of doing a no-rewrite conversion between
timestamp and timestamptz, based on the assumption that the original
values are UTC time.  That's one I encounter a lot.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



No-rewrite timestamp<->timestamptz conversions

From
Noah Misch
Date:
On Tue, Nov 05, 2013 at 05:02:58PM -0800, Josh Berkus wrote:
> I'd also love some way of doing a no-rewrite conversion between
> timestamp and timestamptz, based on the assumption that the original
> values are UTC time.  That's one I encounter a lot.

It was such a conversion that motivated me to add the no-rewrite ALTER TABLE
ALTER TYPE support in the first place.  Interesting.  Support for it didn't
end up in any submitted patch due to a formal problem: a protransform function
shall only consult IMMUTABLE facts, but we posit that timezone==UTC is a
STABLE observation.  However, a protransform function can easily simplify the
immutable expression "tscol AT TIME ZONE 'UTC'", avoiding a rewrite.  See
attached patch.  Examples:

begin;
create table t (c timestamptz);
set client_min_messages = debug1;
-- rewrite: depends on timezone GUC
alter table t alter c type timestamp;
-- rewrite: depends on timezone GUC
alter table t alter c type timestamptz;
-- no rewrite: always UTC+0
alter table t alter c type timestamp using c at time zone 'UTC';
-- no rewrite: always UTC+0
alter table t alter c type timestamptz using c at time zone 'Etc/Universal';
-- rewrite: always UTC+0 in the present day, but not historically
alter table t alter c type timestamp using c at time zone 'Atlantic/Reykjavik';
-- rewrite: always UTC+0 in the present day, but not historically
alter table t alter c type timestamptz using c at time zone 'Africa/Lome';
-- no rewrite: always UTC+0
alter table t alter c type timestamp using c at time zone 'GMT';
-- rewrite: always UTC+1
alter table t alter c type timestamptz using c at time zone '1 hour'::interval;
-- no rewrite: always UTC+0
alter table t alter c type timestamp using c at time zone '0 hour'::interval;
rollback;

Attachment

Re: No-rewrite timestamp<->timestamptz conversions

From
Noah Misch
Date:
On Thu, Feb 05, 2015 at 08:36:18PM -0500, Noah Misch wrote:
> On Tue, Nov 05, 2013 at 05:02:58PM -0800, Josh Berkus wrote:
> > I'd also love some way of doing a no-rewrite conversion between
> > timestamp and timestamptz, based on the assumption that the original
> > values are UTC time.  That's one I encounter a lot.
> 
> It was such a conversion that motivated me to add the no-rewrite ALTER TABLE
> ALTER TYPE support in the first place.  Interesting.  Support for it didn't
> end up in any submitted patch due to a formal problem: a protransform function
> shall only consult IMMUTABLE facts, but we posit that timezone==UTC is a
> STABLE observation.  However, a protransform function can easily simplify the
> immutable expression "tscol AT TIME ZONE 'UTC'", avoiding a rewrite.  See
> attached patch.

This (commit b8a18ad) ended up causing wrong EXPLAIN output and wrong
indxpath.c processing.  Hence, commit c22ecc6 neutralized the optimization;
see that commit's threads for details.  I pondered ways to solve those
problems, but I didn't come up with anything satisfying for EXPLAIN.  (One
dead-end thought was to introduce an ExprShortcut node having "Node
*semantics" and "Node *shortcut" fields, where "semantics" is deparsed for
EXPLAIN and "shortcut" is actually evaluated.  That would require teaching
piles of code about the new node type, which isn't appropriate for the benefit
in question.)

Stepping back a bit, commit b8a18ad didn't provide a great UI.  I doubt folks
write queries this way spontaneously; to do so, they would have needed to
learn that such syntax enables this optimization.  If I'm going to do
something more invasive, it should optimize the idiomatic "alter table t alter
timestamptzcol type timestamp".  One could do that with a facility like
SupportRequestSimplify except permitted to consider STABLE facts.  I suppose I
could add a volatility field to SupportRequestSimplify.  So far, I can't think
of a second use case for such a facility, so instead I think
ATColumnChangeRequiresRewrite() should have a hard-wired call for
F_TIMESTAMPTZ_TIMESTAMP and F_TIMESTAMP_TIMESTAMPTZ.  Patch attached.  If we
find more applications of this concept, it shouldn't be hard to migrate this
logic into SupportRequestSimplify.  Does anyone think that's better to do from
the start?

Thanks,
nm

Attachment

Re: No-rewrite timestamp<->timestamptz conversions

From
Simon Riggs
Date:
On Tue, 26 Feb 2019 at 06:14, Noah Misch <noah@leadboat.com> wrote:
On Thu, Feb 05, 2015 at 08:36:18PM -0500, Noah Misch wrote:
> On Tue, Nov 05, 2013 at 05:02:58PM -0800, Josh Berkus wrote:
> > I'd also love some way of doing a no-rewrite conversion between
> > timestamp and timestamptz, based on the assumption that the original
> > values are UTC time.  That's one I encounter a lot.
>
> It was such a conversion that motivated me to add the no-rewrite ALTER TABLE
> ALTER TYPE support in the first place.  Interesting.  Support for it didn't
> end up in any submitted patch due to a formal problem: a protransform function
> shall only consult IMMUTABLE facts, but we posit that timezone==UTC is a
> STABLE observation.  However, a protransform function can easily simplify the
> immutable expression "tscol AT TIME ZONE 'UTC'", avoiding a rewrite.  See
> attached patch.

This (commit b8a18ad) ended up causing wrong EXPLAIN output and wrong
indxpath.c processing.  Hence, commit c22ecc6 neutralized the optimization;
see that commit's threads for details.  I pondered ways to solve those
problems, but I didn't come up with anything satisfying for EXPLAIN.  (One
dead-end thought was to introduce an ExprShortcut node having "Node
*semantics" and "Node *shortcut" fields, where "semantics" is deparsed for
EXPLAIN and "shortcut" is actually evaluated.  That would require teaching
piles of code about the new node type, which isn't appropriate for the benefit
in question.)

Stepping back a bit, commit b8a18ad didn't provide a great UI.  I doubt folks
write queries this way spontaneously; to do so, they would have needed to
learn that such syntax enables this optimization.  If I'm going to do
something more invasive, it should optimize the idiomatic "alter table t alter
timestamptzcol type timestamp".  One could do that with a facility like
SupportRequestSimplify except permitted to consider STABLE facts.  I suppose I
could add a volatility field to SupportRequestSimplify.  So far, I can't think
of a second use case for such a facility, so instead I think
ATColumnChangeRequiresRewrite() should have a hard-wired call for
F_TIMESTAMPTZ_TIMESTAMP and F_TIMESTAMP_TIMESTAMPTZ.  Patch attached.  If we
find more applications of this concept, it shouldn't be hard to migrate this
logic into SupportRequestSimplify.  Does anyone think that's better to do from
the start?

Looks good, would need docs.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: No-rewrite timestamp<->timestamptz conversions

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> Stepping back a bit, commit b8a18ad didn't provide a great UI.  I doubt folks
> write queries this way spontaneously; to do so, they would have needed to
> learn that such syntax enables this optimization.  If I'm going to do
> something more invasive, it should optimize the idiomatic "alter table t alter
> timestamptzcol type timestamp".  One could do that with a facility like
> SupportRequestSimplify except permitted to consider STABLE facts.  I suppose I
> could add a volatility field to SupportRequestSimplify.  So far, I can't think
> of a second use case for such a facility, so instead I think
> ATColumnChangeRequiresRewrite() should have a hard-wired call for
> F_TIMESTAMPTZ_TIMESTAMP and F_TIMESTAMP_TIMESTAMPTZ.  Patch attached.  If we
> find more applications of this concept, it shouldn't be hard to migrate this
> logic into SupportRequestSimplify.  Does anyone think that's better to do from
> the start?

It'd be nice to get the SupportRequestSimplify API correct from the first
release, so if there's even a slightly plausible reason for it to support
this, I'd be inclined to err in the direction of doing so.  On the other
hand, if we really can't think of another use-case then a hard-wired fix
might be the best way.

One thing that we'd have to nail down a bit harder, if we're to add
something to the SupportRequestSimplify API, is exactly what the semantics
of the weaker check should be.  The notion of "stable" has always been a
bit squishy, in that it's not totally clear what span of time stability
of the function result is being promised over.  In the case at hand, for
instance, is it really impossible for the timezone GUC to change during
the execution of the ALTER TABLE command?  You could probably break that
if you tried hard enough, though it seems unlikely that anyone would do so
accidentally.

I also kind of wonder whether this case arises often enough for us
to be expending so much effort optimizing it in the first place.
No doubt, where one lives colors one's opinion of how likely it is
that the timezone GUC is set to UTC ... but my guess is that that's
not true in very many installations.

            regards, tom lane


Re: No-rewrite timestamp<->timestamptz conversions

From
Noah Misch
Date:
On Tue, Feb 26, 2019 at 10:46:29AM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Stepping back a bit, commit b8a18ad didn't provide a great UI.  I doubt folks
> > write queries this way spontaneously; to do so, they would have needed to
> > learn that such syntax enables this optimization.  If I'm going to do
> > something more invasive, it should optimize the idiomatic "alter table t alter
> > timestamptzcol type timestamp".  One could do that with a facility like
> > SupportRequestSimplify except permitted to consider STABLE facts.  I suppose I
> > could add a volatility field to SupportRequestSimplify.  So far, I can't think
> > of a second use case for such a facility, so instead I think
> > ATColumnChangeRequiresRewrite() should have a hard-wired call for
> > F_TIMESTAMPTZ_TIMESTAMP and F_TIMESTAMP_TIMESTAMPTZ.  Patch attached.  If we
> > find more applications of this concept, it shouldn't be hard to migrate this
> > logic into SupportRequestSimplify.  Does anyone think that's better to do from
> > the start?
> 
> It'd be nice to get the SupportRequestSimplify API correct from the first
> release, so if there's even a slightly plausible reason for it to support
> this, I'd be inclined to err in the direction of doing so.  On the other
> hand, if we really can't think of another use-case then a hard-wired fix
> might be the best way.

Is it problematic to add a volatility field later?  Older support functions
will have needed to assume IMMUTABLE, and a simplification valid for IMMUTABLE
is valid for all volatility levels.  Still, yes, it would be nice to have from
the start if we will indeed need it.

> One thing that we'd have to nail down a bit harder, if we're to add
> something to the SupportRequestSimplify API, is exactly what the semantics
> of the weaker check should be.  The notion of "stable" has always been a
> bit squishy, in that it's not totally clear what span of time stability
> of the function result is being promised over.  In the case at hand, for
> instance, is it really impossible for the timezone GUC to change during
> the execution of the ALTER TABLE command?  You could probably break that
> if you tried hard enough, though it seems unlikely that anyone would do so
> accidentally.

No, one certainly can change a GUC during ALTER TABLE execution.  The STABLE
marking on current_setting is a fiction.  I interpret that fiction as a signal
that a query has undefined behavior if you change GUCs mid-query, which seems
like a prudent level of effort toward such queries.  Adding a volatility field
to SupportRequestSimplify does invite C-language extension code to participate
in deciding this undefined behavior, which would make it harder to verify that
we like the undefined behavior of the moment (e.g. doesn't crash).  Perhaps
best not to overturn that rock?


Re: No-rewrite timestamp<->timestamptz conversions

From
Tom Lane
Date:
Noah Misch <noah@leadboat.com> writes:
> On Tue, Feb 26, 2019 at 10:46:29AM -0500, Tom Lane wrote:
>> It'd be nice to get the SupportRequestSimplify API correct from the first
>> release, so if there's even a slightly plausible reason for it to support
>> this, I'd be inclined to err in the direction of doing so.

> Is it problematic to add a volatility field later?

Not hugely so, no.  I'm thinking more in terms of support functions
having to pay attention to which version they're being compiled for
to know what they can do.  But I suppose that's little worse than
any other feature addition we make at the C API level.

>> ... is it really impossible for the timezone GUC to change during
>> the execution of the ALTER TABLE command?  You could probably break that
>> if you tried hard enough, though it seems unlikely that anyone would do so
>> accidentally.

> No, one certainly can change a GUC during ALTER TABLE execution.  The STABLE
> marking on current_setting is a fiction.  I interpret that fiction as a signal
> that a query has undefined behavior if you change GUCs mid-query, which seems
> like a prudent level of effort toward such queries.  Adding a volatility field
> to SupportRequestSimplify does invite C-language extension code to participate
> in deciding this undefined behavior, which would make it harder to verify that
> we like the undefined behavior of the moment (e.g. doesn't crash).  Perhaps
> best not to overturn that rock?

Probably not, unless we can come up with more convincing use-cases for
it.

For the moment, I'm content with the approach in the patch you posted.

            regards, tom lane


Re: No-rewrite timestamp<->timestamptz conversions

From
Noah Misch
Date:
On Tue, Feb 26, 2019 at 02:29:01PM +0000, Simon Riggs wrote:
> Looks good, would need docs.

The ALTER TABLE page just says "old type is either binary coercible to the new
type or an unconstrained domain over the new type."  Avoiding rewrites by way
of a prosupport function or the at-timestamp-v2.patch approach is essentially
another way to achieve binary coercion.  So far, we haven't documented the
individual data types affected.  Since we don't mention e.g. varchar(x) ->
varchar(x+k) explicitly, I plan not to mention timestamp explicitly.