RE: Re: User defined data types in Logical Replication - Mailing list pgsql-hackers

From Huong Dangminh
Subject RE: Re: User defined data types in Logical Replication
Date
Msg-id 75DB81BEEA95B445AE6D576A0A5C9E936A6C4B0A@BPXM05GP.gisp.nec.co.jp
Whole thread Raw
Responses RE: Re: User defined data types in Logical Replication  (Huong Dangminh <huo-dangminh@ys.jp.nec.com>)
List pgsql-hackers
Sawada-san,

Thanks for your response.
# And sorry again because I could not reply to your gmail 
# address from my environment due to security restriction.

> >> We are getting the bellow error while trying use Logical Replication
> >> with user defined data types in a C program (when call elog function).
> >>
> >>  ERROR:  XX000: cache lookup failed for type XXXXX
> >>
> >
> > Sorry for continuously disturbing in this topic, but am I missing something
> here?
> 
> No, but I'd suggest to provide a procedure for reproducing if possible,
> which will be helpful for investigation.

Sorry, I will be careful next time.

> > I mean that in case of type's OID in PUBLICATION host does not exists
> > in SUBSCRIPTION host's pg_type, it could returns unintended error (the
> XX000 above) when elog or ereport is executed.
> >
> > For more details, it happen in slot_store_error_callback when it try to
> call format_type_be(localtypoid) for errcontext.
> > slot_store_error_callback is set in slot_store_cstrings,
> slot_modify_cstrings function and it also be unset here, so the effect here
> is small but it happens.
> >
> 
> I think I found out the cause of this issue, and this is a bug. This can
> be reproduced, for example, if the input function of the data type calls
> elog() during applying on the environment where OIDs of the data type on
> publisher and subscriber are different. The cause of this issue is that
> we call format_type_be() with remotetypoid. If the OIDs of data type on
> publisher and subscriber are different we search it from syscache by the
> OID that doesn't exist on subscriber.

Yes, I also think that.

> On detail of your patch, I don't think this direction is good. Since the
> subscriber already has a LogicalRepTyp cache entry for the type we can report
> the error message using the data type name. So I think this issue can be
> fixed by using the remote type name got from the cache.

Thanks, 
I did not realize the LogicalRepRelMapEntry, remote type name is already here.

> Also I'm confused about the message of errcontext; currently we store the
> local data type OID corresponding to the remote data type name into the
> cache, and then we search the local data type name by the local data type
> OID stored in the cache. So  it means the both the local data type OID and
> the remote data type OID always imply the same data type. We use the both
> data type OIDs for log message in slot_store_error_callback, but I think
> what the function want to do is to show the different type names if the
> table definitions on both server are different (e.g. sending jsonb column
> data to text column data). I think we should use the type of the local relation
> attribute rather than remote's one.
> 
> Attached draft patch fixed this issue, at least on my environment.

It works good for me.

> Please review it.

I will review it soon.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd. 
http://www.nec-solutioninnovators.co.jp/en/


pgsql-hackers by date:

Previous
From: 高增琦
Date:
Subject: Re: no library dependency in Makefile?
Next
From: Michael Paquier
Date:
Subject: Re: pgsql: Disable installcheck tests for test_session_hooks