Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Pgoutput not capturing the generated columns |
Date | |
Msg-id | CAHut+PuYxn2HacfF-t9S1UkEN7Nq_4w=FXGPiupvNj0Dp917yA@mail.gmail.com Whole thread Raw |
In response to | Pgoutput not capturing the generated columns (Rajendra Kumar Dangwal <dangwalrajendra888@gmail.com>) |
List | pgsql-hackers |
On Fri, Sep 13, 2024 at 9:34 PM Shubham Khanna <khannashubham1197@gmail.com> wrote: > > On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna > > <khannashubham1197@gmail.com> wrote: > > > > > > On Thu, Aug 29, 2024 at 11:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > As Euler mentioned earlier, I think it's a decision not to replicate > > > > > > > generated columns because we don't know the target table on the > > > > > > > subscriber has the same expression and there could be locale issues > > > > > > > even if it looks the same. I can see that a benefit of this proposal > > > > > > > would be to save cost to compute generated column values if the user > > > > > > > wants the target table on the subscriber to have exactly the same data > > > > > > > as the publisher's one. Are there other benefits or use cases? > > > > > > > > > > > > > > > > > > > The cost is one but the other is the user may not want the data to be > > > > > > different based on volatile functions like timeofday() > > > > > > > > > > Shouldn't the generation expression be immutable? > > > > > > > > > > > > > Yes, I missed that point. > > > > > > > > > > or the table on > > > > > > subscriber won't have the column marked as generated. > > > > > > > > > > Yeah, it would be another use case. > > > > > > > > > > > > > Right, apart from that I am not aware of other use cases. If they > > > > have, I would request Euler or Rajendra to share any other use case. > > > > > > > > > > Now, considering > > > > > > such use cases, is providing a subscription-level option a good idea > > > > > > as the patch is doing? I understand that this can serve the purpose > > > > > > but it could also lead to having the same behavior for all the tables > > > > > > in all the publications for a subscription which may or may not be > > > > > > what the user expects. This could lead to some performance overhead > > > > > > (due to always sending generated columns for all the tables) for cases > > > > > > where the user needs it only for a subset of tables. > > > > > > > > > > Yeah, it's a downside and I think it's less flexible. For example, if > > > > > users want to send both tables with generated columns and tables > > > > > without generated columns, they would have to create at least two > > > > > subscriptions. > > > > > > > > > > > > > Agreed and that would consume more resources. > > > > > > > > > Also, they would have to include a different set of > > > > > tables to two publications. > > > > > > > > > > > > > > > > > I think we should consider it as a table-level option while defining > > > > > > publication in some way. A few ideas could be: (a) We ask users to > > > > > > explicitly mention the generated column in the columns list while > > > > > > defining publication. This has a drawback such that users need to > > > > > > specify the column list even when all columns need to be replicated. > > > > > > (b) We can have some new syntax to indicate the same like: CREATE > > > > > > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4 > > > > > > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there > > > > > > could be some challenges but we can at least investigate it. > > > > > > > > > > I think we can create a publication for a single table, so what we can > > > > > do with this feature can be done also by the idea you described below. > > > > > > > > > > > Yet another idea is to keep this as a publication option > > > > > > (include_generated_columns or publish_generated_columns) similar to > > > > > > "publish_via_partition_root". Normally, "publish_via_partition_root" > > > > > > is used when tables on either side have different partitions > > > > > > hierarchies which is somewhat the case here. > > > > > > > > > > It sounds more useful to me. > > > > > > > > > > > > > Fair enough. Let's see if anyone else has any preference among the > > > > proposed methods or can think of a better way. > > > > > > I have fixed the current issue. I have added the option > > > 'publish_generated_columns' to the publisher side and created the new > > > test cases accordingly. > > > The attached patches contain the desired changes. > > > > > > > Thank you for updating the patches. I have some comments: > > > > Do we really need to add this option to test_decoding? I think it > > would be good if this improves the test coverage. Otherwise, I'm not > > sure we need this part. If we want to add it, I think it would be > > better to have it in a separate patch. > > > > I have removed the option from the test_decoding file. > > > --- > > + <para> > > + If the publisher-side column is also a generated column > > then this option > > + has no effect; the publisher column will be filled as normal with the > > + publisher-side computed or default data. > > + </para> > > > > I don't understand this description. Why does this option have no > > effect if the publisher-side column is a generated column? > > > > The documentation was incorrect. Currently, replicating from a > publisher table with a generated column to a subscriber table with a > generated column will result in an error. This has now been updated. > > > --- > > + <para> > > + This parameter can only be set <literal>true</literal> if > > <literal>copy_data</literal> is > > + set to <literal>false</literal>. > > + </para> > > > > If I understand this patch correctly, it doesn't disallow to set > > copy_data to true when the publish_generated_columns option is > > specified. But do we want to disallow it? I think it would be more > > useful and understandable if we allow to use both > > publish_generated_columns (publisher option) and copy_data (subscriber > > option) at the same time. > > > > Support for tablesync with generated columns was not included in the > initial patch, and this was reflected in the documentation. The > functionality for syncing generated column data has been introduced > with the 0002 patch. > Since nothing was said otherwise, I assumed my v30-0001 comments were addressed in v31, but the new code seems to have quite a few of my suggested changes missing. If you haven't addressed my review comments for patch 0001 yet, please say so. OTOH, please give reasons for any rejected comments. > The attached v31 patches contain the changes for the same. I won't be > posting the test patch for now. I will share it once this patch has > been stabilized. How can the patch become "stabilized" without associated tests to verify the behaviour is not broken? e.g. I can write a stable function that says 2+2=5. ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: