Thread: Re: pgsql: Doc: Explain about Column List feature.
On 2022-Sep-07, Amit Kapila wrote: > Doc: Explain about Column List feature. > > Add a new logical replication section for "Column Lists" (analogous to the > Row Filters page). This explains how the feature can be used and the > caveats in it. > > Author: Peter Smith > Reviewed-by: Shi yu, Vignesh C, Erik Rijkers, Amit Kapila > Backpatch-through: 15, where it was introduced > Discussion: https://postgr.es/m/CAHut+PvOuc9=_4TbASc5=VUqh16UWtFO3GzcKQK_5m1hrW3vqg@mail.gmail.com Hi I just read these docs and noticed that it mentions that column lists can be used for security. As far as I remember, this is wrong: it is the subscriber that builds the query to read column data during initial sync, and the publisher doesn't forbid to read columns not in it, so it is entirely possible for a malicious subscriber to read columns other than those published. I'm pretty sure we discussed this at some point during development of the feature. So I suggest to mention this point explicitly in its own paragraph, to avoid giving a false sense of security. While going over this text I found some additional things that could --in my opinion-- stand some improvement: * It feels better to start the section saying that a list can be specified; subscriber must have all those columns; omitting list means to publish everything. That leads to shorter text (no need to say "you need to have them all, oh wait you might only have a few"). * there's no reason to explain the syntax in vague terms and refer the reader to the reference page. * The first few <sect2> seem to give no useful structure, and instead cause the text to become disorganized. I propose to remove them, and instead mix the paragraphs in them so that we explain the rules and the behavior, and lastly the effect on specific commands. The attached patch effects those changes. One more thing. There's a sect2 about combining column list. Part of it seems pretty judgmental and I see no reason to have it in there; I propose to remove it (it's not in this patch). I think we should just say it doesn't work at present, here's how to work around it, and perhaps even say that we may lift the restriction in the future. The paragraph that starts with "Background:" is IMO out of place, and it repeats the mistake that column lists are for security. Lastly: In the create-publication reference page I think it would be better to reword the Parameters section a bit. It mentions FOR TABLE as a parameter, but the parameter is actually <replaceable>table_name</replaceable>; and the row-filter and column-list explanations are also put in there when they should be in their own <replaceable>expression</> and <replaceable>column_name</> varlistentries. I think splitting things that way would result in a clearer explanation. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Attachment
On Tue, Sep 13, 2022 at 10:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Sep-07, Amit Kapila wrote: > > > Doc: Explain about Column List feature. > > > > Add a new logical replication section for "Column Lists" (analogous to the > > Row Filters page). This explains how the feature can be used and the > > caveats in it. > > > > Author: Peter Smith > > Reviewed-by: Shi yu, Vignesh C, Erik Rijkers, Amit Kapila > > Backpatch-through: 15, where it was introduced > > Discussion: https://postgr.es/m/CAHut+PvOuc9=_4TbASc5=VUqh16UWtFO3GzcKQK_5m1hrW3vqg@mail.gmail.com > > Hi > > I just read these docs and noticed that it mentions that column lists > can be used for security. As far as I remember, this is wrong: it is > the subscriber that builds the query to read column data during initial > sync, and the publisher doesn't forbid to read columns not in it, so it > is entirely possible for a malicious subscriber to read columns other > than those published. I'm pretty sure we discussed this at some point > during development of the feature. > > So I suggest to mention this point explicitly in its own paragraph, to > avoid giving a false sense of security. > Thanks for the feedback. The mention of 'security' in the page (as previously written) just means to say that publications can prevent sensitive columns from being replicated/subscribed by default. It was not intended to imply those columns are immune from a malicious attack. Indeed, just having another publication without any column lists could expose the same sensitive columns. I am fine with your rewording of the security part. > While going over this text I found some additional things that could > --in my opinion-- stand some improvement: In general (because they have a lot of similarities), the wording and the section structure of the "Column Lists" page tried to be consistent with the "Row Filters" page. Perhaps this made everything unnecessarily complex. Anyway, your suggested re-wording and removal of those sections look OK to me - the content is the same AFAICT. > > * It feels better to start the section saying that a list can be > specified; subscriber must have all those columns; omitting list > means to publish everything. That leads to shorter text (no need to > say "you need to have them all, oh wait you might only have a few"). > > * there's no reason to explain the syntax in vague terms and refer the > reader to the reference page. > > * The first few <sect2> seem to give no useful structure, and instead > cause the text to become disorganized. I propose to remove them, and > instead mix the paragraphs in them so that we explain the rules and > the behavior, and lastly the effect on specific commands. > > The attached patch effects those changes. > For some reason I was unable to apply your supplied patch to master: [postgres@CentOS7-x64 oss_postgres_misc]$ git apply column-list-wording.patch column-list-wording.patch:16: trailing whitespace. Each publication can optionally specify which columns of each table are column-list-wording.patch:17: trailing whitespace. replicated to subscribers. The table on the subscriber side must have at column-list-wording.patch:18: trailing whitespace. least all the columns that are published. If no column list is specified, column-list-wording.patch:19: trailing whitespace. then all columns in the publisher are replicated. column-list-wording.patch:20: trailing whitespace. See <xref linkend="sql-createpublication"/> for details on the syntax. error: patch failed: doc/src/sgml/logical-replication.sgml:1093 error: doc/src/sgml/logical-replication.sgml: patch does not apply error: patch failed: doc/src/sgml/ref/create_publication.sgml:94 error: doc/src/sgml/ref/create_publication.sgml: patch does not apply > > One more thing. There's a sect2 about combining column list. Part of it > seems pretty judgmental and I see no reason to have it in there; I > propose to remove it (it's not in this patch). I think we should just > say it doesn't work at present, here's how to work around it, and > perhaps even say that we may lift the restriction in the future. The > paragraph that starts with "Background:" is IMO out of place, and it > repeats the mistake that column lists are for security. > It wasn't clear which part you felt was judgemental. I have removed the "Background" paragraph but I have otherwise left that section and Warning as-is because it still seemed useful for the user to know. You can change/remove it if you disagree. > > Lastly: In the create-publication reference page I think it would be > better to reword the Parameters section a bit. It mentions > FOR TABLE as a parameter, but the parameter is actually > <replaceable>table_name</replaceable>; and the row-filter and > column-list explanations are also put in there when they should be in > their own <replaceable>expression</> and <replaceable>column_name</> > varlistentries. I think splitting things that way would result in a > clearer explanation. > IMO this should be proposed as a separate patch. Some of those things (e.g. FOR TABLE as a parameter) seem to have been written that way since PG10. ~~~ PSA a new patch for the "Column Lists" page. AFAIK this is the same as everything that you suggested (except for the Warning section which was kept as mentioned above). ------ Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On 2022-Sep-14, Peter Smith wrote: > PSA a new patch for the "Column Lists" page. AFAIK this is the same as > everything that you suggested I don't get it. You send me my patch back, and claim it is a new patch? I kindly request that when you review a patch, you do not hijack the submitter's patch and claim it as your own. If a submitter goes mising or states that they're unavailable to complete some work, then that's okay, but otherwise it seems a bit offensive to me. I have seen that repeatedly of late, and I find it quite rude. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)
On Wed, Sep 14, 2022 at 7:40 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Sep-14, Peter Smith wrote: > > > PSA a new patch for the "Column Lists" page. AFAIK this is the same as > > everything that you suggested > > I don't get it. You send me my patch back, and claim it is a new patch? > > I kindly request that when you review a patch, you do not hijack the > submitter's patch and claim it as your own. If a submitter goes mising > or states that they're unavailable to complete some work, then that's > okay, but otherwise it seems a bit offensive to me. I have seen that > repeatedly of late, and I find it quite rude. Hi Alvaro, I'm sorry for any misunderstandings. I attached the replacement patch primarily because the original did not apply for me, so I had to re-make it at my end anyway so I could see the result. I thought posting it might save others from having to do the same. Certainly I am not trying to hijack or claim ownership. ------ Kind Regards, Peter Smith. Fujitsu Australia
On 2022-Sep-14, Peter Smith wrote: > On Tue, Sep 13, 2022 at 10:11 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Sep-07, Amit Kapila wrote: > > One more thing. There's a sect2 about combining column list. Part of it > > seems pretty judgmental and I see no reason to have it in there; I > > propose to remove it (it's not in this patch). I think we should just > > say it doesn't work at present, here's how to work around it, and > > perhaps even say that we may lift the restriction in the future. The > > paragraph that starts with "Background:" is IMO out of place, and it > > repeats the mistake that column lists are for security. > > It wasn't clear which part you felt was judgemental. I have removed > the "Background" paragraph but I have otherwise left that section and > Warning as-is because it still seemed useful for the user to know. You > can change/remove it if you disagree. I meant the Background part that you remove, yeah. Looking at the rendered docs again, I notice that section "31.4.5. Combining Multiple Column Lists" is *only* the red-tinted Warning block. That seems quite odd. I am tempted to remove the sect2 heading for that one too. I am now wondering how to split things between the normative bits "It is not supported to have a subscription comprising several publications where the same table has been published with different column lists." and the informative bits "This means changing the column lists of the tables being subscribed could cause inconsistency of column lists among publications, in which case the ALTER PUBLICATION will be successful but later the walsender on the publisher, or the subscriber may throw an error. In this scenario, the user needs to recreate the subscription after adjusting the column list or drop the problematic publication using ALTER SUBSCRIPTION ... DROP PUBLICATION and then add it back after adjusting the column list." > > Lastly: In the create-publication reference page I think it would be > > better to reword the Parameters section a bit. It mentions > > FOR TABLE as a parameter, but the parameter is actually > > <replaceable>table_name</replaceable>; and the row-filter and > > column-list explanations are also put in there when they should be in > > their own <replaceable>expression</> and <replaceable>column_name</> > > varlistentries. I think splitting things that way would result in a > > clearer explanation. > > IMO this should be proposed as a separate patch. Some of those things > (e.g. FOR TABLE as a parameter) seem to have been written that way > since PG10. Agreed. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)
On 2022-Sep-15, Alvaro Herrera wrote: > Looking at the rendered docs again, I notice that section "31.4.5. > Combining Multiple Column Lists" is *only* the red-tinted Warning block. > That seems quite odd. I am tempted to remove the sect2 heading for that > one too. Pushed. I didn't modify this part; I spent too much time looking at it trying to figure out how to do it. I think this bit really belongs in the CREATE/ALTER docs rather than this chapter. But in order to support having a separate <para> for the restriction on combination, we need a separate <varlistentry> for the column_name parameter. So I'm going to edit that one and I'll throw this change in. Thanks, Peter, for the discussion. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)
On 2022-Sep-15, Alvaro Herrera wrote: > On 2022-Sep-15, Alvaro Herrera wrote: > > > Looking at the rendered docs again, I notice that section "31.4.5. > > Combining Multiple Column Lists" is *only* the red-tinted Warning block. > > That seems quite odd. I am tempted to remove the sect2 heading for that > > one too. > > Pushed. I didn't modify this part; I spent too much time looking at it > trying to figure out how to do it. I think this bit really belongs in > the CREATE/ALTER docs rather than this chapter. But in order to support > having a separate <para> for the restriction on combination, we need a > separate <varlistentry> for the column_name parameter. So I'm going to > edit that one and I'll throw this change in. I figured out how to fix this one -- just remove the <sect2> tags, and add a <title> tag to the <warning> box. The attached yields the explanatory text in a separate box that doesn't have the silly otherwise-empty section title. We add the 'id' that was in the sect2 to the warning; with this, at the referencing end the full title is rendered, which looks quite reasonable. I have attached screenshots of both sides of this. Compare the existing https://www.postgresql.org/docs/15/logical-replication-col-lists.html#LOGICAL-REPLICATION-COL-LIST-COMBINING Unless there are objections, I'll get this pushed to 15 and master. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Attachment
On Mon, Dec 19, 2022 at 10:17 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Sep-15, Alvaro Herrera wrote: > > > On 2022-Sep-15, Alvaro Herrera wrote: > > > > > Looking at the rendered docs again, I notice that section "31.4.5. > > > Combining Multiple Column Lists" is *only* the red-tinted Warning block. > > > That seems quite odd. I am tempted to remove the sect2 heading for that > > > one too. > > > > Pushed. I didn't modify this part; I spent too much time looking at it > > trying to figure out how to do it. I think this bit really belongs in > > the CREATE/ALTER docs rather than this chapter. But in order to support > > having a separate <para> for the restriction on combination, we need a > > separate <varlistentry> for the column_name parameter. So I'm going to > > edit that one and I'll throw this change in. > > I figured out how to fix this one -- just remove the <sect2> tags, and > add a <title> tag to the <warning> box. The attached yields the > explanatory text in a separate box that doesn't have the silly > otherwise-empty section title. We add the 'id' that was in the sect2 to > the warning; with this, at the referencing end the full title is > rendered, which looks quite reasonable. I have attached screenshots of > both sides of this. > > Compare the existing > https://www.postgresql.org/docs/15/logical-replication-col-lists.html#LOGICAL-REPLICATION-COL-LIST-COMBINING > - <sect2 id="logical-replication-col-list-combining"> - <title>Combining Multiple Column Lists</title> - - <warning> + <warning id="logical-replication-col-list-combining"> + <title>Combining Column Lists from Multiple Subscriptions</title> Shouldn't the title be "Combining Column Lists from Multiple Publications"? We can define column lists while defining publications so the proposed title doesn't seem to be conveying the right thing. -- With Regards, Amit Kapila.
On Tue, Dec 20, 2022 at 3:47 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Sep-15, Alvaro Herrera wrote: > > > On 2022-Sep-15, Alvaro Herrera wrote: > > > > > Looking at the rendered docs again, I notice that section "31.4.5. > > > Combining Multiple Column Lists" is *only* the red-tinted Warning block. > > > That seems quite odd. I am tempted to remove the sect2 heading for that > > > one too. > > > > Pushed. I didn't modify this part; I spent too much time looking at it > > trying to figure out how to do it. I think this bit really belongs in > > the CREATE/ALTER docs rather than this chapter. But in order to support > > having a separate <para> for the restriction on combination, we need a > > separate <varlistentry> for the column_name parameter. So I'm going to > > edit that one and I'll throw this change in. > > I figured out how to fix this one -- just remove the <sect2> tags, and > add a <title> tag to the <warning> box. The attached yields the > explanatory text in a separate box that doesn't have the silly > otherwise-empty section title. We add the 'id' that was in the sect2 to > the warning; with this, at the referencing end the full title is > rendered, which looks quite reasonable. I have attached screenshots of > both sides of this. > > Compare the existing > https://www.postgresql.org/docs/15/logical-replication-col-lists.html#LOGICAL-REPLICATION-COL-LIST-COMBINING > > Unless there are objections, I'll get this pushed to 15 and master. > Not quite an objection, but... If you change this warning title then it becomes the odd one out - every other warning in all the pg docs just says "Warning". IMO maintaining consistency throughout is best. e.g. I can imagine maybe someone searching for "Warning" in the docs, and now they are not going to find this one. Maybe a safer way to fix this "silly otherwise-empty section title" would be to just add some explanatory text so it is no longer empty. ------ Kind Regards, Peter Smith. Fujitsu Australia
On 2022-Dec-20, Peter Smith wrote: > If you change this warning title then it becomes the odd one out - > every other warning in all the pg docs just says "Warning". IMO > maintaining consistency throughout is best. e.g. I can imagine maybe > someone searching for "Warning" in the docs, and now they are not > going to find this one. Hmm, how do you propose that people search for warnings, and fail to notice one that is not titled "Warning"? In my mind, it is much more likely that they scan a page visually until they hit a red box ("eh look, a red box that I can ignore because its title is not Warning" does not sound very credible). On the other hand, if you're going over the source .sgml files, you're going to grep for the warning tag, and that's going to be there. (Maybe you'd say somebody would grep for "<warning>" and not find this one because the > is not there anymore. I grant you that that could happen. But then they're doing it wrong already. I don't think we need to cater to that.) Now, I did notice that we don't have any other titled warning boxes, because I had a quick look at all the other warnings we have. I was surprised to find out that we have very few, which I think is good, because warnings are annoying. I was also surprised that most of them are right not to have a title, because they are very quick one-para boxes. But I did find two others that should probably have a title: https://www.postgresql.org/docs/15/app-pgrewind.html Maybe "Failures while rewinding" https://www.postgresql.org/docs/15/applevel-consistency.html Maybe "Serializable Transactions and Data Replication" (and patch it to mention logical replication) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
On 2022-Dec-20, Amit Kapila wrote: > + <warning id="logical-replication-col-list-combining"> > + <title>Combining Column Lists from Multiple Subscriptions</title> > > Shouldn't the title be "Combining Column Lists from Multiple > Publications"? We can define column lists while defining publications > so the proposed title doesn't seem to be conveying the right thing. Doh, of course. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'. After collecting 500 such letters, he mused, a university somewhere in Arizona would probably grant him a degree. (Don Knuth)
On Tue, Dec 20, 2022 at 7:21 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Dec-20, Peter Smith wrote: > > > If you change this warning title then it becomes the odd one out - > > every other warning in all the pg docs just says "Warning". IMO > > maintaining consistency throughout is best. e.g. I can imagine maybe > > someone searching for "Warning" in the docs, and now they are not > > going to find this one. > > Hmm, how do you propose that people search for warnings, and fail to > notice one that is not titled "Warning"? In my mind, it is much more > likely that they scan a page visually until they hit a red box ("eh > look, a red box that I can ignore because its title is not Warning" does > not sound very credible). On the other hand, if you're going over the > source .sgml files, you're going to grep for the warning tag, and that's > going to be there. > > (Maybe you'd say somebody would grep for "<warning>" and not find this > one because the > is not there anymore. I grant you that that could > happen. But then they're doing it wrong already. I don't think we need > to cater to that.) > By "searching" I also meant just scanning visually, although I was thinking more about scanning the PDF. Right now, the intention of any text box is obvious at a glance because of those titles like "Caution", "Tip", "Note", "Warning". Sure, the HTML rendering also uses colours to convey the purpose, but in the PDF version [1] everything is black-and-white so apart from the title all boxes look the same. That's why I felt using non-standard box titles might be throwing away some of the meaning - e.g. the reader of the PDF won't know anymore at a glance are they looking at a warning or a tip. > > Now, I did notice that we don't have any other titled warning boxes, > because I had a quick look at all the other warnings we have. I was > surprised to find out that we have very few, which I think is good, > because warnings are annoying. I was also surprised that most of them > are right not to have a title, because they are very quick one-para > boxes. But I did find two others that should probably have a title: > > https://www.postgresql.org/docs/15/app-pgrewind.html > Maybe "Failures while rewinding" > > https://www.postgresql.org/docs/15/applevel-consistency.html > Maybe "Serializable Transactions and Data Replication" > (and patch it to mention logical replication) > ------ [1] PDF docs - https://www.postgresql.org/files/documentation/pdf/15/postgresql-15-A4.pdf Kind Regards, Peter Smith. Fujitsu Australia
On 2022-Dec-21, Peter Smith wrote: > By "searching" I also meant just scanning visually, although I was > thinking more about scanning the PDF. > > Right now, the intention of any text box is obvious at a glance > because of those titles like "Caution", "Tip", "Note", "Warning". > Sure, the HTML rendering also uses colours to convey the purpose, but > in the PDF version [1] everything is black-and-white so apart from the > title all boxes look the same. That's why I felt using non-standard > box titles might be throwing away some of the meaning - e.g. the > reader of the PDF won't know anymore at a glance are they looking at a > warning or a tip. Oh, I see. It's been so long that I haven't looked at the PDFs, that I failed to realize that they don't use color. I agree that would be a problem. Maybe we can change the title to have the word: <title>Warning: Combining Column Lists from Multiple Publications</title> -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Wed, Dec 21, 2022 at 6:59 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Dec-21, Peter Smith wrote: > > > By "searching" I also meant just scanning visually, although I was > > thinking more about scanning the PDF. > > > > Right now, the intention of any text box is obvious at a glance > > because of those titles like "Caution", "Tip", "Note", "Warning". > > Sure, the HTML rendering also uses colours to convey the purpose, but > > in the PDF version [1] everything is black-and-white so apart from the > > title all boxes look the same. That's why I felt using non-standard > > box titles might be throwing away some of the meaning - e.g. the > > reader of the PDF won't know anymore at a glance are they looking at a > > warning or a tip. > > Oh, I see. It's been so long that I haven't looked at the PDFs, that I > failed to realize that they don't use color. I agree that would be a > problem. Maybe we can change the title to have the word: > > <title>Warning: Combining Column Lists from Multiple Publications</title> > That last idea LGTM. But no patch at all LGTM also. ------ Kind Regards, Peter Smith. Fujitsu Australia.
On 2022-Dec-21, Peter Smith wrote: > On Wed, Dec 21, 2022 at 6:59 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > Oh, I see. It's been so long that I haven't looked at the PDFs, that I > > failed to realize that they don't use color. I agree that would be a > > problem. Maybe we can change the title to have the word: > > > > <title>Warning: Combining Column Lists from Multiple Publications</title> > > That last idea LGTM. But no patch at all LGTM also. ... I hear you, but honestly that warning box with a section title looks completely wrong to me, so I've pushed it. Many thanks for looking. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "But static content is just dynamic content that isn't moving!" http://smylers.hates-software.com/2007/08/15/fe244d0c.html