Re: pgsql: Doc: Explain about Column List feature. - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: pgsql: Doc: Explain about Column List feature.
Date
Msg-id 20220913121138.yn7ekkfysxzhkm2u@alvherre.pgsql
Whole thread Raw
Responses Re: pgsql: Doc: Explain about Column List feature.
List pgsql-hackers
On 2022-Sep-07, Amit Kapila wrote:

> Doc: Explain about Column List feature.
> 
> Add a new logical replication section for "Column Lists" (analogous to the
> Row Filters page). This explains how the feature can be used and the
> caveats in it.
> 
> Author: Peter Smith
> Reviewed-by: Shi yu, Vignesh C, Erik Rijkers, Amit Kapila
> Backpatch-through: 15, where it was introduced
> Discussion: https://postgr.es/m/CAHut+PvOuc9=_4TbASc5=VUqh16UWtFO3GzcKQK_5m1hrW3vqg@mail.gmail.com

Hi

I just read these docs and noticed that it mentions that column lists
can be used for security.  As far as I remember, this is wrong: it is
the subscriber that builds the query to read column data during initial
sync, and the publisher doesn't forbid to read columns not in it, so it
is entirely possible for a malicious subscriber to read columns other
than those published.  I'm pretty sure we discussed this at some point
during development of the feature.

So I suggest to mention this point explicitly in its own paragraph, to
avoid giving a false sense of security.

While going over this text I found some additional things that could
--in my opinion-- stand some improvement:

* It feels better to start the section saying that a list can be
  specified; subscriber must have all those columns; omitting list
  means to publish everything. That leads to shorter text (no need to
  say "you need to have them all, oh wait you might only have a few").

* there's no reason to explain the syntax in vague terms and refer the
  reader to the reference page.

* The first few <sect2> seem to give no useful structure, and instead
  cause the text to become disorganized. I propose to remove them, and
  instead mix the paragraphs in them so that we explain the rules and
  the behavior, and lastly the effect on specific commands.

The attached patch effects those changes.


One more thing. There's a sect2 about combining column list.  Part of it
seems pretty judgmental and I see no reason to have it in there; I
propose to remove it (it's not in this patch).  I think we should just
say it doesn't work at present, here's how to work around it, and
perhaps even say that we may lift the restriction in the future.  The
paragraph that starts with "Background:" is IMO out of place, and it
repeats the mistake that column lists are for security.


Lastly: In the create-publication reference page I think it would be
better to reword the Parameters section a bit.  It mentions
FOR TABLE as a parameter, but the parameter is actually
<replaceable>table_name</replaceable>; and the row-filter and
column-list explanations are also put in there when they should be in
their own <replaceable>expression</> and <replaceable>column_name</>
varlistentries.  I think splitting things that way would result in a
clearer explanation.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Attachment

pgsql-hackers by date:

Previous
From: "kuroda.hayato@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Next
From: Matthias van de Meent
Date:
Subject: Re: Tuples inserted and deleted by the same transaction