Michael Paquier <michael.paquier@gmail.com> writes:
> On Wed, Nov 2, 2016 at 12:09 AM, <david.turon@linuxbox.cz> wrote:
>> we tried new feature RLS - tested on postgres 9.5.3 / CentOS6. When we turn
>> on ENABLE RLS + FORCE RLS on normal workload cause huge produce checkpoints
>> (about 30x or more), our disk partition for xlog was full and log shipping
>> to replica maybe delayed removing old checkpoints. Have anybody same
>> experiences after turn on RLS? Looks like more buffers set as dirty. Yes,
>> we can provide more space for xlog, but it will take much more space for
>> xlog backups. We do not know if it's worth it. We had log_checkpoints ON and
>> I send log as attachment (RLS Turn ON at 13:26).
> Interesting, I don't recall RLS generating a burst in activity.
I have an idea about a partial explanation for that. RLS restrictions on
UPDATE, DELETE, or SELECT FOR UPDATE target tables cause extra LockRows
plan nodes to be generated, and in turn that would cause extra WAL
activity to track the row-lock-taking. But it's hard to see how you get
to a 30X bloat factor that way. The XLOG_HEAP_LOCK records would be no
larger than the subsequent updates, certainly, so if RLS isn't really
doing anything then this should be strictly less than a 2X penalty.
The row lock occurs before the RLS filter, so if you were using RLS in
such a way that it rejected a very large fraction of rows that updates or
deletes were attempted on, maybe you could get to 30X from this. But that
would be a weird way to use RLS, IMO. (Hm ... actually, if this were
happening in otherwise-seldom-changed pages, maybe you get to that from
the full page images in the XLOG_HEAP_LOCK records? Normally if you
needed an FPI you were going to pay it anyway, either here or at the
update proper; but for a locked and then RLS-rejected row it would be
pure overhead. It's hard to credit needing so many FPIs this way,
though.)
I'm of the opinion that the extra row locks are actually semantically
wrong, or at least pretty darn dubious: locking rows that you don't have
privilege to see is not a nice behavior, and neither is the fact that the
RLS filter functions see different data than they would in read-only cases
(due to LockRows fetching the most up-to-date version).
The RLS planner rewrite that I posted earlier gets rid of the extra
LockRows nodes. It's not something we'd think of back-patching, though.
regards, tom lane