Thread: A modest proposal: make parser/rewriter/planner inputs read-only

The parser, rewriter, and planner all have a bad habit of scribbling
on their input parsetrees.  At this point I've lost count of how many
bugs that's caused (but e33f2335a and 0f43083d1 are the two latest
examples), and of how many places there are that are brute-forcing a
solution to the problem by copying a whole parsetree before letting
one of these subsystems see it.

Years ago we fixed the executor to treat its input Plan trees as
read-only.  It seems like we really ought to do the same for these
upstream subsystems.  Surely, whatever benefit we get from changing
Node contents in-place is lost due to all these other hacks.

While I've had this thought in the back of my head for a long while,
I haven't pursued it because I couldn't think of a practical way
to find all the places that are violating such a policy, much less
prevent future violations.  However, today I thought of this sketch 
of a solution:

1. Invent a way to make a particular memory context read-only
after putting some data into it.

2. In debug builds, after we've built a tree that should be considered
read-only, copy it into such a context and make it read-only.  Or
perhaps build it there in the first place.

3. Fix the resulting crashes.

4. Profit!  (In particular, nuke a lot of no-longer-needed copyObject
calls.)

My first thought about implementing #1 was to seek Valgrind's help,
but so far as I can find out there's no VALGRIND_MAKE_MEM_READ_ONLY.
Step #3 would be pretty tedious anyway if it required running under
Valgrind.  However, all modern hardware has the ability to mark
memory read-only at the page level, and most platforms expose that
in some way or other.  So it doesn't seem unreasonable to invent
a memory context option (or whole new context type, if that seems
easier) that is careful to align its allocation blocks on page
boundaries and then can set or clear the hardware R/O flag on
demand.  It'd be enough if the R/O enforcement worked on popular
development platforms, we don't have to make it work absolutely
everywhere.

I'm not planning to pursue this idea Right Now, but it seems like
something that could happen for v19 or so.  In the meantime I wanted
to get the ideas down on electrons.

Thoughts?

            regards, tom lane



Re: A modest proposal: make parser/rewriter/planner inputs read-only

From
Andres Freund
Date:
Hi,

On 2025-04-05 12:46:37 -0400, Tom Lane wrote:
> 1. Invent a way to make a particular memory context read-only
> after putting some data into it.
> 
> 2. In debug builds, after we've built a tree that should be considered
> read-only, copy it into such a context and make it read-only.  Or
> perhaps build it there in the first place.

> 3. Fix the resulting crashes.
> 
> 4. Profit!  (In particular, nuke a lot of no-longer-needed copyObject
> calls.)
> 
> My first thought about implementing #1 was to seek Valgrind's help,
> but so far as I can find out there's no VALGRIND_MAKE_MEM_READ_ONLY.
> Step #3 would be pretty tedious anyway if it required running under
> Valgrind.  However, all modern hardware has the ability to mark
> memory read-only at the page level, and most platforms expose that
> in some way or other.  So it doesn't seem unreasonable to invent
> a memory context option (or whole new context type, if that seems
> easier) that is careful to align its allocation blocks on page
> boundaries and then can set or clear the hardware R/O flag on
> demand.  It'd be enough if the R/O enforcement worked on popular
> development platforms, we don't have to make it work absolutely
> everywhere.

FWIW, while hacking on patch to making hint bit writes not happening while IO
is going on (so we don't need to copy the page anymore and don't cause
filesystem level issues with DIO), I hacked up protection for shared buffers
using mprotect() - it worked way better than I thought it would. The overhead
ended up surprisingly low:

base:
real    1m4.613s
user    4m31.409s
sys     3m20.445s

ENFORCE_BUFFER_PROT

real    1m11.912s
user    4m27.332s
sys     3m28.063s


See https://postgr.es/m/043c8b50-d183-46e5-b054-145cc0f6f908%40iki.fi


I'm mostly sharing that to say that
a) yes, mprotect() is viable and works surprisingly well
b) it might be worth inventing some common platform abstraction for mprotect

That prototype patch already worked on most platforms, windows should be
entirely doable.


Greetings,

Andres Freund