Re: Coccinelle for PostgreSQL development [4/N]: correcting palloc() use - Mailing list pgsql-hackers

From Mats Kindahl
Subject Re: Coccinelle for PostgreSQL development [4/N]: correcting palloc() use
Date
Msg-id CA+14424CWusjBxX5Hc54DTc1XAk1DBL4JGhHeqhp0XMYo-O=-Q@mail.gmail.com
Whole thread Raw
In response to Re: Coccinelle for PostgreSQL development [4/N]: correcting palloc() use  (Peter Eisentraut <peter@eisentraut.org>)
List pgsql-hackers
On Wed, Jan 8, 2025 at 10:02 PM Peter Eisentraut <peter@eisentraut.org> wrote:
On 07.01.25 20:49, Mats Kindahl wrote:
> This is the first example semantic patch and shows how to capture and
> fix a common problem.
>
> If you use an palloc() to allocate memory for an object (or an array of
> objects) and by mistake type something like:
>
>      StringInfoData *info = palloc(sizeof(StringInfoData*));
>
> You will not allocate enough memory for storing the object. This
> semantic patch catches any cases where you are either allocating an
> array of objects or a single object that do not have corret types in
> this sense, more precisely, it captures assignments to a variable of
> type T* where palloc() uses sizeof(T) either alone or with a single
> expression (assuming this is an array count).
>
> The semantic patch is overzealous in the sense that allocation to a
> "const char **" expects a "sizeof(const char *)" and it cannot deal with
> typedefs that introduce aliases (these two can be seen in the patch).
> Although the sizes of these are the same, and Coccinelle do not have a
> good system for comparing types, it might be better to just follow the
> convention of always using the type "T*" for any "palloc(sizeof(T))"
> since it makes automated checking easier and is a small inconvenience;
> especially considering that coccicheck can easily fix this for you. It
> also simplifies other automated checking to follow this convention.

I think this kind of thing is better addressed with a static analyzer or
some heavier compiler annotations and warnings or something like that.
That kind of tool would have the right level of information to decide
whether some code is correct.  Having a text matching tool do it is just
never going to be complete, and the problems you mention show that this
tool fundamentally doesn't understand the code, and just knows how to
parse the code a little better than grep or sed.

Thanks for the comments Peter.

On one hand, I agree with you. Coccinelle understands the *structure* of the code (e.g., what is a type and what is an identifier, where functions start and end, etc.), but not the *semantics* of the code (e.g., if two types are different).

On the other hand, however, development practice is also about following common coding patterns to avoid problems and improve readability. In this sense, having strange situations in the code that just happen to work makes it more difficult for developers as well as for automation tools. For example, allocating memory for elements of sizeof(T**) and assigning it to a T** variable will not be caught by something like ASAN (because the sizes match), and cannot trigger a fault (again, because the sizes match), but correcting this would still make the life easier for both developers and static analysis tools.

That said, if your only concern is the "const int *" vs. "int *" thing (but not the T** vs T* thing), it is possible to deal with this by either adding Python code to the Coccinelle script to just remove all "const" since they do not affect the size or detect that one of the types has "const" in it and just ignore this situation and not suggest to patch it.
 
I do like the idea of exploring this tool for facilitating code
rewriting or semantic patching.  But I'm suspicious about using it for
enforcing coding styles or patterns.

The case we're discussing here is one such case where I think that enforcing a coding style might not be a good idea, but I think it was valuable to raise the issue and get opinions. (This is why I mentioned the problem with the patch being overzealous.)

The real value here is not specific patches though. It is having a tool like Coccinelle for detecting and correcting problematic code as part of the development practice since I think it would speed up development and review and also improve the quality of the code.
--
Best wishes,
Mats Kindahl, Timescale

pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: ecpg command does not warn COPY ... FROM STDIN;
Next
From: Bertrand Drouvot
Date:
Subject: Re: Reorder shutdown sequence, to flush pgstats later