Thread: What is lurking in the shadows?
Hi hackers, Recently I was involved with some patches [1][2] to fix code which was mistakenly using a global "wrconn" variable instead of a local one. That bug led me to wonder if similar problems might be going undetected elsewhere in the code. There is a gcc compiler option [3] -Wshadow which informs about the similar scenario where one variable is "shadowing" another (e.g. redeclaring a variable with the same name as one at an outer scope). PSA a log file from a PG14 build (code from last week) run using the -Wshadow flag. In this logfile I have filtered out everything except the shadow warnings. My plan initially was to just fix the few warnings found and present the patches here, but it turned out there are far more cases than I was anticipating. There seem to be basically 3 categories of shadowing exposed in this logfile: 1. where a var declaration is shadowing a previously declared local var (205 cases found) 2. where a var declaration is shadowing a function parameter (14 cases found) 3. where a var declaration is shadowing a global variable (110 cases found) ~~~ Of the dozen or so cases that I have looked at, so far I have been unable to find anything that would result in any *real* errors. But that is not to say they are harmless either - at the very least IMO they affect code readability in ways that span the full spectrum from "meh" to downright "dodgy-looking". Some examples are possibly deliberate (albeit lazy / unimaginative?) local re-declarations of variables like "i" and "buf" etc. But many other examples (particularly the global shadows) seemed clearly unintentional mistakes to me - like the code evolved and continued working OK without warnings, so any introduced shadowing just went unnoticed. And who knows... maybe there are a few *real* bugs lurking within this list too? ~~~ For now, I am not sure how to proceed with this information. Hence this post... - Perhaps a consistent convention for global variable names could have prevented lots of these cases from occurring. - Many of these shadow cases look unintentional to me; I feel the code would have been implemented differently had the developer been aware of them, so at least advertising their presence seems a useful thing to know. Perhaps the -Wshadow flag can be added to one of the build-farm machines for that purpose? - Finally, IMO the code is nearly always more confusing when there is variable shadowing, so removal of these warnings seems a worthy goal. Perhaps they can be slowly whittled away during the course of PG 15 development? Or am I just jumping at shadows? Thoughts? ---------- [1] https://github.com/postgres/postgres/commit/4e8c0f1a0d0d095a749a329a216c88a340a455b6 [2] https://github.com/postgres/postgres/commit/db16c656478b815627a03bb0a31833391a733eb0 [3] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Fri, 14 May 2021 at 12:00, Peter Smith <smithpb2250@gmail.com> wrote: > That bug led me to wonder if similar problems might be going > undetected elsewhere in the code. There is a gcc compiler option [3] > -Wshadow which informs about the similar scenario where one variable > is "shadowing" another (e.g. redeclaring a variable with the same name > as one at an outer scope). > For now, I am not sure how to proceed with this information. Hence this post... I'm inclined to think that since a bug has already been found due to a local variable shadowing a global one that it would be good to review these and then consider if it's worth doing any renaming. I think the process of looking at each warning individually will allow us to determine if; a) there are any bugs, or; b) if it's worth doing any renaming. I see GCC also has -Wshadow=compatible-local to warn when there shadowing is going on in local vars where both vars have compatible types. -Wshadow=local is any local var shadowing, then the option you used which is the same as -Wshadow=global. I'd say it might be worth aspiring to reduce the warnings from building with these flags. If we reduced these down then it might allow us to more easily identify cases where there are actual bugs. Maybe we can get to a point where we could enable either -Wshadow=compatible-local or -Wshadow=local. I doubt we could ever get to a stage where -Wshadow=global would work for us. There's also some quite old discussion in [1] that you might want to review. I don't pretend to have found the best example of ones that we might want to leave alone, but: pg_controldata.c: In function ‘wal_level_str’: pg_controldata.c:73:24: warning: declaration of ‘wal_level’ shadows a global declaration [-Wshadow] wal_level_str(WalLevel wal_level) ^ In file included from pg_controldata.c:24:0: ../../../src/include/access/xlog.h:187:24: warning: shadowed declaration is here [-Wshadow] extern PGDLLIMPORT int wal_level; I wonder if it would really clear up much if the parameter name there was renamed not to shadow the GUC variable's name. Also, doing any renaming here is not without risk that we break something, so certainly PG15 at the earliest, unless there is an actual bug. I imagine starting with a patch that fixes the ones where the name does not have much meaning. e.g, i, buf, tmp, lc We also need to take into account that renaming variables here can increase the overhead of backpatching fixes. The process of fixing those up to make the patch apply to the back branch does increase the chances that bugs could make their way into the back branches. However, it's probably more likely to end up as a bug if the patch was written for the back branch then there's a bigger opportunity for the patch author to pick the wrong variable name when converting the patch to work with master. In the reverse case, that does not seem as likely due to both variables having the same name. David [1] https://www.postgresql.org/message-id/flat/877k1psmpf.fsf%40mailbox.samurai.com
On Fri, May 14, 2021 at 01:16:37PM +1200, David Rowley wrote: > I'm inclined to think that since a bug has already been found due to a > local variable shadowing a global one that it would be good to review > these and then consider if it's worth doing any renaming. I think the > process of looking at each warning individually will allow us to > determine if; a) there are any bugs, or; b) if it's worth doing any > renaming. 70116493 is another instance of that, from a not-so-far past.. > I'd say it might be worth aspiring to reduce the warnings from > building with these flags. If we reduced these down then it might > allow us to more easily identify cases where there are actual bugs. > Maybe we can get to a point where we could enable either > -Wshadow=compatible-local or -Wshadow=local. I doubt we could ever > get to a stage where -Wshadow=global would work for us. There's also > some quite old discussion in [1] that you might want to review. Agreed, not before the 15 branch opens for business for cosmetic changes. compatible-local did not sound that much interesting to me on first sight, but the report of Peter tells the contrary: most of the conflicts come from local problems. I am not sure that you could enable that safely though as PG_TRY() would complain on that, for example in ProcessUtilitySlow(). > We also need to take into account that renaming variables here can > increase the overhead of backpatching fixes. The process of fixing > those up to make the patch apply to the back branch does increase the > chances that bugs could make their way into the back branches. > However, it's probably more likely to end up as a bug if the patch was > written for the back branch then there's a bigger opportunity for the > patch author to pick the wrong variable name when converting the patch > to work with master. In the reverse case, that does not seem as likely > due to both variables having the same name. That may be tricky, even if global or local variables are changed, but I'd like to think that there is room for improvement. Looking at the report, the global conflicts involve: - synchronous_commit - ssl_key_file - wal_segment_size - DataDir, with the control data business. These seem changeable without much holes with potential back-patches. -- Michael
Attachment
On Fri, May 14, 2021 at 11:16 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 14 May 2021 at 12:00, Peter Smith <smithpb2250@gmail.com> wrote: > > That bug led me to wonder if similar problems might be going > > undetected elsewhere in the code. There is a gcc compiler option [3] > > -Wshadow which informs about the similar scenario where one variable > > is "shadowing" another (e.g. redeclaring a variable with the same name > > as one at an outer scope). > > > For now, I am not sure how to proceed with this information. Hence this post... > > I'm inclined to think that since a bug has already been found due to a > local variable shadowing a global one that it would be good to review > these and then consider if it's worth doing any renaming. I think the > process of looking at each warning individually will allow us to > determine if; a) there are any bugs, or; b) if it's worth doing any > renaming. > Hi David, Michael - Thanks for your replies. Yeah, I would like to work my way through all of these warnings in my spare time and report back to this thread (after 1-2 months?) with a detailed analysis. After that it should become much clearer what / if any action should be taken next. ---------- Kind Regards, Peter Smith. Fujitsu Australia
On Tue, May 18, 2021 at 12:08:57PM +1000, Peter Smith wrote: > Yeah, I would like to work my way through all of these warnings in my > spare time and report back to this thread (after 1-2 months?) with a > detailed analysis. The next commit fest is at the beginning of July, so there are a couple of weeks of margin here. > After that it should become much clearer what / if any action should > be taken next. If you can dive into the details, that would be nice! My take would be to look first at the local-local conflicts and rename all the variables that conflict so as any backpatch done in the areas changed cause a compilation failure. Some of the global-local conflicts are simple enough to solve, these could go second. Each of them requires a case-by-case lookup, of course. -- Michael
Attachment
On Tue, 18 May 2021 at 14:09, Peter Smith <smithpb2250@gmail.com> wrote: > Yeah, I would like to work my way through all of these warnings in my > spare time and report back to this thread (after 1-2 months?) with a > detailed analysis. I'd recommend for any patches that they come in bite-sized chunks. A committer is going to have to re-review each change. For me personally, I'll probably run for the hills if I see a patch that renames 200 variables. I'd think about a dozen would be good. Starting with ones that are least likely to raise objection also seems like a good idea. That way you'll have an idea if you want to trouble yourself with the more questionable ones when the less questionable ones raised too many questions. Like I mentioned, start with ones like i, buf, tmp, lc. If those are accepted then move on to the more difficult ones. Unless you discover bugs, then there's not really any urgency to fix these. Doing it in bite-sized chunks is less likely going to cause frustration for you if some of the work is rejected after you've gone to all the trouble. Also, going by what's mentioned in [1], in particular [2], I'm not so certain that these changes will be received well by everyone. So I recommend just taking it slow. David [1] https://www.postgresql.org/message-id/flat/877k1psmpf.fsf%40mailbox.samurai.com [2] https://www.postgresql.org/message-id/22920.1069708226%40sss.pgh.pa.us