Thread: Re: Regression in statement locations
> It is also possible that the regression is not coming from > 499edb0 but I do not see another obvious candidate. I used pg_stat_statements to repro the issue, and a bisect resulted in 499edb0 being the source of the regression. ``` select pg_stat_statements_reset(); set pg_stat_statements.track='all'; DO $$ DECLARE BEGIN EXECUTE 'CREATE TABLE do_table (weird_name INT); DROP table do_table'; END $$; select query from pg_stat_statements where not toplevel; ``` WITHOUT a semicolon at the end of the statements, as reported by David ``` EXECUTE 'CREATE TABLE do_table (weird_name INT); DROP table do_table'; ``` I can reproduce the issue query ------------------------------------------------------------- CREATE TABLE do_table (weird_name INT); DROP table do_table CREATE TABLE do_table (weird_name INT) (2 rows) WITH a semicolon at the end of the statements ``` EXECUTE 'CREATE TABLE do_table (weird_name INT); DROP table do_table;'; ``` I cannot reproduce the issue query ---------------------------------------- DROP table do_table CREATE TABLE do_table (weird_name INT) (2 rows) I am still not sure why this is the case, but wanted to share this for now. -- Sami Imseih Amazon Web Services (AWS)
On Mon, May 19, 2025 at 05:10:14PM -0500, Sami Imseih wrote: > I am still not sure why this is the case, but wanted to share this > for now. Hmm. Something seems to not be compiling well for the final query of a stmtmulti in gram.y with updateRawStmtEnd(), as we rely on the position of the semicolon to decide what the stmt_len should be with multiple querues. We don't set the stmt_len without the semicolon, causing pgss to reuse the full query length when storing the entry of the last query. With the semicolon in place, stmt_len gets set for the last query of the string. Still digging more.. -- Michael
Attachment
On Tue, May 20, 2025 at 08:38:47AM +0900, Michael Paquier wrote: > With the semicolon in place, stmt_len gets set for the last query of > the string. Still digging more.. And got it. The problem is that we are failing to update the statement location in a couple of cases with subqueries, and that we should handle (p_stmt_len == 0) as of using the remaining bytes in the string when a location is available, but the code was too aggressive in thinking that the length = 0 case should be always discarded. Once I have fixed that, I've been a bit puzzled by the difference in output in the tests of pg_overexplain, but I think that the new output is actually the correct one: the queries whose plan outputs have changed are passed as arguments of the explain_filter() function, hence the location of the inner queries point at the start location of the inner query instead of the start of the top-level query. Note that if you add a semicolon at the end of these three queries in the pg_overexplain tests, we finish with an end location reported. I have also played with 499edb0 reverted and noted that the results of pg_overexplain were inconsistent when the module has been originally introduced, with two queries choking a bit. -- Michael
Attachment
On Tue, May 20, 2025 at 11:59 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, May 20, 2025 at 08:38:47AM +0900, Michael Paquier wrote: > > With the semicolon in place, stmt_len gets set for the last query of > > the string. Still digging more.. > > And got it. The problem is that we are failing to update the > statement location in a couple of cases with subqueries, and that we > should handle (p_stmt_len == 0) as of using the remaining bytes in > the string when a location is available, but the code was too > aggressive in thinking that the length = 0 case should be always > discarded. > I was thinking of using strlen(pstate->p_sourcetext) - qry->stmt_location; in setQueryLocationAndLength, then I saw your changes in the pg_overexplain module. maybe put some comments on top of setQueryLocationAndLength for (qry->stmt_len == 0) case, but i saw the comments on the bottom of setQueryLocationAndLength. Overall, it looks good to me.
On Tue, May 20, 2025 at 09:58:04AM +0200, Anthonin Bonnefoy wrote: > Looking at the tests, there are 2 additionals empty DO blocks: > +DO $$ > +DECLARE BEGIN > +END $$; > > What's the point of those? They won't be visible in the output since > we have 'toplevel IS FALSE' in the pg_stat_statements calls and they > don't fit the "DO block --- multiple inner queries with separators". That's a copy-pasto. Will remove. > Other than that, the patch looks good. Thanks for the review, Anthonin and Jian. -- Michael
Attachment
Tested the patch and it looks good to me. Not that I thought it would fail, but I also confirmed the pgaudit case works as expected. ``` LOG: AUDIT: SESSION,10,2,DDL,CREATE TABLE,,,"CREATE TABLE do_table (""weird name"" INT)",<not logged> LOG: AUDIT: SESSION,10,3,DDL,DROP TABLE,,,DROP table do_table,<not logged> DO ``` -- Sami
On Tue, May 20, 2025 at 02:04:20PM +0000, David Steele wrote: > I did a careful examination of the remaining diffs (there are quite a few) > and in every case I consider them to be beneficial, i.e. they make the > output more targeted and readable. > > I did not do a real code review, but I did notice that the test table column > is called weird_name as in our tests. I would argue that since it is missing > the quotes and space it is not really all that weird and should maybe get a > normal name so developers in the future don't wonder what is weird about it. I have fixed that, as it is not a weird attribute, removed the unnedeed DO blocks in the tests as pointed out by Anthonin, and moved one comment as pointed out by Jian. > Thank you for this improvement and the quick fix! Yeah, thanks all for pointing out that sometimes my analysis of things can go off tracks. The fix has been applied now on HEAD. I've also checked the state of pgaudit on branch dev-pg18, with the regression getting fixed. Things look clear now, at least from my side. -- Michael