Thread: Re: Regression in statement locations

Re: Regression in statement locations

From
Sami Imseih
Date:
> It is also possible that the regression is not coming from
> 499edb0 but I do not see another obvious candidate.

I used pg_stat_statements to repro the issue, and a bisect
resulted in 499edb0 being the source of the regression.

```
select pg_stat_statements_reset();

set pg_stat_statements.track='all';
DO $$
DECLARE
BEGIN
        EXECUTE 'CREATE TABLE do_table (weird_name INT); DROP table do_table';
END $$;
select query from pg_stat_statements where not toplevel;
```

WITHOUT a semicolon at the end of the statements, as reported by David

```
EXECUTE 'CREATE TABLE do_table (weird_name INT); DROP table do_table';
```

I can reproduce the issue


                            query
-------------------------------------------------------------
 CREATE TABLE do_table (weird_name INT); DROP table do_table
 CREATE TABLE do_table (weird_name INT)
(2 rows)


WITH a semicolon at the end of the statements
```
EXECUTE 'CREATE TABLE do_table (weird_name INT); DROP table do_table;';
```

I cannot reproduce the issue

                 query
----------------------------------------
 DROP table do_table
 CREATE TABLE do_table (weird_name INT)
(2 rows)

I am still not sure why this is the case, but wanted to share this
for now.

--
Sami Imseih
Amazon Web Services (AWS)



Re: Regression in statement locations

From
Michael Paquier
Date:
On Mon, May 19, 2025 at 05:10:14PM -0500, Sami Imseih wrote:
> I am still not sure why this is the case, but wanted to share this
> for now.

Hmm.  Something seems to not be compiling well for the final query of
a stmtmulti in gram.y with updateRawStmtEnd(), as we rely on the
position of the semicolon to decide what the stmt_len should be with
multiple querues.  We don't set the stmt_len without the semicolon,
causing pgss to reuse the full query length when storing the entry of
the last query.

With the semicolon in place, stmt_len gets set for the last query of
the string.  Still digging more..
--
Michael

Attachment

Re: Regression in statement locations

From
Michael Paquier
Date:
On Tue, May 20, 2025 at 08:38:47AM +0900, Michael Paquier wrote:
> With the semicolon in place, stmt_len gets set for the last query of
> the string.  Still digging more..

And got it.  The problem is that we are failing to update the
statement location in a couple of cases with subqueries, and that we
should handle (p_stmt_len == 0) as of using the remaining bytes in
the string when a location is available, but the code was too
aggressive in thinking that the length = 0 case should be always
discarded.

Once I have fixed that, I've been a bit puzzled by the difference in
output in the tests of pg_overexplain, but I think that the new output
is actually the correct one: the queries whose plan outputs have
changed are passed as arguments of the explain_filter() function,
hence the location of the inner queries point at the start location of
the inner query instead of the start of the top-level query.  Note
that if you add a semicolon at the end of these three queries in the
pg_overexplain tests, we finish with an end location reported.

I have also played with 499edb0 reverted and noted that the results of
pg_overexplain were inconsistent when the module has been originally
introduced, with two queries choking a bit.
--
Michael

Attachment

Re: Regression in statement locations

From
jian he
Date:
On Tue, May 20, 2025 at 11:59 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, May 20, 2025 at 08:38:47AM +0900, Michael Paquier wrote:
> > With the semicolon in place, stmt_len gets set for the last query of
> > the string.  Still digging more..
>
> And got it.  The problem is that we are failing to update the
> statement location in a couple of cases with subqueries, and that we
> should handle (p_stmt_len == 0) as of using the remaining bytes in
> the string when a location is available, but the code was too
> aggressive in thinking that the length = 0 case should be always
> discarded.
>

I was thinking of using strlen(pstate->p_sourcetext) -  qry->stmt_location;
in setQueryLocationAndLength,
then I saw your changes in the pg_overexplain module.

maybe put some comments on top of setQueryLocationAndLength
for (qry->stmt_len == 0) case, but i saw the comments on the bottom
of setQueryLocationAndLength.

Overall, it looks good to me.



Re: Regression in statement locations

From
Michael Paquier
Date:
On Tue, May 20, 2025 at 09:58:04AM +0200, Anthonin Bonnefoy wrote:
> Looking at the tests, there are 2 additionals empty DO blocks:
> +DO $$
> +DECLARE BEGIN
> +END $$;
>
> What's the point of those? They won't be visible in the output since
> we have 'toplevel IS FALSE' in the pg_stat_statements calls and they
> don't fit the "DO block --- multiple inner queries with separators".

That's a copy-pasto.  Will remove.

> Other than that, the patch looks good.

Thanks for the review, Anthonin and Jian.
--
Michael

Attachment

Re: Regression in statement locations

From
Sami Imseih
Date:
Tested the patch and it looks good to me.

Not that I thought it would fail, but I also confirmed the pgaudit case
works as expected.

```
LOG:  AUDIT: SESSION,10,2,DDL,CREATE TABLE,,,"CREATE TABLE do_table
(""weird name""
INT)",<not logged>
LOG:  AUDIT: SESSION,10,3,DDL,DROP TABLE,,,DROP table do_table,<not logged>
DO
```

--
Sami



Re: Regression in statement locations

From
Michael Paquier
Date:
On Tue, May 20, 2025 at 02:04:20PM +0000, David Steele wrote:
> I did a careful examination of the remaining diffs (there are quite a few)
> and in every case I consider them to be beneficial, i.e. they make the
> output more targeted and readable.
>
> I did not do a real code review, but I did notice that the test table column
> is called weird_name as in our tests. I would argue that since it is missing
> the quotes and space it is not really all that weird and should maybe get a
> normal name so developers in the future don't wonder what is weird about it.

I have fixed that, as it is not a weird attribute, removed the
unnedeed DO blocks in the tests as pointed out by Anthonin, and moved
one comment as pointed out by Jian.

> Thank you for this improvement and the quick fix!

Yeah, thanks all for pointing out that sometimes my analysis of things
can go off tracks.

The fix has been applied now on HEAD.  I've also checked the state of
pgaudit on branch dev-pg18, with the regression getting fixed.  Things
look clear now, at least from my side.
--
Michael

Attachment