Improve explicit cursor handling in pg_stat_statements - Mailing list pgsql-hackers

From Sami Imseih
Subject Improve explicit cursor handling in pg_stat_statements
Date
Msg-id CAA5RZ0tA6LbHCg2qSS+KuM850BZC_+ZgHV7Ug6BXw22TNyF+MA@mail.gmail.com
Whole thread Raw
Responses Re: Improve explicit cursor handling in pg_stat_statements
List pgsql-hackers
Hi hackers,

I recently looked into a workload that makes heavy use of explicit cursors,
and I found that pg_stat_statements can become a bottleneck. The
application in question declares hundreds of cursors, and for each one,
performs many FETCH and MOVE operations with varying fetch sizes.

As a result, pg_stat_statements ends up overwhelmed by the deallocation
(and garbage collection) of DECLARE CURSOR, FETCH, and MOVE entries.
Each of these is associated with a unique queryId, which leads to bloated
entries with limited diagnostic value.

Other issues:

1. FETCH/MOVE statements don't really help much in laying blame on a specific
query. the DECLARE CURSOR statement could have been evicted in
pg_stat_statements
by that point or a similar cursor name is pointing to a different query.

Also, FETCH doesn't aggregate for the same cursor — e.g., FETCH 10 c1 and
FETCH 20 c1 show up as separate entries.

2. DECLARE CURSOR doesn't provide execution stats for the underlying SQL.
Enabling pg_stat_statements.track = 'all' can expose the underlying SQL,
but adds overhead.There’s also a bug: the toplevel column for the underlying
query is still marked as "t", even though you must set track "all" to see it.

Based on this, I propose the following improvements:

1. Better normalization of cursor utility commands:

2. Normalize the cursor name in CLOSE.

3. Normalize fetch/move sizes in FETCH and MOVE. Users can use the rows and
calls columns to derive average fetch size. Ideally I would want to
normalize the
cursor name and generate the queryId f the FETCH statement based on the
underlying query, but that is not possible to do that post parsing.

(The above normalizations of these utility statements will reduce the bloat.)

4. Track the underlying query of a cursor by default, even when
pg_stat_statements.track_utility = off.

I’ve attached two patches that implement this. Here's a quick example:

```
begin;
declare cs1 cursor for select from pg_class;
declare cs2 cursor for select from pg_class;
fetch 10 cs1;
fetch 20 cs1;
fetch 10 cs1;
fetch 10 cs2;
close cs1;
close cs2;
declare cs1 cursor for select from pg_attribute;
SELECT calls, rows, query, toplevel FROM pg_stat_statements ORDER BY
query COLLATE "C";
commit;
```

current state:
```
postgres=*# SELECT calls, rows, query, toplevel FROM
pg_stat_statements ORDER BY query COLLATE "C";
 calls | rows |
query                                                   | toplevel

-------+------+-----------------------------------------------------------------------------------------------------------+----------
     1 |    1 | SELECT name    FROM pg_catalog.pg_available_extensions
  WHERE name LIKE $1 AND installed_version IS NULL+| t
       |      | LIMIT $2
                                                   |
     1 |    0 | begin
                                                   | t
     1 |    0 | close cs1
                                                   | t
     1 |    0 | close cs2
                                                   | t
     1 |    0 | create extension pg_stat_statements
                                                   | t
     1 |    0 | declare cs1 cursor for select from pg_attribute
                                                   | t
     1 |    0 | declare cs1 cursor for select from pg_class
                                                   | t
     1 |    0 | declare cs2 cursor for select from pg_class
                                                   | t
     2 |   20 | fetch 10 cs1
                                                   | t
     1 |   10 | fetch 10 cs2
                                                   | t
     1 |   20 | fetch 20 cs1
                                                   | t
(11 rows)
```

with both patches applied:
```
postgres=*# SELECT calls, rows, query, toplevel FROM
pg_stat_statements ORDER BY query COLLATE "C";
 calls | rows |                     query                      | toplevel
-------+------+------------------------------------------------+----------
     1 |    0 | begin                                          | t
     2 |    0 | close $1                                       | t
     1 |    0 | declare $1 cursor for select from pg_attribute | t
     2 |    0 | declare $1 cursor for select from pg_class     | t
     3 |   40 | fetch $1 cs1                                   | t
     1 |   10 | fetch $1 cs2                                   | t
     2 |   50 | select from pg_class                           | t
(7 rows)

postgres=*# commit;
COMMIT
```

FWIW, I raised this ~3 years ago [0], but there was not much interest. I
have seen this being a problem a few times since then that I think something
should be done about. I also was not happy with the approach I took in [0].

Looking forward to feedback!

Regards,

--
Sami Imseih
Amazon Web Services (AWS)

[0] https://www.postgresql.org/message-id/flat/203CFCF7-176E-4AFC-A48E-B2CECFECD6AA%40amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pgsql: Add function to log the memory contexts of specified backend pro
Next
From: Sami Imseih
Date:
Subject: Re: Introduce some randomness to autovacuum