pg_stat_statements normalization: re-review - Mailing list pgsql-hackers

From Daniel Farina
Subject pg_stat_statements normalization: re-review
Date
Msg-id CAAZKuFYZ37ox+v+izHhAPhKq_xLSYD_rPnK=6-ANVW4dCE_4sA@mail.gmail.com
Whole thread Raw
Responses Re: pg_stat_statements normalization: re-review  (Peter Geoghegan <peter@2ndquadrant.com>)
List pgsql-hackers
Hello again,

I'm reviewing the revised version of pg_stat_statements again.  In
particular, this new version has done away with the mechanical but
invasive change to the lexing that preserved *both* the position and
length of constant values. (See
http://archives.postgresql.org/message-id/CAEYLb_WGeFCT7MfJ8FXf-CR6BSE6Lbn+O1VX3+OGrc4Bscn4=A@mail.gmail.com)

I did the review front-matter again (check against a recent version,
make sure it does what it says it'll do, et al..) and did trigger an
assertion failure that seems to be an oversight, already reported to
Peter.  I found that oversight by running the SQLAlchemy Python
query-generation toolkit's tests, which are voluminous.  The
functionality is seemingly mostly unchanged.

What I'm going to do here is focus on the back-end source changes that
are not part of the contrib.  The size of the changes are much
reduced, but their semantics are also somewhat more complex...so, here
goes.  Conclusion first:

* The small changes to hashing are probably not strictly required,
unless collisions are known to get terrible.

* The hook to analyze is pretty straight-forward and seem like the other hooks

* The addition of a field to scribble upon in the Query and Plan nodes
is something I'd like to understand better, as these leave me with a
bit of disquiet.

What follows is in much more detail, and broken up by functionality
and file grouping in the backend.
src/include/access/hash.h            |    4 ++-src/backend/access/hash/hashfunc.c   |   21 ++++++++++---------

This adjusts the hash_any procedure to support returning two possible
precisions (64 bit and 32 bit) by tacking on a Boolean flag to make
the precision selectable.  The hash_any operator is never used
directly, and instead via macro, and a second macro to support 64-bit
precision has been added.  It seems a bit wonky to me to use a flag to
select this rather than encapsulating the common logic in a procedure
and just break this up into three symbols, though.  I think the longer
precision is used to try to get fewer collisions with not too much
slowdown.  Although it may meaningfully improve the quality of the
hashing for pg_stat_statements or living without the extra hashing
bits might lead to unusable results (?), per the usual semantics of
hashing it is probably not *strictly* necessary.

A smidgen of avoidable formatting niggles (> 80 columns) are in this section.
src/backend/nodes/copyfuncs.c        |    5 ++++src/backend/nodes/equalfuncs.c       |    4
+++src/backend/nodes/outfuncs.c        |   10 +++++++++src/backend/optimizer/plan/planner.c |    1
+src/backend/nodes/readfuncs.c       |   12 +++++++++++src/include/nodes/parsenodes.h       |    3
++src/include/nodes/plannodes.h       |    2 +
 

These have all been changed in the usual manner to support one new
field, the queryId, on the toplevel Plan and Query nodes.  The queryId
is 64-bit field copied from queries to plans to tie together plans "to
be used by plugins" (quoth the source) as they flow through the
system.  pg_stat_statements fills them with the 64-bit hash of the
query text -- a reasonable functionality to possess, I think, but the
abstraction seems iffy: plugins cannot compose well if they all need
to use the queryId for different reasons.  Somehow I get the feeling
that this field can be avoided or its use case can be satisfied in a
more satisfying way.

Mysteriously, in the Query representation the symbol uses an
underscored-notation (query_id) and in the planner it uses a camelcase
one, queryId.  Overall, the other fields in those structs all use
camelcase, so I'd recommend normalizing it.
src/backend/parser/analyze.c         |   37 ++++++++++++++++++++++++++++++---src/include/parser/analyze.h         |
12+++++++++++
 

These changes implement hooks for the once-non-hookable symbols
parse_analyze_varparams and parse_analyze, in seemingly the same way
they are implemented in other hooks in the system.  These are neatly
symmetrical with the planner hook.  I only ask if there is a way to
have one hook and not two, but I suppose that would be a similar
question as "why is are there two ways to symbols to enter into
parsing, and not one".

-- 
fdr


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: foreign key locks, 2nd attempt
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Speed dblink using alternate libpq tuple storage