Re: pg_stat_statements: calls under-estimation propagation - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: pg_stat_statements: calls under-estimation propagation
Date
Msg-id CACN56+Mobhp0KB0Bj=kFgrnJ24xhjPnaQvD3VE0Q7L+8yUgmSg@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_statements: calls under-estimation propagation  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Thu, Nov 14, 2013 at 7:25 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur <samthakur74@gmail.com> wrote:
>> Hello,
>> Please find attached pg_stat_statements-identification-v9.patch.
>
> I took a quick look. Observations:
>
> +       /* Making query ID dependent on PG version */
> +       query->queryId |= PG_VERSION_NUM << 16;
>
> If you want to do something like this, make the value of
> PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.
>
> Why are you doing this?

The destabilization of the query_id is to attempt to skirt the
objection that the unpredictability of instability in the query_id
would otherwise cause problems and present a false contract,
particular in regards to point release upgrades.

Earliest drafts of mine included destabilizing every session, but this
kills off a nice use case of correlating query ids between primaries
and standbys, if memory serves.  This has the semantics of
destabilizing -- for sure -- every point release.

> @@ -128,6 +146,7 @@ typedef struct pgssEntry
>         pgssHashKey key;                        /* hash key of entry - MUST BE FIRST */
>         Counters        counters;               /* the statistics for this query */
>         int                     query_len;              /* # of valid bytes in query string */
> +       uint32          query_id;               /* jumble value for this entry */
>
> query_id is already in "key".
>
> Not sure I like the idea of the new enum at all, but in any case you
> shouldn't have a PGSS_TUP_LATEST constant - should someone go update
> all usage of that constant only when your version isn't the latest?
> Like here:
>
> +               if (detected_version >= PGSS_TUP_LATEST)

It is roughly modeled after the "column" version of the same thing
that pre-existed in pg_stat_statements.  The only reason to have a
LATEST is for some of the invariants though; otherwise, most uses
should use the version-stamped symbol.  I did this wrongly a bunch of
times as spotted by Alvaro, I believe.

> I forget why Daniel originally altered the min value of
> pg_stat_statements.max to 1 (I just remember that he did), but I don't
> think it holds that you should keep it there. Have you considered the
> failure modes when it is actually set to 1?

Testing.  It was very useful to set to small numbers, like two or
three.  It's not crucial to the patch at all but having to whack it
back all the time to keep the patch submission devoid of it seemed
impractically tedious during revisions.

> This is what I call a "can't happen" error, or a defensive one:
>
> +       else
> +       {
> +               /*
> +                * Couldn't identify the tuple format.  Raise error.
> +                *
> +                * This is an exceptional case that may only happen in bizarre
> +                * situations, since it is thought that every released version
> +                * of pg_stat_statements has a matching schema.
> +                */
> +               ereport(ERROR,
> +                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                                errmsg("pg_stat_statements schema is not supported "
> +                                               "by its installed binary")));
> +       }
>
> I'll generally make these simple elogs(), which are more terse. No one
> is going to find all that dressing useful.

That seems reasonable.  Yes, it's basically a soft assert.  I hit this
one in development a few times, it was handy.

> It probably isn't useful to comment random, unaffected code that isn't
> affected by your patch - I don't find this new refactoring useful, and
> am surprised to see it in your patch:
>
> +       /* Check header existence and magic number match. */
>         if (fread(&header, sizeof(uint32), 1, file) != 1 ||
> -               header != PGSS_FILE_HEADER ||
> -               fread(&num, sizeof(int32), 1, file) != 1)
> +               header != PGSS_FILE_HEADER)
> +               goto error;
> +
> +       /* Read how many table entries there are. */
> +       if (fread(&num, sizeof(int32), 1, file) != 1)
>                 goto error;
>
> Did you mean to add all this, or is it left over from Daniel's patch?

The whitespace changes are not intentional, but I think the comments
help a reader track what's going on better, which becomes more
important if one adds more fields.  It can be broken out into a
separate patch, but it didn't seem like that bookkeeping was
necessary.

> @@ -43,6 +43,7 @@
>   */
>  #include "postgres.h"
>
> +#include <time.h>
>  #include <unistd.h>
>
>  #include "access/hash.h"
> @@ -59,15 +60,18 @@
>  #include "storage/spin.h"
>  #include "tcop/utility.h"
>  #include "utils/builtins.h"
> +#include "utils/timestamp.h"
>
> Final thought: I think the order in the pg_stat_statements view is
> wrong. It ought to be like a composite primary key - (userid, dbid,
> query_id).

I made no design decisions here...no complaints from me.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Heavily modified big table bloat even in auto vacuum is running
Next
From: Claudio Freire
Date:
Subject: Re: Optimize kernel readahead using buffer access strategy