Re: Statistics Import and Export - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Statistics Import and Export
Date
Msg-id Ze7WQBMPvzXoRCCj@tamriel.snowman.net
Whole thread Raw
In response to Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: Statistics Import and Export
List pgsql-hackers
Greetings,

* Corey Huinker (corey.huinker@gmail.com) wrote:
> > > +/*
> > > + * Set statistics for a given pg_class entry.
> > > + *
> > > + * pg_set_relation_stats(relation Oid, reltuples double, relpages int)
> > > + *
> > > + * This does an in-place (i.e. non-transactional) update of pg_class,
> > just as
> > > + * is done in ANALYZE.
> > > + *
> > > + */
> > > +Datum
> > > +pg_set_relation_stats(PG_FUNCTION_ARGS)
> > > +{
> > > +     const char *param_names[] = {
> > > +             "relation",
> > > +             "reltuples",
> > > +             "relpages",
> > > +     };
> > > +
> > > +     Oid                             relid;
> > > +     Relation                rel;
> > > +     HeapTuple               ctup;
> > > +     Form_pg_class   pgcform;
> > > +
> > > +     for (int i = 0; i <= 2; i++)
> > > +             if (PG_ARGISNULL(i))
> > > +                     ereport(ERROR,
> > > +
> >  (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> > > +                                      errmsg("%s cannot be NULL",
> > param_names[i])));
> >
> > Why not just mark this function as strict..?  Or perhaps we should allow
> > NULLs to be passed in and just not update the current value in that
> > case?
>
> Strict could definitely apply here, and I'm inclined to make it so.

Having thought about it a bit more, I generally like the idea of being
able to just update one stat instead of having to update all of them at
once (and therefore having to go look up what the other values currently
are...).  That said, per below, perhaps making it strict is the better
plan.

> > Also, in some cases we allow the function to be called with a
> > NULL but then make it a no-op rather than throwing an ERROR (eg, if the
> > OID ends up being NULL).
>
> Thoughts on it emitting a WARN or NOTICE before returning false?

Eh, I don't think so?

Where this is coming from is that we can often end up with functions
like these being called inside of larger queries, and having them spit
out WARN or NOTICE will just make them noisy.

That leads to my general feeling of just returning NULL if called with a
NULL OID, as we would get with setting the function strict.

> >   Not sure if that makes sense here or not
> > offhand but figured I'd mention it as something to consider.
> >
> > > +     pgcform = (Form_pg_class) GETSTRUCT(ctup);
> > > +     pgcform->reltuples = PG_GETARG_FLOAT4(1);
> > > +     pgcform->relpages = PG_GETARG_INT32(2);
> >
> > Shouldn't we include relallvisible?
>
> Yes. No idea why I didn't have that in there from the start.

Ok.

> > Also, perhaps we should use the approach that we have in ANALYZE, and
> > only actually do something if the values are different rather than just
> > always doing an update.
>
> That was how it worked back in v1, more for the possibility that there was
> no matching JSON to set values.
>
> Looking again at analyze.c (currently lines 1751-1780), we just check if
> there is a row in the way, and if so we replace it. I don't see where we
> compare existing values to new values.

Well, that code is for pg_statistic while I was looking at pg_class (in
vacuum.c:1428-1443, where we track if we're actually changing anything
and only make the pg_class change if there's actually something
different):

vacuum.c:1531
    /* If anything changed, write out the tuple. */
    if (dirty)
        heap_inplace_update(rd, ctup);

Not sure why we don't treat both the same way though ... although it's
probably the case that it's much less likely to have an entire
pg_statistic row be identical than the few values in pg_class.

> > > +Datum
> > > +pg_set_attribute_stats(PG_FUNCTION_ARGS)
> >
> > > +     /* names of columns that cannot be null */
> > > +     const char *required_param_names[] = {
> > > +             "relation",
> > > +             "attname",
> > > +             "stainherit",
> > > +             "stanullfrac",
> > > +             "stawidth",
> > > +             "stadistinct",
> > > +             "stakind1",
> > > +             "stakind2",
> > > +             "stakind3",
> > > +             "stakind4",
> > > +             "stakind5",
> > > +     };
> >
> > Same comment here as above wrt NULL being passed in.
>
> In this case, the last 10 params (stanumbersN and stavaluesN) can be null,
> and are NULL more often than not.

Hmm, that's a valid point, so a NULL passed in would need to set that
value actually to NULL, presumably.  Perhaps then we should have
pg_set_relation_stats() be strict and have pg_set_attribute_stats()
handles NULLs passed in appropriately, and return NULL if the relation
itself or attname, or other required (not NULL'able) argument passed in
cause the function to return NULL.

(What I'm trying to drive at here is a consistent interface for these
functions, but one which does a no-op instead of returning an ERROR on
values being passed in which aren't allowable; it can be quite
frustrating trying to get a query to work where one of the functions
decides to return ERROR instead of just ignoring things passed in which
aren't valid.)

> > > +     for (int k = 0; k < 5; k++)
> >
> > Shouldn't we use STATISTIC_NUM_SLOTS here?
>
> Yes, I had in the past. Not sure why I didn't again.

No worries.

Thanks!

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: Support a wildcard in backtrace_functions
Next
From: "Li, Yong"
Date:
Subject: Re: Proposal to add page headers to SLRU pages