Re: Statistics Import and Export - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Statistics Import and Export |
Date | |
Msg-id | Ze7WQBMPvzXoRCCj@tamriel.snowman.net Whole thread Raw |
In response to | Re: Statistics Import and Export (Corey Huinker <corey.huinker@gmail.com>) |
Responses |
Re: Statistics Import and Export
|
List | pgsql-hackers |
Greetings, * Corey Huinker (corey.huinker@gmail.com) wrote: > > > +/* > > > + * Set statistics for a given pg_class entry. > > > + * > > > + * pg_set_relation_stats(relation Oid, reltuples double, relpages int) > > > + * > > > + * This does an in-place (i.e. non-transactional) update of pg_class, > > just as > > > + * is done in ANALYZE. > > > + * > > > + */ > > > +Datum > > > +pg_set_relation_stats(PG_FUNCTION_ARGS) > > > +{ > > > + const char *param_names[] = { > > > + "relation", > > > + "reltuples", > > > + "relpages", > > > + }; > > > + > > > + Oid relid; > > > + Relation rel; > > > + HeapTuple ctup; > > > + Form_pg_class pgcform; > > > + > > > + for (int i = 0; i <= 2; i++) > > > + if (PG_ARGISNULL(i)) > > > + ereport(ERROR, > > > + > > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > + errmsg("%s cannot be NULL", > > param_names[i]))); > > > > Why not just mark this function as strict..? Or perhaps we should allow > > NULLs to be passed in and just not update the current value in that > > case? > > Strict could definitely apply here, and I'm inclined to make it so. Having thought about it a bit more, I generally like the idea of being able to just update one stat instead of having to update all of them at once (and therefore having to go look up what the other values currently are...). That said, per below, perhaps making it strict is the better plan. > > Also, in some cases we allow the function to be called with a > > NULL but then make it a no-op rather than throwing an ERROR (eg, if the > > OID ends up being NULL). > > Thoughts on it emitting a WARN or NOTICE before returning false? Eh, I don't think so? Where this is coming from is that we can often end up with functions like these being called inside of larger queries, and having them spit out WARN or NOTICE will just make them noisy. That leads to my general feeling of just returning NULL if called with a NULL OID, as we would get with setting the function strict. > > Not sure if that makes sense here or not > > offhand but figured I'd mention it as something to consider. > > > > > + pgcform = (Form_pg_class) GETSTRUCT(ctup); > > > + pgcform->reltuples = PG_GETARG_FLOAT4(1); > > > + pgcform->relpages = PG_GETARG_INT32(2); > > > > Shouldn't we include relallvisible? > > Yes. No idea why I didn't have that in there from the start. Ok. > > Also, perhaps we should use the approach that we have in ANALYZE, and > > only actually do something if the values are different rather than just > > always doing an update. > > That was how it worked back in v1, more for the possibility that there was > no matching JSON to set values. > > Looking again at analyze.c (currently lines 1751-1780), we just check if > there is a row in the way, and if so we replace it. I don't see where we > compare existing values to new values. Well, that code is for pg_statistic while I was looking at pg_class (in vacuum.c:1428-1443, where we track if we're actually changing anything and only make the pg_class change if there's actually something different): vacuum.c:1531 /* If anything changed, write out the tuple. */ if (dirty) heap_inplace_update(rd, ctup); Not sure why we don't treat both the same way though ... although it's probably the case that it's much less likely to have an entire pg_statistic row be identical than the few values in pg_class. > > > +Datum > > > +pg_set_attribute_stats(PG_FUNCTION_ARGS) > > > > > + /* names of columns that cannot be null */ > > > + const char *required_param_names[] = { > > > + "relation", > > > + "attname", > > > + "stainherit", > > > + "stanullfrac", > > > + "stawidth", > > > + "stadistinct", > > > + "stakind1", > > > + "stakind2", > > > + "stakind3", > > > + "stakind4", > > > + "stakind5", > > > + }; > > > > Same comment here as above wrt NULL being passed in. > > In this case, the last 10 params (stanumbersN and stavaluesN) can be null, > and are NULL more often than not. Hmm, that's a valid point, so a NULL passed in would need to set that value actually to NULL, presumably. Perhaps then we should have pg_set_relation_stats() be strict and have pg_set_attribute_stats() handles NULLs passed in appropriately, and return NULL if the relation itself or attname, or other required (not NULL'able) argument passed in cause the function to return NULL. (What I'm trying to drive at here is a consistent interface for these functions, but one which does a no-op instead of returning an ERROR on values being passed in which aren't allowable; it can be quite frustrating trying to get a query to work where one of the functions decides to return ERROR instead of just ignoring things passed in which aren't valid.) > > > + for (int k = 0; k < 5; k++) > > > > Shouldn't we use STATISTIC_NUM_SLOTS here? > > Yes, I had in the past. Not sure why I didn't again. No worries. Thanks! Stephen
Attachment
pgsql-hackers by date: