Re: Extended Statistics set/restore/clear functions. - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Extended Statistics set/restore/clear functions.
Date
Msg-id CADkLM=dJ4bQ3e4Cc+vaVFmnrL37Y5hPJ1u0AYXE6fSkh_=+sTg@mail.gmail.com
Whole thread Raw
In response to Re: Extended Statistics set/restore/clear functions.  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Extended Statistics set/restore/clear functions.
List pgsql-hackers
The patch set has a clear deficiency in test coverage: a lot of code
where we expect failures are totally uncovered, particularly things
related to MCV lists and parameters, MCV array checks, dimensions,
incorrect number of elements, expression imports, etc.  There is a
large number of them.  I'd expect all of these cases to have proper
coverage, with WARNINGs generated as we don't want hard failures on
these calls, do we?  Some more validation for ndistinct and
dependencies would be welcome, as well.

Indeed, I've identified 29 WARNINGs that need direct coverage, and that's not counting the ndistinct and dependencies, though for those I think we can limit ourselves to testing ways in which those values can fail the validation against the extended stats object rather than covering all the ways that a value can be invalid in a vacuum.

In doing so, I noticed that we have two ways of handling some of this warning-waterfall.

The first is to have a function call that emits some warnings and flips a boolean:

    datum = some_function(input, input, &ok);
    if (!ok)
       ...

And the second is to directly test the datum, which is fine because the expected datum output is never null:

    datum = some_function(input, input);
    if (datum == (Datum) 0)
       ...

There seems to be more precedent for the second pattern, but I thought I'd bring it up for discussion before switching to that pattern wherever possible.
 
This has to be stable, meaning that we *must* be able to handle and
reject trash input data, and also relates to the lack of coverage
overall.  I am pretty sure that if I begin to inject more buggy values
in other areas of these parameters, things could get funky.

Agreed.
 
I am wondering if we should not cut the pie into two parts here: the
dependencies and ndistinct data is actually so much more
straight-forward as they have predictive input patterns that we can
discard easiy if they are incorrect.  The MCV list part with its
import logic and cross-validation of the inputs is something else
based on the attribute types.  I don't think that we are far from a
good solution, still that would be one strategy to get something done
if we cannot get the MCV part completely right.

I'm not sure what cutting the pie would mean in terms of code reorganization, so I can't comment on whether I think it's a good idea. For the time being I'm going to add the tests in stats_import.sql, and if it gets unweildly then we can consider splitting that.
 

A lot of the code is still largely under-documented, with zero
explanation about pg_restore_extended_stats(), no explanation about
the individual fields that can be set in pg_restore_extended_stats().
It's impossible to understand for a reader what statext_mcv_import()
does based on its inputs and what a caller can give in input.

Noted.
 
least what they refer to in the catalogs.  Having this stuff implied
in the code with zero reference in the documentation is like driving a
car blind, and I doubt it's good for one's health.

If anyone could come up with a means of safely driving via echolocation, it would be you. But let's not let it come to that. Thanks for the feedback. I can post an intermediate v29 to appease the cfbot, but if there's no demand then I'll wait til I have the test cases and documentation in place.

pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Mystery with REVOKE PRIVILEGE
Next
From: Manni Wood
Date:
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD