Thread: Re: Extended Statistics set/restore/clear functions.

Re: Extended Statistics set/restore/clear functions.

From
jian he
Date:
hi.
I reviewed 0001 only.

in src/backend/statistics/mvdistinct.c

no need #include "nodes/pg_list.h" since
src/include/statistics/statistics.h sub level include "nodes/pg_list.h"

no need #include "utils/palloc.h"
sicne #include "postgres.h"
already included it.


 select '[{"6, -32768,,": -11}]'::pg_ndistinct;
ERROR:  malformed pg_ndistinct: "[{"6, -32768,,": -11}]"
LINE 1: select '[{"6, -32768,,": -11}]'::pg_ndistinct;
               ^
DETAIL:  All ndistinct count values are scalar doubles.
imho, this errdetail message is not good.


select '{}'::pg_ndistinct ;
segfault


select '{"1,":"1"}'::pg_ndistinct ;
ERROR:  malformed pg_ndistinct: "{"1,":"1"}"
LINE 1: select '{"1,":"1"}'::pg_ndistinct ;
               ^
DETAIL:  All ndistinct attnum lists must be a comma separated list of attnums.

imho, this errdetail message is not good. would be better saying that
"length of list of attnums must be larger than 1".



select  pt1.typnamespace, pt1.typarray, pt1.typcategory, pt1.typname
from    pg_type pt1
where pt1.typname ~*'distinct';

 typnamespace | typarray | typcategory |   typname
--------------+----------+-------------+--------------
           11 |        0 | Z           | pg_ndistinct

typcategory (Z) marked as Internal-use types. and there is no
pg_ndistinct array type,
not sure this is fine.


all the errcode one pair of the parenthesis is unnecessary.
for example
                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                 errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
                 errdetail("Must begin with \"{\"")));
can change to
                errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                 errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
                 errdetail("Must begin with \"{\""));

see https://www.postgresql.org/docs/current/error-message-reporting.html



Re: Extended Statistics set/restore/clear functions.

From
jian he
Date:
hi.

select '{"1, 0B100101":"NaN"}'::pg_ndistinct;
      pg_ndistinct
------------------------
 {"1, 37": -2147483648}
(1 row)

this is not what we expected?
For the VALUE part of pg_ndistinct, float8 has 3 special values: inf, -inf, NaN.

For the key part of pg_ndistinct, see example.
select '{"1, 16\t":"1"}'::pg_ndistinct;
here \t is not tab character, ascii 9. it's two characters: backslash
and character "t".
so here it should error out?
(apply this to \n, \r, \b)


pg_ndistinct_in(PG_FUNCTION_ARGS)
ending part should be:

    freeJsonLexContext(lex);
    if (result == JSON_SUCCESS)
    {
        ......
    }
    else
    {
       ereturn(parse_state.escontext, (Datum) 0,
                    errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                    errmsg("malformed pg_ndistinct: \"%s\"", str),
                    errdetail("Must be valid JSON."));
       PG_RETURN_NULL();
    }
result should be either JSON_SUCCESS or anything else.



all these functions:
ndistinct_object_start, ndistinct_array_start,
ndistinct_object_field_start, ndistinct_array_element_start
have
ndistinctParseState *parse = state;

do we need to change it to
ndistinctParseState *parse = (ndistinctParseState *)state;
?



ndistinctParseState need to add to src/tools/pgindent/typedefs.list
probably change it to "typedef struct ndistinctParseState".
also struct ndistinctParseState need placed in the top of
src/backend/statistics/mvdistinct.c
not in line 340?



Re: Extended Statistics set/restore/clear functions.

From
Corey Huinker
Date:
On Tue, Jan 28, 2025 at 11:25 AM jian he <jian.universality@gmail.com> wrote:
hi.
I reviewed 0001 only.

in src/backend/statistics/mvdistinct.c

no need #include "nodes/pg_list.h" since
src/include/statistics/statistics.h sub level include "nodes/pg_list.h"

no need #include "utils/palloc.h"
sicne #include "postgres.h"
already included it.


Noted.
 
 select '[{"6, -32768,,": -11}]'::pg_ndistinct;
ERROR:  malformed pg_ndistinct: "[{"6, -32768,,": -11}]"
LINE 1: select '[{"6, -32768,,": -11}]'::pg_ndistinct;
               ^
DETAIL:  All ndistinct count values are scalar doubles.
imho, this errdetail message is not good.

What error message do you think is appropriate in that situation?
 
select '{}'::pg_ndistinct ;
segfault

Mmm, gotta look into that!

 


select '{"1,":"1"}'::pg_ndistinct ;
ERROR:  malformed pg_ndistinct: "{"1,":"1"}"
LINE 1: select '{"1,":"1"}'::pg_ndistinct ;
               ^
DETAIL:  All ndistinct attnum lists must be a comma separated list of attnums.

imho, this errdetail message is not good. would be better saying that
"length of list of attnums must be larger than 1".

That sounds better.

 
typcategory (Z) marked as Internal-use types. and there is no
pg_ndistinct array type,
not sure this is fine.

I think it's probably ok for now. The datatype currently has no utility other than extended statistics, and I'm doubtful that it ever will.

Re: Extended Statistics set/restore/clear functions.

From
Corey Huinker
Date:
On Wed, Jan 29, 2025 at 2:50 AM jian he <jian.universality@gmail.com> wrote:
hi.

select '{"1, 0B100101":"NaN"}'::pg_ndistinct;
      pg_ndistinct
------------------------
 {"1, 37": -2147483648}
(1 row)

I think my initial reaction is to just refuse those special values, but I'll look into the parsing code to see what can be done.



this is not what we expected?
For the VALUE part of pg_ndistinct, float8 has 3 special values: inf, -inf, NaN.

For the key part of pg_ndistinct, see example.
select '{"1, 16\t":"1"}'::pg_ndistinct;
here \t is not tab character, ascii 9. it's two characters: backslash
and character "t".
so here it should error out?
(apply this to \n, \r, \b)

I don't have a good answer as to what should happen here. Special cases like this make Tomas' suggestion to change the in/out format more attractive.


 


pg_ndistinct_in(PG_FUNCTION_ARGS)
ending part should be:

    freeJsonLexContext(lex);
    if (result == JSON_SUCCESS)
    {
        ......
    }
    else
    {
       ereturn(parse_state.escontext, (Datum) 0,
                    errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                    errmsg("malformed pg_ndistinct: \"%s\"", str),
                    errdetail("Must be valid JSON."));
       PG_RETURN_NULL();
    }
result should be either JSON_SUCCESS or anything else.



all these functions:
ndistinct_object_start, ndistinct_array_start,
ndistinct_object_field_start, ndistinct_array_element_start
have
ndistinctParseState *parse = state;

do we need to change it to
ndistinctParseState *parse = (ndistinctParseState *)state;
?

The compiler isn't complaining so far, but I see no harm in it.