Thread: Re: Extended Statistics set/restore/clear functions.
hi. I reviewed 0001 only. in src/backend/statistics/mvdistinct.c no need #include "nodes/pg_list.h" since src/include/statistics/statistics.h sub level include "nodes/pg_list.h" no need #include "utils/palloc.h" sicne #include "postgres.h" already included it. select '[{"6, -32768,,": -11}]'::pg_ndistinct; ERROR: malformed pg_ndistinct: "[{"6, -32768,,": -11}]" LINE 1: select '[{"6, -32768,,": -11}]'::pg_ndistinct; ^ DETAIL: All ndistinct count values are scalar doubles. imho, this errdetail message is not good. select '{}'::pg_ndistinct ; segfault select '{"1,":"1"}'::pg_ndistinct ; ERROR: malformed pg_ndistinct: "{"1,":"1"}" LINE 1: select '{"1,":"1"}'::pg_ndistinct ; ^ DETAIL: All ndistinct attnum lists must be a comma separated list of attnums. imho, this errdetail message is not good. would be better saying that "length of list of attnums must be larger than 1". select pt1.typnamespace, pt1.typarray, pt1.typcategory, pt1.typname from pg_type pt1 where pt1.typname ~*'distinct'; typnamespace | typarray | typcategory | typname --------------+----------+-------------+-------------- 11 | 0 | Z | pg_ndistinct typcategory (Z) marked as Internal-use types. and there is no pg_ndistinct array type, not sure this is fine. all the errcode one pair of the parenthesis is unnecessary. for example (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("malformed pg_ndistinct: \"%s\"", parse->str), errdetail("Must begin with \"{\""))); can change to errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("malformed pg_ndistinct: \"%s\"", parse->str), errdetail("Must begin with \"{\"")); see https://www.postgresql.org/docs/current/error-message-reporting.html
hi. select '{"1, 0B100101":"NaN"}'::pg_ndistinct; pg_ndistinct ------------------------ {"1, 37": -2147483648} (1 row) this is not what we expected? For the VALUE part of pg_ndistinct, float8 has 3 special values: inf, -inf, NaN. For the key part of pg_ndistinct, see example. select '{"1, 16\t":"1"}'::pg_ndistinct; here \t is not tab character, ascii 9. it's two characters: backslash and character "t". so here it should error out? (apply this to \n, \r, \b) pg_ndistinct_in(PG_FUNCTION_ARGS) ending part should be: freeJsonLexContext(lex); if (result == JSON_SUCCESS) { ...... } else { ereturn(parse_state.escontext, (Datum) 0, errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), errmsg("malformed pg_ndistinct: \"%s\"", str), errdetail("Must be valid JSON.")); PG_RETURN_NULL(); } result should be either JSON_SUCCESS or anything else. all these functions: ndistinct_object_start, ndistinct_array_start, ndistinct_object_field_start, ndistinct_array_element_start have ndistinctParseState *parse = state; do we need to change it to ndistinctParseState *parse = (ndistinctParseState *)state; ? ndistinctParseState need to add to src/tools/pgindent/typedefs.list probably change it to "typedef struct ndistinctParseState". also struct ndistinctParseState need placed in the top of src/backend/statistics/mvdistinct.c not in line 340?
On Tue, Jan 28, 2025 at 11:25 AM jian he <jian.universality@gmail.com> wrote:
hi.
I reviewed 0001 only.
in src/backend/statistics/mvdistinct.c
no need #include "nodes/pg_list.h" since
src/include/statistics/statistics.h sub level include "nodes/pg_list.h"
no need #include "utils/palloc.h"
sicne #include "postgres.h"
already included it.
Noted.
select '[{"6, -32768,,": -11}]'::pg_ndistinct;
ERROR: malformed pg_ndistinct: "[{"6, -32768,,": -11}]"
LINE 1: select '[{"6, -32768,,": -11}]'::pg_ndistinct;
^
DETAIL: All ndistinct count values are scalar doubles.
imho, this errdetail message is not good.
What error message do you think is appropriate in that situation?
select '{}'::pg_ndistinct ;
segfault
Mmm, gotta look into that!
select '{"1,":"1"}'::pg_ndistinct ;
ERROR: malformed pg_ndistinct: "{"1,":"1"}"
LINE 1: select '{"1,":"1"}'::pg_ndistinct ;
^
DETAIL: All ndistinct attnum lists must be a comma separated list of attnums.
imho, this errdetail message is not good. would be better saying that
"length of list of attnums must be larger than 1".
That sounds better.
typcategory (Z) marked as Internal-use types. and there is no
pg_ndistinct array type,
not sure this is fine.
I think it's probably ok for now. The datatype currently has no utility other than extended statistics, and I'm doubtful that it ever will.
On Wed, Jan 29, 2025 at 2:50 AM jian he <jian.universality@gmail.com> wrote:
hi.
select '{"1, 0B100101":"NaN"}'::pg_ndistinct;
pg_ndistinct
------------------------
{"1, 37": -2147483648}
(1 row)
I think my initial reaction is to just refuse those special values, but I'll look into the parsing code to see what can be done.
this is not what we expected?
For the VALUE part of pg_ndistinct, float8 has 3 special values: inf, -inf, NaN.
For the key part of pg_ndistinct, see example.
select '{"1, 16\t":"1"}'::pg_ndistinct;
here \t is not tab character, ascii 9. it's two characters: backslash
and character "t".
so here it should error out?
(apply this to \n, \r, \b)
I don't have a good answer as to what should happen here. Special cases like this make Tomas' suggestion to change the in/out format more attractive.
pg_ndistinct_in(PG_FUNCTION_ARGS)
ending part should be:
freeJsonLexContext(lex);
if (result == JSON_SUCCESS)
{
......
}
else
{
ereturn(parse_state.escontext, (Datum) 0,
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed pg_ndistinct: \"%s\"", str),
errdetail("Must be valid JSON."));
PG_RETURN_NULL();
}
result should be either JSON_SUCCESS or anything else.
all these functions:
ndistinct_object_start, ndistinct_array_start,
ndistinct_object_field_start, ndistinct_array_element_start
have
ndistinctParseState *parse = state;
do we need to change it to
ndistinctParseState *parse = (ndistinctParseState *)state;
?
The compiler isn't complaining so far, but I see no harm in it.