Home > mailing lists

Re: list of extended statistics on psql - Mailing list pgsql-hackers

From	Tatsuro Yamada
Subject	Re: list of extended statistics on psql
Date	September 2, 2020 23:45:17
Msg-id	d63766ca-19fa-25a6-cb60-e61b6bbc7700@nttcom.co.jp_1 Whole thread Raw
In response to	Re: list of extended statistics on psql (Pavel Stehule <pavel.stehule@gmail.com>)
Responses	Re: list of extended statistics on psql
List	pgsql-hackers

Tree view

Hi,

>      >> I wonder if trying to list info about all stats from the statistics
>      >> object in a single line is necessary. Maybe we should split the info
>      >> into one line per statistics, so for example
>      >>
>      >>     CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ...
>      >>
>      >> would result in three lines in the \dX output. The statistics name would
>      >> identify which lines belong together, but other than that the pieces are
>      >> mostly independent.
>      >
>      >Yeah, that's what I'm suggesting.  I don't think we need to repeat the
>      >name/definition for each line though.
>      >
>      >It might be useful to know how does pspg show a single entry that's
>      >split in three lines, though.
>      >
> 
>     Ah, I didn't realize you're proposing that - I assumed it's broken
>     simply to make it readable, or something like that. I think the lines
>     are mostly independent, so I'd suggest to include the name of the object
>     on each line. The question is whether this independence will remain true
>     in the future - for example histograms would be built only on data not
>     represented by the MCV list, so there's a close dependency there.
> 
>     Not sure about pspg, and I'm not sure it matters too much.
> 
> 
> pspg almost ignores multiline rows - the horizontal cursor is one row every time. There is only one use case where
pspgdetects multiline rows - sorts, and pspg ensures correct content for multiline rows displayed in different (than
input)order.
 



I try to summarize the discussion so far.
Is my understanding right? Could you revise it if it has something wrong?


* Summary

   1. "\dX[+]" doesn't display the Size of extended stats since the size is
       useful only for the development process of the stats.

   2. each row should have stats name, definition, type, and status.
      For example:

      statname |   definition     |         type              |
     ----------+------------------+---------------------------+
      someobj  | (a, b) FROM tab  | n-distinct: built         |
      someobj  | (a, b) FROM tab  | func-dependencies: built  |
      someobj  | (a, b) FROM tab  | mcv: built                |
      sttshoge | (a, b) FROM hoge | n-distinct: required      |
      sttshoge | (a, b) FROM hoge | func-dependencies:required|
      sttscross| (a, b) FROM t1,t2| n-distinct: required      |


My opinion is below:

   For 1., Agreed. I will remove it on the next patch.
   For 2., I feel the design is not beautiful so I'd like to change it.
     The reasons are:

     - I think that even if we expected the number of types increasing two times,
        each type would be better to put as columns, not lines.
       Repeating items (the stats name and definition) should be removed.
       It's okay there are many columns in the future like "\df+" because we can
       use "\x" mode to display if we need it.

     - The type column has two kinds of data, the one is stats type and another
       is status. We know the word "One fact in One place" for data modeling in
       the RDBMS world so it would be better to divide it.
       I'd like to suggest the bellow design of the view.

      statname |   definition     | n-distinct | func-dependencies | mcv   |
     ----------+------------------+------------+-------------------+-------|
      someobj  | (a, b) FROM tab  | built      | built             | built |
      sttshoge | (a, b) FROM hoge | required   | required          |       |
      sttscross| (a, b) FROM t1,t2| required   |                   |       |


Any thoughts?


Thanks,
Tatsuro Yamada

pgsql-hackers by date:

From: Thomas Munro
Date: 02 September 2020, 23:30:50
Subject: Re: Two fsync related performance issues?

From: Thomas Munro
Date: 03 September 2020, 00:09:29
Subject: Re: Two fsync related performance issues?

Re: list of extended statistics on psql - Mailing list pgsql-hackers

Previous

Next