Thread: list of extended statistics on psql

list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi!

I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-D

I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.

Please find the attached patch.
Any comments are welcome!

For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;

CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;

postgres=# \dz
                     List of extended statistics
  Schema | Table | Name  | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
  public | t1    | stts1 | a, b    | f         | t            | f
  public | t1    | stts2 | a, b    | t         | t            | f
  public | t1    | stts3 | a, b    | t         | t            | t
  public | t2    | stts4 | b, c    | t         | t            | t
(4 rows)

postgres=# \?
...
   \dy     [PATTERN]      list event triggers
   \dz     [PATTERN]      list extended statistics
   \l[+]   [PATTERN]      list databases
...
=======================

For now, I haven't written a document and regression test for that.
I'll create it later.

Thanks,
Tatsuro Yamada



Attachment

Re: list of extended statistics on psql

From
Pavel Stehule
Date:


po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> napsal:
Hi!

I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-D

I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.

Please find the attached patch.
Any comments are welcome!

For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;

CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;

postgres=# \dz
                     List of extended statistics
  Schema | Table | Name  | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
  public | t1    | stts1 | a, b    | f         | t            | f
  public | t1    | stts2 | a, b    | t         | t            | f
  public | t1    | stts3 | a, b    | t         | t            | t
  public | t2    | stts4 | b, c    | t         | t            | t
(4 rows)

postgres=# \?
...
   \dy     [PATTERN]      list event triggers
   \dz     [PATTERN]      list extended statistics
   \l[+]   [PATTERN]      list databases
...
=======================

For now, I haven't written a document and regression test for that.
I'll create it later.

+1 good idea

Pavel


Thanks,
Tatsuro Yamada


Re: list of extended statistics on psql

From
Julien Rouhaud
Date:
On Mon, Aug 24, 2020 at 6:13 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> napsal:
>>
>> Hi!
>>
>> I created a POC patch that allows showing a list of extended statistics by
>> "\dz" command on psql. I believe this feature helps DBA and users who
>> would like to know all extended statistics easily. :-D
>>
>> I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
>> than "\dz" but they were already assigned. Therefore I used "\dz"
>> instead of them.
>>
>> Please find the attached patch.
>> Any comments are welcome!
>>
>> For Example:
>> =======================
>> CREATE TABLE t1 (a INT, b INT);
>> CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
>> CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
>> CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
>> ANALYZE t1;
>>
>> CREATE TABLE t2 (a INT, b INT, c INT);
>> CREATE STATISTICS stts4 ON b, c FROM t2;
>> ANALYZE t2;
>>
>> postgres=# \dz
>>                      List of extended statistics
>>   Schema | Table | Name  | Columns | Ndistinct | Dependencies | MCV
>> --------+-------+-------+---------+-----------+--------------+-----
>>   public | t1    | stts1 | a, b    | f         | t            | f
>>   public | t1    | stts2 | a, b    | t         | t            | f
>>   public | t1    | stts3 | a, b    | t         | t            | t
>>   public | t2    | stts4 | b, c    | t         | t            | t
>> (4 rows)
>>
>> postgres=# \?
>> ...
>>    \dy     [PATTERN]      list event triggers
>>    \dz     [PATTERN]      list extended statistics
>>    \l[+]   [PATTERN]      list databases
>> ...
>> =======================
>>
>> For now, I haven't written a document and regression test for that.
>> I'll create it later.
>
>
> +1 good idea

+1 that's a good idea.  Please add it to the next commitfest!

You have a typo:

+    if (pset.sversion < 10000)
+    {
+        char        sverbuf[32];
+
+        pg_log_error("The server (version %s) does not support
extended statistics.",
+                     formatPGVersionNumber(pset.sversion, false,
+                                           sverbuf, sizeof(sverbuf)));
+        return true;
+    }

the version test is missing a 0, the feature looks otherwise ok.

How about using \dX rather than \dz?



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi!

>> +1 good idea
> 
> +1 that's a good idea.  Please add it to the next commitfest!

Thanks!


> You have a typo:
> 
> +    if (pset.sversion < 10000)
> +    {
> +        char        sverbuf[32];
> +
> +        pg_log_error("The server (version %s) does not support
> extended statistics.",
> +                     formatPGVersionNumber(pset.sversion, false,
> +                                           sverbuf, sizeof(sverbuf)));
> +        return true;
> +    }
> 
> the version test is missing a 0, the feature looks otherwise ok.

Ouch, I fixed on the attached patch.

The new patch includes:

  - Fix the version number check (10000 -> 100000)
  - Fix query to get extended stats info for sort order
  - Add handling [Pattern] e.g \dz stts*
  - Add document and regression test for \dz
  
> How about using \dX rather than \dz?

Thanks for your suggestion!
I'll replace it if I got consensus. :-D

Thanks,
Tatsuro Yamada

Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Julien and Pavel!

>> How about using \dX rather than \dz?
> 
> Thanks for your suggestion!
> I'll replace it if I got consensus. :-D

>> How about using \dX rather than \dz?
>
>Thanks for your suggestion!
>I'll replace it if I got consensus. :-D


I re-read a help message of \d* commands and realized it's better to
use "\dX".
There are already cases where the commands differ due to differences
in case, so I did the same way. Please find attached patch. :-D
  
For example:
==========
   \da[S]  [PATTERN]      list aggregates
   \dA[+]  [PATTERN]      list access methods
==========

Attached patch uses "\dX" instead of "\dz":
==========
   \dx[+]  [PATTERN]      list extensions
   \dX     [PATTERN]      list extended statistics
==========

Results of regress test of the feature are the following:
==========
-- check printing info about extended statistics
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

\dX
                           List of extended statistics
  Schema | Table |   Name    |     Columns      | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
  public | hoge  | stts_hoge | col1, col2, col3 | t         | t            | t
  public | t1    | stts_1    | a, b             | f         | t            | f
  public | t1    | stts_2    | a, b             | t         | t            | f
  public | t1    | stts_3    | a, b             | t         | t            | t
  public | t2    | stts_4    | b, c             | t         | t            | t
(5 rows)

\dX stts_?
                     List of extended statistics
  Schema | Table |  Name  | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
  public | t1    | stts_1 | a, b    | f         | t            | f
  public | t1    | stts_2 | a, b    | t         | t            | f
  public | t1    | stts_3 | a, b    | t         | t            | t
  public | t2    | stts_4 | b, c    | t         | t            | t
(4 rows)

\dX *hoge
                           List of extended statistics
  Schema | Table |   Name    |     Columns      | Ndistinct | Dependencies | MCV
--------+-------+-----------+------------------+-----------+--------------+-----
  public | hoge  | stts_hoge | col1, col2, col3 | t         | t            | t
(1 row)
==========


Thanks,
Tatsuro Yamada



Attachment

Re: list of extended statistics on psql

From
Julien Rouhaud
Date:
Hi Yamada-san,

On Thu, Aug 27, 2020 at 03:13:09PM +0900, Tatsuro Yamada wrote:
> 
> I re-read a help message of \d* commands and realized it's better to
> use "\dX".
> There are already cases where the commands differ due to differences
> in case, so I did the same way. Please find attached patch. :-D
> For example:
> ==========
>   \da[S]  [PATTERN]      list aggregates
>   \dA[+]  [PATTERN]      list access methods
> ==========
> 
> Attached patch uses "\dX" instead of "\dz":
> ==========
>   \dx[+]  [PATTERN]      list extensions
>   \dX     [PATTERN]      list extended statistics
> ==========


Thanks for updating the patch!  This alias will probably be easier to remember.


> 
> Results of regress test of the feature are the following:
> ==========
> -- check printing info about extended statistics
> create table t1 (a int, b int);
> create statistics stts_1 (dependencies) on a, b from t1;
> create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
> create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
> create table t2 (a int, b int, c int);
> create statistics stts_4 on b, c from t2;
> create table hoge (col1 int, col2 int, col3 int);
> create statistics stts_hoge on col1, col2, col3 from hoge;
> 
> \dX
>                           List of extended statistics
>  Schema | Table |   Name    |     Columns      | Ndistinct | Dependencies | MCV
> --------+-------+-----------+------------------+-----------+--------------+-----
>  public | hoge  | stts_hoge | col1, col2, col3 | t         | t            | t
>  public | t1    | stts_1    | a, b             | f         | t            | f
>  public | t1    | stts_2    | a, b             | t         | t            | f
>  public | t1    | stts_3    | a, b             | t         | t            | t
>  public | t2    | stts_4    | b, c             | t         | t            | t
> (5 rows)
> 
> \dX stts_?
>                     List of extended statistics
>  Schema | Table |  Name  | Columns | Ndistinct | Dependencies | MCV
> --------+-------+--------+---------+-----------+--------------+-----
>  public | t1    | stts_1 | a, b    | f         | t            | f
>  public | t1    | stts_2 | a, b    | t         | t            | f
>  public | t1    | stts_3 | a, b    | t         | t            | t
>  public | t2    | stts_4 | b, c    | t         | t            | t
> (4 rows)
> 
> \dX *hoge
>                           List of extended statistics
>  Schema | Table |   Name    |     Columns      | Ndistinct | Dependencies | MCV
> --------+-------+-----------+------------------+-----------+--------------+-----
>  public | hoge  | stts_hoge | col1, col2, col3 | t         | t            | t
> (1 row)
> ==========


Thanks also for the documentation and regression tests.  This overall looks
good, I just have a two comments:

- there's a whitespace issue in the documentation part:

add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent.
      <varlistentry>
warning: 1 line adds whitespace errors.

- You're sorting the output on schema, table, extended statistics and columns
  but I think the last one isn't required since extended statistics names are
  unique.



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Julien!
  

> Thanks also for the documentation and regression tests.  This overall looks
> good, I just have a two comments:


Thank you for reviewing the patch! :-D


> - there's a whitespace issue in the documentation part:
> 
> add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent.
>       <varlistentry>
> warning: 1 line adds whitespace errors.


Oops, I forgot to use "git diff --check". I fixed it.

  
> - You're sorting the output on schema, table, extended statistics and columns
>    but I think the last one isn't required since extended statistics names are
>    unique.


You are right.
The sort key "columns" was not necessary so I removed it.

Attached new patch includes the above two fixes:

   - Fix whitespace issue in the documentation part
   - Remove unnecessary sort key from the query
      (ORDER BY 1, 2, 3, 4 -> ORDER BY 1, 2, 3)


Thanks,
Tatsuro Yamada


Attachment

Re: list of extended statistics on psql

From
Alvaro Herrera
Date:
+1 for the general idea, and +1 for \dX being the syntax to use

IMO the per-type columns should show both the type being enabled as
well as it being built.

(How many more stat types do we expect -- Tomas?  I wonder if having one
column per type is going to scale in the long run.)

Also, the stat obj name column should be first, followed by a single
column listing both table and columns that it applies to.  Keep in mind
that in the future we might want to add stats that cross multiple tables
-- that's why the CREATE syntax is the way it is.  So we should give
room for that in psql's display too.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Alvaro!

It's been ages since we created a progress reporting feature together. :-D

>>> +1 good idea
>>
>> +1 that's a good idea.  Please add it to the next commitfest!
>
>+1 for the general idea, and +1 for \dX being the syntax to use

Thank you for voting!


> IMO the per-type columns should show both the type being enabled as
well as it being built.

Hmm. I'm not sure how to get the status (enabled or disabled) of
extended stats. :(
Could you explain it more?


> Also, the stat obj name column should be first, followed by a single
> column listing both table and columns that it applies to.  Keep in mind
> that in the future we might want to add stats that cross multiple tables
> -- that's why the CREATE syntax is the way it is.  So we should give
> room for that in psql's display too.

I understand your suggestions are the following, right?

* The Current column order:
===================
   Schema | Table |  Name  | Columns | Ndistinct | Dependencies | MCV
--------+-------+--------+---------+-----------+--------------+-----
   public | t1    | stts_1 | a, b    | f         | t            | f
   public | t1    | stts_2 | a, b    | t         | t            | f
   public | t1    | stts_3 | a, b    | t         | t            | t
   public | t2    | stts_4 | b, c    | t         | t            | t
===================

* The suggested column order is like this:
===================
    Name    | Schema | Table |     Columns      | Ndistinct | Dependencies | MCV
-----------+--------+-------+------------------+-----------+--------------+-----
  stts_1    | public | t1    | a, b             | f         | t            | f
  stts_2    | public | t1    | a, b             | t         | t            | f
  stts_3    | public | t1    | a, b             | t         | t            | t
  stts_4    | public | t2    | b, c             | t         | t            | t
===================

*  In the future, Extended stats that cross multiple tables will be
    shown maybe... (t1, t2):
===================
    Name    | Schema | Table  |     Columns      | Ndistinct | Dependencies | MCV
-----------+--------+--------+------------------+-----------+--------------+-----
  stts_5    | public | t1, t2 | a, b             | f         | t            | f
===================

If so, I can revise the column order as you suggested easily.
However, I have no idea how to show extended stats that cross
multiple tables and the status now.

I suppose that the current column order is sufficient if there is
no improvement of extended stats on PG14. Do you know any plan to
improve extended stats such as to allow it to cross multiple tables on PG14?


In addition,
Currently, I use this query to get Extended stats info from pg_statistic_ext.

         SELECT
         stxnamespace::pg_catalog.regnamespace AS "Schema",
         c.relname AS "Table",
         stxname AS "Name",
         (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
          FROM pg_catalog.unnest(stxkeys) s(attnum)
          JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
          a.attnum = s.attnum AND NOT attisdropped)) AS "Columns",
         'd' = any(stxkind) AS "Ndistinct",
         'f' = any(stxkind) AS "Dependencies",
         'm' = any(stxkind) AS "MCV"
         FROM pg_catalog.pg_statistic_ext
         INNER JOIN pg_catalog.pg_class c
         ON stxrelid = c.oid
         ORDER BY 1, 2, 3;

Thanks,
Tatsuro Yamada






Re: list of extended statistics on psql

From
Alvaro Herrera
Date:
On 2020-Aug-28, Tatsuro Yamada wrote:

> > IMO the per-type columns should show both the type being enabled as
> > well as it being built.
> 
> Hmm. I'm not sure how to get the status (enabled or disabled) of
> extended stats. :(
> Could you explain it more?

pg_statistic_ext_data.stxdndistinct is not null if the stats have been
built.  (I'm not sure whether there's an easier way to determine this.)


> * The suggested column order is like this:
> ===================
>    Name    | Schema | Table |     Columns      | Ndistinct | Dependencies | MCV
> -----------+--------+-------+------------------+-----------+--------------+-----
>  stts_1    | public | t1    | a, b             | f         | t            | f
>  stts_2    | public | t1    | a, b             | t         | t            | f
>  stts_3    | public | t1    | a, b             | t         | t            | t
>  stts_4    | public | t2    | b, c             | t         | t            | t
> ===================

I suggest to do this

    Name    | Schema | Definition               | Ndistinct | Dependencies | MCV
 -----------+--------+--------------------------+-----------+--------------+-----
  stts_1    | public | (a, b) FROM t1           | f         | t            | f

> I suppose that the current column order is sufficient if there is
> no improvement of extended stats on PG14. Do you know any plan to
> improve extended stats such as to allow it to cross multiple tables on PG14?

I suggest that changing it in the future is going to be an uphill
battle, so better get it right from the get go, without requiring a
future restructure.

> In addition,
> Currently, I use this query to get Extended stats info from pg_statistic_ext.

Maybe something like this would do

SELECT
 stxnamespace::pg_catalog.regnamespace AS "Schema",
 stxname AS "Name",
 format('%s FROM %s',
 (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
  FROM pg_catalog.unnest(stxkeys) s(attnum)
  JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
  a.attnum = s.attnum AND NOT attisdropped)),
  stxrelid::regclass) AS "Definition",
  CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS
"n-distinct",
  CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS
"functionaldependencies",
 
  CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv
 FROM pg_catalog.pg_statistic_ext es
 INNER JOIN pg_catalog.pg_class c
 ON stxrelid = c.oid
 LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
 ORDER BY 1, 2, 3;

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On Thu, Aug 27, 2020 at 07:53:23PM -0400, Alvaro Herrera wrote:
>+1 for the general idea, and +1 for \dX being the syntax to use
>
>IMO the per-type columns should show both the type being enabled as
>well as it being built.
>
>(How many more stat types do we expect -- Tomas?  I wonder if having one
>column per type is going to scale in the long run.)
>

I wouldn't expect a huge number of types. I can imagine maybe twice the
current number of types, but not much more. But I'm not sure the output
is easy to read even now ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On Thu, Aug 27, 2020 at 11:26:17PM -0400, Alvaro Herrera wrote:
>On 2020-Aug-28, Tatsuro Yamada wrote:
>
>> > IMO the per-type columns should show both the type being enabled as
>> > well as it being built.
>>
>> Hmm. I'm not sure how to get the status (enabled or disabled) of
>> extended stats. :(
>> Could you explain it more?
>
>pg_statistic_ext_data.stxdndistinct is not null if the stats have been
>built.  (I'm not sure whether there's an easier way to determine this.)
>

It's the only way, I think. Which types were requested is stored in

    pg_statistic_ext.stxkind

and what was built is in pg_statistic_ext_data. But if we want the
output to show both what was requested and which types were actually
built, that'll effectively double the number of columns needed :-(

Also, it might be useful to show the size of the statistics built, just
like we show for \d+ etc.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Alvaro Herrera
Date:
On 2020-Aug-29, Tomas Vondra wrote:

> But if we want the
> output to show both what was requested and which types were actually
> built, that'll effectively double the number of columns needed :-(

I was thinking it would be one column per type showing either disabled or enabled
or built.  But another idea is to show one type per line that's at least
enabled.

> Also, it might be useful to show the size of the statistics built, just
> like we show for \d+ etc.

\dX+  I  suppose?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
>On 2020-Aug-29, Tomas Vondra wrote:
>
>> But if we want the
>> output to show both what was requested and which types were actually
>> built, that'll effectively double the number of columns needed :-(
>
>I was thinking it would be one column per type showing either disabled or enabled
>or built.  But another idea is to show one type per line that's at least
>enabled.
>
>> Also, it might be useful to show the size of the statistics built, just
>> like we show for \d+ etc.
>
>\dX+  I  suppose?
>

Right. I've only used \d+ as an example of an existing command showing
sizes of the objects.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Alvaro Herrera
Date:
On 2020-Aug-30, Tomas Vondra wrote:

> On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
> > On 2020-Aug-29, Tomas Vondra wrote:

> > > Also, it might be useful to show the size of the statistics built, just
> > > like we show for \d+ etc.
> > 
> > \dX+  I  suppose?
> 
> Right. I've only used \d+ as an example of an existing command showing
> sizes of the objects.

Yeah, I understood it that way too.

How can you measure the size of the stat objects in a query?  Are you
thinking in pg_column_size()?

I wonder how to report that.  Knowing that psql \-commands are not meant
for anything other than human consumption, maybe we can use a format()
string that says "built: %d bytes" when \dX+ is used (for each stat type),
and just "built" when \dX is used.  What do people think about this?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
>On 2020-Aug-30, Tomas Vondra wrote:
>
>> On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote:
>> > On 2020-Aug-29, Tomas Vondra wrote:
>
>> > > Also, it might be useful to show the size of the statistics built, just
>> > > like we show for \d+ etc.
>> >
>> > \dX+  I  suppose?
>>
>> Right. I've only used \d+ as an example of an existing command showing
>> sizes of the objects.
>
>Yeah, I understood it that way too.
>
>How can you measure the size of the stat objects in a query?  Are you
>thinking in pg_column_size()?
>

Either that or simply length() on the bytea value.

>I wonder how to report that.  Knowing that psql \-commands are not meant
>for anything other than human consumption, maybe we can use a format()
>string that says "built: %d bytes" when \dX+ is used (for each stat type),
>and just "built" when \dX is used.  What do people think about this?
>

I'd use the same approach as \d+, i.e. a separate column with the size.
Maybe that'd mean too many columns, though.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
>> I wonder how to report that.  Knowing that psql \-commands are not meant
>> for anything other than human consumption, maybe we can use a format()
>> string that says "built: %d bytes" when \dX+ is used (for each stat type),
>> and just "built" when \dX is used.  What do people think about this?

Seems a little too cute to me.

> I'd use the same approach as \d+, i.e. a separate column with the size.
> Maybe that'd mean too many columns, though.

psql already has \d commands with so many columns that you pretty much
have to use \x mode to make them legible; \df+ for instance.  I don't
mind if \dX+ is also in that territory.  It'd be good though if plain
\dX can fit in a normal terminal window.

            regards, tom lane



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Alvaro,

>>> IMO the per-type columns should show both the type being enabled as
>>> well as it being built.
>>
>> Hmm. I'm not sure how to get the status (enabled or disabled) of
>> extended stats. :(
>> Could you explain it more?
>
> pg_statistic_ext_data.stxdndistinct is not null if the stats have been
> built. (I'm not sure whether there's an easier way to determine this.)


Ah.. I see! Thank you.


> I suggest to do this
>
>    Name    | Schema | Definition               | Ndistinct | Dependencies | MCV
> -----------+--------+--------------------------+-----------+--------------+-----
>  stts_1    | public | (a, b) FROM t1           | f         | t            | f
>
>> I suppose that the current column order is sufficient if there is
>> no improvement of extended stats on PG14. Do you know any plan to
>> improve extended stats such as to allow it to cross multiple tables on PG14?
>
> I suggest that changing it in the future is going to be an uphill
> battle, so better get it right from the get go, without requiring a
> future restructure.


I understand your suggestions. I'll replace "Columns" and "Table" columns with "Definition" column.


>> Currently, I use this query to get Extended stats info from pg_statistic_ext.
>
> Maybe something like this would do
>
> SELECT
> stxnamespace::pg_catalog.regnamespace AS "Schema",
> stxname AS "Name",
> format('%s FROM %s',
>  (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
>   FROM pg_catalog.unnest(stxkeys) s(attnum)
>   JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
>   a.attnum = s.attnum AND NOT attisdropped)),
>   stxrelid::regclass) AS "Definition",
>   CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS
"n-distinct",
>   CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS
"functionaldependencies",
 
>   CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv
>  FROM pg_catalog.pg_statistic_ext es
>  INNER JOIN pg_catalog.pg_class c
>  ON stxrelid = c.oid
>  LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid
>  ORDER BY 1, 2, 3;

Great! It helped me a lot to understand your suggestions correctly. Thanks. :-D
I got the below results by your query.

========
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;
create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;
create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

insert into t1 select i,i from generate_series(1,100) i;
analyze t1;


Your query gave this result:

  Schema |   Name    |         Definition         |     n-distinct     | functional dependencies |        mcv
--------+-----------+----------------------------+--------------------+-------------------------+--------------------
  public | stts_1    | a, b FROM t1               |                    | built                   |
  public | stts_2    | a, b FROM t1               | built              | built                   |
  public | stts_3    | a, b FROM t1               | built              | built                   | built
  public | stts_4    | b, c FROM t2               | enabled, not built | enabled, not built      | enabled, not built
  public | stts_hoge | col1, col2, col3 FROM hoge | enabled, not built | enabled, not built      | enabled, not built
(5 rows)
========

I guess "enabled, not built" is a little redundant. The status would better to
have three patterns: "built", "not built" or nothing (NULL) like these:

   - "built":  extended stats is defined and built (collected by analyze cmd)
   - "not built": extended stats is defined but have not built yet
   - nothing (NULL): extended stats is not defined

What do you think about it?


I will send a new patch including :

   - Replace "Columns" and "Table" column with "Definition"
   - Show the status (built/not built/null) of extended stats by using
     pg_statistic_ext_data

Thanks,
Tatsuro Yamada






Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
On 2020/08/31 1:59, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
>>> I wonder how to report that.  Knowing that psql \-commands are not meant
>>> for anything other than human consumption, maybe we can use a format()
>>> string that says "built: %d bytes" when \dX+ is used (for each stat type),
>>> and just "built" when \dX is used.  What do people think about this?
> 
> Seems a little too cute to me.
> 
>> I'd use the same approach as \d+, i.e. a separate column with the size.
>> Maybe that'd mean too many columns, though.
> 
> psql already has \d commands with so many columns that you pretty much
> have to use \x mode to make them legible; \df+ for instance.  I don't
> mind if \dX+ is also in that territory.  It'd be good though if plain
> \dX can fit in a normal terminal window.


Hmm. How about these instead of "built: %d bytes"?
I added three columns (N_size, D_size, M_size) to show size. See below:

===================
  postgres=# \dX
                                List of extended statistics
  Schema |   Name    |         Definition         | N_distinct | Dependencies |    Mcv
--------+-----------+----------------------------+------------+--------------+-----------
  public | stts_1    | a, b FROM t1               |            | built        |
  public | stts_2    | a, b FROM t1               | built      | built        |
  public | stts_3    | a, b FROM t1               | built      | built        | built
  public | stts_4    | b, c FROM t2               | not built  | not built    | not built
  public | stts_hoge | col1, col2, col3 FROM hoge | not built  | not built    | not built
(5 rows)

postgres=# \dX+
                                             List of extended statistics
  Schema |   Name    |         Definition         | N_distinct | Dependencies |    Mcv    | N_size | D_size | M_size
--------+-----------+----------------------------+------------+--------------+-----------+--------+--------+--------
  public | stts_1    | a, b FROM t1               |            | built        |           |        |     40 |
  public | stts_2    | a, b FROM t1               | built      | built        |           |     13 |     40 |
  public | stts_3    | a, b FROM t1               | built      | built        | built     |     13 |     40 |   6126
  public | stts_4    | b, c FROM t2               | not built  | not built    | not built |        |        |
  public | stts_hoge | col1, col2, col3 FROM hoge | not built  | not built    | not built |        |        |
===================

I used this query to get results of "\dX+".
===================
         SELECT
          stxnamespace::pg_catalog.regnamespace AS "Schema",
          stxname AS "Name",
          format('%s FROM %s',
            (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
             FROM pg_catalog.unnest(stxkeys) s(attnum)
             JOIN pg_catalog.pg_attribute a
             ON (stxrelid = a.attrelid
             AND a.attnum = s.attnum
             AND NOT attisdropped)),
          stxrelid::regclass) AS "Definition",
          CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built'
               WHEN 'd' = any(stxkind) THEN 'not built'
          END AS "N_distinct",
          CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built'
               WHEN 'f' = any(stxkind) THEN 'not built'
          END AS "Dependencies",
          CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built'
               WHEN 'm' = any(stxkind) THEN 'not built'
          END AS "Mcv",
        pg_catalog.length(stxdndistinct) AS "N_size",
        pg_catalog.length(stxddependencies) AS "D_size",
        pg_catalog.length(stxdmcv) AS "M_size"
        FROM pg_catalog.pg_statistic_ext es
        INNER JOIN pg_catalog.pg_class c
        ON stxrelid = c.oid
        LEFT JOIN pg_catalog.pg_statistic_ext_data esd
        ON es.oid = esd.stxoid
        ORDER BY 1, 2;
===================
  

Attached patch includes:

    - Replace "Columns" and "Table" column with "Definition"
    - Show the status (built/not built/null) of extended stats by
      using pg_statistic_ext_data
    - Add "\dX+" command to show size of extended stats

Please find the attached file! :-D


Thanks,
Tatsuro Yamada


Attachment

Re: list of extended statistics on psql

From
Alvaro Herrera
Date:
On 2020-Aug-30, Tomas Vondra wrote:

> On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:

> > I wonder how to report that.  Knowing that psql \-commands are not meant
> > for anything other than human consumption, maybe we can use a format()
> > string that says "built: %d bytes" when \dX+ is used (for each stat type),
> > and just "built" when \dX is used.  What do people think about this?
> 
> I'd use the same approach as \d+, i.e. a separate column with the size.
> Maybe that'd mean too many columns, though.

Are you thinking in one size for all stats, or a combined size?  If the
former, then yes it'd be too many columns.

I'm trying to figure out what can the user *do* with that data.  Can
they make the sample size smaller/bigger if the stats data is too large?
Can they do that for each individual stats type?  If so, it'd make sense
to list each type's size separately.

If we do put each type in its own row -- at least "logical" row, say
string_agg(unnest(array_of_types), '\n') -- then we can put the size of each type
in a separate column with string_agg(unnest(array_of_sizes), '\n') 

 statname |   definition    |         type             |  size
----------+-----------------+--------------------------+-----------
 someobj  | (a, b) FROM tab | n-distinct: built        | 2000 bytes
                            | func-dependencies: built | 4000 bytes
 another  | (a, c) FROM tab | n-distint: enabled       | <null>


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> If we do put each type in its own row -- at least "logical" row, say
> string_agg(unnest(array_of_types), '\n') -- then we can put the size of each type
> in a separate column with string_agg(unnest(array_of_sizes), '\n')

>  statname |   definition    |         type             |  size
> ----------+-----------------+--------------------------+-----------
>  someobj  | (a, b) FROM tab | n-distinct: built        | 2000 bytes
>                             | func-dependencies: built | 4000 bytes
>  another  | (a, c) FROM tab | n-distint: enabled       | <null>

I guess I'm wondering why the size is of such interest that we
need it at all here.

            regards, tom lane



Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On Mon, Aug 31, 2020 at 10:58:11AM -0400, Tom Lane wrote:
>Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> If we do put each type in its own row -- at least "logical" row, say
>> string_agg(unnest(array_of_types), '\n') -- then we can put the size of each type
>> in a separate column with string_agg(unnest(array_of_sizes), '\n')
>
>>  statname |   definition    |         type             |  size
>> ----------+-----------------+--------------------------+-----------
>>  someobj  | (a, b) FROM tab | n-distinct: built        | 2000 bytes
>>                             | func-dependencies: built | 4000 bytes
>>  another  | (a, c) FROM tab | n-distint: enabled       | <null>
>
>I guess I'm wondering why the size is of such interest that we
>need it at all here.
>

I agree it may not be important enough. I did use it during development
etc. but maybe it's not something we need to include in this list (even
if it's just in the \dX+ variant).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On Mon, Aug 31, 2020 at 10:28:38AM -0400, Alvaro Herrera wrote:
>On 2020-Aug-30, Tomas Vondra wrote:
>
>> On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote:
>
>> > I wonder how to report that.  Knowing that psql \-commands are not meant
>> > for anything other than human consumption, maybe we can use a format()
>> > string that says "built: %d bytes" when \dX+ is used (for each stat type),
>> > and just "built" when \dX is used.  What do people think about this?
>>
>> I'd use the same approach as \d+, i.e. a separate column with the size.
>> Maybe that'd mean too many columns, though.
>
>Are you thinking in one size for all stats, or a combined size?  If the
>former, then yes it'd be too many columns.
>

I wonder if trying to list info about all stats from the statistics
object in a single line is necessary. Maybe we should split the info
into one line per statistics, so for example

     CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ...

would result in three lines in the \dX output. The statistics name would
identify which lines belong together, but other than that the pieces are
mostly independent.

This would make it somewhat future-proof in case we add more statistics
types, because the number of columns would not increase. OTOH maybe it's
pointless and/or against the purpose of listing statistics objects.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Alvaro Herrera
Date:
On 2020-Aug-31, Tomas Vondra wrote:

> I wonder if trying to list info about all stats from the statistics
> object in a single line is necessary. Maybe we should split the info
> into one line per statistics, so for example
> 
>     CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ...
> 
> would result in three lines in the \dX output. The statistics name would
> identify which lines belong together, but other than that the pieces are
> mostly independent.

Yeah, that's what I'm suggesting.  I don't think we need to repeat the
name/definition for each line though.

It might be useful to know how does pspg show a single entry that's
split in three lines, though.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On Mon, Aug 31, 2020 at 12:18:09PM -0400, Alvaro Herrera wrote:
>On 2020-Aug-31, Tomas Vondra wrote:
>
>> I wonder if trying to list info about all stats from the statistics
>> object in a single line is necessary. Maybe we should split the info
>> into one line per statistics, so for example
>>
>>     CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ...
>>
>> would result in three lines in the \dX output. The statistics name would
>> identify which lines belong together, but other than that the pieces are
>> mostly independent.
>
>Yeah, that's what I'm suggesting.  I don't think we need to repeat the
>name/definition for each line though.
>
>It might be useful to know how does pspg show a single entry that's
>split in three lines, though.
>

Ah, I didn't realize you're proposing that - I assumed it's broken
simply to make it readable, or something like that. I think the lines
are mostly independent, so I'd suggest to include the name of the object
on each line. The question is whether this independence will remain true
in the future - for example histograms would be built only on data not
represented by the MCV list, so there's a close dependency there.

Not sure about pspg, and I'm not sure it matters too much.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: list of extended statistics on psql

From
Pavel Stehule
Date:


po 31. 8. 2020 v 18:32 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
On Mon, Aug 31, 2020 at 12:18:09PM -0400, Alvaro Herrera wrote:
>On 2020-Aug-31, Tomas Vondra wrote:
>
>> I wonder if trying to list info about all stats from the statistics
>> object in a single line is necessary. Maybe we should split the info
>> into one line per statistics, so for example
>>
>>     CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ...
>>
>> would result in three lines in the \dX output. The statistics name would
>> identify which lines belong together, but other than that the pieces are
>> mostly independent.
>
>Yeah, that's what I'm suggesting.  I don't think we need to repeat the
>name/definition for each line though.
>
>It might be useful to know how does pspg show a single entry that's
>split in three lines, though.
>

Ah, I didn't realize you're proposing that - I assumed it's broken
simply to make it readable, or something like that. I think the lines
are mostly independent, so I'd suggest to include the name of the object
on each line. The question is whether this independence will remain true
in the future - for example histograms would be built only on data not
represented by the MCV list, so there's a close dependency there.

Not sure about pspg, and I'm not sure it matters too much.

pspg almost ignores multiline rows - the horizontal cursor is one row every time. There is only one use case where pspg detects multiline rows - sorts, and pspg ensures correct content for multiline rows displayed in different (than input) order.

Regards

Pavel



regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi,

>      >> I wonder if trying to list info about all stats from the statistics
>      >> object in a single line is necessary. Maybe we should split the info
>      >> into one line per statistics, so for example
>      >>
>      >>     CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ...
>      >>
>      >> would result in three lines in the \dX output. The statistics name would
>      >> identify which lines belong together, but other than that the pieces are
>      >> mostly independent.
>      >
>      >Yeah, that's what I'm suggesting.  I don't think we need to repeat the
>      >name/definition for each line though.
>      >
>      >It might be useful to know how does pspg show a single entry that's
>      >split in three lines, though.
>      >
> 
>     Ah, I didn't realize you're proposing that - I assumed it's broken
>     simply to make it readable, or something like that. I think the lines
>     are mostly independent, so I'd suggest to include the name of the object
>     on each line. The question is whether this independence will remain true
>     in the future - for example histograms would be built only on data not
>     represented by the MCV list, so there's a close dependency there.
> 
>     Not sure about pspg, and I'm not sure it matters too much.
> 
> 
> pspg almost ignores multiline rows - the horizontal cursor is one row every time. There is only one use case where
pspgdetects multiline rows - sorts, and pspg ensures correct content for multiline rows displayed in different (than
input)order.
 



I try to summarize the discussion so far.
Is my understanding right? Could you revise it if it has something wrong?


* Summary

   1. "\dX[+]" doesn't display the Size of extended stats since the size is
       useful only for the development process of the stats.

   2. each row should have stats name, definition, type, and status.
      For example:

      statname |   definition     |         type              |
     ----------+------------------+---------------------------+
      someobj  | (a, b) FROM tab  | n-distinct: built         |
      someobj  | (a, b) FROM tab  | func-dependencies: built  |
      someobj  | (a, b) FROM tab  | mcv: built                |
      sttshoge | (a, b) FROM hoge | n-distinct: required      |
      sttshoge | (a, b) FROM hoge | func-dependencies:required|
      sttscross| (a, b) FROM t1,t2| n-distinct: required      |


My opinion is below:

   For 1., Agreed. I will remove it on the next patch.
   For 2., I feel the design is not beautiful so I'd like to change it.
     The reasons are:

     - I think that even if we expected the number of types increasing two times,
        each type would be better to put as columns, not lines.
       Repeating items (the stats name and definition) should be removed.
       It's okay there are many columns in the future like "\df+" because we can
       use "\x" mode to display if we need it.

     - The type column has two kinds of data, the one is stats type and another
       is status. We know the word "One fact in One place" for data modeling in
       the RDBMS world so it would be better to divide it.
       I'd like to suggest the bellow design of the view.

      statname |   definition     | n-distinct | func-dependencies | mcv   |
     ----------+------------------+------------+-------------------+-------|
      someobj  | (a, b) FROM tab  | built      | built             | built |
      sttshoge | (a, b) FROM hoge | required   | required          |       |
      sttscross| (a, b) FROM t1,t2| required   |                   |       |


Any thoughts?


Thanks,
Tatsuro Yamada






Re: list of extended statistics on psql

From
Michael Paquier
Date:
On Thu, Sep 03, 2020 at 08:45:17AM +0900, Tatsuro Yamada wrote:
> I try to summarize the discussion so far.

Could you provide at least a rebased version of the patch?  The CF bot
is complaning here.
--
Michael

Attachment

Re: list of extended statistics on psql

From
Michael Paquier
Date:
On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote:
> Could you provide at least a rebased version of the patch?  The CF bot
> is complaning here.

Not seeing this answered after two weeks, I have marked the patch as
RwF for now.
--
Michael

Attachment

Re: [spam] Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Michael-san and Hackers,

On 2020/09/30 15:19, Michael Paquier wrote:
> On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote:
>> Could you provide at least a rebased version of the patch?  The CF bot
>> is complaning here.
> 
> Not seeing this answered after two weeks, I have marked the patch as
> RwF for now.
> --
> Michael


Sorry for the delayed reply.

I re-based the patch on the current head and did some
refactoring.
I think the size of extended stats are not useful for DBA.
Should I remove it?

Changes:
========
   - Use a keyword "defined" instead of "not built"
   - Use COALESCE function for size for extended stats

Results of \dX and \dX+:
========================
postgres=# \dX
                            List of extended statistics
    Schema    |   Name    |   Definition    | N_distinct | Dependencies |   Mcv
-------------+-----------+-----------------+------------+--------------+---------
  public      | hoge1_ext | a, b FROM hoge1 | defined    | defined      | defined
  hoge1schema | hoge1_ext | a, b FROM hoge1 | built      | built        | built
(2 rows)

postgres=# \dX+
                                         List of extended statistics
    Schema    |   Name    |   Definition    | N_distinct | Dependencies |   Mcv   | N_size | D_size | M_size
-------------+-----------+-----------------+------------+--------------+---------+--------+--------+--------
  public      | hoge1_ext | a, b FROM hoge1 | defined    | defined      | defined |      0 |      0 |      0
  hoge1schema | hoge1_ext | a, b FROM hoge1 | built      | built        | built   |     13 |     40 |   6126
(2 rows)

Query of \dX+:
==============
         SELECT
         stxnamespace::pg_catalog.regnamespace AS "Schema",
         stxname AS "Name",
         pg_catalog.format('%s FROM %s',
           (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ')
            FROM pg_catalog.unnest(es.stxkeys) s(attnum)
            JOIN pg_catalog.pg_attribute a
            ON (es.stxrelid = a.attrelid
            AND a.attnum = s.attnum
            AND NOT a.attisdropped)),
         es.stxrelid::regclass) AS "Definition",
         CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built'
              WHEN 'd' = any(stxkind) THEN 'defined'
         END AS "N_distinct",
         CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built'
              WHEN 'f' = any(stxkind) THEN 'defined'
         END AS "Dependencies",
         CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built'
              WHEN 'm' = any(stxkind) THEN 'defined'
         END AS "Mcv",
         COALESCE(pg_catalog.length(stxdndistinct), 0) AS "N_size",
         COALESCE(pg_catalog.length(stxddependencies), 0) AS "D_size",
         COALESCE(pg_catalog.length(stxdmcv), 0) AS "M_size"
         FROM pg_catalog.pg_statistic_ext es
         LEFT JOIN pg_catalog.pg_statistic_ext_data esd
         ON es.oid = esd.stxoid
         INNER JOIN pg_catalog.pg_class c
         ON es.stxrelid = c.oid
         ORDER BY 1, 2;


Regards,
Tatsuro Yamada

Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Michael-san and Hackers,

On 2020/09/30 15:19, Michael Paquier wrote:
> On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote:
>> Could you provide at least a rebased version of the patch?  The CF bot
>> is complaning here.
> 
> Not seeing this answered after two weeks, I have marked the patch as
> RwF for now.
> --
> Michael


Sorry for the delayed reply.

I re-based the patch on the current head and did some
refactoring.
I think the size of extended stats are not useful for DBA.
Should I remove it?

Changes:
========
    - Use a keyword "defined" instead of "not built"
    - Use COALESCE function for size for extended stats

Results of \dX and \dX+:
========================
postgres=# \dX
                             List of extended statistics
     Schema    |   Name    |   Definition    | N_distinct | Dependencies |   Mcv
-------------+-----------+-----------------+------------+--------------+---------
   public      | hoge1_ext | a, b FROM hoge1 | defined    | defined      | defined
   hoge1schema | hoge1_ext | a, b FROM hoge1 | built      | built        | built
(2 rows)

postgres=# \dX+
                                          List of extended statistics
     Schema    |   Name    |   Definition    | N_distinct | Dependencies |   Mcv   | N_size | D_size | M_size
-------------+-----------+-----------------+------------+--------------+---------+--------+--------+--------
   public      | hoge1_ext | a, b FROM hoge1 | defined    | defined      | defined |      0 |      0 |      0
   hoge1schema | hoge1_ext | a, b FROM hoge1 | built      | built        | built   |     13 |     40 |   6126
(2 rows)

Query of \dX+:
==============
          SELECT
          stxnamespace::pg_catalog.regnamespace AS "Schema",
          stxname AS "Name",
          pg_catalog.format('%s FROM %s',
            (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ')
             FROM pg_catalog.unnest(es.stxkeys) s(attnum)
             JOIN pg_catalog.pg_attribute a
             ON (es.stxrelid = a.attrelid
             AND a.attnum = s.attnum
             AND NOT a.attisdropped)),
          es.stxrelid::regclass) AS "Definition",
          CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built'
               WHEN 'd' = any(stxkind) THEN 'defined'
          END AS "N_distinct",
          CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built'
               WHEN 'f' = any(stxkind) THEN 'defined'
          END AS "Dependencies",
          CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built'
               WHEN 'm' = any(stxkind) THEN 'defined'
          END AS "Mcv",
          COALESCE(pg_catalog.length(stxdndistinct), 0) AS "N_size",
          COALESCE(pg_catalog.length(stxddependencies), 0) AS "D_size",
          COALESCE(pg_catalog.length(stxdmcv), 0) AS "M_size"
          FROM pg_catalog.pg_statistic_ext es
          LEFT JOIN pg_catalog.pg_statistic_ext_data esd
          ON es.oid = esd.stxoid
          INNER JOIN pg_catalog.pg_class c
          ON es.stxrelid = c.oid
          ORDER BY 1, 2;


Regards,
Tatsuro Yamada


Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi,

> Results of \dX and \dX+:
> ========================
> postgres=# \dX
>                              List of extended statistics
>      Schema    |   Name    |   Definition    | N_distinct | Dependencies |   Mcv
> -------------+-----------+-----------------+------------+--------------+---------
>    public      | hoge1_ext | a, b FROM hoge1 | defined    | defined      | defined
>    hoge1schema | hoge1_ext | a, b FROM hoge1 | built      | built        | built
> (2 rows)


I used "Order by 1, 2" on the query but I realized the ordering of
result was wrong so I fixed on the attached patch.
Please fined the patch file. :-D

Regards,
Tatsuro Yamada



Attachment

Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On Wed, Oct 28, 2020 at 03:07:56PM +0900, Tatsuro Yamada wrote:
>Hi Michael-san and Hackers,
>
>On 2020/09/30 15:19, Michael Paquier wrote:
>>On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote:
>>>Could you provide at least a rebased version of the patch?  The CF bot
>>>is complaning here.
>>
>>Not seeing this answered after two weeks, I have marked the patch as
>>RwF for now.
>>--
>>Michael
>
>
>Sorry for the delayed reply.
>
>I re-based the patch on the current head and did some
>refactoring.
>I think the size of extended stats are not useful for DBA.
>Should I remove it?
>

I think it's an interesting / useful information, I'd keep it (in the
\dX+ output only, of course). But I think it needs to print the size
similarly to \d+, i.e. using pg_size_pretty - that'll include the unit
and make it more readable for large stats.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On Wed, Oct 28, 2020 at 04:20:25PM +0900, Tatsuro Yamada wrote:
>Hi,
>
>>Results of \dX and \dX+:
>>========================
>>postgres=# \dX
>>                             List of extended statistics
>>     Schema    |   Name    |   Definition    | N_distinct | Dependencies |   Mcv
>>-------------+-----------+-----------------+------------+--------------+---------
>>   public      | hoge1_ext | a, b FROM hoge1 | defined    | defined      | defined
>>   hoge1schema | hoge1_ext | a, b FROM hoge1 | built      | built        | built
>>(2 rows)
>
>
>I used "Order by 1, 2" on the query but I realized the ordering of
>result was wrong so I fixed on the attached patch.
>Please fined the patch file. :-D
>

Thanks. I'll take a look at the beginning of the 2020-11 commitfest, and
I hope to get this committed.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

On 2020/10/29 4:07, Tomas Vondra wrote:
> On Wed, Oct 28, 2020 at 04:20:25PM +0900, Tatsuro Yamada wrote:
>> Hi,
>>
>>> Results of \dX and \dX+:
>>> ========================
>>> postgres=# \dX
>>>                             List of extended statistics
>>>     Schema    |   Name    |   Definition    | N_distinct | Dependencies |   Mcv
>>> -------------+-----------+-----------------+------------+--------------+---------
>>>   public      | hoge1_ext | a, b FROM hoge1 | defined    | defined      | defined
>>>   hoge1schema | hoge1_ext | a, b FROM hoge1 | built      | built        | built
>>> (2 rows)
>>
>>
>> I used "Order by 1, 2" on the query but I realized the ordering of
>> result was wrong so I fixed on the attached patch.
>> Please find the patch file. :-D
>>
> 
> Thanks. I'll take a look at the beginning of the 2020-11 commitfest, and
> I hope to get this committed.


Thanks for your reply and I'm glad to hear that.

I'm going to revise the patch as possible to get this committed on
the next commitfest.

Regards,
Tatsuro Yamada






Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

On 2020/10/29 4:06, Tomas Vondra wrote:
> On Wed, Oct 28, 2020 at 03:07:56PM +0900, Tatsuro Yamada wrote:
>> Hi Michael-san and Hackers,
>>
>> On 2020/09/30 15:19, Michael Paquier wrote:
>>> On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote:
>>>> Could you provide at least a rebased version of the patch?  The CF bot
>>>> is complaning here.
>>>
>>> Not seeing this answered after two weeks, I have marked the patch as
>>> RwF for now.
>>> -- 
>>> Michael
>>
>>
>> Sorry for the delayed reply.
>>
>> I re-based the patch on the current head and did some
>> refactoring.
>> I think the size of extended stats are not useful for DBA.
>> Should I remove it?
>>
> 
> I think it's an interesting / useful information, I'd keep it (in the
> \dX+ output only, of course). But I think it needs to print the size
> similarly to \d+, i.e. using pg_size_pretty - that'll include the unit
> and make it more readable for large stats.


Thanks for your comment.
I addressed it, so I keep the size of extended stats with the unit.

Changes:
========
   - Use pg_size_pretty to show the size of extended stats by \dX+

Result of \dX+:
===============
    Schema    |    Name    |   Definition    | N_distinct | Dependencies |   Mcv   |  N_Size  |  D_Size  |   M_Size
-------------+------------+-----------------+------------+--------------+---------+----------+----------+------------
  hoge1schema | hoge1_ext  | a, b FROM hoge1 | built      | built        | built   | 13 bytes | 40 bytes | 6126 bytes
  public      | hoge1_ext1 | a, b FROM hoge1 | defined    | defined      | defined | 0 bytes  | 0 bytes  | 0 bytes
  public      | hoge1_ext2 | a, b FROM hoge1 | defined    |              |         | 0 bytes  |          |
(3 rows)

Please find the attached patch.

Regards,
Tatsuro Yamada


Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi,

> I addressed it, so I keep the size of extended stats with the unit.
> 
> Changes:
> ========
>    - Use pg_size_pretty to show the size of extended stats by \dX+


I rebased the patch on the head and also added tab-completion.
Any feedback is welcome.


Preparing for tests:
===========
create table t1 (a int, b int);
create statistics stts_1 (dependencies) on a, b from t1;
create statistics stts_2 (dependencies, ndistinct) on a, b from t1;
create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1;

create table t2 (a int, b int, c int);
create statistics stts_4 on b, c from t2;

create table hoge (col1 int, col2 int, col3 int);
create statistics stts_hoge on col1, col2, col3 from hoge;

create schema foo;
create schema yama;
create statistics foo.stts_foo on col1, col2 from hoge;
create statistics yama.stts_yama (ndistinct, mcv) on col1, col3 from hoge;

insert into t1 select i,i from generate_series(1,100) i;
analyze t1;

Result of \dX:
==============
postgres=# \dX
                               List of extended statistics
  Schema |   Name    |         Definition         | N_distinct | Dependencies |   Mcv
--------+-----------+----------------------------+------------+--------------+---------
  foo    | stts_foo  | col1, col2 FROM hoge       | defined    | defined      | defined
  public | stts_1    | a, b FROM t1               |            | built        |
  public | stts_2    | a, b FROM t1               | built      | built        |
  public | stts_3    | a, b FROM t1               | built      | built        | built
  public | stts_4    | b, c FROM t2               | defined    | defined      | defined
  public | stts_hoge | col1, col2, col3 FROM hoge | defined    | defined      | defined
  yama   | stts_yama | col1, col3 FROM hoge       | defined    |              | defined
(7 rows)

Result of \dX+:
===============
postgres=# \dX+
                                                List of extended statistics
  Schema |   Name    |         Definition         | N_distinct | Dependencies |   Mcv   |  N_size  |  D_size  |
M_size

--------+-----------+----------------------------+------------+--------------+---------+----------+----------+------------
  foo    | stts_foo  | col1, col2 FROM hoge       | defined    | defined      | defined | 0 bytes  | 0 bytes  | 0
bytes
  public | stts_1    | a, b FROM t1               |            | built        |         |          | 40 bytes |
  public | stts_2    | a, b FROM t1               | built      | built        |         | 13 bytes | 40 bytes |
  public | stts_3    | a, b FROM t1               | built      | built        | built   | 13 bytes | 40 bytes | 6126
bytes
  public | stts_4    | b, c FROM t2               | defined    | defined      | defined | 0 bytes  | 0 bytes  | 0
bytes
  public | stts_hoge | col1, col2, col3 FROM hoge | defined    | defined      | defined | 0 bytes  | 0 bytes  | 0
bytes
  yama   | stts_yama | col1, col3 FROM hoge       | defined    |              | defined | 0 bytes  |          | 0
bytes
(7 rows)

Results of Tab-completion:
===============
postgres=# \dX <Tab>
foo.                 pg_toast.            stts_2               stts_hoge
information_schema.  public.              stts_3               yama.
pg_catalog.          stts_1               stts_4

postgres=# \dX+ <Tab>
foo.                 pg_toast.            stts_2               stts_hoge
information_schema.  public.              stts_3               yama.
pg_catalog.          stts_1               stts_4


Regards,
Tatsuro Yamada

Attachment

Re: list of extended statistics on psql

From
Tomas Vondra
Date:
Hi,

I took a look at this today, and I think the code is ready, but the
regression test needs a bit more work:

1) It's probably better to use somewhat more specific names for the
objects, especially when created in public schema. It decreases the
chance of a collision with other tests (which may be hard to notice
because of timing). I suggest we use "stts_" prefix or something like
that, per the attached 0002 patch. (0001 is just the v7 patch)

2) The test is failing intermittently because it's executed in parallel
with stats_ext test, which is also creating extended statistics. So
depending on the timing the \dX may list some of the stats_ext stuff.
I'm not sure what to do about this. Either this part needs to be moved
to a separate test executed in a different group, or maybe we should
simply move it to stats_ext.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

> I took a look at this today, and I think the code is ready, but the
> regression test needs a bit more work:

Thanks for taking your time. :-D


> 1) It's probably better to use somewhat more specific names for the
> objects, especially when created in public schema. It decreases the
> chance of a collision with other tests (which may be hard to notice
> because of timing). I suggest we use "stts_" prefix or something like
> that, per the attached 0002 patch. (0001 is just the v7 patch)

I agree with your comment. Thanks.



> 2) The test is failing intermittently because it's executed in parallel
> with stats_ext test, which is also creating extended statistics. So
> depending on the timing the \dX may list some of the stats_ext stuff.
> I'm not sure what to do about this. Either this part needs to be moved
> to a separate test executed in a different group, or maybe we should
> simply move it to stats_ext.

I thought all tests related to meta-commands exist in psql.sql, but I
realize it's not true. For example, the test of \dRp does not exist in
psql.sql. Therefore, I moved the regression test of \dX to stats_ext.sql
to avoid the test failed in parallel.

Attached patches is following:
  - 0001-v8-Add-dX-command-on-psql.patch
  - 0002-Add-regression-test-of-dX-to-stats_ext.sql.patch

However, I feel the test of \dX is not elegant, so I'm going to try
creating another one since it would be better to be aware of the context
of existing extended stats tests.

Regards,
Tatsuro Yamada



Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi,

  
>> 2) The test is failing intermittently because it's executed in parallel
>> with stats_ext test, which is also creating extended statistics. So
>> depending on the timing the \dX may list some of the stats_ext stuff.
>> I'm not sure what to do about this. Either this part needs to be moved
>> to a separate test executed in a different group, or maybe we should
>> simply move it to stats_ext.
> 
> I thought all tests related to meta-commands exist in psql.sql, but I
> realize it's not true. For example, the test of \dRp does not exist in
> psql.sql. Therefore, I moved the regression test of \dX to stats_ext.sql
> to avoid the test failed in parallel.
> 
> Attached patches is following:
>   - 0001-v8-Add-dX-command-on-psql.patch
>   - 0002-Add-regression-test-of-dX-to-stats_ext.sql.patch
> 
> However, I feel the test of \dX is not elegant, so I'm going to try
> creating another one since it would be better to be aware of the context
> of existing extended stats tests.

I tried to create another version of the regression test (0003).
"\dX" was added after ANALYZE command or SELECT... from pg_statistic_ext.

Please find the attached file:
   - 0003-Add-regression-test-of-dX-to-stats_ext.sql-another-ver

Both regression tests 0002 and 0003 are okay for me, I think.
Could you choose one?

Regards,
Tatsuro Yamada


Attachment

Re: list of extended statistics on psql

From
Tomas Vondra
Date:
Thanks,

It's better to always post the whole patch series, so that cfbot can
test it properly. Sending just 0003 separately kind breaks that.

Also, 0003 seems to only tweak the .sql file, not the expected output,
and there actually seems to be two places that mistakenly use \dx (so
listing extensions) instead of \dX. I've fixed both issues in the
attached patches.

However, I think the 0002 tests are better/sufficient - I prefer to keep
it compact, not interleaving with the tests testing various other stuff.
So I don't intend to commit 0003, unless there's something that I don't
see for some reason.

The one remaining thing I'm not sure about is naming of the columns with
size of statistics - N_size, D_size and M_size does not seem very clear.
Any clearer naming will however make the tables wider, though :-/


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

Thanks for your comments and also revising patches.

On 2020/11/16 3:22, Tomas Vondra wrote:
> It's better to always post the whole patch series, so that cfbot can
> test it properly. Sending just 0003 separately kind breaks that.

I now understand how "cfbot" works so that I'll take care of that
when I send patches. Thanks.


> Also, 0003 seems to only tweak the .sql file, not the expected output,
> and there actually seems to be two places that mistakenly use \dx (so
> listing extensions) instead of \dX. I've fixed both issues in the
> attached patches.

Oops, sorry about that.

  
> However, I think the 0002 tests are better/sufficient - I prefer to keep
> it compact, not interleaving with the tests testing various other stuff.
> So I don't intend to commit 0003, unless there's something that I don't
> see for some reason.

I Agreed. 0002 is easy to modify test cases and check results than 0003.
Therefore, I'll go with 0002.

  
> The one remaining thing I'm not sure about is naming of the columns with
> size of statistics - N_size, D_size and M_size does not seem very clear.
> Any clearer naming will however make the tables wider, though :-/

Yeah, I think so too, but I couldn't get an idea of a suitable name for
the columns when I created the patch.
I don't prefer a long name but I'll replace the name with it to be clearer.
For example, s/N_size/Ndistinct_size/.

Please find attached patcheds:
   - 0001: Replace column names
   - 0002: Recreate regression test based on 0001


Regards,
Tatsuro Yamada


Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas and hackers,

> I don't prefer a long name but I'll replace the name with it to be clearer.
> For example, s/N_size/Ndistinct_size/.
> 
> Please find attached patcheds:
>    - 0001: Replace column names
>    - 0002: Recreate regression test based on 0001


I rebased the patch set on the master (7e5e1bba03), and the regression
test is good. Therefore, I changed the status of the patch: "needs review".

I know that you proposed the new extended statistics[1], and it probably
conflicts with the patch. I hope my patch will get commit before your
patch committed to avoid the time of recreating. :-)


[1] https://www.postgresql.org/message-id/flat/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com

Thanks,
Tatsuro Yamada




Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi,

>I rebased the patch set on the master (7e5e1bba03), and the regression
>test is good. Therefore, I changed the status of the patch: "needs review". 

Happy New Year!

I rebased my patches on HEAD.
Please find attached files. :-D

Thanks,
Tatsuro Yamada


Attachment

Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On 1/5/21 5:26 AM, Tatsuro Yamada wrote:
> Hi,
> 
>> I rebased the patch set on the master (7e5e1bba03), and the regression
>> test is good. Therefore, I changed the status of the patch: "needs 
>> review". 
> 
> Happy New Year!
> 
> I rebased my patches on HEAD.
> Please find attached files. :-D
> 

Thanks, and Happy new year to you too.

I almost pushed this, but I have one more question. listExtendedStats 
first checks the server version, and errors out for pre-10 servers. 
Shouldn't the logic building query check the server version too, to 
decide whether to check the MCV stats? Otherwise it won't work on 10 and 
11, which does not support the "mcv" stats.

I don't recall what exactly is our policy regarding new psql with older 
server versions, but it seems strange to check for 10.0 and then fail 
anyway for "supported" versions.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

> Thanks, and Happy new year to you too.
> 
> I almost pushed this, but I have one more question. listExtendedStats first checks the server version, and errors out
forpre-10 servers. Shouldn't the logic building query check the server version too, to decide whether to check the MCV
stats?Otherwise it won't work on 10 and 11, which does not support the "mcv" stats.
 
>> I don't recall what exactly is our policy regarding new psql with older server versions, but it seems strange to
checkfor 10.0 and then fail anyway for "supported" versions.
 

Thanks for your comments.

I overlooked the check for MCV in the logic building query
because I created the patch as a new feature on PG14.
I'm not sure whether we should do back patch or not. However, I'll
add the check on the next patch because it is useful if you decide to
do the back patch on PG10, 11, 12, and 13.

I wonder the column names added by \dX+ is fine? For example,
"Ndistinct_size" and "Dependencies_size". It looks like long names,
but acceptable?

Regards,
Tatsuro Yamada





Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/7/21 1:46 AM, Tatsuro Yamada wrote:
> Hi Tomas,
> 
>> Thanks, and Happy new year to you too.
>>
>> I almost pushed this, but I have one more question. listExtendedStats 
>> first checks the server version, and errors out for pre-10 servers. 
>> Shouldn't the logic building query check the server version too, to 
>> decide whether to check the MCV stats? Otherwise it won't work on 10 
>> and 11, which does not support the "mcv" stats.
>>> I don't recall what exactly is our policy regarding new psql with 
>>> older server versions, but it seems strange to check for 10.0 and 
>>> then fail anyway for "supported" versions.
> 
> Thanks for your comments.
> 
> I overlooked the check for MCV in the logic building query
> because I created the patch as a new feature on PG14.
> I'm not sure whether we should do back patch or not. However, I'll
> add the check on the next patch because it is useful if you decide to
> do the back patch on PG10, 11, 12, and 13.
> 

+1

BTW perhaps a quick look at the other \d commands would show if there 
are precedents. I didn't have time for that.

> I wonder the column names added by \dX+ is fine? For example,
> "Ndistinct_size" and "Dependencies_size". It looks like long names,
> but acceptable?
> 

Seems acceptable - I don't have a better idea.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Alvaro Herrera
Date:
On 2021-Jan-07, Tomas Vondra wrote:

> On 1/7/21 1:46 AM, Tatsuro Yamada wrote:

> > I overlooked the check for MCV in the logic building query
> > because I created the patch as a new feature on PG14.
> > I'm not sure whether we should do back patch or not. However, I'll
> > add the check on the next patch because it is useful if you decide to
> > do the back patch on PG10, 11, 12, and 13.
> 
> BTW perhaps a quick look at the other \d commands would show if there are
> precedents. I didn't have time for that.

Yes, we do promise that new psql works with older servers.

I think we would not backpatch any of this, though.

-- 
Álvaro Herrera



Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/7/21 3:47 PM, Alvaro Herrera wrote:
> On 2021-Jan-07, Tomas Vondra wrote:
> 
>> On 1/7/21 1:46 AM, Tatsuro Yamada wrote:
> 
>>> I overlooked the check for MCV in the logic building query
>>> because I created the patch as a new feature on PG14.
>>> I'm not sure whether we should do back patch or not. However, I'll
>>> add the check on the next patch because it is useful if you decide to
>>> do the back patch on PG10, 11, 12, and 13.
>>
>> BTW perhaps a quick look at the other \d commands would show if there are
>> precedents. I didn't have time for that.
> 
> Yes, we do promise that new psql works with older servers.
> 

Yeah, makes sense. That means we need add the check for 12 / MCV.

> I think we would not backpatch any of this, though.

I wasn't really planning to backpatch any of this, of course.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi,

On 2021/01/08 0:56, Tomas Vondra wrote:
> On 1/7/21 3:47 PM, Alvaro Herrera wrote:
>> On 2021-Jan-07, Tomas Vondra wrote:
>>
>>> On 1/7/21 1:46 AM, Tatsuro Yamada wrote:
>>
>>>> I overlooked the check for MCV in the logic building query
>>>> because I created the patch as a new feature on PG14.
>>>> I'm not sure whether we should do back patch or not. However, I'll
>>>> add the check on the next patch because it is useful if you decide to
>>>> do the back patch on PG10, 11, 12, and 13.
>>>
>>> BTW perhaps a quick look at the other \d commands would show if there are
>>> precedents. I didn't have time for that.
>>
>> Yes, we do promise that new psql works with older servers.
>>
> 
> Yeah, makes sense. That means we need add the check for 12 / MCV.


Ah, I got it.
I fixed the patch to work with older servers to add the checking versions. And I tested \dX command on older servers
(PG10- 13).
 
These results look fine.

0001:
      Added the check code to handle pre-PG12. It has not MCV and
       pg_statistic_ext_data.
0002:
      This patch is the same as the previous patch (not changed).

Please find the attached files.


>> I wonder the column names added by \dX+ is fine? For example,
>> "Ndistinct_size" and "Dependencies_size". It looks like long names,
>> but acceptable?
>>
> 
> Seems acceptable - I don't have a better idea. 

I see, thanks!


Thanks,
Tatsuro Yamada

Attachment

Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/8/21 12:52 AM, Tatsuro Yamada wrote:
> Hi,
> 
> On 2021/01/08 0:56, Tomas Vondra wrote:
>> On 1/7/21 3:47 PM, Alvaro Herrera wrote:
>>> On 2021-Jan-07, Tomas Vondra wrote:
>>>
>>>> On 1/7/21 1:46 AM, Tatsuro Yamada wrote:
>>>
>>>>> I overlooked the check for MCV in the logic building query
>>>>> because I created the patch as a new feature on PG14.
>>>>> I'm not sure whether we should do back patch or not. However, I'll
>>>>> add the check on the next patch because it is useful if you decide to
>>>>> do the back patch on PG10, 11, 12, and 13.
>>>>
>>>> BTW perhaps a quick look at the other \d commands would show if 
>>>> there are
>>>> precedents. I didn't have time for that.
>>>
>>> Yes, we do promise that new psql works with older servers.
>>>
>>
>> Yeah, makes sense. That means we need add the check for 12 / MCV.
> 
> 
> Ah, I got it.
> I fixed the patch to work with older servers to add the checking 
> versions. And I tested \dX command on older servers (PG10 - 13).
> These results look fine.
> 
> 0001:
>       Added the check code to handle pre-PG12. It has not MCV and
>        pg_statistic_ext_data.
> 0002:
>       This patch is the same as the previous patch (not changed).
> 
> Please find the attached files.
> 

OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to 
squash the patches into a single commit.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/8/21 1:14 AM, Tomas Vondra wrote:
> 
> 
> On 1/8/21 12:52 AM, Tatsuro Yamada wrote:
>> Hi,
>>
>> On 2021/01/08 0:56, Tomas Vondra wrote:
>>> On 1/7/21 3:47 PM, Alvaro Herrera wrote:
>>>> On 2021-Jan-07, Tomas Vondra wrote:
>>>>
>>>>> On 1/7/21 1:46 AM, Tatsuro Yamada wrote:
>>>>
>>>>>> I overlooked the check for MCV in the logic building query
>>>>>> because I created the patch as a new feature on PG14.
>>>>>> I'm not sure whether we should do back patch or not. However, I'll
>>>>>> add the check on the next patch because it is useful if you decide to
>>>>>> do the back patch on PG10, 11, 12, and 13.
>>>>>
>>>>> BTW perhaps a quick look at the other \d commands would show if
>>>>> there are
>>>>> precedents. I didn't have time for that.
>>>>
>>>> Yes, we do promise that new psql works with older servers.
>>>>
>>>
>>> Yeah, makes sense. That means we need add the check for 12 / MCV.
>>
>>
>> Ah, I got it.
>> I fixed the patch to work with older servers to add the checking
>> versions. And I tested \dX command on older servers (PG10 - 13).
>> These results look fine.
>>
>> 0001:
>>       Added the check code to handle pre-PG12. It has not MCV and
>>        pg_statistic_ext_data.
>> 0002:
>>       This patch is the same as the previous patch (not changed).
>>
>> Please find the attached files.
>>
> 
> OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to
> squash the patches into a single commit.
> 

Attached is a patch I plan to commit - 0001 is the last submitted
version with a couple minor tweaks, mostly in docs/comments, and small
rework of branching to be more like the other functions in describe.c.

While working on that, I realized that 'defined' might be a bit
ambiguous, I initially thought it means 'NOT NULL' (which it does not).
I propose to change it to 'requested' instead. Tatsuro, do you agree, or
do you think 'defined' is better?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

On 2021/01/09 9:01, Tomas Vondra wrote:
> On 1/8/21 1:14 AM, Tomas Vondra wrote:
>> On 1/8/21 12:52 AM, Tatsuro Yamada wrote:
>>> On 2021/01/08 0:56, Tomas Vondra wrote:
>>>> On 1/7/21 3:47 PM, Alvaro Herrera wrote:
>>>>> On 2021-Jan-07, Tomas Vondra wrote:
>>>>>> On 1/7/21 1:46 AM, Tatsuro Yamada wrote:
>>>>>
>>>>>>> I overlooked the check for MCV in the logic building query
>>>>>>> because I created the patch as a new feature on PG14.
>>>>>>> I'm not sure whether we should do back patch or not. However, I'll
>>>>>>> add the check on the next patch because it is useful if you decide to
>>>>>>> do the back patch on PG10, 11, 12, and 13.
>>>>>>
>>>>>> BTW perhaps a quick look at the other \d commands would show if
>>>>>> there are
>>>>>> precedents. I didn't have time for that.
>>>>>
>>>>> Yes, we do promise that new psql works with older servers.
>>>>>
>>>>
>>>> Yeah, makes sense. That means we need add the check for 12 / MCV.
>>>
>>>
>>> Ah, I got it.
>>> I fixed the patch to work with older servers to add the checking
>>> versions. And I tested \dX command on older servers (PG10 - 13).
>>> These results look fine.
>>>
>>> 0001:
>>>        Added the check code to handle pre-PG12. It has not MCV and
>>>         pg_statistic_ext_data.
>>> 0002:
>>>        This patch is the same as the previous patch (not changed).
>>>
>>> Please find the attached files.
>>>
>>
>> OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to
>> squash the patches into a single commit.
>>
> 
> Attached is a patch I plan to commit - 0001 is the last submitted
> version with a couple minor tweaks, mostly in docs/comments, and small
> rework of branching to be more like the other functions in describe.c.

Thanks for revising the patch.
I reviewed the 0001, and the branching and comments look good to me.
However, I added an alias name in processSQLNamePattern() on the patch:
s/"stxname"/"es.stxname"/


> While working on that, I realized that 'defined' might be a bit
> ambiguous, I initially thought it means 'NOT NULL' (which it does not).
> I propose to change it to 'requested' instead. Tatsuro, do you agree, or
> do you think 'defined' is better?

Regarding the status of extended stats, I think the followings:

  - "defined": it shows the extended stats defined only. We can't know
               whether it needs to analyze or not. I agree this name was
                ambiguous. Therefore we should replace it with a more suitable
               name.
  - "requested": it shows the extended stats needs something. Of course,
               we know it needs to ANALYZE because we can create the patch.
               However, I feel there is a little ambiguity for DBA.
               To solve this, it would be better to write an explanation of
               the status in the document. For example,

======
The column of the kind of extended stats (e. g. Ndistinct) shows some statuses.
"requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE
  was finished, and the planner can use it. NULL means that it doesn't exists.
======

What do you think? :-D


Thanks,
Tatsuro Yamada

Attachment

Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On 1/12/21 2:57 AM, Tatsuro Yamada wrote:
> Hi Tomas,
> 
> On 2021/01/09 9:01, Tomas Vondra wrote:
...>
>> While working on that, I realized that 'defined' might be a bit
>> ambiguous, I initially thought it means 'NOT NULL' (which it does not).
>> I propose to change it to 'requested' instead. Tatsuro, do you agree, or
>> do you think 'defined' is better?
> 
> Regarding the status of extended stats, I think the followings:
> 
>   - "defined": it shows the extended stats defined only. We can't know
>                whether it needs to analyze or not. I agree this name was
>                 ambiguous. Therefore we should replace it with a more 
> suitable
>                name.
>   - "requested": it shows the extended stats needs something. Of course,
>                we know it needs to ANALYZE because we can create the patch.
>                However, I feel there is a little ambiguity for DBA.
>                To solve this, it would be better to write an explanation of
>                the status in the document. For example,
> 
> ======
> The column of the kind of extended stats (e. g. Ndistinct) shows some 
> statuses.
> "requested" means that it needs to gather data by ANALYZE. "built" means 
> ANALYZE
>   was finished, and the planner can use it. NULL means that it doesn't 
> exists.
> ======
> 
> What do you think? :-D
> 

Yes, that seems reasonable to me. Will you provide an updated patch?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

On 2021/01/12 20:08, Tomas Vondra wrote:
> 
> On 1/12/21 2:57 AM, Tatsuro Yamada wrote:
>> Hi Tomas,
>>
>> On 2021/01/09 9:01, Tomas Vondra wrote:
> ...>
>>> While working on that, I realized that 'defined' might be a bit
>>> ambiguous, I initially thought it means 'NOT NULL' (which it does not).
>>> I propose to change it to 'requested' instead. Tatsuro, do you agree, or
>>> do you think 'defined' is better?
>>
>> Regarding the status of extended stats, I think the followings:
>>
>>   - "defined": it shows the extended stats defined only. We can't know
>>                whether it needs to analyze or not. I agree this name was
>>                 ambiguous. Therefore we should replace it with a more suitable
>>                name.
>>   - "requested": it shows the extended stats needs something. Of course,
>>                we know it needs to ANALYZE because we can create the patch.
>>                However, I feel there is a little ambiguity for DBA.
>>                To solve this, it would be better to write an explanation of
>>                the status in the document. For example,
>>
>> ======
>> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses.
>> "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE
>>   was finished, and the planner can use it. NULL means that it doesn't exists.
>> ======
>>
>> What do you think? :-D
>>
> 
> Yes, that seems reasonable to me. Will you provide an updated patch?


Sounds good. I'll send the updated patch today.


Thanks,
Tatsuro Yamada






Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

On 2021/01/13 7:48, Tatsuro Yamada wrote:
> On 2021/01/12 20:08, Tomas Vondra wrote:
>> On 1/12/21 2:57 AM, Tatsuro Yamada wrote:
>>> On 2021/01/09 9:01, Tomas Vondra wrote:
>> ...>
>>>> While working on that, I realized that 'defined' might be a bit
>>>> ambiguous, I initially thought it means 'NOT NULL' (which it does not).
>>>> I propose to change it to 'requested' instead. Tatsuro, do you agree, or
>>>> do you think 'defined' is better?
>>>
>>> Regarding the status of extended stats, I think the followings:
>>>
>>>   - "defined": it shows the extended stats defined only. We can't know
>>>                whether it needs to analyze or not. I agree this name was
>>>                 ambiguous. Therefore we should replace it with a more suitable
>>>                name.
>>>   - "requested": it shows the extended stats needs something. Of course,
>>>                we know it needs to ANALYZE because we can create the patch.
>>>                However, I feel there is a little ambiguity for DBA.
>>>                To solve this, it would be better to write an explanation of
>>>                the status in the document. For example,
>>>
>>> ======
>>> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses.
>>> "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE
>>>   was finished, and the planner can use it. NULL means that it doesn't exists.
>>> ======
>>>
>>> What do you think? :-D
>>>
>>
>> Yes, that seems reasonable to me. Will you provide an updated patch?
> 
> 
> Sounds good. I'll send the updated patch today.



I updated the patch to add the explanation of the extended stats' statuses.
Please feel free to modify the patch to improve it more clearly.

The attached files are:
    0001: Add psql \dx and the fixed document
    0002: Regression test for psql \dX
    app-psql.html: Created by "make html" command (You can check the
                   explanation of the statuses easily, probably)

Thanks,
Tatsuro Yamada

Attachment

Re: list of extended statistics on psql

From
Julien Rouhaud
Date:
On Wed, Jan 13, 2021 at 10:22:05AM +0900, Tatsuro Yamada wrote:
> Hi Tomas,
> 
> On 2021/01/13 7:48, Tatsuro Yamada wrote:
> > On 2021/01/12 20:08, Tomas Vondra wrote:
> > > On 1/12/21 2:57 AM, Tatsuro Yamada wrote:
> > > > On 2021/01/09 9:01, Tomas Vondra wrote:
> > > ...>
> > > > > While working on that, I realized that 'defined' might be a bit
> > > > > ambiguous, I initially thought it means 'NOT NULL' (which it does not).
> > > > > I propose to change it to 'requested' instead. Tatsuro, do you agree, or
> > > > > do you think 'defined' is better?
> > > > 
> > > > Regarding the status of extended stats, I think the followings:
> > > > 
> > > >   - "defined": it shows the extended stats defined only. We can't know
> > > >                whether it needs to analyze or not. I agree this name was
> > > >                 ambiguous. Therefore we should replace it with a more suitable
> > > >                name.
> > > >   - "requested": it shows the extended stats needs something. Of course,
> > > >                we know it needs to ANALYZE because we can create the patch.
> > > >                However, I feel there is a little ambiguity for DBA.
> > > >                To solve this, it would be better to write an explanation of
> > > >                the status in the document. For example,
> > > > 
> > > > ======
> > > > The column of the kind of extended stats (e. g. Ndistinct) shows some statuses.
> > > > "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE
> > > >   was finished, and the planner can use it. NULL means that it doesn't exists.
> > > > ======
> > > > 
> > > > What do you think? :-D
> > > > 
> > > 
> > > Yes, that seems reasonable to me. Will you provide an updated patch?
> > 
> > 
> > Sounds good. I'll send the updated patch today.
> 
> 
> 
> I updated the patch to add the explanation of the extended stats' statuses.
> Please feel free to modify the patch to improve it more clearly.
> 
> The attached files are:
>    0001: Add psql \dx and the fixed document
>    0002: Regression test for psql \dX
>    app-psql.html: Created by "make html" command (You can check the
>                   explanation of the statuses easily, probably)

Hello Yamada-san,

I reviewed the patch and don't have specific complaints, it all looks good!

I'm however thinking about the "requested" status.  I'm wondering if it could
lead to people think that an ANALYZE is scheduled and will happen soon.
Maybe "defined" or "declared" might be less misleading, or even "waiting for
analyze"?



Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/15/21 9:47 AM, Julien Rouhaud wrote:
> On Wed, Jan 13, 2021 at 10:22:05AM +0900, Tatsuro Yamada wrote:
>> Hi Tomas,
>>
>> On 2021/01/13 7:48, Tatsuro Yamada wrote:
>>> On 2021/01/12 20:08, Tomas Vondra wrote:
>>>> On 1/12/21 2:57 AM, Tatsuro Yamada wrote:
>>>>> On 2021/01/09 9:01, Tomas Vondra wrote:
>>>> ...>
>>>>>> While working on that, I realized that 'defined' might be a bit
>>>>>> ambiguous, I initially thought it means 'NOT NULL' (which it does not).
>>>>>> I propose to change it to 'requested' instead. Tatsuro, do you agree, or
>>>>>> do you think 'defined' is better?
>>>>>
>>>>> Regarding the status of extended stats, I think the followings:
>>>>>
>>>>>   - "defined": it shows the extended stats defined only. We can't know
>>>>>                whether it needs to analyze or not. I agree this name was
>>>>>                 ambiguous. Therefore we should replace it with a more suitable
>>>>>                name.
>>>>>   - "requested": it shows the extended stats needs something. Of course,
>>>>>                we know it needs to ANALYZE because we can create the patch.
>>>>>                However, I feel there is a little ambiguity for DBA.
>>>>>                To solve this, it would be better to write an explanation of
>>>>>                the status in the document. For example,
>>>>>
>>>>> ======
>>>>> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses.
>>>>> "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE
>>>>>   was finished, and the planner can use it. NULL means that it doesn't exists.
>>>>> ======
>>>>>
>>>>> What do you think? :-D
>>>>>
>>>>
>>>> Yes, that seems reasonable to me. Will you provide an updated patch?
>>>
>>>
>>> Sounds good. I'll send the updated patch today.
>>
>>
>>
>> I updated the patch to add the explanation of the extended stats' statuses.
>> Please feel free to modify the patch to improve it more clearly.
>>
>> The attached files are:
>>    0001: Add psql \dx and the fixed document
>>    0002: Regression test for psql \dX
>>    app-psql.html: Created by "make html" command (You can check the
>>                   explanation of the statuses easily, probably)
> 
> Hello Yamada-san,
> 
> I reviewed the patch and don't have specific complaints, it all looks good!
> 
> I'm however thinking about the "requested" status.  I'm wondering if it could
> lead to people think that an ANALYZE is scheduled and will happen soon.
> Maybe "defined" or "declared" might be less misleading, or even "waiting for
> analyze"?
> 

Well, the "defined" option is not great either, because it can be
interpreted as "NOT NULL" - that's why I proposed "requested". Not sure
about "declared" - I wouldn't use it in this context, but I'm not a
native speaker so maybe it's OK.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/15/21 5:19 PM, Tomas Vondra wrote:
> 
> 
> On 1/15/21 9:47 AM, Julien Rouhaud wrote:
>> On Wed, Jan 13, 2021 at 10:22:05AM +0900, Tatsuro Yamada wrote:
>>> Hi Tomas,
>>>
>>> On 2021/01/13 7:48, Tatsuro Yamada wrote:
>>>> On 2021/01/12 20:08, Tomas Vondra wrote:
>>>>> On 1/12/21 2:57 AM, Tatsuro Yamada wrote:
>>>>>> On 2021/01/09 9:01, Tomas Vondra wrote:
>>>>> ...>
>>>>>>> While working on that, I realized that 'defined' might be a bit
>>>>>>> ambiguous, I initially thought it means 'NOT NULL' (which it does not).
>>>>>>> I propose to change it to 'requested' instead. Tatsuro, do you agree, or
>>>>>>> do you think 'defined' is better?
>>>>>>
>>>>>> Regarding the status of extended stats, I think the followings:
>>>>>>
>>>>>>    - "defined": it shows the extended stats defined only. We can't know
>>>>>>                 whether it needs to analyze or not. I agree this name was
>>>>>>                  ambiguous. Therefore we should replace it with a more suitable
>>>>>>                 name.
>>>>>>    - "requested": it shows the extended stats needs something. Of course,
>>>>>>                 we know it needs to ANALYZE because we can create the patch.
>>>>>>                 However, I feel there is a little ambiguity for DBA.
>>>>>>                 To solve this, it would be better to write an explanation of
>>>>>>                 the status in the document. For example,
>>>>>>
>>>>>> ======
>>>>>> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses.
>>>>>> "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE
>>>>>>    was finished, and the planner can use it. NULL means that it doesn't exists.
>>>>>> ======
>>>>>>
>>>>>> What do you think? :-D
>>>>>>
>>>>>
>>>>> Yes, that seems reasonable to me. Will you provide an updated patch?
>>>>
>>>>
>>>> Sounds good. I'll send the updated patch today.
>>>
>>>
>>>
>>> I updated the patch to add the explanation of the extended stats' statuses.
>>> Please feel free to modify the patch to improve it more clearly.
>>>
>>> The attached files are:
>>>     0001: Add psql \dx and the fixed document
>>>     0002: Regression test for psql \dX
>>>     app-psql.html: Created by "make html" command (You can check the
>>>                    explanation of the statuses easily, probably)
>>
>> Hello Yamada-san,
>>
>> I reviewed the patch and don't have specific complaints, it all looks good!
>>
>> I'm however thinking about the "requested" status.  I'm wondering if it could
>> lead to people think that an ANALYZE is scheduled and will happen soon.
>> Maybe "defined" or "declared" might be less misleading, or even "waiting for
>> analyze"?
>>
> 
> Well, the "defined" option is not great either, because it can be
> interpreted as "NOT NULL" - that's why I proposed "requested". Not sure
> about "declared" - I wouldn't use it in this context, but I'm not a
> native speaker so maybe it's OK.
> 

I've pushed this, keeping the "requested". If we decide that some other 
term is a better choice, we can tweak that later of course.

Thanks Tatsuro-san for the patience!


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



RE: list of extended statistics on psql

From
"Shinoda, Noriyoshi (PN Japan FSIP)"
Date:
Hi, hackers.

I tested this committed feature. 
It doesn't seem to be available to non-superusers due to the inability to access pg_statistics_ext_data. 
Is this the expected behavior?

--- operation ---
postgres=> CREATE STATISTICS stat1_data1 ON c1, c2 FROM data1;
CREATE STATISTICS
postgres=> ANALYZE data1;
ANALYZE
postgres=> SELECT * FROM pg_statistic_ext;
  oid  | stxrelid |   stxname   | stxnamespace | stxowner | stxstattarget | stxkeys | stxkind
-------+----------+-------------+--------------+----------+---------------+---------+---------
 16393 |    16385 | stat1_data1 |         2200 |    16384 |            -1 | 1 2     | {d,f,m}
(1 row)

postgres=> \dX
ERROR:  permission denied for table pg_statistic_ext_data
postgres=>
postgres=> \connect postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=#
postgres=# \dX
                           List of extended statistics
 Schema |    Name     |    Definition     | Ndistinct | Dependencies |    MCV
--------+-------------+-------------------+-----------+--------------+-----------
 public | stat1_data1 | c1, c2 FROM data1 | built     | built        | requested
(1 row)

--- operation ---

Regards,
Noriyoshi Shinoda

-----Original Message-----
From: Tomas Vondra [mailto:tomas.vondra@enterprisedb.com] 
Sent: Sunday, January 17, 2021 8:32 AM
To: Julien Rouhaud <rjuju123@gmail.com>; Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>
Cc: Alvaro Herrera <alvherre@2ndquadrant.com>; Tomas Vondra <tomas.vondra@2ndquadrant.com>; Michael Paquier
<michael@paquier.xyz>;Pavel Stehule <pavel.stehule@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
 
Subject: Re: list of extended statistics on psql



On 1/15/21 5:19 PM, Tomas Vondra wrote:
> 
> 
> On 1/15/21 9:47 AM, Julien Rouhaud wrote:
>> On Wed, Jan 13, 2021 at 10:22:05AM +0900, Tatsuro Yamada wrote:
>>> Hi Tomas,
>>>
>>> On 2021/01/13 7:48, Tatsuro Yamada wrote:
>>>> On 2021/01/12 20:08, Tomas Vondra wrote:
>>>>> On 1/12/21 2:57 AM, Tatsuro Yamada wrote:
>>>>>> On 2021/01/09 9:01, Tomas Vondra wrote:
>>>>> ...>
>>>>>>> While working on that, I realized that 'defined' might be a bit 
>>>>>>> ambiguous, I initially thought it means 'NOT NULL' (which it does not).
>>>>>>> I propose to change it to 'requested' instead. Tatsuro, do you 
>>>>>>> agree, or do you think 'defined' is better?
>>>>>>
>>>>>> Regarding the status of extended stats, I think the followings:
>>>>>>
>>>>>>    - "defined": it shows the extended stats defined only. We 
>>>>>> can't know
>>>>>>                 whether it needs to analyze or not. I agree this 
>>>>>> name was
>>>>>>                  ambiguous. Therefore we should replace it with a 
>>>>>> more suitable
>>>>>>                 name.
>>>>>>    - "requested": it shows the extended stats needs something. Of 
>>>>>> course,
>>>>>>                 we know it needs to ANALYZE because we can create the patch.
>>>>>>                 However, I feel there is a little ambiguity for DBA.
>>>>>>                 To solve this, it would be better to write an 
>>>>>> explanation of
>>>>>>                 the status in the document. For example,
>>>>>>
>>>>>> ======
>>>>>> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses.
>>>>>> "requested" means that it needs to gather data by ANALYZE. 
>>>>>> "built" means ANALYZE
>>>>>>    was finished, and the planner can use it. NULL means that it doesn't exists.
>>>>>> ======
>>>>>>
>>>>>> What do you think? :-D
>>>>>>
>>>>>
>>>>> Yes, that seems reasonable to me. Will you provide an updated patch?
>>>>
>>>>
>>>> Sounds good. I'll send the updated patch today.
>>>
>>>
>>>
>>> I updated the patch to add the explanation of the extended stats' statuses.
>>> Please feel free to modify the patch to improve it more clearly.
>>>
>>> The attached files are:
>>>     0001: Add psql \dx and the fixed document
>>>     0002: Regression test for psql \dX
>>>     app-psql.html: Created by "make html" command (You can check the
>>>                    explanation of the statuses easily, probably)
>>
>> Hello Yamada-san,
>>
>> I reviewed the patch and don't have specific complaints, it all looks good!
>>
>> I'm however thinking about the "requested" status.  I'm wondering if 
>> it could lead to people think that an ANALYZE is scheduled and will happen soon.
>> Maybe "defined" or "declared" might be less misleading, or even 
>> "waiting for analyze"?
>>
> 
> Well, the "defined" option is not great either, because it can be 
> interpreted as "NOT NULL" - that's why I proposed "requested". Not 
> sure about "declared" - I wouldn't use it in this context, but I'm not 
> a native speaker so maybe it's OK.
> 

I've pushed this, keeping the "requested". If we decide that some other term is a better choice, we can tweak that
laterof course.
 

Thanks Tatsuro-san for the patience!


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tomas Vondra
Date:
On 1/17/21 2:41 AM, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
> Hi, hackers.
> 
> I tested this committed feature.
> It doesn't seem to be available to non-superusers due to the inability to access pg_statistics_ext_data.
> Is this the expected behavior?
> 

Hmmm, that's a good point. Bummer we haven't noticed that earlier.

I wonder what the right fix should be - presumably we could do something 
like pg_stats_ext (we can't use that view directly, because it formats 
the data, so the sizes are different).

But should it list just the stats the user has access to, or should it 
list everything and leave the inaccessible fields NULL?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/17/21 3:01 AM, Tomas Vondra wrote:
> On 1/17/21 2:41 AM, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
>> Hi, hackers.
>>
>> I tested this committed feature.
>> It doesn't seem to be available to non-superusers due to the inability 
>> to access pg_statistics_ext_data.
>> Is this the expected behavior?
>>
> 
> Hmmm, that's a good point. Bummer we haven't noticed that earlier.
> 
> I wonder what the right fix should be - presumably we could do something 
> like pg_stats_ext (we can't use that view directly, because it formats 
> the data, so the sizes are different).
> 
> But should it list just the stats the user has access to, or should it 
> list everything and leave the inaccessible fields NULL?
> 

I've reverted the commit - once we find the right way to handle this, 
I'll get it committed again.

As for how to deal with this, I can think of about three ways:

1) simplify the command to only print information from pg_statistic_ext 
(so on information about which stats are built or sizes)

2) extend pg_stats_ext with necessary information (e.g. sizes)

3) create a new system view, with necessary information (so that 
pg_stats_ext does not need to be modified)

4) add functions returning the necessary information, possibly only for 
statistics the user can access (similarly to what pg_stats_ext does)

Options 2-4 have the obvious disadvantage that this won't work on older 
releases (we can't add views or functions there). So I'm leaning towards 
#1 even if that means we have to remove some of the details. We can 
consider adding that for new releases, though.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Justin Pryzby
Date:
On Sun, Jan 17, 2021 at 03:31:57PM +0100, Tomas Vondra wrote:
> I've reverted the commit - once we find the right way to handle this, I'll
> get it committed again.

Please consider these doc changes for the next iteration.

commit 1a69f648ce6c63ebb37b6d8ec7c6539b3cb70787
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date:   Sat Jan 16 17:47:35 2021 -0600

    doc review: psql \dX 891a1d0bca262ca78564e0fea1eaa5ae544ea5ee

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index aaf55df921..a678a69dfb 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1928,15 +1928,15 @@ testdb=>
         is specified, only those extended statistics whose names match the
         pattern are listed.
         If <literal>+</literal> is appended to the command name, each extended
-        statistics is listed with its size.
+        statistic is listed with its size.
         </para>
 
         <para>
-        The column of the kind of extended stats (e.g. Ndistinct) shows some statuses.
+        The column of the kind of extended stats (e.g. Ndistinct) shows its status.
         "requested" means that it needs to collect statistics by <link
         linkend="sql-analyze"><command>ANALYZE</command></link>. 
         "built" means <link linkend="sql-analyze"><command>ANALYZE</command></link> was 
-        finished, and the planner can use it. NULL means that it doesn't exists. 
+        run, and statistics are available to the planner. NULL means that it doesn't exist. 
         </para>
         </listitem>
       </varlistentry>



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Julien,

On 2021/01/15 17:47, Julien Rouhaud wrote:
> Hello Yamada-san,
> 
> I reviewed the patch and don't have specific complaints, it all looks good!
> 
> I'm however thinking about the "requested" status.  I'm wondering if it could
> lead to people think that an ANALYZE is scheduled and will happen soon.
> Maybe "defined" or "declared" might be less misleading, or even "waiting for
> analyze"?


Thanks for reviewing the patch.
Yeah, "waiting for analyze" was preferable but it was a little long to use on the columns. Unfortunately, I couldn't
imaginea suitable term. Therefore,
 
I'm keeping it as is.

Regards,
Tatsuro Yamada




Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

On 2021/01/17 8:32, Tomas Vondra wrote:
> I've pushed this, keeping the "requested". If we decide that some other term is a better choice, we can tweak that
laterof course.
 
> 
> Thanks Tatsuro-san for the patience!

Thanks for taking the time to review the patches.
I believe this feature is useful for DBA when they use Extended stats.
For example, the execution plan tuning phase and so on.

Then, I know the patch was reverted. So, I keep going to fix the patch
on the Second iteration. :-D

Regards,
Tatsuro Yamada




Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas and Shinoda-san,

On 2021/01/17 23:31, Tomas Vondra wrote:
> 
> 
> On 1/17/21 3:01 AM, Tomas Vondra wrote:
>> On 1/17/21 2:41 AM, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
>>> Hi, hackers.
>>>
>>> I tested this committed feature.
>>> It doesn't seem to be available to non-superusers due to the inability to access pg_statistics_ext_data.
>>> Is this the expected behavior?


Ugh. I overlooked the test to check the case of the user hasn't Superuser privilege. The user without the privilege was
ableto access pg_statistics_ext. Therefore I supposed that it's also able to access pg_statics_ext_data. Oops.
 


>> Hmmm, that's a good point. Bummer we haven't noticed that earlier.
>>
>> I wonder what the right fix should be - presumably we could do something like pg_stats_ext (we can't use that view
directly,because it formats the data, so the sizes are different).
 
>>
>> But should it list just the stats the user has access to, or should it list everything and leave the inaccessible
fieldsNULL?
 
>>
> 
> I've reverted the commit - once we find the right way to handle this, I'll get it committed again.
> 
> As for how to deal with this, I can think of about three ways:
> 
> 1) simplify the command to only print information from pg_statistic_ext (so on information about which stats are
builtor sizes)
 
> 
> 2) extend pg_stats_ext with necessary information (e.g. sizes)
> 
> 3) create a new system view, with necessary information (so that pg_stats_ext does not need to be modified)
> 
> 4) add functions returning the necessary information, possibly only for statistics the user can access (similarly to
whatpg_stats_ext does)
 
> 
> Options 2-4 have the obvious disadvantage that this won't work on older releases (we can't add views or functions
there).So I'm leaning towards #1 even if that means we have to remove some of the details. We can consider adding that
fornew releases, though.
 


Thanks for the useful advice. I go with option 1).
The following query is created by using pg_stats_ext instead of pg_statistic_ext and pg_statistic_ext_data. However, I
wasconfused
 
about writing a part of the query for calculating MCV size because
there are four columns related to MCV. For example, most_common_vals, most_common_val_nulls, most_common_freqs, and
most_common_base_freqs.
Currently, I don't know how to calculate the size of MCV by using the
four columns. Thoughts? :-)

===================================================
\connect postgres hoge
create table hoge_t(a int, b int);
insert into hoge_t select i,i from generate_series(1,100) i;
create statistics hoge_t_ext on a, b from hoge_t;


SELECT
         es.statistics_schemaname AS "Schema",
         es.statistics_name AS "Name",
         pg_catalog.format('%s FROM %s',
           (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(s.attname),', ')
            FROM pg_catalog.unnest(es.attnames) s(attname)),
         es.tablename) AS "Definition",
         CASE WHEN es.n_distinct IS NOT NULL THEN 'built'
              WHEN 'd' = any(es.kinds) THEN 'requested'
         END AS "Ndistinct",
         CASE WHEN es.dependencies IS NOT NULL THEN 'built'
              WHEN 'f' = any(es.kinds) THEN 'requested'
         END AS "Dependencies",
         CASE WHEN es.most_common_vals IS NOT NULL THEN 'built'
              WHEN 'm' = any(es.kinds) THEN 'requested'
         END AS "MCV",
         CASE WHEN es.n_distinct IS NOT NULL THEN
                   pg_catalog.pg_size_pretty(pg_catalog.length(es.n_distinct)::bigint)
              WHEN 'd' = any(es.kinds) THEN '0 bytes'
         END AS "Ndistinct_size",
         CASE WHEN es.dependencies IS NOT NULL THEN
                   pg_catalog.pg_size_pretty(pg_catalog.length(es.dependencies)::bigint)
              WHEN 'f' = any(es.kinds) THEN '0 bytes'
         END AS "Dependencies_size"
         FROM pg_catalog.pg_stats_ext es
         ORDER BY 1, 2;

-[ RECORD 1 ]-----+-----------------
Schema            | public
Name              | hoge_t_ext
Definition        | a, b FROM hoge_t
Ndistinct         | requested
Dependencies      | requested
MCV               | requested
Ndistinct_size    | 0 bytes
Dependencies_size | 0 bytes

analyze hoge_t;

-[ RECORD 1 ]-----+-----------------
Schema            | public
Name              | hoge_t_ext
Definition        | a, b FROM hoge_t
Ndistinct         | built
Dependencies      | built
MCV               | built
Ndistinct_size    | 13 bytes
Dependencies_size | 40 bytes
===================================================

Thanks,
Tatsuro Yamada






Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Justin,

On 2021/01/18 1:52, Justin Pryzby wrote:
> On Sun, Jan 17, 2021 at 03:31:57PM +0100, Tomas Vondra wrote:
>> I've reverted the commit - once we find the right way to handle this, I'll
>> get it committed again.
> 
> Please consider these doc changes for the next iteration.
> 
> commit 1a69f648ce6c63ebb37b6d8ec7c6539b3cb70787
> Author: Justin Pryzby <pryzbyj@telsasoft.com>
> Date:   Sat Jan 16 17:47:35 2021 -0600
> 
>      doc review: psql \dX 891a1d0bca262ca78564e0fea1eaa5ae544ea5ee

Thanks for your comments!
It helps a lot since I'm not a native speaker.
I'll fix the document based on your suggestion on the next patch.

  
> diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
> index aaf55df921..a678a69dfb 100644
> --- a/doc/src/sgml/ref/psql-ref.sgml
> +++ b/doc/src/sgml/ref/psql-ref.sgml
> @@ -1928,15 +1928,15 @@ testdb=>
>           is specified, only those extended statistics whose names match the
>           pattern are listed.
>           If <literal>+</literal> is appended to the command name, each extended
> -        statistics is listed with its size.
> +        statistic is listed with its size.

Agreed.

    
>           <para>
> -        The column of the kind of extended stats (e.g. Ndistinct) shows some statuses.
> +        The column of the kind of extended stats (e.g. Ndistinct) shows its status.
>           "requested" means that it needs to collect statistics by <link
>           linkend="sql-analyze"><command>ANALYZE</command></link>.
>           "built" means <link linkend="sql-analyze"><command>ANALYZE</command></link> was

Agreed.


> -        finished, and the planner can use it. NULL means that it doesn't exists.
> +        run, and statistics are available to the planner. NULL means that it doesn't exist.


Agreed.


Thanks,
Tatsuro Yamada




Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/18/21 8:31 AM, Tatsuro Yamada wrote:
> Hi Tomas and Shinoda-san,
> 
> On 2021/01/17 23:31, Tomas Vondra wrote:
>>
>>
>> On 1/17/21 3:01 AM, Tomas Vondra wrote:
>>> On 1/17/21 2:41 AM, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
>>>> Hi, hackers.
>>>>
>>>> I tested this committed feature.
>>>> It doesn't seem to be available to non-superusers due to the 
>>>> inability to access pg_statistics_ext_data.
>>>> Is this the expected behavior?
> 
> 
> Ugh. I overlooked the test to check the case of the user hasn't 
> Superuser privilege. The user without the privilege was able to access 
> pg_statistics_ext. Therefore I supposed that it's also able to access 
> pg_statics_ext_data. Oops.
> 
> 
>>> Hmmm, that's a good point. Bummer we haven't noticed that earlier.
>>>
>>> I wonder what the right fix should be - presumably we could do 
>>> something like pg_stats_ext (we can't use that view directly, because 
>>> it formats the data, so the sizes are different).
>>>
>>> But should it list just the stats the user has access to, or should 
>>> it list everything and leave the inaccessible fields NULL?
>>>
>>
>> I've reverted the commit - once we find the right way to handle this, 
>> I'll get it committed again.
>>
>> As for how to deal with this, I can think of about three ways:
>>
>> 1) simplify the command to only print information from 
>> pg_statistic_ext (so on information about which stats are built or sizes)
>>
>> 2) extend pg_stats_ext with necessary information (e.g. sizes)
>>
>> 3) create a new system view, with necessary information (so that 
>> pg_stats_ext does not need to be modified)
>>
>> 4) add functions returning the necessary information, possibly only 
>> for statistics the user can access (similarly to what pg_stats_ext does)
>>
>> Options 2-4 have the obvious disadvantage that this won't work on 
>> older releases (we can't add views or functions there). So I'm leaning 
>> towards #1 even if that means we have to remove some of the details. 
>> We can consider adding that for new releases, though.
> 
> 
> Thanks for the useful advice. I go with option 1).
> The following query is created by using pg_stats_ext instead of 
> pg_statistic_ext and pg_statistic_ext_data. However, I was confused
> about writing a part of the query for calculating MCV size because
> there are four columns related to MCV. For example, most_common_vals, 
> most_common_val_nulls, most_common_freqs, and most_common_base_freqs.
> Currently, I don't know how to calculate the size of MCV by using the
> four columns. Thoughts? :-)

Well, my suggestion was to use pg_statistic_ext, because that lists all 
statistics, while pg_stats_ext is filtering statistics depending on 
access privileges. I think that's more appropriate for \dX, the contents 
should not change depending on the user.

Also, let me clarify - with option (1) we'd not show the sizes at all. 
The size of the formatted statistics may be very different from the 
on-disk representation, so I see no point in showing it in \dX.

We might show other stats (e.g. number of MCV items, or the fraction of 
data represented by the MCV list), but the user can inspect pg_stats_ext 
if needed.

What we might do is to show those stats when a superuser is running this 
command, but I'm not sure that's a good idea (or how difficult would it 
be to implement).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

>>> As for how to deal with this, I can think of about three ways:
>>>
>>> 1) simplify the command to only print information from pg_statistic_ext (so on information about which stats are
builtor sizes)
 
>>>
>>> 2) extend pg_stats_ext with necessary information (e.g. sizes)
>>>
>>> 3) create a new system view, with necessary information (so that pg_stats_ext does not need to be modified)
>>>
>>> 4) add functions returning the necessary information, possibly only for statistics the user can access (similarly
towhat pg_stats_ext does)
 
>>>
>>> Options 2-4 have the obvious disadvantage that this won't work on older releases (we can't add views or functions
there).So I'm leaning towards #1 even if that means we have to remove some of the details. We can consider adding that
fornew releases, though.
 
>>
>>
>> Thanks for the useful advice. I go with option 1).
>> The following query is created by using pg_stats_ext instead of pg_statistic_ext and pg_statistic_ext_data. However,
Iwas confused
 
>> about writing a part of the query for calculating MCV size because
>> there are four columns related to MCV. For example, most_common_vals, most_common_val_nulls, most_common_freqs, and
most_common_base_freqs.
>> Currently, I don't know how to calculate the size of MCV by using the
>> four columns. Thoughts? :-)
> 
> Well, my suggestion was to use pg_statistic_ext, because that lists all statistics, while pg_stats_ext is filtering
statisticsdepending on access privileges. I think that's more appropriate for \dX, the contents should not change
dependingon the user.
 
> 
> Also, let me clarify - with option (1) we'd not show the sizes at all. The size of the formatted statistics may be
verydifferent from the on-disk representation, so I see no point in showing it in \dX.
 
> 
> We might show other stats (e.g. number of MCV items, or the fraction of data represented by the MCV list), but the
usercan inspect pg_stats_ext if needed.
 
> 
> What we might do is to show those stats when a superuser is running this command, but I'm not sure that's a good idea
(orhow difficult would it be to implement).
 


Thanks for clarifying.
I see that your suggestion was to use pg_statistic_ext, not pg_stats_ext.
And we don't need the size of stats.

If that's the case, we also can't get the status of stats since PG12 or later
because we can't use pg_statistic_ext_data, as you know. Therefore, it would be
better to replace the query with the old query that I sent five months ago like this:

# the old query
SELECT
     stxnamespace::pg_catalog.regnamespace AS "Schema",
     stxrelid::pg_catalog.regclass AS "Table",
     stxname AS "Name",
     (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
      FROM pg_catalog.unnest(stxkeys) s(attnum)
      JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
      a.attnum = s.attnum AND NOT attisdropped)) AS "Columns",
     'd' = any(stxkind) AS "Ndistinct",
     'f' = any(stxkind) AS "Dependencies",
     'm' = any(stxkind) AS "MCV"
FROM pg_catalog.pg_statistic_ext stat
ORDER BY 1,2;

  Schema | Table  |    Name    | Columns | Ndistinct | Dependencies | MCV
--------+--------+------------+---------+-----------+--------------+-----
  public | hoge1  | hoge1_ext  | a, b    | t         | t            | t
  public | hoge_t | hoge_t_ext | a, b    | t         | t            | t
(2 rows)


The above query is so simple so that we would better to use the following query:

# This query works on PG10 or later
SELECT
     es.stxnamespace::pg_catalog.regnamespace::text AS "Schema",
     es.stxname AS "Name",
     pg_catalog.format('%s FROM %s',
         (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ')
          FROM pg_catalog.unnest(es.stxkeys) s(attnum)
          JOIN pg_catalog.pg_attribute a
          ON (es.stxrelid = a.attrelid
          AND a.attnum = s.attnum
          AND NOT a.attisdropped)),
     es.stxrelid::regclass) AS "Definition",
     CASE WHEN 'd' = any(es.stxkind) THEN 'defined'
     END AS "Ndistinct",
     CASE WHEN 'f' = any(es.stxkind) THEN 'defined'
     END AS "Dependencies",
     CASE WHEN 'm' = any(es.stxkind) THEN 'defined'
     END AS "MCV"
FROM pg_catalog.pg_statistic_ext es
ORDER BY 1, 2;

  Schema |    Name    |    Definition    | Ndistinct | Dependencies | Dependencies
--------+------------+------------------+-----------+--------------+--------------
  public | hoge1_ext  | a, b FROM hoge1  | defined   | defined      | defined
  public | hoge_t_ext | a, b FROM hoge_t | defined   | defined      | defined
(2 rows)


I'm going to create the WIP patch to use the above queriy.
Any comments welcome. :-D

Thanks,
Tatsuro Yamada




Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/19/21 1:44 AM, Tatsuro Yamada wrote:
> Hi Tomas,
> 
>>>> As for how to deal with this, I can think of about three ways:
>>>>
>>>> 1) simplify the command to only print information from 
>>>> pg_statistic_ext (so on information about which stats are built or 
>>>> sizes)
>>>>
>>>> 2) extend pg_stats_ext with necessary information (e.g. sizes)
>>>>
>>>> 3) create a new system view, with necessary information (so that 
>>>> pg_stats_ext does not need to be modified)
>>>>
>>>> 4) add functions returning the necessary information, possibly only 
>>>> for statistics the user can access (similarly to what pg_stats_ext 
>>>> does)
>>>>
>>>> Options 2-4 have the obvious disadvantage that this won't work on 
>>>> older releases (we can't add views or functions there). So I'm 
>>>> leaning towards #1 even if that means we have to remove some of the 
>>>> details. We can consider adding that for new releases, though.
>>>
>>>
>>> Thanks for the useful advice. I go with option 1).
>>> The following query is created by using pg_stats_ext instead of 
>>> pg_statistic_ext and pg_statistic_ext_data. However, I was confused
>>> about writing a part of the query for calculating MCV size because
>>> there are four columns related to MCV. For example, most_common_vals, 
>>> most_common_val_nulls, most_common_freqs, and most_common_base_freqs.
>>> Currently, I don't know how to calculate the size of MCV by using the
>>> four columns. Thoughts? :-)
>>
>> Well, my suggestion was to use pg_statistic_ext, because that lists 
>> all statistics, while pg_stats_ext is filtering statistics depending 
>> on access privileges. I think that's more appropriate for \dX, the 
>> contents should not change depending on the user.
>>
>> Also, let me clarify - with option (1) we'd not show the sizes at all. 
>> The size of the formatted statistics may be very different from the 
>> on-disk representation, so I see no point in showing it in \dX.
>>
>> We might show other stats (e.g. number of MCV items, or the fraction 
>> of data represented by the MCV list), but the user can inspect 
>> pg_stats_ext if needed.
>>
>> What we might do is to show those stats when a superuser is running 
>> this command, but I'm not sure that's a good idea (or how difficult 
>> would it be to implement).
> 
> 
> Thanks for clarifying.
> I see that your suggestion was to use pg_statistic_ext, not pg_stats_ext.
> And we don't need the size of stats.
> 
> If that's the case, we also can't get the status of stats since PG12 or 
> later
> because we can't use pg_statistic_ext_data, as you know. Therefore, it 
> would be
> better to replace the query with the old query that I sent five months 
> ago like this:
> 
> # the old query
> SELECT
>      stxnamespace::pg_catalog.regnamespace AS "Schema",
>      stxrelid::pg_catalog.regclass AS "Table",
>      stxname AS "Name",
>      (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ')
>       FROM pg_catalog.unnest(stxkeys) s(attnum)
>       JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND
>       a.attnum = s.attnum AND NOT attisdropped)) AS "Columns",
>      'd' = any(stxkind) AS "Ndistinct",
>      'f' = any(stxkind) AS "Dependencies",
>      'm' = any(stxkind) AS "MCV"
> FROM pg_catalog.pg_statistic_ext stat
> ORDER BY 1,2;
> 
>   Schema | Table  |    Name    | Columns | Ndistinct | Dependencies | MCV
> --------+--------+------------+---------+-----------+--------------+-----
>   public | hoge1  | hoge1_ext  | a, b    | t         | t            | t
>   public | hoge_t | hoge_t_ext | a, b    | t         | t            | t
> (2 rows)
> 
> 
> The above query is so simple so that we would better to use the 
> following query:
> 
> # This query works on PG10 or later
> SELECT
>      es.stxnamespace::pg_catalog.regnamespace::text AS "Schema",
>      es.stxname AS "Name",
>      pg_catalog.format('%s FROM %s',
>          (SELECT 
> pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ')
>           FROM pg_catalog.unnest(es.stxkeys) s(attnum)
>           JOIN pg_catalog.pg_attribute a
>           ON (es.stxrelid = a.attrelid
>           AND a.attnum = s.attnum
>           AND NOT a.attisdropped)),
>      es.stxrelid::regclass) AS "Definition",
>      CASE WHEN 'd' = any(es.stxkind) THEN 'defined'
>      END AS "Ndistinct",
>      CASE WHEN 'f' = any(es.stxkind) THEN 'defined'
>      END AS "Dependencies",
>      CASE WHEN 'm' = any(es.stxkind) THEN 'defined'
>      END AS "MCV"
> FROM pg_catalog.pg_statistic_ext es
> ORDER BY 1, 2;
> 
>   Schema |    Name    |    Definition    | Ndistinct | Dependencies | 
> Dependencies
> --------+------------+------------------+-----------+--------------+-------------- 
> 
>   public | hoge1_ext  | a, b FROM hoge1  | defined   | defined      | 
> defined
>   public | hoge_t_ext | a, b FROM hoge_t | defined   | defined      | 
> defined
> (2 rows)
> 
> 
> I'm going to create the WIP patch to use the above queriy.
> Any comments welcome. :-D
> 

Yes, I think using this simpler query makes sense. If we decide we need 
something more elaborate, we can improve that by in future PostgreSQL 
versions (after adding view/function to core), but I'd leave that as a 
work for the future.

Apologies for all the extra work - I haven't realized this flaw when 
pushing for showing more stuff :-(


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi,

> The above query is so simple so that we would better to use the following query:
> 
> # This query works on PG10 or later
> SELECT
>      es.stxnamespace::pg_catalog.regnamespace::text AS "Schema",
>      es.stxname AS "Name",
>      pg_catalog.format('%s FROM %s',
>          (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ')
>           FROM pg_catalog.unnest(es.stxkeys) s(attnum)
>           JOIN pg_catalog.pg_attribute a
>           ON (es.stxrelid = a.attrelid
>           AND a.attnum = s.attnum
>           AND NOT a.attisdropped)),
>      es.stxrelid::regclass) AS "Definition",
>      CASE WHEN 'd' = any(es.stxkind) THEN 'defined'
>      END AS "Ndistinct",
>      CASE WHEN 'f' = any(es.stxkind) THEN 'defined'
>      END AS "Dependencies",
>      CASE WHEN 'm' = any(es.stxkind) THEN 'defined'
>      END AS "MCV"
> FROM pg_catalog.pg_statistic_ext es
> ORDER BY 1, 2;
> 
>   Schema |    Name    |    Definition    | Ndistinct | Dependencies | Dependencies
> --------+------------+------------------+-----------+--------------+--------------
>   public | hoge1_ext  | a, b FROM hoge1  | defined   | defined      | defined
>   public | hoge_t_ext | a, b FROM hoge_t | defined   | defined      | defined
> (2 rows)
> 
> 
> I'm going to create the WIP patch to use the above query.
> Any comments welcome. :-D


Attached patch is WIP patch.

The changes are:
   - Use pg_statistic_ext only
   - Remove these statuses: "required" and "built"
   - Add new status: "defined"
   - Remove the size columns
   - Fix document

I'll create and send the regression test on the next patch if there is
no objection. Is it Okay?

Regards,
Tatsuro Yamada


Attachment

Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

On 2021/01/19 11:52, Tomas Vondra wrote:
> 
>> I'm going to create the WIP patch to use the above queriy.
>> Any comments welcome. :-D
> 
> Yes, I think using this simpler query makes sense. If we decide we need something more elaborate, we can improve that
byin future PostgreSQL versions (after adding view/function to core), but I'd leave that as a work for the future.
 


I see, thanks!


> Apologies for all the extra work - I haven't realized this flaw when pushing for showing more stuff :-(


Don't worry about it. We didn't notice the problem even when viewed by multiple
people on -hackers. Let's keep moving forward. :-D

I'll send a patch including a regression test on the next patch.

Regards,
Tatsuro Yamada




Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas,

On 2021/01/20 11:35, Tatsuro Yamada wrote:
>> Apologies for all the extra work - I haven't realized this flaw when pushing for showing more stuff :-(
> 
> Don't worry about it. We didn't notice the problem even when viewed by multiple
> people on -hackers. Let's keep moving forward. :-D
> 
> I'll send a patch including a regression test on the next patch.


I created patches and my test results on PG10, 11, 12, and 14 are fine.

   0001:
     - Fix query to use pg_statistic_ext only
     - Replace statuses "required" and "built" with "defined"
     - Remove the size columns
     - Fix document
     - Add schema name as a filter condition on the query

   0002:
     - Fix all results of \dX
     - Add new testcase by non-superuser

Please find attached files. :-D


Regards,
Tatsuro Yamada

Attachment

Re: list of extended statistics on psql

From
Tomas Vondra
Date:

On 1/20/21 7:41 AM, Tatsuro Yamada wrote:
> Hi Tomas,
> 
> On 2021/01/20 11:35, Tatsuro Yamada wrote:
>>> Apologies for all the extra work - I haven't realized this flaw when 
>>> pushing for showing more stuff :-(
>>
>> Don't worry about it. We didn't notice the problem even when viewed by 
>> multiple
>> people on -hackers. Let's keep moving forward. :-D
>>
>> I'll send a patch including a regression test on the next patch.
> 
> 
> I created patches and my test results on PG10, 11, 12, and 14 are fine.
> 
>    0001:
>      - Fix query to use pg_statistic_ext only
>      - Replace statuses "required" and "built" with "defined"
>      - Remove the size columns
>      - Fix document
>      - Add schema name as a filter condition on the query
> 
>    0002:
>      - Fix all results of \dX
>      - Add new testcase by non-superuser
> 
> Please find attached files. :-D

Thanks, I've pushed this. I had to tweak the regression tests a bit, for 
two reasons:

1) to change user in regression tests, don't use \connect, but SET ROLE 
and RESET ROLE

2) roles in regression tests should use names with regress_ prefix



regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql

From
Tatsuro Yamada
Date:
Hi Tomas and hackers,

On 2021/01/21 7:00, Tomas Vondra wrote:
>> I created patches and my test results on PG10, 11, 12, and 14 are fine.
>>
>>    0001:
>>      - Fix query to use pg_statistic_ext only
>>      - Replace statuses "required" and "built" with "defined"
>>      - Remove the size columns
>>      - Fix document
>>      - Add schema name as a filter condition on the query
>>
>>    0002:
>>      - Fix all results of \dX
>>      - Add new testcase by non-superuser
>>
>> Please find attached files. :-D
> 
> Thanks, I've pushed this. I had to tweak the regression tests a bit, for two reasons:
> 
> 1) to change user in regression tests, don't use \connect, but SET ROLE and RESET ROLE
> 
> 2) roles in regression tests should use names with regress_ prefix


Thanks for reviewing many times and committing the feature!

I understood 1) and 2). I'll keep that in mind for the next developing patch.
Then, If possible, could you add Justin to the commit message as a reviewer?
Because I revised the document partly based on his comments.

Finally, As extended stats were more used, this feature becomes more useful.
I hope it helps DBA. :-D


Thanks,
Tatsuro Yamada





Re: list of extended statistics on psql (\dX)

From
Tomas Vondra
Date:

On 5/30/21 7:24 PM, Justin Pryzby wrote:
> On Wed, Jan 20, 2021 at 11:00:50PM +0100, Tomas Vondra wrote:
>> Thanks, I've pushed this. I had to tweak the regression tests a bit, for two
>> reasons:
> 
> \dX isn't checking schema visibility rules, so accidentally shows stats objects
> outside of the search path.  I noticed after installing the PG14b1 client,
> since we create stats objects in a separate schema to allow excluding them with
> pg_dump -N.
> 
> diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
> index 195f8d8cd2..e29f13c65e 100644
> --- a/src/bin/psql/describe.c
> +++ b/src/bin/psql/describe.c
> @@ -4774,7 +4774,7 @@ listExtendedStats(const char *pattern)
>      processSQLNamePattern(pset.db, &buf, pattern,
>                            false, false,
>                            "es.stxnamespace::pg_catalog.regnamespace::text", "es.stxname",
> -                          NULL, NULL);
> +                          NULL, "pg_catalog.pg_statistics_obj_is_visible(es.oid)");
>  
>      appendPQExpBufferStr(&buf, "ORDER BY 1, 2;");
>  

Thanks for noticing this! Will push.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql (\dX)

From
Tomas Vondra
Date:
Hi,

Here's a slightly more complete patch, tweaking the regression tests a
bit to detect this.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

Re: list of extended statistics on psql (\dX)

From
Tatsuro Yamada
Date:
Hi Tomas and Justin,

On 2021/06/07 4:47, Tomas Vondra wrote:
> Here's a slightly more complete patch, tweaking the regression tests a
> bit to detect this.


I tested your patch on PG14beta2 and PG15devel.
And they work fine.
=======================
  All 209 tests passed.
=======================

Next time I create a feature on psql, I will be careful to add
a check for schema visibility rules. :-D

Thanks,
Tatsuro Yamada






Re: list of extended statistics on psql (\dX)

From
Tomas Vondra
Date:
Hi,

I've pushed the last version of the fix, including the regression tests 
etc. Backpatch to 14, where \dX was introduced.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: list of extended statistics on psql (\dX)

From
Tatsuro Yamada
Date:
Hi Tomas and Justin,

On 2021/07/27 4:26, Tomas Vondra wrote:
> Hi,
> 
> I've pushed the last version of the fix, including the regression tests etc. Backpatch to 14, where \dX was
introduced.


Thank you!


Regards,
Tatsuro Yamada