Thread: list of extended statistics on psql
Hi! I created a POC patch that allows showing a list of extended statistics by "\dz" command on psql. I believe this feature helps DBA and users who would like to know all extended statistics easily. :-D I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*" than "\dz" but they were already assigned. Therefore I used "\dz" instead of them. Please find the attached patch. Any comments are welcome! For Example: ======================= CREATE TABLE t1 (a INT, b INT); CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1; CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1; CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1; ANALYZE t1; CREATE TABLE t2 (a INT, b INT, c INT); CREATE STATISTICS stts4 ON b, c FROM t2; ANALYZE t2; postgres=# \dz List of extended statistics Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV --------+-------+-------+---------+-----------+--------------+----- public | t1 | stts1 | a, b | f | t | f public | t1 | stts2 | a, b | t | t | f public | t1 | stts3 | a, b | t | t | t public | t2 | stts4 | b, c | t | t | t (4 rows) postgres=# \? ... \dy [PATTERN] list event triggers \dz [PATTERN] list extended statistics \l[+] [PATTERN] list databases ... ======================= For now, I haven't written a document and regression test for that. I'll create it later. Thanks, Tatsuro Yamada
Attachment
po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> napsal:
Hi!
I created a POC patch that allows showing a list of extended statistics by
"\dz" command on psql. I believe this feature helps DBA and users who
would like to know all extended statistics easily. :-D
I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*"
than "\dz" but they were already assigned. Therefore I used "\dz"
instead of them.
Please find the attached patch.
Any comments are welcome!
For Example:
=======================
CREATE TABLE t1 (a INT, b INT);
CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1;
CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1;
CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1;
ANALYZE t1;
CREATE TABLE t2 (a INT, b INT, c INT);
CREATE STATISTICS stts4 ON b, c FROM t2;
ANALYZE t2;
postgres=# \dz
List of extended statistics
Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV
--------+-------+-------+---------+-----------+--------------+-----
public | t1 | stts1 | a, b | f | t | f
public | t1 | stts2 | a, b | t | t | f
public | t1 | stts3 | a, b | t | t | t
public | t2 | stts4 | b, c | t | t | t
(4 rows)
postgres=# \?
...
\dy [PATTERN] list event triggers
\dz [PATTERN] list extended statistics
\l[+] [PATTERN] list databases
...
=======================
For now, I haven't written a document and regression test for that.
I'll create it later.
+1 good idea
Pavel
Thanks,
Tatsuro Yamada
On Mon, Aug 24, 2020 at 6:13 AM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > po 24. 8. 2020 v 5:23 odesílatel Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> napsal: >> >> Hi! >> >> I created a POC patch that allows showing a list of extended statistics by >> "\dz" command on psql. I believe this feature helps DBA and users who >> would like to know all extended statistics easily. :-D >> >> I have not a strong opinion to assign "\dz". I prefer "\dx" or "\de*" >> than "\dz" but they were already assigned. Therefore I used "\dz" >> instead of them. >> >> Please find the attached patch. >> Any comments are welcome! >> >> For Example: >> ======================= >> CREATE TABLE t1 (a INT, b INT); >> CREATE STATISTICS stts1 (dependencies) ON a, b FROM t1; >> CREATE STATISTICS stts2 (dependencies, ndistinct) ON a, b FROM t1; >> CREATE STATISTICS stts3 (dependencies, ndistinct, mcv) ON a, b FROM t1; >> ANALYZE t1; >> >> CREATE TABLE t2 (a INT, b INT, c INT); >> CREATE STATISTICS stts4 ON b, c FROM t2; >> ANALYZE t2; >> >> postgres=# \dz >> List of extended statistics >> Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV >> --------+-------+-------+---------+-----------+--------------+----- >> public | t1 | stts1 | a, b | f | t | f >> public | t1 | stts2 | a, b | t | t | f >> public | t1 | stts3 | a, b | t | t | t >> public | t2 | stts4 | b, c | t | t | t >> (4 rows) >> >> postgres=# \? >> ... >> \dy [PATTERN] list event triggers >> \dz [PATTERN] list extended statistics >> \l[+] [PATTERN] list databases >> ... >> ======================= >> >> For now, I haven't written a document and regression test for that. >> I'll create it later. > > > +1 good idea +1 that's a good idea. Please add it to the next commitfest! You have a typo: + if (pset.sversion < 10000) + { + char sverbuf[32]; + + pg_log_error("The server (version %s) does not support extended statistics.", + formatPGVersionNumber(pset.sversion, false, + sverbuf, sizeof(sverbuf))); + return true; + } the version test is missing a 0, the feature looks otherwise ok. How about using \dX rather than \dz?
Hi! >> +1 good idea > > +1 that's a good idea. Please add it to the next commitfest! Thanks! > You have a typo: > > + if (pset.sversion < 10000) > + { > + char sverbuf[32]; > + > + pg_log_error("The server (version %s) does not support > extended statistics.", > + formatPGVersionNumber(pset.sversion, false, > + sverbuf, sizeof(sverbuf))); > + return true; > + } > > the version test is missing a 0, the feature looks otherwise ok. Ouch, I fixed on the attached patch. The new patch includes: - Fix the version number check (10000 -> 100000) - Fix query to get extended stats info for sort order - Add handling [Pattern] e.g \dz stts* - Add document and regression test for \dz > How about using \dX rather than \dz? Thanks for your suggestion! I'll replace it if I got consensus. :-D Thanks, Tatsuro Yamada
Attachment
Hi Julien and Pavel! >> How about using \dX rather than \dz? > > Thanks for your suggestion! > I'll replace it if I got consensus. :-D >> How about using \dX rather than \dz? > >Thanks for your suggestion! >I'll replace it if I got consensus. :-D I re-read a help message of \d* commands and realized it's better to use "\dX". There are already cases where the commands differ due to differences in case, so I did the same way. Please find attached patch. :-D For example: ========== \da[S] [PATTERN] list aggregates \dA[+] [PATTERN] list access methods ========== Attached patch uses "\dX" instead of "\dz": ========== \dx[+] [PATTERN] list extensions \dX [PATTERN] list extended statistics ========== Results of regress test of the feature are the following: ========== -- check printing info about extended statistics create table t1 (a int, b int); create statistics stts_1 (dependencies) on a, b from t1; create statistics stts_2 (dependencies, ndistinct) on a, b from t1; create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1; create table t2 (a int, b int, c int); create statistics stts_4 on b, c from t2; create table hoge (col1 int, col2 int, col3 int); create statistics stts_hoge on col1, col2, col3 from hoge; \dX List of extended statistics Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV --------+-------+-----------+------------------+-----------+--------------+----- public | hoge | stts_hoge | col1, col2, col3 | t | t | t public | t1 | stts_1 | a, b | f | t | f public | t1 | stts_2 | a, b | t | t | f public | t1 | stts_3 | a, b | t | t | t public | t2 | stts_4 | b, c | t | t | t (5 rows) \dX stts_? List of extended statistics Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV --------+-------+--------+---------+-----------+--------------+----- public | t1 | stts_1 | a, b | f | t | f public | t1 | stts_2 | a, b | t | t | f public | t1 | stts_3 | a, b | t | t | t public | t2 | stts_4 | b, c | t | t | t (4 rows) \dX *hoge List of extended statistics Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV --------+-------+-----------+------------------+-----------+--------------+----- public | hoge | stts_hoge | col1, col2, col3 | t | t | t (1 row) ========== Thanks, Tatsuro Yamada
Attachment
Hi Yamada-san, On Thu, Aug 27, 2020 at 03:13:09PM +0900, Tatsuro Yamada wrote: > > I re-read a help message of \d* commands and realized it's better to > use "\dX". > There are already cases where the commands differ due to differences > in case, so I did the same way. Please find attached patch. :-D > For example: > ========== > \da[S] [PATTERN] list aggregates > \dA[+] [PATTERN] list access methods > ========== > > Attached patch uses "\dX" instead of "\dz": > ========== > \dx[+] [PATTERN] list extensions > \dX [PATTERN] list extended statistics > ========== Thanks for updating the patch! This alias will probably be easier to remember. > > Results of regress test of the feature are the following: > ========== > -- check printing info about extended statistics > create table t1 (a int, b int); > create statistics stts_1 (dependencies) on a, b from t1; > create statistics stts_2 (dependencies, ndistinct) on a, b from t1; > create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1; > create table t2 (a int, b int, c int); > create statistics stts_4 on b, c from t2; > create table hoge (col1 int, col2 int, col3 int); > create statistics stts_hoge on col1, col2, col3 from hoge; > > \dX > List of extended statistics > Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV > --------+-------+-----------+------------------+-----------+--------------+----- > public | hoge | stts_hoge | col1, col2, col3 | t | t | t > public | t1 | stts_1 | a, b | f | t | f > public | t1 | stts_2 | a, b | t | t | f > public | t1 | stts_3 | a, b | t | t | t > public | t2 | stts_4 | b, c | t | t | t > (5 rows) > > \dX stts_? > List of extended statistics > Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV > --------+-------+--------+---------+-----------+--------------+----- > public | t1 | stts_1 | a, b | f | t | f > public | t1 | stts_2 | a, b | t | t | f > public | t1 | stts_3 | a, b | t | t | t > public | t2 | stts_4 | b, c | t | t | t > (4 rows) > > \dX *hoge > List of extended statistics > Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV > --------+-------+-----------+------------------+-----------+--------------+----- > public | hoge | stts_hoge | col1, col2, col3 | t | t | t > (1 row) > ========== Thanks also for the documentation and regression tests. This overall looks good, I just have a two comments: - there's a whitespace issue in the documentation part: add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent. <varlistentry> warning: 1 line adds whitespace errors. - You're sorting the output on schema, table, extended statistics and columns but I think the last one isn't required since extended statistics names are unique.
Hi Julien! > Thanks also for the documentation and regression tests. This overall looks > good, I just have a two comments: Thank you for reviewing the patch! :-D > - there's a whitespace issue in the documentation part: > > add_list_extended_stats_for_psql_by_dX_command.patch:10: tab in indent. > <varlistentry> > warning: 1 line adds whitespace errors. Oops, I forgot to use "git diff --check". I fixed it. > - You're sorting the output on schema, table, extended statistics and columns > but I think the last one isn't required since extended statistics names are > unique. You are right. The sort key "columns" was not necessary so I removed it. Attached new patch includes the above two fixes: - Fix whitespace issue in the documentation part - Remove unnecessary sort key from the query (ORDER BY 1, 2, 3, 4 -> ORDER BY 1, 2, 3) Thanks, Tatsuro Yamada
Attachment
+1 for the general idea, and +1 for \dX being the syntax to use IMO the per-type columns should show both the type being enabled as well as it being built. (How many more stat types do we expect -- Tomas? I wonder if having one column per type is going to scale in the long run.) Also, the stat obj name column should be first, followed by a single column listing both table and columns that it applies to. Keep in mind that in the future we might want to add stats that cross multiple tables -- that's why the CREATE syntax is the way it is. So we should give room for that in psql's display too. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Alvaro! It's been ages since we created a progress reporting feature together. :-D >>> +1 good idea >> >> +1 that's a good idea. Please add it to the next commitfest! > >+1 for the general idea, and +1 for \dX being the syntax to use Thank you for voting! > IMO the per-type columns should show both the type being enabled as well as it being built. Hmm. I'm not sure how to get the status (enabled or disabled) of extended stats. :( Could you explain it more? > Also, the stat obj name column should be first, followed by a single > column listing both table and columns that it applies to. Keep in mind > that in the future we might want to add stats that cross multiple tables > -- that's why the CREATE syntax is the way it is. So we should give > room for that in psql's display too. I understand your suggestions are the following, right? * The Current column order: =================== Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV --------+-------+--------+---------+-----------+--------------+----- public | t1 | stts_1 | a, b | f | t | f public | t1 | stts_2 | a, b | t | t | f public | t1 | stts_3 | a, b | t | t | t public | t2 | stts_4 | b, c | t | t | t =================== * The suggested column order is like this: =================== Name | Schema | Table | Columns | Ndistinct | Dependencies | MCV -----------+--------+-------+------------------+-----------+--------------+----- stts_1 | public | t1 | a, b | f | t | f stts_2 | public | t1 | a, b | t | t | f stts_3 | public | t1 | a, b | t | t | t stts_4 | public | t2 | b, c | t | t | t =================== * In the future, Extended stats that cross multiple tables will be shown maybe... (t1, t2): =================== Name | Schema | Table | Columns | Ndistinct | Dependencies | MCV -----------+--------+--------+------------------+-----------+--------------+----- stts_5 | public | t1, t2 | a, b | f | t | f =================== If so, I can revise the column order as you suggested easily. However, I have no idea how to show extended stats that cross multiple tables and the status now. I suppose that the current column order is sufficient if there is no improvement of extended stats on PG14. Do you know any plan to improve extended stats such as to allow it to cross multiple tables on PG14? In addition, Currently, I use this query to get Extended stats info from pg_statistic_ext. SELECT stxnamespace::pg_catalog.regnamespace AS "Schema", c.relname AS "Table", stxname AS "Name", (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ') FROM pg_catalog.unnest(stxkeys) s(attnum) JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND a.attnum = s.attnum AND NOT attisdropped)) AS "Columns", 'd' = any(stxkind) AS "Ndistinct", 'f' = any(stxkind) AS "Dependencies", 'm' = any(stxkind) AS "MCV" FROM pg_catalog.pg_statistic_ext INNER JOIN pg_catalog.pg_class c ON stxrelid = c.oid ORDER BY 1, 2, 3; Thanks, Tatsuro Yamada
On 2020-Aug-28, Tatsuro Yamada wrote: > > IMO the per-type columns should show both the type being enabled as > > well as it being built. > > Hmm. I'm not sure how to get the status (enabled or disabled) of > extended stats. :( > Could you explain it more? pg_statistic_ext_data.stxdndistinct is not null if the stats have been built. (I'm not sure whether there's an easier way to determine this.) > * The suggested column order is like this: > =================== > Name | Schema | Table | Columns | Ndistinct | Dependencies | MCV > -----------+--------+-------+------------------+-----------+--------------+----- > stts_1 | public | t1 | a, b | f | t | f > stts_2 | public | t1 | a, b | t | t | f > stts_3 | public | t1 | a, b | t | t | t > stts_4 | public | t2 | b, c | t | t | t > =================== I suggest to do this Name | Schema | Definition | Ndistinct | Dependencies | MCV -----------+--------+--------------------------+-----------+--------------+----- stts_1 | public | (a, b) FROM t1 | f | t | f > I suppose that the current column order is sufficient if there is > no improvement of extended stats on PG14. Do you know any plan to > improve extended stats such as to allow it to cross multiple tables on PG14? I suggest that changing it in the future is going to be an uphill battle, so better get it right from the get go, without requiring a future restructure. > In addition, > Currently, I use this query to get Extended stats info from pg_statistic_ext. Maybe something like this would do SELECT stxnamespace::pg_catalog.regnamespace AS "Schema", stxname AS "Name", format('%s FROM %s', (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ') FROM pg_catalog.unnest(stxkeys) s(attnum) JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND a.attnum = s.attnum AND NOT attisdropped)), stxrelid::regclass) AS "Definition", CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS "n-distinct", CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS "functionaldependencies", CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv FROM pg_catalog.pg_statistic_ext es INNER JOIN pg_catalog.pg_class c ON stxrelid = c.oid LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid ORDER BY 1, 2, 3; -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Aug 27, 2020 at 07:53:23PM -0400, Alvaro Herrera wrote: >+1 for the general idea, and +1 for \dX being the syntax to use > >IMO the per-type columns should show both the type being enabled as >well as it being built. > >(How many more stat types do we expect -- Tomas? I wonder if having one >column per type is going to scale in the long run.) > I wouldn't expect a huge number of types. I can imagine maybe twice the current number of types, but not much more. But I'm not sure the output is easy to read even now ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Aug 27, 2020 at 11:26:17PM -0400, Alvaro Herrera wrote: >On 2020-Aug-28, Tatsuro Yamada wrote: > >> > IMO the per-type columns should show both the type being enabled as >> > well as it being built. >> >> Hmm. I'm not sure how to get the status (enabled or disabled) of >> extended stats. :( >> Could you explain it more? > >pg_statistic_ext_data.stxdndistinct is not null if the stats have been >built. (I'm not sure whether there's an easier way to determine this.) > It's the only way, I think. Which types were requested is stored in pg_statistic_ext.stxkind and what was built is in pg_statistic_ext_data. But if we want the output to show both what was requested and which types were actually built, that'll effectively double the number of columns needed :-( Also, it might be useful to show the size of the statistics built, just like we show for \d+ etc. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Aug-29, Tomas Vondra wrote: > But if we want the > output to show both what was requested and which types were actually > built, that'll effectively double the number of columns needed :-( I was thinking it would be one column per type showing either disabled or enabled or built. But another idea is to show one type per line that's at least enabled. > Also, it might be useful to show the size of the statistics built, just > like we show for \d+ etc. \dX+ I suppose? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote: >On 2020-Aug-29, Tomas Vondra wrote: > >> But if we want the >> output to show both what was requested and which types were actually >> built, that'll effectively double the number of columns needed :-( > >I was thinking it would be one column per type showing either disabled or enabled >or built. But another idea is to show one type per line that's at least >enabled. > >> Also, it might be useful to show the size of the statistics built, just >> like we show for \d+ etc. > >\dX+ I suppose? > Right. I've only used \d+ as an example of an existing command showing sizes of the objects. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Aug-30, Tomas Vondra wrote: > On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote: > > On 2020-Aug-29, Tomas Vondra wrote: > > > Also, it might be useful to show the size of the statistics built, just > > > like we show for \d+ etc. > > > > \dX+ I suppose? > > Right. I've only used \d+ as an example of an existing command showing > sizes of the objects. Yeah, I understood it that way too. How can you measure the size of the stat objects in a query? Are you thinking in pg_column_size()? I wonder how to report that. Knowing that psql \-commands are not meant for anything other than human consumption, maybe we can use a format() string that says "built: %d bytes" when \dX+ is used (for each stat type), and just "built" when \dX is used. What do people think about this? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote: >On 2020-Aug-30, Tomas Vondra wrote: > >> On Sat, Aug 29, 2020 at 06:43:47PM -0400, Alvaro Herrera wrote: >> > On 2020-Aug-29, Tomas Vondra wrote: > >> > > Also, it might be useful to show the size of the statistics built, just >> > > like we show for \d+ etc. >> > >> > \dX+ I suppose? >> >> Right. I've only used \d+ as an example of an existing command showing >> sizes of the objects. > >Yeah, I understood it that way too. > >How can you measure the size of the stat objects in a query? Are you >thinking in pg_column_size()? > Either that or simply length() on the bytea value. >I wonder how to report that. Knowing that psql \-commands are not meant >for anything other than human consumption, maybe we can use a format() >string that says "built: %d bytes" when \dX+ is used (for each stat type), >and just "built" when \dX is used. What do people think about this? > I'd use the same approach as \d+, i.e. a separate column with the size. Maybe that'd mean too many columns, though. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote: >> I wonder how to report that. Knowing that psql \-commands are not meant >> for anything other than human consumption, maybe we can use a format() >> string that says "built: %d bytes" when \dX+ is used (for each stat type), >> and just "built" when \dX is used. What do people think about this? Seems a little too cute to me. > I'd use the same approach as \d+, i.e. a separate column with the size. > Maybe that'd mean too many columns, though. psql already has \d commands with so many columns that you pretty much have to use \x mode to make them legible; \df+ for instance. I don't mind if \dX+ is also in that territory. It'd be good though if plain \dX can fit in a normal terminal window. regards, tom lane
Hi Alvaro, >>> IMO the per-type columns should show both the type being enabled as >>> well as it being built. >> >> Hmm. I'm not sure how to get the status (enabled or disabled) of >> extended stats. :( >> Could you explain it more? > > pg_statistic_ext_data.stxdndistinct is not null if the stats have been > built. (I'm not sure whether there's an easier way to determine this.) Ah.. I see! Thank you. > I suggest to do this > > Name | Schema | Definition | Ndistinct | Dependencies | MCV > -----------+--------+--------------------------+-----------+--------------+----- > stts_1 | public | (a, b) FROM t1 | f | t | f > >> I suppose that the current column order is sufficient if there is >> no improvement of extended stats on PG14. Do you know any plan to >> improve extended stats such as to allow it to cross multiple tables on PG14? > > I suggest that changing it in the future is going to be an uphill > battle, so better get it right from the get go, without requiring a > future restructure. I understand your suggestions. I'll replace "Columns" and "Table" columns with "Definition" column. >> Currently, I use this query to get Extended stats info from pg_statistic_ext. > > Maybe something like this would do > > SELECT > stxnamespace::pg_catalog.regnamespace AS "Schema", > stxname AS "Name", > format('%s FROM %s', > (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ') > FROM pg_catalog.unnest(stxkeys) s(attnum) > JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND > a.attnum = s.attnum AND NOT attisdropped)), > stxrelid::regclass) AS "Definition", > CASE WHEN stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'enabled, not built' END AS "n-distinct", > CASE WHEN stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'enabled, not built' END AS "functionaldependencies", > CASE WHEN stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'enabled, not built' END AS mcv > FROM pg_catalog.pg_statistic_ext es > INNER JOIN pg_catalog.pg_class c > ON stxrelid = c.oid > LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid > ORDER BY 1, 2, 3; Great! It helped me a lot to understand your suggestions correctly. Thanks. :-D I got the below results by your query. ======== create table t1 (a int, b int); create statistics stts_1 (dependencies) on a, b from t1; create statistics stts_2 (dependencies, ndistinct) on a, b from t1; create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1; create table t2 (a int, b int, c int); create statistics stts_4 on b, c from t2; create table hoge (col1 int, col2 int, col3 int); create statistics stts_hoge on col1, col2, col3 from hoge; insert into t1 select i,i from generate_series(1,100) i; analyze t1; Your query gave this result: Schema | Name | Definition | n-distinct | functional dependencies | mcv --------+-----------+----------------------------+--------------------+-------------------------+-------------------- public | stts_1 | a, b FROM t1 | | built | public | stts_2 | a, b FROM t1 | built | built | public | stts_3 | a, b FROM t1 | built | built | built public | stts_4 | b, c FROM t2 | enabled, not built | enabled, not built | enabled, not built public | stts_hoge | col1, col2, col3 FROM hoge | enabled, not built | enabled, not built | enabled, not built (5 rows) ======== I guess "enabled, not built" is a little redundant. The status would better to have three patterns: "built", "not built" or nothing (NULL) like these: - "built": extended stats is defined and built (collected by analyze cmd) - "not built": extended stats is defined but have not built yet - nothing (NULL): extended stats is not defined What do you think about it? I will send a new patch including : - Replace "Columns" and "Table" column with "Definition" - Show the status (built/not built/null) of extended stats by using pg_statistic_ext_data Thanks, Tatsuro Yamada
On 2020/08/31 1:59, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote: >>> I wonder how to report that. Knowing that psql \-commands are not meant >>> for anything other than human consumption, maybe we can use a format() >>> string that says "built: %d bytes" when \dX+ is used (for each stat type), >>> and just "built" when \dX is used. What do people think about this? > > Seems a little too cute to me. > >> I'd use the same approach as \d+, i.e. a separate column with the size. >> Maybe that'd mean too many columns, though. > > psql already has \d commands with so many columns that you pretty much > have to use \x mode to make them legible; \df+ for instance. I don't > mind if \dX+ is also in that territory. It'd be good though if plain > \dX can fit in a normal terminal window. Hmm. How about these instead of "built: %d bytes"? I added three columns (N_size, D_size, M_size) to show size. See below: =================== postgres=# \dX List of extended statistics Schema | Name | Definition | N_distinct | Dependencies | Mcv --------+-----------+----------------------------+------------+--------------+----------- public | stts_1 | a, b FROM t1 | | built | public | stts_2 | a, b FROM t1 | built | built | public | stts_3 | a, b FROM t1 | built | built | built public | stts_4 | b, c FROM t2 | not built | not built | not built public | stts_hoge | col1, col2, col3 FROM hoge | not built | not built | not built (5 rows) postgres=# \dX+ List of extended statistics Schema | Name | Definition | N_distinct | Dependencies | Mcv | N_size | D_size | M_size --------+-----------+----------------------------+------------+--------------+-----------+--------+--------+-------- public | stts_1 | a, b FROM t1 | | built | | | 40 | public | stts_2 | a, b FROM t1 | built | built | | 13 | 40 | public | stts_3 | a, b FROM t1 | built | built | built | 13 | 40 | 6126 public | stts_4 | b, c FROM t2 | not built | not built | not built | | | public | stts_hoge | col1, col2, col3 FROM hoge | not built | not built | not built | | | =================== I used this query to get results of "\dX+". =================== SELECT stxnamespace::pg_catalog.regnamespace AS "Schema", stxname AS "Name", format('%s FROM %s', (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ') FROM pg_catalog.unnest(stxkeys) s(attnum) JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND a.attnum = s.attnum AND NOT attisdropped)), stxrelid::regclass) AS "Definition", CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'not built' END AS "N_distinct", CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'not built' END AS "Dependencies", CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'not built' END AS "Mcv", pg_catalog.length(stxdndistinct) AS "N_size", pg_catalog.length(stxddependencies) AS "D_size", pg_catalog.length(stxdmcv) AS "M_size" FROM pg_catalog.pg_statistic_ext es INNER JOIN pg_catalog.pg_class c ON stxrelid = c.oid LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid ORDER BY 1, 2; =================== Attached patch includes: - Replace "Columns" and "Table" column with "Definition" - Show the status (built/not built/null) of extended stats by using pg_statistic_ext_data - Add "\dX+" command to show size of extended stats Please find the attached file! :-D Thanks, Tatsuro Yamada
Attachment
On 2020-Aug-30, Tomas Vondra wrote: > On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote: > > I wonder how to report that. Knowing that psql \-commands are not meant > > for anything other than human consumption, maybe we can use a format() > > string that says "built: %d bytes" when \dX+ is used (for each stat type), > > and just "built" when \dX is used. What do people think about this? > > I'd use the same approach as \d+, i.e. a separate column with the size. > Maybe that'd mean too many columns, though. Are you thinking in one size for all stats, or a combined size? If the former, then yes it'd be too many columns. I'm trying to figure out what can the user *do* with that data. Can they make the sample size smaller/bigger if the stats data is too large? Can they do that for each individual stats type? If so, it'd make sense to list each type's size separately. If we do put each type in its own row -- at least "logical" row, say string_agg(unnest(array_of_types), '\n') -- then we can put the size of each type in a separate column with string_agg(unnest(array_of_sizes), '\n') statname | definition | type | size ----------+-----------------+--------------------------+----------- someobj | (a, b) FROM tab | n-distinct: built | 2000 bytes | func-dependencies: built | 4000 bytes another | (a, c) FROM tab | n-distint: enabled | <null> -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > If we do put each type in its own row -- at least "logical" row, say > string_agg(unnest(array_of_types), '\n') -- then we can put the size of each type > in a separate column with string_agg(unnest(array_of_sizes), '\n') > statname | definition | type | size > ----------+-----------------+--------------------------+----------- > someobj | (a, b) FROM tab | n-distinct: built | 2000 bytes > | func-dependencies: built | 4000 bytes > another | (a, c) FROM tab | n-distint: enabled | <null> I guess I'm wondering why the size is of such interest that we need it at all here. regards, tom lane
On Mon, Aug 31, 2020 at 10:58:11AM -0400, Tom Lane wrote: >Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> If we do put each type in its own row -- at least "logical" row, say >> string_agg(unnest(array_of_types), '\n') -- then we can put the size of each type >> in a separate column with string_agg(unnest(array_of_sizes), '\n') > >> statname | definition | type | size >> ----------+-----------------+--------------------------+----------- >> someobj | (a, b) FROM tab | n-distinct: built | 2000 bytes >> | func-dependencies: built | 4000 bytes >> another | (a, c) FROM tab | n-distint: enabled | <null> > >I guess I'm wondering why the size is of such interest that we >need it at all here. > I agree it may not be important enough. I did use it during development etc. but maybe it's not something we need to include in this list (even if it's just in the \dX+ variant). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Aug 31, 2020 at 10:28:38AM -0400, Alvaro Herrera wrote: >On 2020-Aug-30, Tomas Vondra wrote: > >> On Sun, Aug 30, 2020 at 12:33:29PM -0400, Alvaro Herrera wrote: > >> > I wonder how to report that. Knowing that psql \-commands are not meant >> > for anything other than human consumption, maybe we can use a format() >> > string that says "built: %d bytes" when \dX+ is used (for each stat type), >> > and just "built" when \dX is used. What do people think about this? >> >> I'd use the same approach as \d+, i.e. a separate column with the size. >> Maybe that'd mean too many columns, though. > >Are you thinking in one size for all stats, or a combined size? If the >former, then yes it'd be too many columns. > I wonder if trying to list info about all stats from the statistics object in a single line is necessary. Maybe we should split the info into one line per statistics, so for example CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ... would result in three lines in the \dX output. The statistics name would identify which lines belong together, but other than that the pieces are mostly independent. This would make it somewhat future-proof in case we add more statistics types, because the number of columns would not increase. OTOH maybe it's pointless and/or against the purpose of listing statistics objects. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Aug-31, Tomas Vondra wrote: > I wonder if trying to list info about all stats from the statistics > object in a single line is necessary. Maybe we should split the info > into one line per statistics, so for example > > CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ... > > would result in three lines in the \dX output. The statistics name would > identify which lines belong together, but other than that the pieces are > mostly independent. Yeah, that's what I'm suggesting. I don't think we need to repeat the name/definition for each line though. It might be useful to know how does pspg show a single entry that's split in three lines, though. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Aug 31, 2020 at 12:18:09PM -0400, Alvaro Herrera wrote: >On 2020-Aug-31, Tomas Vondra wrote: > >> I wonder if trying to list info about all stats from the statistics >> object in a single line is necessary. Maybe we should split the info >> into one line per statistics, so for example >> >> CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ... >> >> would result in three lines in the \dX output. The statistics name would >> identify which lines belong together, but other than that the pieces are >> mostly independent. > >Yeah, that's what I'm suggesting. I don't think we need to repeat the >name/definition for each line though. > >It might be useful to know how does pspg show a single entry that's >split in three lines, though. > Ah, I didn't realize you're proposing that - I assumed it's broken simply to make it readable, or something like that. I think the lines are mostly independent, so I'd suggest to include the name of the object on each line. The question is whether this independence will remain true in the future - for example histograms would be built only on data not represented by the MCV list, so there's a close dependency there. Not sure about pspg, and I'm not sure it matters too much. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
po 31. 8. 2020 v 18:32 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
On Mon, Aug 31, 2020 at 12:18:09PM -0400, Alvaro Herrera wrote:
>On 2020-Aug-31, Tomas Vondra wrote:
>
>> I wonder if trying to list info about all stats from the statistics
>> object in a single line is necessary. Maybe we should split the info
>> into one line per statistics, so for example
>>
>> CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ...
>>
>> would result in three lines in the \dX output. The statistics name would
>> identify which lines belong together, but other than that the pieces are
>> mostly independent.
>
>Yeah, that's what I'm suggesting. I don't think we need to repeat the
>name/definition for each line though.
>
>It might be useful to know how does pspg show a single entry that's
>split in three lines, though.
>
Ah, I didn't realize you're proposing that - I assumed it's broken
simply to make it readable, or something like that. I think the lines
are mostly independent, so I'd suggest to include the name of the object
on each line. The question is whether this independence will remain true
in the future - for example histograms would be built only on data not
represented by the MCV list, so there's a close dependency there.
Not sure about pspg, and I'm not sure it matters too much.
pspg almost ignores multiline rows - the horizontal cursor is one row every time. There is only one use case where pspg detects multiline rows - sorts, and pspg ensures correct content for multiline rows displayed in different (than input) order.
Regards
Pavel
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, > >> I wonder if trying to list info about all stats from the statistics > >> object in a single line is necessary. Maybe we should split the info > >> into one line per statistics, so for example > >> > >> CREATE STATISTICS s (mcv, ndistinct, dependencies) ON ... > >> > >> would result in three lines in the \dX output. The statistics name would > >> identify which lines belong together, but other than that the pieces are > >> mostly independent. > > > >Yeah, that's what I'm suggesting. I don't think we need to repeat the > >name/definition for each line though. > > > >It might be useful to know how does pspg show a single entry that's > >split in three lines, though. > > > > Ah, I didn't realize you're proposing that - I assumed it's broken > simply to make it readable, or something like that. I think the lines > are mostly independent, so I'd suggest to include the name of the object > on each line. The question is whether this independence will remain true > in the future - for example histograms would be built only on data not > represented by the MCV list, so there's a close dependency there. > > Not sure about pspg, and I'm not sure it matters too much. > > > pspg almost ignores multiline rows - the horizontal cursor is one row every time. There is only one use case where pspgdetects multiline rows - sorts, and pspg ensures correct content for multiline rows displayed in different (than input)order. I try to summarize the discussion so far. Is my understanding right? Could you revise it if it has something wrong? * Summary 1. "\dX[+]" doesn't display the Size of extended stats since the size is useful only for the development process of the stats. 2. each row should have stats name, definition, type, and status. For example: statname | definition | type | ----------+------------------+---------------------------+ someobj | (a, b) FROM tab | n-distinct: built | someobj | (a, b) FROM tab | func-dependencies: built | someobj | (a, b) FROM tab | mcv: built | sttshoge | (a, b) FROM hoge | n-distinct: required | sttshoge | (a, b) FROM hoge | func-dependencies:required| sttscross| (a, b) FROM t1,t2| n-distinct: required | My opinion is below: For 1., Agreed. I will remove it on the next patch. For 2., I feel the design is not beautiful so I'd like to change it. The reasons are: - I think that even if we expected the number of types increasing two times, each type would be better to put as columns, not lines. Repeating items (the stats name and definition) should be removed. It's okay there are many columns in the future like "\df+" because we can use "\x" mode to display if we need it. - The type column has two kinds of data, the one is stats type and another is status. We know the word "One fact in One place" for data modeling in the RDBMS world so it would be better to divide it. I'd like to suggest the bellow design of the view. statname | definition | n-distinct | func-dependencies | mcv | ----------+------------------+------------+-------------------+-------| someobj | (a, b) FROM tab | built | built | built | sttshoge | (a, b) FROM hoge | required | required | | sttscross| (a, b) FROM t1,t2| required | | | Any thoughts? Thanks, Tatsuro Yamada
On Thu, Sep 03, 2020 at 08:45:17AM +0900, Tatsuro Yamada wrote: > I try to summarize the discussion so far. Could you provide at least a rebased version of the patch? The CF bot is complaning here. -- Michael
Attachment
On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote: > Could you provide at least a rebased version of the patch? The CF bot > is complaning here. Not seeing this answered after two weeks, I have marked the patch as RwF for now. -- Michael
Attachment
Hi Michael-san and Hackers, On 2020/09/30 15:19, Michael Paquier wrote: > On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote: >> Could you provide at least a rebased version of the patch? The CF bot >> is complaning here. > > Not seeing this answered after two weeks, I have marked the patch as > RwF for now. > -- > Michael Sorry for the delayed reply. I re-based the patch on the current head and did some refactoring. I think the size of extended stats are not useful for DBA. Should I remove it? Changes: ======== - Use a keyword "defined" instead of "not built" - Use COALESCE function for size for extended stats Results of \dX and \dX+: ======================== postgres=# \dX List of extended statistics Schema | Name | Definition | N_distinct | Dependencies | Mcv -------------+-----------+-----------------+------------+--------------+--------- public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined hoge1schema | hoge1_ext | a, b FROM hoge1 | built | built | built (2 rows) postgres=# \dX+ List of extended statistics Schema | Name | Definition | N_distinct | Dependencies | Mcv | N_size | D_size | M_size -------------+-----------+-----------------+------------+--------------+---------+--------+--------+-------- public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined | 0 | 0 | 0 hoge1schema | hoge1_ext | a, b FROM hoge1 | built | built | built | 13 | 40 | 6126 (2 rows) Query of \dX+: ============== SELECT stxnamespace::pg_catalog.regnamespace AS "Schema", stxname AS "Name", pg_catalog.format('%s FROM %s', (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ') FROM pg_catalog.unnest(es.stxkeys) s(attnum) JOIN pg_catalog.pg_attribute a ON (es.stxrelid = a.attrelid AND a.attnum = s.attnum AND NOT a.attisdropped)), es.stxrelid::regclass) AS "Definition", CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'defined' END AS "N_distinct", CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'defined' END AS "Dependencies", CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'defined' END AS "Mcv", COALESCE(pg_catalog.length(stxdndistinct), 0) AS "N_size", COALESCE(pg_catalog.length(stxddependencies), 0) AS "D_size", COALESCE(pg_catalog.length(stxdmcv), 0) AS "M_size" FROM pg_catalog.pg_statistic_ext es LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid INNER JOIN pg_catalog.pg_class c ON es.stxrelid = c.oid ORDER BY 1, 2; Regards, Tatsuro Yamada
Attachment
Hi Michael-san and Hackers, On 2020/09/30 15:19, Michael Paquier wrote: > On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote: >> Could you provide at least a rebased version of the patch? The CF bot >> is complaning here. > > Not seeing this answered after two weeks, I have marked the patch as > RwF for now. > -- > Michael Sorry for the delayed reply. I re-based the patch on the current head and did some refactoring. I think the size of extended stats are not useful for DBA. Should I remove it? Changes: ======== - Use a keyword "defined" instead of "not built" - Use COALESCE function for size for extended stats Results of \dX and \dX+: ======================== postgres=# \dX List of extended statistics Schema | Name | Definition | N_distinct | Dependencies | Mcv -------------+-----------+-----------------+------------+--------------+--------- public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined hoge1schema | hoge1_ext | a, b FROM hoge1 | built | built | built (2 rows) postgres=# \dX+ List of extended statistics Schema | Name | Definition | N_distinct | Dependencies | Mcv | N_size | D_size | M_size -------------+-----------+-----------------+------------+--------------+---------+--------+--------+-------- public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined | 0 | 0 | 0 hoge1schema | hoge1_ext | a, b FROM hoge1 | built | built | built | 13 | 40 | 6126 (2 rows) Query of \dX+: ============== SELECT stxnamespace::pg_catalog.regnamespace AS "Schema", stxname AS "Name", pg_catalog.format('%s FROM %s', (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ') FROM pg_catalog.unnest(es.stxkeys) s(attnum) JOIN pg_catalog.pg_attribute a ON (es.stxrelid = a.attrelid AND a.attnum = s.attnum AND NOT a.attisdropped)), es.stxrelid::regclass) AS "Definition", CASE WHEN esd.stxdndistinct IS NOT NULL THEN 'built' WHEN 'd' = any(stxkind) THEN 'defined' END AS "N_distinct", CASE WHEN esd.stxddependencies IS NOT NULL THEN 'built' WHEN 'f' = any(stxkind) THEN 'defined' END AS "Dependencies", CASE WHEN esd.stxdmcv IS NOT NULL THEN 'built' WHEN 'm' = any(stxkind) THEN 'defined' END AS "Mcv", COALESCE(pg_catalog.length(stxdndistinct), 0) AS "N_size", COALESCE(pg_catalog.length(stxddependencies), 0) AS "D_size", COALESCE(pg_catalog.length(stxdmcv), 0) AS "M_size" FROM pg_catalog.pg_statistic_ext es LEFT JOIN pg_catalog.pg_statistic_ext_data esd ON es.oid = esd.stxoid INNER JOIN pg_catalog.pg_class c ON es.stxrelid = c.oid ORDER BY 1, 2; Regards, Tatsuro Yamada
Attachment
Hi, > Results of \dX and \dX+: > ======================== > postgres=# \dX > List of extended statistics > Schema | Name | Definition | N_distinct | Dependencies | Mcv > -------------+-----------+-----------------+------------+--------------+--------- > public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined > hoge1schema | hoge1_ext | a, b FROM hoge1 | built | built | built > (2 rows) I used "Order by 1, 2" on the query but I realized the ordering of result was wrong so I fixed on the attached patch. Please fined the patch file. :-D Regards, Tatsuro Yamada
Attachment
On Wed, Oct 28, 2020 at 03:07:56PM +0900, Tatsuro Yamada wrote: >Hi Michael-san and Hackers, > >On 2020/09/30 15:19, Michael Paquier wrote: >>On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote: >>>Could you provide at least a rebased version of the patch? The CF bot >>>is complaning here. >> >>Not seeing this answered after two weeks, I have marked the patch as >>RwF for now. >>-- >>Michael > > >Sorry for the delayed reply. > >I re-based the patch on the current head and did some >refactoring. >I think the size of extended stats are not useful for DBA. >Should I remove it? > I think it's an interesting / useful information, I'd keep it (in the \dX+ output only, of course). But I think it needs to print the size similarly to \d+, i.e. using pg_size_pretty - that'll include the unit and make it more readable for large stats. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Oct 28, 2020 at 04:20:25PM +0900, Tatsuro Yamada wrote: >Hi, > >>Results of \dX and \dX+: >>======================== >>postgres=# \dX >> List of extended statistics >> Schema | Name | Definition | N_distinct | Dependencies | Mcv >>-------------+-----------+-----------------+------------+--------------+--------- >> public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined >> hoge1schema | hoge1_ext | a, b FROM hoge1 | built | built | built >>(2 rows) > > >I used "Order by 1, 2" on the query but I realized the ordering of >result was wrong so I fixed on the attached patch. >Please fined the patch file. :-D > Thanks. I'll take a look at the beginning of the 2020-11 commitfest, and I hope to get this committed. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Tomas, On 2020/10/29 4:07, Tomas Vondra wrote: > On Wed, Oct 28, 2020 at 04:20:25PM +0900, Tatsuro Yamada wrote: >> Hi, >> >>> Results of \dX and \dX+: >>> ======================== >>> postgres=# \dX >>> List of extended statistics >>> Schema | Name | Definition | N_distinct | Dependencies | Mcv >>> -------------+-----------+-----------------+------------+--------------+--------- >>> public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined >>> hoge1schema | hoge1_ext | a, b FROM hoge1 | built | built | built >>> (2 rows) >> >> >> I used "Order by 1, 2" on the query but I realized the ordering of >> result was wrong so I fixed on the attached patch. >> Please find the patch file. :-D >> > > Thanks. I'll take a look at the beginning of the 2020-11 commitfest, and > I hope to get this committed. Thanks for your reply and I'm glad to hear that. I'm going to revise the patch as possible to get this committed on the next commitfest. Regards, Tatsuro Yamada
Hi Tomas, On 2020/10/29 4:06, Tomas Vondra wrote: > On Wed, Oct 28, 2020 at 03:07:56PM +0900, Tatsuro Yamada wrote: >> Hi Michael-san and Hackers, >> >> On 2020/09/30 15:19, Michael Paquier wrote: >>> On Thu, Sep 17, 2020 at 02:55:31PM +0900, Michael Paquier wrote: >>>> Could you provide at least a rebased version of the patch? The CF bot >>>> is complaning here. >>> >>> Not seeing this answered after two weeks, I have marked the patch as >>> RwF for now. >>> -- >>> Michael >> >> >> Sorry for the delayed reply. >> >> I re-based the patch on the current head and did some >> refactoring. >> I think the size of extended stats are not useful for DBA. >> Should I remove it? >> > > I think it's an interesting / useful information, I'd keep it (in the > \dX+ output only, of course). But I think it needs to print the size > similarly to \d+, i.e. using pg_size_pretty - that'll include the unit > and make it more readable for large stats. Thanks for your comment. I addressed it, so I keep the size of extended stats with the unit. Changes: ======== - Use pg_size_pretty to show the size of extended stats by \dX+ Result of \dX+: =============== Schema | Name | Definition | N_distinct | Dependencies | Mcv | N_Size | D_Size | M_Size -------------+------------+-----------------+------------+--------------+---------+----------+----------+------------ hoge1schema | hoge1_ext | a, b FROM hoge1 | built | built | built | 13 bytes | 40 bytes | 6126 bytes public | hoge1_ext1 | a, b FROM hoge1 | defined | defined | defined | 0 bytes | 0 bytes | 0 bytes public | hoge1_ext2 | a, b FROM hoge1 | defined | | | 0 bytes | | (3 rows) Please find the attached patch. Regards, Tatsuro Yamada
Attachment
Hi, > I addressed it, so I keep the size of extended stats with the unit. > > Changes: > ======== > - Use pg_size_pretty to show the size of extended stats by \dX+ I rebased the patch on the head and also added tab-completion. Any feedback is welcome. Preparing for tests: =========== create table t1 (a int, b int); create statistics stts_1 (dependencies) on a, b from t1; create statistics stts_2 (dependencies, ndistinct) on a, b from t1; create statistics stts_3 (dependencies, ndistinct, mcv) on a, b from t1; create table t2 (a int, b int, c int); create statistics stts_4 on b, c from t2; create table hoge (col1 int, col2 int, col3 int); create statistics stts_hoge on col1, col2, col3 from hoge; create schema foo; create schema yama; create statistics foo.stts_foo on col1, col2 from hoge; create statistics yama.stts_yama (ndistinct, mcv) on col1, col3 from hoge; insert into t1 select i,i from generate_series(1,100) i; analyze t1; Result of \dX: ============== postgres=# \dX List of extended statistics Schema | Name | Definition | N_distinct | Dependencies | Mcv --------+-----------+----------------------------+------------+--------------+--------- foo | stts_foo | col1, col2 FROM hoge | defined | defined | defined public | stts_1 | a, b FROM t1 | | built | public | stts_2 | a, b FROM t1 | built | built | public | stts_3 | a, b FROM t1 | built | built | built public | stts_4 | b, c FROM t2 | defined | defined | defined public | stts_hoge | col1, col2, col3 FROM hoge | defined | defined | defined yama | stts_yama | col1, col3 FROM hoge | defined | | defined (7 rows) Result of \dX+: =============== postgres=# \dX+ List of extended statistics Schema | Name | Definition | N_distinct | Dependencies | Mcv | N_size | D_size | M_size --------+-----------+----------------------------+------------+--------------+---------+----------+----------+------------ foo | stts_foo | col1, col2 FROM hoge | defined | defined | defined | 0 bytes | 0 bytes | 0 bytes public | stts_1 | a, b FROM t1 | | built | | | 40 bytes | public | stts_2 | a, b FROM t1 | built | built | | 13 bytes | 40 bytes | public | stts_3 | a, b FROM t1 | built | built | built | 13 bytes | 40 bytes | 6126 bytes public | stts_4 | b, c FROM t2 | defined | defined | defined | 0 bytes | 0 bytes | 0 bytes public | stts_hoge | col1, col2, col3 FROM hoge | defined | defined | defined | 0 bytes | 0 bytes | 0 bytes yama | stts_yama | col1, col3 FROM hoge | defined | | defined | 0 bytes | | 0 bytes (7 rows) Results of Tab-completion: =============== postgres=# \dX <Tab> foo. pg_toast. stts_2 stts_hoge information_schema. public. stts_3 yama. pg_catalog. stts_1 stts_4 postgres=# \dX+ <Tab> foo. pg_toast. stts_2 stts_hoge information_schema. public. stts_3 yama. pg_catalog. stts_1 stts_4 Regards, Tatsuro Yamada
Attachment
Hi, I took a look at this today, and I think the code is ready, but the regression test needs a bit more work: 1) It's probably better to use somewhat more specific names for the objects, especially when created in public schema. It decreases the chance of a collision with other tests (which may be hard to notice because of timing). I suggest we use "stts_" prefix or something like that, per the attached 0002 patch. (0001 is just the v7 patch) 2) The test is failing intermittently because it's executed in parallel with stats_ext test, which is also creating extended statistics. So depending on the timing the \dX may list some of the stats_ext stuff. I'm not sure what to do about this. Either this part needs to be moved to a separate test executed in a different group, or maybe we should simply move it to stats_ext. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi Tomas, > I took a look at this today, and I think the code is ready, but the > regression test needs a bit more work: Thanks for taking your time. :-D > 1) It's probably better to use somewhat more specific names for the > objects, especially when created in public schema. It decreases the > chance of a collision with other tests (which may be hard to notice > because of timing). I suggest we use "stts_" prefix or something like > that, per the attached 0002 patch. (0001 is just the v7 patch) I agree with your comment. Thanks. > 2) The test is failing intermittently because it's executed in parallel > with stats_ext test, which is also creating extended statistics. So > depending on the timing the \dX may list some of the stats_ext stuff. > I'm not sure what to do about this. Either this part needs to be moved > to a separate test executed in a different group, or maybe we should > simply move it to stats_ext. I thought all tests related to meta-commands exist in psql.sql, but I realize it's not true. For example, the test of \dRp does not exist in psql.sql. Therefore, I moved the regression test of \dX to stats_ext.sql to avoid the test failed in parallel. Attached patches is following: - 0001-v8-Add-dX-command-on-psql.patch - 0002-Add-regression-test-of-dX-to-stats_ext.sql.patch However, I feel the test of \dX is not elegant, so I'm going to try creating another one since it would be better to be aware of the context of existing extended stats tests. Regards, Tatsuro Yamada
Attachment
Hi, >> 2) The test is failing intermittently because it's executed in parallel >> with stats_ext test, which is also creating extended statistics. So >> depending on the timing the \dX may list some of the stats_ext stuff. >> I'm not sure what to do about this. Either this part needs to be moved >> to a separate test executed in a different group, or maybe we should >> simply move it to stats_ext. > > I thought all tests related to meta-commands exist in psql.sql, but I > realize it's not true. For example, the test of \dRp does not exist in > psql.sql. Therefore, I moved the regression test of \dX to stats_ext.sql > to avoid the test failed in parallel. > > Attached patches is following: > - 0001-v8-Add-dX-command-on-psql.patch > - 0002-Add-regression-test-of-dX-to-stats_ext.sql.patch > > However, I feel the test of \dX is not elegant, so I'm going to try > creating another one since it would be better to be aware of the context > of existing extended stats tests. I tried to create another version of the regression test (0003). "\dX" was added after ANALYZE command or SELECT... from pg_statistic_ext. Please find the attached file: - 0003-Add-regression-test-of-dX-to-stats_ext.sql-another-ver Both regression tests 0002 and 0003 are okay for me, I think. Could you choose one? Regards, Tatsuro Yamada
Attachment
Thanks, It's better to always post the whole patch series, so that cfbot can test it properly. Sending just 0003 separately kind breaks that. Also, 0003 seems to only tweak the .sql file, not the expected output, and there actually seems to be two places that mistakenly use \dx (so listing extensions) instead of \dX. I've fixed both issues in the attached patches. However, I think the 0002 tests are better/sufficient - I prefer to keep it compact, not interleaving with the tests testing various other stuff. So I don't intend to commit 0003, unless there's something that I don't see for some reason. The one remaining thing I'm not sure about is naming of the columns with size of statistics - N_size, D_size and M_size does not seem very clear. Any clearer naming will however make the tables wider, though :-/ regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi Tomas, Thanks for your comments and also revising patches. On 2020/11/16 3:22, Tomas Vondra wrote: > It's better to always post the whole patch series, so that cfbot can > test it properly. Sending just 0003 separately kind breaks that. I now understand how "cfbot" works so that I'll take care of that when I send patches. Thanks. > Also, 0003 seems to only tweak the .sql file, not the expected output, > and there actually seems to be two places that mistakenly use \dx (so > listing extensions) instead of \dX. I've fixed both issues in the > attached patches. Oops, sorry about that. > However, I think the 0002 tests are better/sufficient - I prefer to keep > it compact, not interleaving with the tests testing various other stuff. > So I don't intend to commit 0003, unless there's something that I don't > see for some reason. I Agreed. 0002 is easy to modify test cases and check results than 0003. Therefore, I'll go with 0002. > The one remaining thing I'm not sure about is naming of the columns with > size of statistics - N_size, D_size and M_size does not seem very clear. > Any clearer naming will however make the tables wider, though :-/ Yeah, I think so too, but I couldn't get an idea of a suitable name for the columns when I created the patch. I don't prefer a long name but I'll replace the name with it to be clearer. For example, s/N_size/Ndistinct_size/. Please find attached patcheds: - 0001: Replace column names - 0002: Recreate regression test based on 0001 Regards, Tatsuro Yamada
Attachment
Hi Tomas and hackers, > I don't prefer a long name but I'll replace the name with it to be clearer. > For example, s/N_size/Ndistinct_size/. > > Please find attached patcheds: > - 0001: Replace column names > - 0002: Recreate regression test based on 0001 I rebased the patch set on the master (7e5e1bba03), and the regression test is good. Therefore, I changed the status of the patch: "needs review". I know that you proposed the new extended statistics[1], and it probably conflicts with the patch. I hope my patch will get commit before your patch committed to avoid the time of recreating. :-) [1] https://www.postgresql.org/message-id/flat/ad7891d2-e90c-b446-9fe2-7419143847d7%40enterprisedb.com Thanks, Tatsuro Yamada
Attachment
Hi, >I rebased the patch set on the master (7e5e1bba03), and the regression >test is good. Therefore, I changed the status of the patch: "needs review". Happy New Year! I rebased my patches on HEAD. Please find attached files. :-D Thanks, Tatsuro Yamada
Attachment
On 1/5/21 5:26 AM, Tatsuro Yamada wrote: > Hi, > >> I rebased the patch set on the master (7e5e1bba03), and the regression >> test is good. Therefore, I changed the status of the patch: "needs >> review". > > Happy New Year! > > I rebased my patches on HEAD. > Please find attached files. :-D > Thanks, and Happy new year to you too. I almost pushed this, but I have one more question. listExtendedStats first checks the server version, and errors out for pre-10 servers. Shouldn't the logic building query check the server version too, to decide whether to check the MCV stats? Otherwise it won't work on 10 and 11, which does not support the "mcv" stats. I don't recall what exactly is our policy regarding new psql with older server versions, but it seems strange to check for 10.0 and then fail anyway for "supported" versions. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Tomas, > Thanks, and Happy new year to you too. > > I almost pushed this, but I have one more question. listExtendedStats first checks the server version, and errors out forpre-10 servers. Shouldn't the logic building query check the server version too, to decide whether to check the MCV stats?Otherwise it won't work on 10 and 11, which does not support the "mcv" stats. >> I don't recall what exactly is our policy regarding new psql with older server versions, but it seems strange to checkfor 10.0 and then fail anyway for "supported" versions. Thanks for your comments. I overlooked the check for MCV in the logic building query because I created the patch as a new feature on PG14. I'm not sure whether we should do back patch or not. However, I'll add the check on the next patch because it is useful if you decide to do the back patch on PG10, 11, 12, and 13. I wonder the column names added by \dX+ is fine? For example, "Ndistinct_size" and "Dependencies_size". It looks like long names, but acceptable? Regards, Tatsuro Yamada
On 1/7/21 1:46 AM, Tatsuro Yamada wrote: > Hi Tomas, > >> Thanks, and Happy new year to you too. >> >> I almost pushed this, but I have one more question. listExtendedStats >> first checks the server version, and errors out for pre-10 servers. >> Shouldn't the logic building query check the server version too, to >> decide whether to check the MCV stats? Otherwise it won't work on 10 >> and 11, which does not support the "mcv" stats. >>> I don't recall what exactly is our policy regarding new psql with >>> older server versions, but it seems strange to check for 10.0 and >>> then fail anyway for "supported" versions. > > Thanks for your comments. > > I overlooked the check for MCV in the logic building query > because I created the patch as a new feature on PG14. > I'm not sure whether we should do back patch or not. However, I'll > add the check on the next patch because it is useful if you decide to > do the back patch on PG10, 11, 12, and 13. > +1 BTW perhaps a quick look at the other \d commands would show if there are precedents. I didn't have time for that. > I wonder the column names added by \dX+ is fine? For example, > "Ndistinct_size" and "Dependencies_size". It looks like long names, > but acceptable? > Seems acceptable - I don't have a better idea. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2021-Jan-07, Tomas Vondra wrote: > On 1/7/21 1:46 AM, Tatsuro Yamada wrote: > > I overlooked the check for MCV in the logic building query > > because I created the patch as a new feature on PG14. > > I'm not sure whether we should do back patch or not. However, I'll > > add the check on the next patch because it is useful if you decide to > > do the back patch on PG10, 11, 12, and 13. > > BTW perhaps a quick look at the other \d commands would show if there are > precedents. I didn't have time for that. Yes, we do promise that new psql works with older servers. I think we would not backpatch any of this, though. -- Álvaro Herrera
On 1/7/21 3:47 PM, Alvaro Herrera wrote: > On 2021-Jan-07, Tomas Vondra wrote: > >> On 1/7/21 1:46 AM, Tatsuro Yamada wrote: > >>> I overlooked the check for MCV in the logic building query >>> because I created the patch as a new feature on PG14. >>> I'm not sure whether we should do back patch or not. However, I'll >>> add the check on the next patch because it is useful if you decide to >>> do the back patch on PG10, 11, 12, and 13. >> >> BTW perhaps a quick look at the other \d commands would show if there are >> precedents. I didn't have time for that. > > Yes, we do promise that new psql works with older servers. > Yeah, makes sense. That means we need add the check for 12 / MCV. > I think we would not backpatch any of this, though. I wasn't really planning to backpatch any of this, of course. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2021/01/08 0:56, Tomas Vondra wrote: > On 1/7/21 3:47 PM, Alvaro Herrera wrote: >> On 2021-Jan-07, Tomas Vondra wrote: >> >>> On 1/7/21 1:46 AM, Tatsuro Yamada wrote: >> >>>> I overlooked the check for MCV in the logic building query >>>> because I created the patch as a new feature on PG14. >>>> I'm not sure whether we should do back patch or not. However, I'll >>>> add the check on the next patch because it is useful if you decide to >>>> do the back patch on PG10, 11, 12, and 13. >>> >>> BTW perhaps a quick look at the other \d commands would show if there are >>> precedents. I didn't have time for that. >> >> Yes, we do promise that new psql works with older servers. >> > > Yeah, makes sense. That means we need add the check for 12 / MCV. Ah, I got it. I fixed the patch to work with older servers to add the checking versions. And I tested \dX command on older servers (PG10- 13). These results look fine. 0001: Added the check code to handle pre-PG12. It has not MCV and pg_statistic_ext_data. 0002: This patch is the same as the previous patch (not changed). Please find the attached files. >> I wonder the column names added by \dX+ is fine? For example, >> "Ndistinct_size" and "Dependencies_size". It looks like long names, >> but acceptable? >> > > Seems acceptable - I don't have a better idea. I see, thanks! Thanks, Tatsuro Yamada
Attachment
On 1/8/21 12:52 AM, Tatsuro Yamada wrote: > Hi, > > On 2021/01/08 0:56, Tomas Vondra wrote: >> On 1/7/21 3:47 PM, Alvaro Herrera wrote: >>> On 2021-Jan-07, Tomas Vondra wrote: >>> >>>> On 1/7/21 1:46 AM, Tatsuro Yamada wrote: >>> >>>>> I overlooked the check for MCV in the logic building query >>>>> because I created the patch as a new feature on PG14. >>>>> I'm not sure whether we should do back patch or not. However, I'll >>>>> add the check on the next patch because it is useful if you decide to >>>>> do the back patch on PG10, 11, 12, and 13. >>>> >>>> BTW perhaps a quick look at the other \d commands would show if >>>> there are >>>> precedents. I didn't have time for that. >>> >>> Yes, we do promise that new psql works with older servers. >>> >> >> Yeah, makes sense. That means we need add the check for 12 / MCV. > > > Ah, I got it. > I fixed the patch to work with older servers to add the checking > versions. And I tested \dX command on older servers (PG10 - 13). > These results look fine. > > 0001: > Added the check code to handle pre-PG12. It has not MCV and > pg_statistic_ext_data. > 0002: > This patch is the same as the previous patch (not changed). > > Please find the attached files. > OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to squash the patches into a single commit. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/8/21 1:14 AM, Tomas Vondra wrote: > > > On 1/8/21 12:52 AM, Tatsuro Yamada wrote: >> Hi, >> >> On 2021/01/08 0:56, Tomas Vondra wrote: >>> On 1/7/21 3:47 PM, Alvaro Herrera wrote: >>>> On 2021-Jan-07, Tomas Vondra wrote: >>>> >>>>> On 1/7/21 1:46 AM, Tatsuro Yamada wrote: >>>> >>>>>> I overlooked the check for MCV in the logic building query >>>>>> because I created the patch as a new feature on PG14. >>>>>> I'm not sure whether we should do back patch or not. However, I'll >>>>>> add the check on the next patch because it is useful if you decide to >>>>>> do the back patch on PG10, 11, 12, and 13. >>>>> >>>>> BTW perhaps a quick look at the other \d commands would show if >>>>> there are >>>>> precedents. I didn't have time for that. >>>> >>>> Yes, we do promise that new psql works with older servers. >>>> >>> >>> Yeah, makes sense. That means we need add the check for 12 / MCV. >> >> >> Ah, I got it. >> I fixed the patch to work with older servers to add the checking >> versions. And I tested \dX command on older servers (PG10 - 13). >> These results look fine. >> >> 0001: >> Added the check code to handle pre-PG12. It has not MCV and >> pg_statistic_ext_data. >> 0002: >> This patch is the same as the previous patch (not changed). >> >> Please find the attached files. >> > > OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to > squash the patches into a single commit. > Attached is a patch I plan to commit - 0001 is the last submitted version with a couple minor tweaks, mostly in docs/comments, and small rework of branching to be more like the other functions in describe.c. While working on that, I realized that 'defined' might be a bit ambiguous, I initially thought it means 'NOT NULL' (which it does not). I propose to change it to 'requested' instead. Tatsuro, do you agree, or do you think 'defined' is better? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi Tomas, On 2021/01/09 9:01, Tomas Vondra wrote: > On 1/8/21 1:14 AM, Tomas Vondra wrote: >> On 1/8/21 12:52 AM, Tatsuro Yamada wrote: >>> On 2021/01/08 0:56, Tomas Vondra wrote: >>>> On 1/7/21 3:47 PM, Alvaro Herrera wrote: >>>>> On 2021-Jan-07, Tomas Vondra wrote: >>>>>> On 1/7/21 1:46 AM, Tatsuro Yamada wrote: >>>>> >>>>>>> I overlooked the check for MCV in the logic building query >>>>>>> because I created the patch as a new feature on PG14. >>>>>>> I'm not sure whether we should do back patch or not. However, I'll >>>>>>> add the check on the next patch because it is useful if you decide to >>>>>>> do the back patch on PG10, 11, 12, and 13. >>>>>> >>>>>> BTW perhaps a quick look at the other \d commands would show if >>>>>> there are >>>>>> precedents. I didn't have time for that. >>>>> >>>>> Yes, we do promise that new psql works with older servers. >>>>> >>>> >>>> Yeah, makes sense. That means we need add the check for 12 / MCV. >>> >>> >>> Ah, I got it. >>> I fixed the patch to work with older servers to add the checking >>> versions. And I tested \dX command on older servers (PG10 - 13). >>> These results look fine. >>> >>> 0001: >>> Added the check code to handle pre-PG12. It has not MCV and >>> pg_statistic_ext_data. >>> 0002: >>> This patch is the same as the previous patch (not changed). >>> >>> Please find the attached files. >>> >> >> OK, thanks. I'll take a look and probably push tomorrow. FWIW I plan to >> squash the patches into a single commit. >> > > Attached is a patch I plan to commit - 0001 is the last submitted > version with a couple minor tweaks, mostly in docs/comments, and small > rework of branching to be more like the other functions in describe.c. Thanks for revising the patch. I reviewed the 0001, and the branching and comments look good to me. However, I added an alias name in processSQLNamePattern() on the patch: s/"stxname"/"es.stxname"/ > While working on that, I realized that 'defined' might be a bit > ambiguous, I initially thought it means 'NOT NULL' (which it does not). > I propose to change it to 'requested' instead. Tatsuro, do you agree, or > do you think 'defined' is better? Regarding the status of extended stats, I think the followings: - "defined": it shows the extended stats defined only. We can't know whether it needs to analyze or not. I agree this name was ambiguous. Therefore we should replace it with a more suitable name. - "requested": it shows the extended stats needs something. Of course, we know it needs to ANALYZE because we can create the patch. However, I feel there is a little ambiguity for DBA. To solve this, it would be better to write an explanation of the status in the document. For example, ====== The column of the kind of extended stats (e. g. Ndistinct) shows some statuses. "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE was finished, and the planner can use it. NULL means that it doesn't exists. ====== What do you think? :-D Thanks, Tatsuro Yamada
Attachment
On 1/12/21 2:57 AM, Tatsuro Yamada wrote: > Hi Tomas, > > On 2021/01/09 9:01, Tomas Vondra wrote: ...> >> While working on that, I realized that 'defined' might be a bit >> ambiguous, I initially thought it means 'NOT NULL' (which it does not). >> I propose to change it to 'requested' instead. Tatsuro, do you agree, or >> do you think 'defined' is better? > > Regarding the status of extended stats, I think the followings: > > - "defined": it shows the extended stats defined only. We can't know > whether it needs to analyze or not. I agree this name was > ambiguous. Therefore we should replace it with a more > suitable > name. > - "requested": it shows the extended stats needs something. Of course, > we know it needs to ANALYZE because we can create the patch. > However, I feel there is a little ambiguity for DBA. > To solve this, it would be better to write an explanation of > the status in the document. For example, > > ====== > The column of the kind of extended stats (e. g. Ndistinct) shows some > statuses. > "requested" means that it needs to gather data by ANALYZE. "built" means > ANALYZE > was finished, and the planner can use it. NULL means that it doesn't > exists. > ====== > > What do you think? :-D > Yes, that seems reasonable to me. Will you provide an updated patch? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Tomas, On 2021/01/12 20:08, Tomas Vondra wrote: > > On 1/12/21 2:57 AM, Tatsuro Yamada wrote: >> Hi Tomas, >> >> On 2021/01/09 9:01, Tomas Vondra wrote: > ...> >>> While working on that, I realized that 'defined' might be a bit >>> ambiguous, I initially thought it means 'NOT NULL' (which it does not). >>> I propose to change it to 'requested' instead. Tatsuro, do you agree, or >>> do you think 'defined' is better? >> >> Regarding the status of extended stats, I think the followings: >> >> - "defined": it shows the extended stats defined only. We can't know >> whether it needs to analyze or not. I agree this name was >> ambiguous. Therefore we should replace it with a more suitable >> name. >> - "requested": it shows the extended stats needs something. Of course, >> we know it needs to ANALYZE because we can create the patch. >> However, I feel there is a little ambiguity for DBA. >> To solve this, it would be better to write an explanation of >> the status in the document. For example, >> >> ====== >> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses. >> "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE >> was finished, and the planner can use it. NULL means that it doesn't exists. >> ====== >> >> What do you think? :-D >> > > Yes, that seems reasonable to me. Will you provide an updated patch? Sounds good. I'll send the updated patch today. Thanks, Tatsuro Yamada
Hi Tomas, On 2021/01/13 7:48, Tatsuro Yamada wrote: > On 2021/01/12 20:08, Tomas Vondra wrote: >> On 1/12/21 2:57 AM, Tatsuro Yamada wrote: >>> On 2021/01/09 9:01, Tomas Vondra wrote: >> ...> >>>> While working on that, I realized that 'defined' might be a bit >>>> ambiguous, I initially thought it means 'NOT NULL' (which it does not). >>>> I propose to change it to 'requested' instead. Tatsuro, do you agree, or >>>> do you think 'defined' is better? >>> >>> Regarding the status of extended stats, I think the followings: >>> >>> - "defined": it shows the extended stats defined only. We can't know >>> whether it needs to analyze or not. I agree this name was >>> ambiguous. Therefore we should replace it with a more suitable >>> name. >>> - "requested": it shows the extended stats needs something. Of course, >>> we know it needs to ANALYZE because we can create the patch. >>> However, I feel there is a little ambiguity for DBA. >>> To solve this, it would be better to write an explanation of >>> the status in the document. For example, >>> >>> ====== >>> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses. >>> "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE >>> was finished, and the planner can use it. NULL means that it doesn't exists. >>> ====== >>> >>> What do you think? :-D >>> >> >> Yes, that seems reasonable to me. Will you provide an updated patch? > > > Sounds good. I'll send the updated patch today. I updated the patch to add the explanation of the extended stats' statuses. Please feel free to modify the patch to improve it more clearly. The attached files are: 0001: Add psql \dx and the fixed document 0002: Regression test for psql \dX app-psql.html: Created by "make html" command (You can check the explanation of the statuses easily, probably) Thanks, Tatsuro Yamada
Attachment
On Wed, Jan 13, 2021 at 10:22:05AM +0900, Tatsuro Yamada wrote: > Hi Tomas, > > On 2021/01/13 7:48, Tatsuro Yamada wrote: > > On 2021/01/12 20:08, Tomas Vondra wrote: > > > On 1/12/21 2:57 AM, Tatsuro Yamada wrote: > > > > On 2021/01/09 9:01, Tomas Vondra wrote: > > > ...> > > > > > While working on that, I realized that 'defined' might be a bit > > > > > ambiguous, I initially thought it means 'NOT NULL' (which it does not). > > > > > I propose to change it to 'requested' instead. Tatsuro, do you agree, or > > > > > do you think 'defined' is better? > > > > > > > > Regarding the status of extended stats, I think the followings: > > > > > > > > - "defined": it shows the extended stats defined only. We can't know > > > > whether it needs to analyze or not. I agree this name was > > > > ambiguous. Therefore we should replace it with a more suitable > > > > name. > > > > - "requested": it shows the extended stats needs something. Of course, > > > > we know it needs to ANALYZE because we can create the patch. > > > > However, I feel there is a little ambiguity for DBA. > > > > To solve this, it would be better to write an explanation of > > > > the status in the document. For example, > > > > > > > > ====== > > > > The column of the kind of extended stats (e. g. Ndistinct) shows some statuses. > > > > "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE > > > > was finished, and the planner can use it. NULL means that it doesn't exists. > > > > ====== > > > > > > > > What do you think? :-D > > > > > > > > > > Yes, that seems reasonable to me. Will you provide an updated patch? > > > > > > Sounds good. I'll send the updated patch today. > > > > I updated the patch to add the explanation of the extended stats' statuses. > Please feel free to modify the patch to improve it more clearly. > > The attached files are: > 0001: Add psql \dx and the fixed document > 0002: Regression test for psql \dX > app-psql.html: Created by "make html" command (You can check the > explanation of the statuses easily, probably) Hello Yamada-san, I reviewed the patch and don't have specific complaints, it all looks good! I'm however thinking about the "requested" status. I'm wondering if it could lead to people think that an ANALYZE is scheduled and will happen soon. Maybe "defined" or "declared" might be less misleading, or even "waiting for analyze"?
On 1/15/21 9:47 AM, Julien Rouhaud wrote: > On Wed, Jan 13, 2021 at 10:22:05AM +0900, Tatsuro Yamada wrote: >> Hi Tomas, >> >> On 2021/01/13 7:48, Tatsuro Yamada wrote: >>> On 2021/01/12 20:08, Tomas Vondra wrote: >>>> On 1/12/21 2:57 AM, Tatsuro Yamada wrote: >>>>> On 2021/01/09 9:01, Tomas Vondra wrote: >>>> ...> >>>>>> While working on that, I realized that 'defined' might be a bit >>>>>> ambiguous, I initially thought it means 'NOT NULL' (which it does not). >>>>>> I propose to change it to 'requested' instead. Tatsuro, do you agree, or >>>>>> do you think 'defined' is better? >>>>> >>>>> Regarding the status of extended stats, I think the followings: >>>>> >>>>> - "defined": it shows the extended stats defined only. We can't know >>>>> whether it needs to analyze or not. I agree this name was >>>>> ambiguous. Therefore we should replace it with a more suitable >>>>> name. >>>>> - "requested": it shows the extended stats needs something. Of course, >>>>> we know it needs to ANALYZE because we can create the patch. >>>>> However, I feel there is a little ambiguity for DBA. >>>>> To solve this, it would be better to write an explanation of >>>>> the status in the document. For example, >>>>> >>>>> ====== >>>>> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses. >>>>> "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE >>>>> was finished, and the planner can use it. NULL means that it doesn't exists. >>>>> ====== >>>>> >>>>> What do you think? :-D >>>>> >>>> >>>> Yes, that seems reasonable to me. Will you provide an updated patch? >>> >>> >>> Sounds good. I'll send the updated patch today. >> >> >> >> I updated the patch to add the explanation of the extended stats' statuses. >> Please feel free to modify the patch to improve it more clearly. >> >> The attached files are: >> 0001: Add psql \dx and the fixed document >> 0002: Regression test for psql \dX >> app-psql.html: Created by "make html" command (You can check the >> explanation of the statuses easily, probably) > > Hello Yamada-san, > > I reviewed the patch and don't have specific complaints, it all looks good! > > I'm however thinking about the "requested" status. I'm wondering if it could > lead to people think that an ANALYZE is scheduled and will happen soon. > Maybe "defined" or "declared" might be less misleading, or even "waiting for > analyze"? > Well, the "defined" option is not great either, because it can be interpreted as "NOT NULL" - that's why I proposed "requested". Not sure about "declared" - I wouldn't use it in this context, but I'm not a native speaker so maybe it's OK. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/15/21 5:19 PM, Tomas Vondra wrote: > > > On 1/15/21 9:47 AM, Julien Rouhaud wrote: >> On Wed, Jan 13, 2021 at 10:22:05AM +0900, Tatsuro Yamada wrote: >>> Hi Tomas, >>> >>> On 2021/01/13 7:48, Tatsuro Yamada wrote: >>>> On 2021/01/12 20:08, Tomas Vondra wrote: >>>>> On 1/12/21 2:57 AM, Tatsuro Yamada wrote: >>>>>> On 2021/01/09 9:01, Tomas Vondra wrote: >>>>> ...> >>>>>>> While working on that, I realized that 'defined' might be a bit >>>>>>> ambiguous, I initially thought it means 'NOT NULL' (which it does not). >>>>>>> I propose to change it to 'requested' instead. Tatsuro, do you agree, or >>>>>>> do you think 'defined' is better? >>>>>> >>>>>> Regarding the status of extended stats, I think the followings: >>>>>> >>>>>> - "defined": it shows the extended stats defined only. We can't know >>>>>> whether it needs to analyze or not. I agree this name was >>>>>> ambiguous. Therefore we should replace it with a more suitable >>>>>> name. >>>>>> - "requested": it shows the extended stats needs something. Of course, >>>>>> we know it needs to ANALYZE because we can create the patch. >>>>>> However, I feel there is a little ambiguity for DBA. >>>>>> To solve this, it would be better to write an explanation of >>>>>> the status in the document. For example, >>>>>> >>>>>> ====== >>>>>> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses. >>>>>> "requested" means that it needs to gather data by ANALYZE. "built" means ANALYZE >>>>>> was finished, and the planner can use it. NULL means that it doesn't exists. >>>>>> ====== >>>>>> >>>>>> What do you think? :-D >>>>>> >>>>> >>>>> Yes, that seems reasonable to me. Will you provide an updated patch? >>>> >>>> >>>> Sounds good. I'll send the updated patch today. >>> >>> >>> >>> I updated the patch to add the explanation of the extended stats' statuses. >>> Please feel free to modify the patch to improve it more clearly. >>> >>> The attached files are: >>> 0001: Add psql \dx and the fixed document >>> 0002: Regression test for psql \dX >>> app-psql.html: Created by "make html" command (You can check the >>> explanation of the statuses easily, probably) >> >> Hello Yamada-san, >> >> I reviewed the patch and don't have specific complaints, it all looks good! >> >> I'm however thinking about the "requested" status. I'm wondering if it could >> lead to people think that an ANALYZE is scheduled and will happen soon. >> Maybe "defined" or "declared" might be less misleading, or even "waiting for >> analyze"? >> > > Well, the "defined" option is not great either, because it can be > interpreted as "NOT NULL" - that's why I proposed "requested". Not sure > about "declared" - I wouldn't use it in this context, but I'm not a > native speaker so maybe it's OK. > I've pushed this, keeping the "requested". If we decide that some other term is a better choice, we can tweak that later of course. Thanks Tatsuro-san for the patience! regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, hackers. I tested this committed feature. It doesn't seem to be available to non-superusers due to the inability to access pg_statistics_ext_data. Is this the expected behavior? --- operation --- postgres=> CREATE STATISTICS stat1_data1 ON c1, c2 FROM data1; CREATE STATISTICS postgres=> ANALYZE data1; ANALYZE postgres=> SELECT * FROM pg_statistic_ext; oid | stxrelid | stxname | stxnamespace | stxowner | stxstattarget | stxkeys | stxkind -------+----------+-------------+--------------+----------+---------------+---------+--------- 16393 | 16385 | stat1_data1 | 2200 | 16384 | -1 | 1 2 | {d,f,m} (1 row) postgres=> \dX ERROR: permission denied for table pg_statistic_ext_data postgres=> postgres=> \connect postgres postgres You are now connected to database "postgres" as user "postgres". postgres=# postgres=# \dX List of extended statistics Schema | Name | Definition | Ndistinct | Dependencies | MCV --------+-------------+-------------------+-----------+--------------+----------- public | stat1_data1 | c1, c2 FROM data1 | built | built | requested (1 row) --- operation --- Regards, Noriyoshi Shinoda -----Original Message----- From: Tomas Vondra [mailto:tomas.vondra@enterprisedb.com] Sent: Sunday, January 17, 2021 8:32 AM To: Julien Rouhaud <rjuju123@gmail.com>; Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> Cc: Alvaro Herrera <alvherre@2ndquadrant.com>; Tomas Vondra <tomas.vondra@2ndquadrant.com>; Michael Paquier <michael@paquier.xyz>;Pavel Stehule <pavel.stehule@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org> Subject: Re: list of extended statistics on psql On 1/15/21 5:19 PM, Tomas Vondra wrote: > > > On 1/15/21 9:47 AM, Julien Rouhaud wrote: >> On Wed, Jan 13, 2021 at 10:22:05AM +0900, Tatsuro Yamada wrote: >>> Hi Tomas, >>> >>> On 2021/01/13 7:48, Tatsuro Yamada wrote: >>>> On 2021/01/12 20:08, Tomas Vondra wrote: >>>>> On 1/12/21 2:57 AM, Tatsuro Yamada wrote: >>>>>> On 2021/01/09 9:01, Tomas Vondra wrote: >>>>> ...> >>>>>>> While working on that, I realized that 'defined' might be a bit >>>>>>> ambiguous, I initially thought it means 'NOT NULL' (which it does not). >>>>>>> I propose to change it to 'requested' instead. Tatsuro, do you >>>>>>> agree, or do you think 'defined' is better? >>>>>> >>>>>> Regarding the status of extended stats, I think the followings: >>>>>> >>>>>> - "defined": it shows the extended stats defined only. We >>>>>> can't know >>>>>> whether it needs to analyze or not. I agree this >>>>>> name was >>>>>> ambiguous. Therefore we should replace it with a >>>>>> more suitable >>>>>> name. >>>>>> - "requested": it shows the extended stats needs something. Of >>>>>> course, >>>>>> we know it needs to ANALYZE because we can create the patch. >>>>>> However, I feel there is a little ambiguity for DBA. >>>>>> To solve this, it would be better to write an >>>>>> explanation of >>>>>> the status in the document. For example, >>>>>> >>>>>> ====== >>>>>> The column of the kind of extended stats (e. g. Ndistinct) shows some statuses. >>>>>> "requested" means that it needs to gather data by ANALYZE. >>>>>> "built" means ANALYZE >>>>>> was finished, and the planner can use it. NULL means that it doesn't exists. >>>>>> ====== >>>>>> >>>>>> What do you think? :-D >>>>>> >>>>> >>>>> Yes, that seems reasonable to me. Will you provide an updated patch? >>>> >>>> >>>> Sounds good. I'll send the updated patch today. >>> >>> >>> >>> I updated the patch to add the explanation of the extended stats' statuses. >>> Please feel free to modify the patch to improve it more clearly. >>> >>> The attached files are: >>> 0001: Add psql \dx and the fixed document >>> 0002: Regression test for psql \dX >>> app-psql.html: Created by "make html" command (You can check the >>> explanation of the statuses easily, probably) >> >> Hello Yamada-san, >> >> I reviewed the patch and don't have specific complaints, it all looks good! >> >> I'm however thinking about the "requested" status. I'm wondering if >> it could lead to people think that an ANALYZE is scheduled and will happen soon. >> Maybe "defined" or "declared" might be less misleading, or even >> "waiting for analyze"? >> > > Well, the "defined" option is not great either, because it can be > interpreted as "NOT NULL" - that's why I proposed "requested". Not > sure about "declared" - I wouldn't use it in this context, but I'm not > a native speaker so maybe it's OK. > I've pushed this, keeping the "requested". If we decide that some other term is a better choice, we can tweak that laterof course. Thanks Tatsuro-san for the patience! regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/17/21 2:41 AM, Shinoda, Noriyoshi (PN Japan FSIP) wrote: > Hi, hackers. > > I tested this committed feature. > It doesn't seem to be available to non-superusers due to the inability to access pg_statistics_ext_data. > Is this the expected behavior? > Hmmm, that's a good point. Bummer we haven't noticed that earlier. I wonder what the right fix should be - presumably we could do something like pg_stats_ext (we can't use that view directly, because it formats the data, so the sizes are different). But should it list just the stats the user has access to, or should it list everything and leave the inaccessible fields NULL? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/17/21 3:01 AM, Tomas Vondra wrote: > On 1/17/21 2:41 AM, Shinoda, Noriyoshi (PN Japan FSIP) wrote: >> Hi, hackers. >> >> I tested this committed feature. >> It doesn't seem to be available to non-superusers due to the inability >> to access pg_statistics_ext_data. >> Is this the expected behavior? >> > > Hmmm, that's a good point. Bummer we haven't noticed that earlier. > > I wonder what the right fix should be - presumably we could do something > like pg_stats_ext (we can't use that view directly, because it formats > the data, so the sizes are different). > > But should it list just the stats the user has access to, or should it > list everything and leave the inaccessible fields NULL? > I've reverted the commit - once we find the right way to handle this, I'll get it committed again. As for how to deal with this, I can think of about three ways: 1) simplify the command to only print information from pg_statistic_ext (so on information about which stats are built or sizes) 2) extend pg_stats_ext with necessary information (e.g. sizes) 3) create a new system view, with necessary information (so that pg_stats_ext does not need to be modified) 4) add functions returning the necessary information, possibly only for statistics the user can access (similarly to what pg_stats_ext does) Options 2-4 have the obvious disadvantage that this won't work on older releases (we can't add views or functions there). So I'm leaning towards #1 even if that means we have to remove some of the details. We can consider adding that for new releases, though. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Jan 17, 2021 at 03:31:57PM +0100, Tomas Vondra wrote: > I've reverted the commit - once we find the right way to handle this, I'll > get it committed again. Please consider these doc changes for the next iteration. commit 1a69f648ce6c63ebb37b6d8ec7c6539b3cb70787 Author: Justin Pryzby <pryzbyj@telsasoft.com> Date: Sat Jan 16 17:47:35 2021 -0600 doc review: psql \dX 891a1d0bca262ca78564e0fea1eaa5ae544ea5ee diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index aaf55df921..a678a69dfb 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1928,15 +1928,15 @@ testdb=> is specified, only those extended statistics whose names match the pattern are listed. If <literal>+</literal> is appended to the command name, each extended - statistics is listed with its size. + statistic is listed with its size. </para> <para> - The column of the kind of extended stats (e.g. Ndistinct) shows some statuses. + The column of the kind of extended stats (e.g. Ndistinct) shows its status. "requested" means that it needs to collect statistics by <link linkend="sql-analyze"><command>ANALYZE</command></link>. "built" means <link linkend="sql-analyze"><command>ANALYZE</command></link> was - finished, and the planner can use it. NULL means that it doesn't exists. + run, and statistics are available to the planner. NULL means that it doesn't exist. </para> </listitem> </varlistentry>
Hi Julien, On 2021/01/15 17:47, Julien Rouhaud wrote: > Hello Yamada-san, > > I reviewed the patch and don't have specific complaints, it all looks good! > > I'm however thinking about the "requested" status. I'm wondering if it could > lead to people think that an ANALYZE is scheduled and will happen soon. > Maybe "defined" or "declared" might be less misleading, or even "waiting for > analyze"? Thanks for reviewing the patch. Yeah, "waiting for analyze" was preferable but it was a little long to use on the columns. Unfortunately, I couldn't imaginea suitable term. Therefore, I'm keeping it as is. Regards, Tatsuro Yamada
Hi Tomas, On 2021/01/17 8:32, Tomas Vondra wrote: > I've pushed this, keeping the "requested". If we decide that some other term is a better choice, we can tweak that laterof course. > > Thanks Tatsuro-san for the patience! Thanks for taking the time to review the patches. I believe this feature is useful for DBA when they use Extended stats. For example, the execution plan tuning phase and so on. Then, I know the patch was reverted. So, I keep going to fix the patch on the Second iteration. :-D Regards, Tatsuro Yamada
Hi Tomas and Shinoda-san, On 2021/01/17 23:31, Tomas Vondra wrote: > > > On 1/17/21 3:01 AM, Tomas Vondra wrote: >> On 1/17/21 2:41 AM, Shinoda, Noriyoshi (PN Japan FSIP) wrote: >>> Hi, hackers. >>> >>> I tested this committed feature. >>> It doesn't seem to be available to non-superusers due to the inability to access pg_statistics_ext_data. >>> Is this the expected behavior? Ugh. I overlooked the test to check the case of the user hasn't Superuser privilege. The user without the privilege was ableto access pg_statistics_ext. Therefore I supposed that it's also able to access pg_statics_ext_data. Oops. >> Hmmm, that's a good point. Bummer we haven't noticed that earlier. >> >> I wonder what the right fix should be - presumably we could do something like pg_stats_ext (we can't use that view directly,because it formats the data, so the sizes are different). >> >> But should it list just the stats the user has access to, or should it list everything and leave the inaccessible fieldsNULL? >> > > I've reverted the commit - once we find the right way to handle this, I'll get it committed again. > > As for how to deal with this, I can think of about three ways: > > 1) simplify the command to only print information from pg_statistic_ext (so on information about which stats are builtor sizes) > > 2) extend pg_stats_ext with necessary information (e.g. sizes) > > 3) create a new system view, with necessary information (so that pg_stats_ext does not need to be modified) > > 4) add functions returning the necessary information, possibly only for statistics the user can access (similarly to whatpg_stats_ext does) > > Options 2-4 have the obvious disadvantage that this won't work on older releases (we can't add views or functions there).So I'm leaning towards #1 even if that means we have to remove some of the details. We can consider adding that fornew releases, though. Thanks for the useful advice. I go with option 1). The following query is created by using pg_stats_ext instead of pg_statistic_ext and pg_statistic_ext_data. However, I wasconfused about writing a part of the query for calculating MCV size because there are four columns related to MCV. For example, most_common_vals, most_common_val_nulls, most_common_freqs, and most_common_base_freqs. Currently, I don't know how to calculate the size of MCV by using the four columns. Thoughts? :-) =================================================== \connect postgres hoge create table hoge_t(a int, b int); insert into hoge_t select i,i from generate_series(1,100) i; create statistics hoge_t_ext on a, b from hoge_t; SELECT es.statistics_schemaname AS "Schema", es.statistics_name AS "Name", pg_catalog.format('%s FROM %s', (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(s.attname),', ') FROM pg_catalog.unnest(es.attnames) s(attname)), es.tablename) AS "Definition", CASE WHEN es.n_distinct IS NOT NULL THEN 'built' WHEN 'd' = any(es.kinds) THEN 'requested' END AS "Ndistinct", CASE WHEN es.dependencies IS NOT NULL THEN 'built' WHEN 'f' = any(es.kinds) THEN 'requested' END AS "Dependencies", CASE WHEN es.most_common_vals IS NOT NULL THEN 'built' WHEN 'm' = any(es.kinds) THEN 'requested' END AS "MCV", CASE WHEN es.n_distinct IS NOT NULL THEN pg_catalog.pg_size_pretty(pg_catalog.length(es.n_distinct)::bigint) WHEN 'd' = any(es.kinds) THEN '0 bytes' END AS "Ndistinct_size", CASE WHEN es.dependencies IS NOT NULL THEN pg_catalog.pg_size_pretty(pg_catalog.length(es.dependencies)::bigint) WHEN 'f' = any(es.kinds) THEN '0 bytes' END AS "Dependencies_size" FROM pg_catalog.pg_stats_ext es ORDER BY 1, 2; -[ RECORD 1 ]-----+----------------- Schema | public Name | hoge_t_ext Definition | a, b FROM hoge_t Ndistinct | requested Dependencies | requested MCV | requested Ndistinct_size | 0 bytes Dependencies_size | 0 bytes analyze hoge_t; -[ RECORD 1 ]-----+----------------- Schema | public Name | hoge_t_ext Definition | a, b FROM hoge_t Ndistinct | built Dependencies | built MCV | built Ndistinct_size | 13 bytes Dependencies_size | 40 bytes =================================================== Thanks, Tatsuro Yamada
Hi Justin, On 2021/01/18 1:52, Justin Pryzby wrote: > On Sun, Jan 17, 2021 at 03:31:57PM +0100, Tomas Vondra wrote: >> I've reverted the commit - once we find the right way to handle this, I'll >> get it committed again. > > Please consider these doc changes for the next iteration. > > commit 1a69f648ce6c63ebb37b6d8ec7c6539b3cb70787 > Author: Justin Pryzby <pryzbyj@telsasoft.com> > Date: Sat Jan 16 17:47:35 2021 -0600 > > doc review: psql \dX 891a1d0bca262ca78564e0fea1eaa5ae544ea5ee Thanks for your comments! It helps a lot since I'm not a native speaker. I'll fix the document based on your suggestion on the next patch. > diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml > index aaf55df921..a678a69dfb 100644 > --- a/doc/src/sgml/ref/psql-ref.sgml > +++ b/doc/src/sgml/ref/psql-ref.sgml > @@ -1928,15 +1928,15 @@ testdb=> > is specified, only those extended statistics whose names match the > pattern are listed. > If <literal>+</literal> is appended to the command name, each extended > - statistics is listed with its size. > + statistic is listed with its size. Agreed. > <para> > - The column of the kind of extended stats (e.g. Ndistinct) shows some statuses. > + The column of the kind of extended stats (e.g. Ndistinct) shows its status. > "requested" means that it needs to collect statistics by <link > linkend="sql-analyze"><command>ANALYZE</command></link>. > "built" means <link linkend="sql-analyze"><command>ANALYZE</command></link> was Agreed. > - finished, and the planner can use it. NULL means that it doesn't exists. > + run, and statistics are available to the planner. NULL means that it doesn't exist. Agreed. Thanks, Tatsuro Yamada
On 1/18/21 8:31 AM, Tatsuro Yamada wrote: > Hi Tomas and Shinoda-san, > > On 2021/01/17 23:31, Tomas Vondra wrote: >> >> >> On 1/17/21 3:01 AM, Tomas Vondra wrote: >>> On 1/17/21 2:41 AM, Shinoda, Noriyoshi (PN Japan FSIP) wrote: >>>> Hi, hackers. >>>> >>>> I tested this committed feature. >>>> It doesn't seem to be available to non-superusers due to the >>>> inability to access pg_statistics_ext_data. >>>> Is this the expected behavior? > > > Ugh. I overlooked the test to check the case of the user hasn't > Superuser privilege. The user without the privilege was able to access > pg_statistics_ext. Therefore I supposed that it's also able to access > pg_statics_ext_data. Oops. > > >>> Hmmm, that's a good point. Bummer we haven't noticed that earlier. >>> >>> I wonder what the right fix should be - presumably we could do >>> something like pg_stats_ext (we can't use that view directly, because >>> it formats the data, so the sizes are different). >>> >>> But should it list just the stats the user has access to, or should >>> it list everything and leave the inaccessible fields NULL? >>> >> >> I've reverted the commit - once we find the right way to handle this, >> I'll get it committed again. >> >> As for how to deal with this, I can think of about three ways: >> >> 1) simplify the command to only print information from >> pg_statistic_ext (so on information about which stats are built or sizes) >> >> 2) extend pg_stats_ext with necessary information (e.g. sizes) >> >> 3) create a new system view, with necessary information (so that >> pg_stats_ext does not need to be modified) >> >> 4) add functions returning the necessary information, possibly only >> for statistics the user can access (similarly to what pg_stats_ext does) >> >> Options 2-4 have the obvious disadvantage that this won't work on >> older releases (we can't add views or functions there). So I'm leaning >> towards #1 even if that means we have to remove some of the details. >> We can consider adding that for new releases, though. > > > Thanks for the useful advice. I go with option 1). > The following query is created by using pg_stats_ext instead of > pg_statistic_ext and pg_statistic_ext_data. However, I was confused > about writing a part of the query for calculating MCV size because > there are four columns related to MCV. For example, most_common_vals, > most_common_val_nulls, most_common_freqs, and most_common_base_freqs. > Currently, I don't know how to calculate the size of MCV by using the > four columns. Thoughts? :-) Well, my suggestion was to use pg_statistic_ext, because that lists all statistics, while pg_stats_ext is filtering statistics depending on access privileges. I think that's more appropriate for \dX, the contents should not change depending on the user. Also, let me clarify - with option (1) we'd not show the sizes at all. The size of the formatted statistics may be very different from the on-disk representation, so I see no point in showing it in \dX. We might show other stats (e.g. number of MCV items, or the fraction of data represented by the MCV list), but the user can inspect pg_stats_ext if needed. What we might do is to show those stats when a superuser is running this command, but I'm not sure that's a good idea (or how difficult would it be to implement). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Tomas, >>> As for how to deal with this, I can think of about three ways: >>> >>> 1) simplify the command to only print information from pg_statistic_ext (so on information about which stats are builtor sizes) >>> >>> 2) extend pg_stats_ext with necessary information (e.g. sizes) >>> >>> 3) create a new system view, with necessary information (so that pg_stats_ext does not need to be modified) >>> >>> 4) add functions returning the necessary information, possibly only for statistics the user can access (similarly towhat pg_stats_ext does) >>> >>> Options 2-4 have the obvious disadvantage that this won't work on older releases (we can't add views or functions there).So I'm leaning towards #1 even if that means we have to remove some of the details. We can consider adding that fornew releases, though. >> >> >> Thanks for the useful advice. I go with option 1). >> The following query is created by using pg_stats_ext instead of pg_statistic_ext and pg_statistic_ext_data. However, Iwas confused >> about writing a part of the query for calculating MCV size because >> there are four columns related to MCV. For example, most_common_vals, most_common_val_nulls, most_common_freqs, and most_common_base_freqs. >> Currently, I don't know how to calculate the size of MCV by using the >> four columns. Thoughts? :-) > > Well, my suggestion was to use pg_statistic_ext, because that lists all statistics, while pg_stats_ext is filtering statisticsdepending on access privileges. I think that's more appropriate for \dX, the contents should not change dependingon the user. > > Also, let me clarify - with option (1) we'd not show the sizes at all. The size of the formatted statistics may be verydifferent from the on-disk representation, so I see no point in showing it in \dX. > > We might show other stats (e.g. number of MCV items, or the fraction of data represented by the MCV list), but the usercan inspect pg_stats_ext if needed. > > What we might do is to show those stats when a superuser is running this command, but I'm not sure that's a good idea (orhow difficult would it be to implement). Thanks for clarifying. I see that your suggestion was to use pg_statistic_ext, not pg_stats_ext. And we don't need the size of stats. If that's the case, we also can't get the status of stats since PG12 or later because we can't use pg_statistic_ext_data, as you know. Therefore, it would be better to replace the query with the old query that I sent five months ago like this: # the old query SELECT stxnamespace::pg_catalog.regnamespace AS "Schema", stxrelid::pg_catalog.regclass AS "Table", stxname AS "Name", (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ') FROM pg_catalog.unnest(stxkeys) s(attnum) JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND a.attnum = s.attnum AND NOT attisdropped)) AS "Columns", 'd' = any(stxkind) AS "Ndistinct", 'f' = any(stxkind) AS "Dependencies", 'm' = any(stxkind) AS "MCV" FROM pg_catalog.pg_statistic_ext stat ORDER BY 1,2; Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV --------+--------+------------+---------+-----------+--------------+----- public | hoge1 | hoge1_ext | a, b | t | t | t public | hoge_t | hoge_t_ext | a, b | t | t | t (2 rows) The above query is so simple so that we would better to use the following query: # This query works on PG10 or later SELECT es.stxnamespace::pg_catalog.regnamespace::text AS "Schema", es.stxname AS "Name", pg_catalog.format('%s FROM %s', (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ') FROM pg_catalog.unnest(es.stxkeys) s(attnum) JOIN pg_catalog.pg_attribute a ON (es.stxrelid = a.attrelid AND a.attnum = s.attnum AND NOT a.attisdropped)), es.stxrelid::regclass) AS "Definition", CASE WHEN 'd' = any(es.stxkind) THEN 'defined' END AS "Ndistinct", CASE WHEN 'f' = any(es.stxkind) THEN 'defined' END AS "Dependencies", CASE WHEN 'm' = any(es.stxkind) THEN 'defined' END AS "MCV" FROM pg_catalog.pg_statistic_ext es ORDER BY 1, 2; Schema | Name | Definition | Ndistinct | Dependencies | Dependencies --------+------------+------------------+-----------+--------------+-------------- public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined public | hoge_t_ext | a, b FROM hoge_t | defined | defined | defined (2 rows) I'm going to create the WIP patch to use the above queriy. Any comments welcome. :-D Thanks, Tatsuro Yamada
On 1/19/21 1:44 AM, Tatsuro Yamada wrote: > Hi Tomas, > >>>> As for how to deal with this, I can think of about three ways: >>>> >>>> 1) simplify the command to only print information from >>>> pg_statistic_ext (so on information about which stats are built or >>>> sizes) >>>> >>>> 2) extend pg_stats_ext with necessary information (e.g. sizes) >>>> >>>> 3) create a new system view, with necessary information (so that >>>> pg_stats_ext does not need to be modified) >>>> >>>> 4) add functions returning the necessary information, possibly only >>>> for statistics the user can access (similarly to what pg_stats_ext >>>> does) >>>> >>>> Options 2-4 have the obvious disadvantage that this won't work on >>>> older releases (we can't add views or functions there). So I'm >>>> leaning towards #1 even if that means we have to remove some of the >>>> details. We can consider adding that for new releases, though. >>> >>> >>> Thanks for the useful advice. I go with option 1). >>> The following query is created by using pg_stats_ext instead of >>> pg_statistic_ext and pg_statistic_ext_data. However, I was confused >>> about writing a part of the query for calculating MCV size because >>> there are four columns related to MCV. For example, most_common_vals, >>> most_common_val_nulls, most_common_freqs, and most_common_base_freqs. >>> Currently, I don't know how to calculate the size of MCV by using the >>> four columns. Thoughts? :-) >> >> Well, my suggestion was to use pg_statistic_ext, because that lists >> all statistics, while pg_stats_ext is filtering statistics depending >> on access privileges. I think that's more appropriate for \dX, the >> contents should not change depending on the user. >> >> Also, let me clarify - with option (1) we'd not show the sizes at all. >> The size of the formatted statistics may be very different from the >> on-disk representation, so I see no point in showing it in \dX. >> >> We might show other stats (e.g. number of MCV items, or the fraction >> of data represented by the MCV list), but the user can inspect >> pg_stats_ext if needed. >> >> What we might do is to show those stats when a superuser is running >> this command, but I'm not sure that's a good idea (or how difficult >> would it be to implement). > > > Thanks for clarifying. > I see that your suggestion was to use pg_statistic_ext, not pg_stats_ext. > And we don't need the size of stats. > > If that's the case, we also can't get the status of stats since PG12 or > later > because we can't use pg_statistic_ext_data, as you know. Therefore, it > would be > better to replace the query with the old query that I sent five months > ago like this: > > # the old query > SELECT > stxnamespace::pg_catalog.regnamespace AS "Schema", > stxrelid::pg_catalog.regclass AS "Table", > stxname AS "Name", > (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ') > FROM pg_catalog.unnest(stxkeys) s(attnum) > JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND > a.attnum = s.attnum AND NOT attisdropped)) AS "Columns", > 'd' = any(stxkind) AS "Ndistinct", > 'f' = any(stxkind) AS "Dependencies", > 'm' = any(stxkind) AS "MCV" > FROM pg_catalog.pg_statistic_ext stat > ORDER BY 1,2; > > Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV > --------+--------+------------+---------+-----------+--------------+----- > public | hoge1 | hoge1_ext | a, b | t | t | t > public | hoge_t | hoge_t_ext | a, b | t | t | t > (2 rows) > > > The above query is so simple so that we would better to use the > following query: > > # This query works on PG10 or later > SELECT > es.stxnamespace::pg_catalog.regnamespace::text AS "Schema", > es.stxname AS "Name", > pg_catalog.format('%s FROM %s', > (SELECT > pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ') > FROM pg_catalog.unnest(es.stxkeys) s(attnum) > JOIN pg_catalog.pg_attribute a > ON (es.stxrelid = a.attrelid > AND a.attnum = s.attnum > AND NOT a.attisdropped)), > es.stxrelid::regclass) AS "Definition", > CASE WHEN 'd' = any(es.stxkind) THEN 'defined' > END AS "Ndistinct", > CASE WHEN 'f' = any(es.stxkind) THEN 'defined' > END AS "Dependencies", > CASE WHEN 'm' = any(es.stxkind) THEN 'defined' > END AS "MCV" > FROM pg_catalog.pg_statistic_ext es > ORDER BY 1, 2; > > Schema | Name | Definition | Ndistinct | Dependencies | > Dependencies > --------+------------+------------------+-----------+--------------+-------------- > > public | hoge1_ext | a, b FROM hoge1 | defined | defined | > defined > public | hoge_t_ext | a, b FROM hoge_t | defined | defined | > defined > (2 rows) > > > I'm going to create the WIP patch to use the above queriy. > Any comments welcome. :-D > Yes, I think using this simpler query makes sense. If we decide we need something more elaborate, we can improve that by in future PostgreSQL versions (after adding view/function to core), but I'd leave that as a work for the future. Apologies for all the extra work - I haven't realized this flaw when pushing for showing more stuff :-( regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, > The above query is so simple so that we would better to use the following query: > > # This query works on PG10 or later > SELECT > es.stxnamespace::pg_catalog.regnamespace::text AS "Schema", > es.stxname AS "Name", > pg_catalog.format('%s FROM %s', > (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ') > FROM pg_catalog.unnest(es.stxkeys) s(attnum) > JOIN pg_catalog.pg_attribute a > ON (es.stxrelid = a.attrelid > AND a.attnum = s.attnum > AND NOT a.attisdropped)), > es.stxrelid::regclass) AS "Definition", > CASE WHEN 'd' = any(es.stxkind) THEN 'defined' > END AS "Ndistinct", > CASE WHEN 'f' = any(es.stxkind) THEN 'defined' > END AS "Dependencies", > CASE WHEN 'm' = any(es.stxkind) THEN 'defined' > END AS "MCV" > FROM pg_catalog.pg_statistic_ext es > ORDER BY 1, 2; > > Schema | Name | Definition | Ndistinct | Dependencies | Dependencies > --------+------------+------------------+-----------+--------------+-------------- > public | hoge1_ext | a, b FROM hoge1 | defined | defined | defined > public | hoge_t_ext | a, b FROM hoge_t | defined | defined | defined > (2 rows) > > > I'm going to create the WIP patch to use the above query. > Any comments welcome. :-D Attached patch is WIP patch. The changes are: - Use pg_statistic_ext only - Remove these statuses: "required" and "built" - Add new status: "defined" - Remove the size columns - Fix document I'll create and send the regression test on the next patch if there is no objection. Is it Okay? Regards, Tatsuro Yamada
Attachment
Hi Tomas, On 2021/01/19 11:52, Tomas Vondra wrote: > >> I'm going to create the WIP patch to use the above queriy. >> Any comments welcome. :-D > > Yes, I think using this simpler query makes sense. If we decide we need something more elaborate, we can improve that byin future PostgreSQL versions (after adding view/function to core), but I'd leave that as a work for the future. I see, thanks! > Apologies for all the extra work - I haven't realized this flaw when pushing for showing more stuff :-( Don't worry about it. We didn't notice the problem even when viewed by multiple people on -hackers. Let's keep moving forward. :-D I'll send a patch including a regression test on the next patch. Regards, Tatsuro Yamada
Hi Tomas, On 2021/01/20 11:35, Tatsuro Yamada wrote: >> Apologies for all the extra work - I haven't realized this flaw when pushing for showing more stuff :-( > > Don't worry about it. We didn't notice the problem even when viewed by multiple > people on -hackers. Let's keep moving forward. :-D > > I'll send a patch including a regression test on the next patch. I created patches and my test results on PG10, 11, 12, and 14 are fine. 0001: - Fix query to use pg_statistic_ext only - Replace statuses "required" and "built" with "defined" - Remove the size columns - Fix document - Add schema name as a filter condition on the query 0002: - Fix all results of \dX - Add new testcase by non-superuser Please find attached files. :-D Regards, Tatsuro Yamada
Attachment
On 1/20/21 7:41 AM, Tatsuro Yamada wrote: > Hi Tomas, > > On 2021/01/20 11:35, Tatsuro Yamada wrote: >>> Apologies for all the extra work - I haven't realized this flaw when >>> pushing for showing more stuff :-( >> >> Don't worry about it. We didn't notice the problem even when viewed by >> multiple >> people on -hackers. Let's keep moving forward. :-D >> >> I'll send a patch including a regression test on the next patch. > > > I created patches and my test results on PG10, 11, 12, and 14 are fine. > > 0001: > - Fix query to use pg_statistic_ext only > - Replace statuses "required" and "built" with "defined" > - Remove the size columns > - Fix document > - Add schema name as a filter condition on the query > > 0002: > - Fix all results of \dX > - Add new testcase by non-superuser > > Please find attached files. :-D Thanks, I've pushed this. I had to tweak the regression tests a bit, for two reasons: 1) to change user in regression tests, don't use \connect, but SET ROLE and RESET ROLE 2) roles in regression tests should use names with regress_ prefix regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Tomas and hackers, On 2021/01/21 7:00, Tomas Vondra wrote: >> I created patches and my test results on PG10, 11, 12, and 14 are fine. >> >> 0001: >> - Fix query to use pg_statistic_ext only >> - Replace statuses "required" and "built" with "defined" >> - Remove the size columns >> - Fix document >> - Add schema name as a filter condition on the query >> >> 0002: >> - Fix all results of \dX >> - Add new testcase by non-superuser >> >> Please find attached files. :-D > > Thanks, I've pushed this. I had to tweak the regression tests a bit, for two reasons: > > 1) to change user in regression tests, don't use \connect, but SET ROLE and RESET ROLE > > 2) roles in regression tests should use names with regress_ prefix Thanks for reviewing many times and committing the feature! I understood 1) and 2). I'll keep that in mind for the next developing patch. Then, If possible, could you add Justin to the commit message as a reviewer? Because I revised the document partly based on his comments. Finally, As extended stats were more used, this feature becomes more useful. I hope it helps DBA. :-D Thanks, Tatsuro Yamada
On 5/30/21 7:24 PM, Justin Pryzby wrote: > On Wed, Jan 20, 2021 at 11:00:50PM +0100, Tomas Vondra wrote: >> Thanks, I've pushed this. I had to tweak the regression tests a bit, for two >> reasons: > > \dX isn't checking schema visibility rules, so accidentally shows stats objects > outside of the search path. I noticed after installing the PG14b1 client, > since we create stats objects in a separate schema to allow excluding them with > pg_dump -N. > > diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c > index 195f8d8cd2..e29f13c65e 100644 > --- a/src/bin/psql/describe.c > +++ b/src/bin/psql/describe.c > @@ -4774,7 +4774,7 @@ listExtendedStats(const char *pattern) > processSQLNamePattern(pset.db, &buf, pattern, > false, false, > "es.stxnamespace::pg_catalog.regnamespace::text", "es.stxname", > - NULL, NULL); > + NULL, "pg_catalog.pg_statistics_obj_is_visible(es.oid)"); > > appendPQExpBufferStr(&buf, "ORDER BY 1, 2;"); > Thanks for noticing this! Will push. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Here's a slightly more complete patch, tweaking the regression tests a bit to detect this. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Hi Tomas and Justin, On 2021/06/07 4:47, Tomas Vondra wrote: > Here's a slightly more complete patch, tweaking the regression tests a > bit to detect this. I tested your patch on PG14beta2 and PG15devel. And they work fine. ======================= All 209 tests passed. ======================= Next time I create a feature on psql, I will be careful to add a check for schema visibility rules. :-D Thanks, Tatsuro Yamada
Hi, I've pushed the last version of the fix, including the regression tests etc. Backpatch to 14, where \dX was introduced. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Tomas and Justin, On 2021/07/27 4:26, Tomas Vondra wrote: > Hi, > > I've pushed the last version of the fix, including the regression tests etc. Backpatch to 14, where \dX was introduced. Thank you! Regards, Tatsuro Yamada