Thread: Most-common value docs in PG 12

Most-common value docs in PG 12

From
Bruce Momjian
Date:
Our docs for most-common values in PG 12 has:

-->    CREATE STATISTICS stts3 (mcv) ON state, city FROM zipcodes;
    
    ANALYZE zipcodes;
    
    SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
                    pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts3';
    
     index |         values         | nulls | frequency | base_frequency
    -------+------------------------+-------+-----------+----------------
-->         0 | {Washington, DC}       | {f,f} |  0.003467 |        2.7e-05
         1 | {Apo, AE}              | {f,f} |  0.003067 |        1.9e-05
         2 | {Houston, TX}          | {f,f} |  0.002167 |       0.000133
         3 | {El Paso, TX}          | {f,f} |     0.002 |       0.000113

It seems pg_mcv_list_items() reports the column names in the order they
appear in the table, not in the order they appear in the CREATE
STATISTICS statement.  Same for psql \d:

    \d zipcodes
                 Table "public.zipcodes"
     Column  | Type | Collation | Nullable | Default
    ---------+------+-----------+----------+---------
     city    | text |           |          |
     state   | text |           |          |
     zipcode | text |           |          |
    Statistics objects:
-->        "public"."stts3" (mcv) ON city, state FROM zipcodes


If this is so, why don't we show the CREATE STATISTICS example as
city/state, and not state/city?

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Most-common value docs in PG 12

From
Tomas Vondra
Date:
On Wed, Aug 28, 2019 at 12:22:38PM -0400, Bruce Momjian wrote:
>Our docs for most-common values in PG 12 has:
>
>-->    CREATE STATISTICS stts3 (mcv) ON state, city FROM zipcodes;
>    
>    ANALYZE zipcodes;
>    
>    SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
>                    pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts3';
>    
>     index |         values         | nulls | frequency | base_frequency
>    -------+------------------------+-------+-----------+----------------
>-->         0 | {Washington, DC}       | {f,f} |  0.003467 |        2.7e-05
>         1 | {Apo, AE}              | {f,f} |  0.003067 |        1.9e-05
>         2 | {Houston, TX}          | {f,f} |  0.002167 |       0.000133
>         3 | {El Paso, TX}          | {f,f} |     0.002 |       0.000113
>
>It seems pg_mcv_list_items() reports the column names in the order they
>appear in the table, not in the order they appear in the CREATE
>STATISTICS statement.  Same for psql \d:
>
>    \d zipcodes
>                 Table "public.zipcodes"
>     Column  | Type | Collation | Nullable | Default
>    ---------+------+-----------+----------+---------
>     city    | text |           |          |
>     state   | text |           |          |
>     zipcode | text |           |          |
>    Statistics objects:
>-->        "public"."stts3" (mcv) ON city, state FROM zipcodes
>
>
>If this is so, why don't we show the CREATE STATISTICS example as
>city/state, and not state/city?
>

Yes, we deduplicate the attributes and store them sorted by attnum. I'm
not sure it makes sense to change the example to match this order, which
is mostly an implementation detail, though. It might be better to point
out the order may not exactly match CREATE STATISTICS, and point users to
what e.g. "\d" shows (because that will show the order as stored in the
system catalog).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Most-common value docs in PG 12

From
Bruce Momjian
Date:
On Wed, Aug 28, 2019 at 08:25:41PM +0200, Tomas Vondra wrote:
> On Wed, Aug 28, 2019 at 12:22:38PM -0400, Bruce Momjian wrote:
> > If this is so, why don't we show the CREATE STATISTICS example as
> > city/state, and not state/city?
> 
> Yes, we deduplicate the attributes and store them sorted by attnum. I'm
> not sure it makes sense to change the example to match this order, which
> is mostly an implementation detail, though. It might be better to point
> out the order may not exactly match CREATE STATISTICS, and point users to
> what e.g. "\d" shows (because that will show the order as stored in the
> system catalog).

OK, how is this patch?  I didn't mention psql since I think everyone
expects psql to show all information about tables and indexes.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachment

Re: Most-common value docs in PG 12

From
Alvaro Herrera
Date:
On 2019-Aug-30, Bruce Momjian wrote:

> OK, how is this patch?  I didn't mention psql since I think everyone
> expects psql to show all information about tables and indexes.

Why would you change perform.sgml?  It seems unnecessary; the commands
shown work fine.

This part seems okay:

> diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
> index ec23a154d6..5b583aacb4 100644
> --- a/doc/src/sgml/ref/create_statistics.sgml
> +++ b/doc/src/sgml/ref/create_statistics.sgml
> @@ -98,7 +98,8 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="parameter">statistics_na
>      <listitem>
>       <para>
>        The name of a table column to be covered by the computed statistics.
> -      At least two column names must be given.
> +      At least two column names must be given;  the order of the column names
> +      is insignificant.
>       </para>
>      </listitem>
>     </varlistentry>

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Most-common value docs in PG 12

From
Bruce Momjian
Date:
On Wed, Sep 11, 2019 at 06:15:22PM -0300, Alvaro Herrera wrote:
> On 2019-Aug-30, Bruce Momjian wrote:
> 
> > OK, how is this patch?  I didn't mention psql since I think everyone
> > expects psql to show all information about tables and indexes.
> 
> Why would you change perform.sgml?  It seems unnecessary; the commands
> shown work fine.

I realize they work fine, but the ordering in the examples not matching
the defined order suggests that ordering matters, but it does not.

---------------------------------------------------------------------------


> 
> This part seems okay:
> 
> > diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
> > index ec23a154d6..5b583aacb4 100644
> > --- a/doc/src/sgml/ref/create_statistics.sgml
> > +++ b/doc/src/sgml/ref/create_statistics.sgml
> > @@ -98,7 +98,8 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="parameter">statistics_na
> >      <listitem>
> >       <para>
> >        The name of a table column to be covered by the computed statistics.
> > -      At least two column names must be given.
> > +      At least two column names must be given;  the order of the column names
> > +      is insignificant.
> >       </para>
> >      </listitem>
> >     </varlistentry>
> 
> -- 
> Álvaro Herrera                https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Most-common value docs in PG 12

From
Alvaro Herrera
Date:
On 2019-Sep-26, Bruce Momjian wrote:

> On Wed, Sep 11, 2019 at 06:15:22PM -0300, Alvaro Herrera wrote:
> > On 2019-Aug-30, Bruce Momjian wrote:
> > 
> > > OK, how is this patch?  I didn't mention psql since I think everyone
> > > expects psql to show all information about tables and indexes.
> > 
> > Why would you change perform.sgml?  It seems unnecessary; the commands
> > shown work fine.
> 
> I realize they work fine, but the ordering in the examples not matching
> the defined order suggests that ordering matters, but it does not.

Well, I mean exactly the other way around: the fact that the orders
don't match illustrates that the order is not important.  And that is
reinforced by the explanation indicating explicitly that it does not
matter:

> > >       <para>
> > >        The name of a table column to be covered by the computed statistics.
> > > -      At least two column names must be given.
> > > +      At least two column names must be given;  the order of the column names
> > > +      is insignificant.
> > >       </para>

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Most-common value docs in PG 12

From
Bruce Momjian
Date:
On Thu, Sep 26, 2019 at 05:17:55PM -0300, Alvaro Herrera wrote:
> On 2019-Sep-26, Bruce Momjian wrote:
> 
> > On Wed, Sep 11, 2019 at 06:15:22PM -0300, Alvaro Herrera wrote:
> > > On 2019-Aug-30, Bruce Momjian wrote:
> > > 
> > > > OK, how is this patch?  I didn't mention psql since I think everyone
> > > > expects psql to show all information about tables and indexes.
> > > 
> > > Why would you change perform.sgml?  It seems unnecessary; the commands
> > > shown work fine.
> > 
> > I realize they work fine, but the ordering in the examples not matching
> > the defined order suggests that ordering matters, but it does not.
> 
> Well, I mean exactly the other way around: the fact that the orders
> don't match illustrates that the order is not important.  And that is
> reinforced by the explanation indicating explicitly that it does not
> matter:

Uh, people normally list things in defined order, so you would usually
not list them in non-defined order unless there is a purpose.  Doing
that just to illustrate the order doesn't matter seems odd.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Most-common value docs in PG 12

From
Tomas Vondra
Date:
On Thu, Sep 26, 2019 at 04:20:59PM -0400, Bruce Momjian wrote:
>On Thu, Sep 26, 2019 at 05:17:55PM -0300, Alvaro Herrera wrote:
>> On 2019-Sep-26, Bruce Momjian wrote:
>>
>> > On Wed, Sep 11, 2019 at 06:15:22PM -0300, Alvaro Herrera wrote:
>> > > On 2019-Aug-30, Bruce Momjian wrote:
>> > >
>> > > > OK, how is this patch?  I didn't mention psql since I think everyone
>> > > > expects psql to show all information about tables and indexes.
>> > >
>> > > Why would you change perform.sgml?  It seems unnecessary; the commands
>> > > shown work fine.
>> >
>> > I realize they work fine, but the ordering in the examples not matching
>> > the defined order suggests that ordering matters, but it does not.
>>
>> Well, I mean exactly the other way around: the fact that the orders
>> don't match illustrates that the order is not important.  And that is
>> reinforced by the explanation indicating explicitly that it does not
>> matter:
>
>Uh, people normally list things in defined order, so you would usually
>not list them in non-defined order unless there is a purpose.  Doing
>that just to illustrate the order doesn't matter seems odd.
>

Well, that assumes there is a definition, and I don't think the zipcodes
table is defined anywhere. So how do you know in what order are those
columns defined?

Now, maybe the table should be defined somewhere in perform.sgml - I
don't recall why exactly I chose not to do that, maybe because there is
no universal definition (one country uses text, another number, ...).

I do however agree that had there been such definition, it's probably
natural to list columns in the same order. We know the order is not
important, the proposed patch states that explicitly, but this just
feels natural.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Most-common value docs in PG 12

From
Bruce Momjian
Date:
On Thu, Sep 26, 2019 at 11:03:54PM +0200, Tomas Vondra wrote:
> On Thu, Sep 26, 2019 at 04:20:59PM -0400, Bruce Momjian wrote:
> > Uh, people normally list things in defined order, so you would usually
> > not list them in non-defined order unless there is a purpose.  Doing
> > that just to illustrate the order doesn't matter seems odd.
> > 
> 
> Well, that assumes there is a definition, and I don't think the zipcodes
> table is defined anywhere. So how do you know in what order are those
> columns defined?

In the USA, it is usually specific to general, i.e., city, state.

> Now, maybe the table should be defined somewhere in perform.sgml - I
> don't recall why exactly I chose not to do that, maybe because there is
> no universal definition (one country uses text, another number, ...)

Yeah, doesn't seem worth adding.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Most-common value docs in PG 12

From
Tomas Vondra
Date:
On Thu, Sep 26, 2019 at 05:31:07PM -0400, Bruce Momjian wrote:
>On Thu, Sep 26, 2019 at 11:03:54PM +0200, Tomas Vondra wrote:
>> On Thu, Sep 26, 2019 at 04:20:59PM -0400, Bruce Momjian wrote:
>> > Uh, people normally list things in defined order, so you would usually
>> > not list them in non-defined order unless there is a purpose.  Doing
>> > that just to illustrate the order doesn't matter seems odd.
>> >
>>
>> Well, that assumes there is a definition, and I don't think the zipcodes
>> table is defined anywhere. So how do you know in what order are those
>> columns defined?
>
>In the USA, it is usually specific to general, i.e., city, state.
>

I'd probably define it the same way, but for example the zipcode data
sets I usually use for my talks [1] defines it like this:

  postal code       : varchar(20)
  place name        : varchar(180)
  admin name1       : 1. order subdivision (state) varchar(100)
  admin code1       : 1. order subdivision (state) varchar(20)
  admin name2       : 2. order subdivision (county/province) varchar(100)
  admin code2       : 2. order subdivision (county/province) varchar(20)
  admin name3       : 3. order subdivision (community) varchar(100)
  admin code3       : 3. order subdivision (community) varchar(20)
  latitude          : estimated latitude (wgs84)
  longitude         : estimated longitude (wgs84)
  accuracy          : accuracy of lat/lng

so in this case it's a bit of a mix of specific vs. general first. 

[1] http://download.geonames.org/export/zip/

>> Now, maybe the table should be defined somewhere in perform.sgml - I
>> don't recall why exactly I chose not to do that, maybe because there is
>> no universal definition (one country uses text, another number, ...)
>
>Yeah, doesn't seem worth adding.
>

OK.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Most-common value docs in PG 12

From
Bruce Momjian
Date:
On Fri, Sep 27, 2019 at 01:30:49PM +0200, Tomas Vondra wrote:
> > > Now, maybe the table should be defined somewhere in perform.sgml - I
> > > don't recall why exactly I chose not to do that, maybe because there is
> > > no universal definition (one country uses text, another number, ...)
> > 
> > Yeah, doesn't seem worth adding.
> > 
> 
> OK.

Patch applied through PG 12.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +