Thread: Correlation in pg_stats

Correlation in pg_stats

From
classical_89
Date:
Hi,First sorry for my bad English :D.I'm new in PostgreSQL and database .Now
i'm researching about statistic correlation .I want to make a example with
the table -assume that I have tbl_test table and two column - id & name -
and do something to get correlation of a column is near 0 (zero).What step
can i do  ?
Thanks for your support :)



--
View this message in context: http://postgresql.1045698.n5.nabble.com/Correlation-in-pg-stats-tp5733524.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Correlation in pg_stats

From
Christian Hammers
Date:
Hello

Do you want to search for tables with many unordered rows?

  SELECT
    schemaname, tablename, correlation
  FROM
    pg_stats
  WHERE
    abs(correlation) < 0.2 and
    schemaname not in ('pg_catalog', 'information_schema')
  ;

Or do you want to put the table in order?

  BEGIN;
  CREATE TABLE tbl_test_new (LIKE tbl_test INCLUDING ALL);
  INSERT INTO tbl_test_new SELECT * FROM tbl_test ORDER BY id;
  DROP TABLE tbl_test;
  TABLE tbl_test_new RENAME TO tbl_test;
  COMMIT;

If you really want to get a correlation of near 0 try

  INSERT INTO tbl_test (name) SELECT random() FROM generate_series(1, 100);

bye,

-christian-

Am Mon, 26 Nov 2012 01:52:21 -0800 (PST)
schrieb classical_89 <luongnx512@gmail.com>:

> Hi,First sorry for my bad English :D.I'm new in PostgreSQL and
> database .Now i'm researching about statistic correlation .I want to
> make a example with the table -assume that I have tbl_test table and
> two column - id & name - and do something to get correlation of a
> column is near 0 (zero).What step can i do  ?
> Thanks for your support :)
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/Correlation-in-pg-stats-tp5733524.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>


Re: Correlation in pg_stats

From
classical_89
Date:
Thanks , i just want to get a correlation of near 0 to understand exactly
what correlation , i quite ambiguity about this concept . /
Statistical correlation between physical row ordering and logical ordering
of the column values. This ranges from -1 to +1. When the value is near -1
or +1, an index scan on the column will be estimated to be cheaper than when
it is near zero, due to reduction of random access to the disk. (This column
is null if the column data type does not have a < operator.)
/
I can not explain exactly what is/ physical row ordering and logical
ordering/ of the column values , can you explain to me with an simple
example ??




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Correlation-in-pg-stats-tp5733524p5733655.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Correlation in pg_stats

From
Gavin Flower
Date:
On 27/11/12 14:23, classical_89 wrote:
Thanks , i just want to get a correlation of near 0 to understand exactly
what correlation , i quite ambiguity about this concept . /
Statistical correlation between physical row ordering and logical ordering
of the column values. This ranges from -1 to +1. When the value is near -1
or +1, an index scan on the column will be estimated to be cheaper than when
it is near zero, due to reduction of random access to the disk. (This column
is null if the column data type does not have a < operator.)
/
I can not explain exactly what is/ physical row ordering and logical
ordering/ of the column values , can you explain to me with an simple
example ?? 




--
View this message in context: http://postgresql.1045698.n5.nabble.com/Correlation-in-pg-stats-tp5733524p5733655.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Logical ordering depends on what you want it to be.  Typically the order of records according to some index, often the order defined by the Primary Key.  Some people might think of the order in which records are inserted - this is mucky: inserts, deletes, and updates may change the actual ordering on disk.

The physical order is the way records are stored on disk, and within each disk block.  The disk blocks will not necessarily be arranged in any particular order, as Postgres will have its own notion as to what is efficient and practical (or simply expedient!).  Even if you knew how it did it in one version of Postgres, there is no requirement for a new version of Postgres to do it in the same way.

Note that the order records are returned by Postgres, may be neither the logical order nor a logical order!  As Postgres will return them in any order it deems fit, unless you specify an ORDER BY statement. 

One of the reasons is that it may have some pages in memory of that table due to some other query, so these records might be returned before records that need to be fetched from the hard disk. Another reason is that your query is reading in the whole table, and started half way through someone else's query is reading in the same table - so Postgres is likely to return records that are in memory from the other query continuing as they are found.

Note that Postgres will endeavour to return your records in the most efficient manner it can, due to transaction isolation and other factors, the order is likely to be unpredictable and inconsistent from run to run in a heavily used database.

Why should Postgres bother to order records in any particular way, unless the user has explicitly requested an order?


Cheers,
Gavin


Re: Correlation in pg_stats

From
Andrew Sullivan
Date:
On Wed, Nov 28, 2012 at 09:21:02AM +1300, Gavin Flower wrote:
> Why should Postgres bother to order records in any particular way,
> unless the user has explicitly requested an order?

More importantly, since SQL results are sets (not ordered sets),
it is plain out required to request an order if you want one.

A

--
Andrew Sullivan
ajs@crankycanuck.ca