Home > mailing lists

Re: Query using SeqScan instead of IndexScan - Mailing list pgsql-performance

From	Mark Kirkwood
Subject	Re: Query using SeqScan instead of IndexScan
Date	April 2, 2006 00:31:40
Msg-id	442F53A6.9090707@paradise.net.nz Whole thread Raw
In response to	Re: Query using SeqScan instead of IndexScan ("chris smith" <dmagick@gmail.com>)
Responses	Re: Query using SeqScan instead of IndexScan
List	pgsql-performance

Tree view

chris smith wrote:

> <rant>
> It'd be nice if the database developers agreed on what terms meant.
>
> http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html
>
> The SHOW INDEX statement displays a cardinality value based on N/S,
> where N is the number of rows in the table and S is the average value
> group size. That ratio yields an approximate number of value groups in
> the table.
> </rant>
>
> A work colleague found that information a few weeks ago so that's
> where my misunderstanding came from - if I'm reading that right they
> use n_distinct as their "cardinality" basis.. then again I could be
> reading that completely wrong too.
>

Yeah that's right - e.g using the same table in postgres and mysql:

pgsql> SELECT attname,n_distinct,correlation
        FROM pg_stats
        WHERE tablename='fact0'
        AND attname LIKE 'd%key';
  attname | n_distinct | correlation
---------+------------+-------------
  d0key   |      10000 |  -0.0211169
  d1key   |        100 |    0.124012
  d2key   |         10 |    0.998393
(3 rows)


mysql> SHOW INDEX FROM fact0
     -> ;

+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name    | Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |

+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| fact0 |          1 | fact0_d0key |            1 | d0key       | A
     |       10000 |     NULL | NULL   |      | BTREE      |         |
| fact0 |          1 | fact0_d1key |            1 | d1key       | A
     |         100 |     NULL | NULL   |      | BTREE      |         |
| fact0 |          1 | fact0_d2key |            1 | d2key       | A
     |          10 |     NULL | NULL   |      | BTREE      |         |

+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)


It is a bit confusing - '(distinct) cardinality' might be a better
heading for their 'cardinality' column!

On the correlation business - I don't think Mysql calculates it (or if
it does, its not displayed).


> I believe postgres (because it's a lot more standards compliant).. but
> sheesh - what a difference!
>

Well yes - however, to be fair to the Mysql guys, AFAICS the capture and
display of index stats (and any other optimizer related data) is not
part of any standard.


Cheers

Mark

pgsql-performance by date:

From: Alvaro Herrera
Date: 02 April 2006, 00:26:50
Subject: Re: Query using SeqScan instead of IndexScan

From: Tom Lane
Date: 02 April 2006, 00:46:19
Subject: Re: Query using SeqScan instead of IndexScan

Re: Query using SeqScan instead of IndexScan - Mailing list pgsql-performance

Previous

Next