Space wasted by denormalized data - Mailing list pgsql-general
From | Decibel! |
---|---|
Subject | Space wasted by denormalized data |
Date | |
Msg-id | 20080228215717.GN1212@decibel.org Whole thread Raw |
Responses |
Re: Space wasted by denormalized data
|
List | pgsql-general |
I wrote this query to identify how much space is being wasted in a database by denormalized data. For each field in each table, it calculates how much space the field is taking in it's table as well as indexes, then it calculates how much space would be needed if that field was normalized into a separate table. It places some (somewhat arbitrary) minimums on how much space would have to be saved to include that field in the output. If you want to get rid of the limit you should still keep savings > 0 in the query, otherwise you'll start seeing normalization suggestions that make no sense (like normalizing an int). As it stands, the query reports a total for the entire database, but you can use just sub-sections of the query to see savings per-table, or per-field. -- Total SELECT pg_size_pretty(sum(sum)::bigint) -- Summarize by table FROM (SELECT schemaname, tablename, sum(savings), pg_size_pretty(sum(savings)::bigint) -- Get pretty size. Start here if you want per-table info FROM (SELECT *, pg_size_pretty(savings::bigint) FROM ( -- Here's where the real work starts SELECT *, table_space_delta + coalesce(index_space_delta, 0) - side_table_space AS savings FROM ( -- Figure out how much space we'd save in indexes by converting to an int SELECT *, index_tuple_count*(avg_width-4) AS index_space_delta FROM (SELECT s.*, index_count, index_tuple_count FROM (SELECT schemaname, tablename, attname, null_frac, avg_width, n_distinct -- How much space would we gain by changing this field to an int? , reltuples*(1-null_frac)*(avg_width-4) AS table_space_delta -- Estimate how big our "side table" will be , CASE WHEN n_distinct >= 0 THEN n_distinct ELSE -n_distinct * reltuples END * (24+4+avg_width+6+4+6+avg_width) AS side_table_space FROM pg_stats s JOIN pg_class c ON c.relname=tablename AND c.relkind='r' JOIN pg_namespace n ON n.oid = c.relnamespace AND n.nspname=s.schemaname WHERE schemaname NOT IN ('pg_catalog','information_schema') ) s NATURAL LEFT JOIN ( SELECT n.nspname AS schemaname, c.relname AS tablename, attname , count(*) AS index_count, sum(i.reltuples) AS index_tuple_count FROM pg_index x JOIN pg_class c ON c.oid = x.indrelid JOIN pg_class i ON i.oid = x.indexrelid JOIN pg_namespace n ON n.oid = c.relnamespace JOIN pg_attribute a ON a.attrelid = i.oid GROUP BY n.nspname, c.relname, attname ) i ) a ) a ) a -- Minimum savings to consider per-field WHERE savings > 1e6) a GROUP BY schemaname, tablename -- Minimum savings to consider per-table HAVING sum(savings) > 1e7 ORDER BY sum(savings) DESC ) a ; -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Attachment
pgsql-general by date: