Space wasted by denormalized data - Mailing list pgsql-general

From Decibel!
Subject Space wasted by denormalized data
Date
Msg-id 20080228215717.GN1212@decibel.org
Whole thread Raw
Responses Re: Space wasted by denormalized data
List pgsql-general
I wrote this query to identify how much space is being wasted in a
database by denormalized data. For each field in each table, it
calculates how much space the field is taking in it's table as well as
indexes, then it calculates how much space would be needed if that field
was normalized into a separate table. It places some (somewhat
arbitrary) minimums on how much space would have to be saved to include
that field in the output. If you want to get rid of the limit you should
still keep savings > 0 in the query, otherwise you'll start seeing
normalization suggestions that make no sense (like normalizing an int).

As it stands, the query reports a total for the entire database, but you
can use just sub-sections of the query to see savings per-table, or
per-field.

-- Total
SELECT pg_size_pretty(sum(sum)::bigint)
    -- Summarize by table
    FROM (SELECT schemaname, tablename, sum(savings), pg_size_pretty(sum(savings)::bigint)
            -- Get pretty size. Start here if you want per-table info
            FROM (SELECT *, pg_size_pretty(savings::bigint)
                    FROM (
-- Here's where the real work starts
SELECT *, table_space_delta + coalesce(index_space_delta, 0) - side_table_space AS savings
    FROM (
        -- Figure out how much space we'd save in indexes by converting to an int
        SELECT *, index_tuple_count*(avg_width-4) AS index_space_delta
            FROM (SELECT s.*, index_count, index_tuple_count
                        FROM (SELECT schemaname, tablename, attname, null_frac, avg_width, n_distinct
                                            -- How much space would we gain by changing this field to an int?
                                            , reltuples*(1-null_frac)*(avg_width-4) AS table_space_delta
                                            -- Estimate how big our "side table" will be
                                            , CASE WHEN n_distinct >= 0 THEN n_distinct ELSE -n_distinct * reltuples
END
                                                * (24+4+avg_width+6+4+6+avg_width) AS side_table_space
                                FROM pg_stats s
                                    JOIN pg_class c ON c.relname=tablename AND c.relkind='r'
                                    JOIN pg_namespace n ON n.oid = c.relnamespace AND n.nspname=s.schemaname
                                WHERE schemaname NOT IN ('pg_catalog','information_schema')
                            ) s
                            NATURAL LEFT JOIN (
                                    SELECT n.nspname AS schemaname, c.relname AS tablename, attname
                                                , count(*) AS index_count, sum(i.reltuples) AS index_tuple_count
                                        FROM pg_index x
                                            JOIN pg_class c ON c.oid = x.indrelid
                                            JOIN pg_class i ON i.oid = x.indexrelid
                                            JOIN pg_namespace n ON n.oid = c.relnamespace
                                            JOIN pg_attribute a ON a.attrelid = i.oid
                                        GROUP BY n.nspname, c.relname, attname
                                                ) i
                    ) a
            ) a
                            ) a
                        -- Minimum savings to consider per-field
                        WHERE savings > 1e6) a
                GROUP BY schemaname, tablename
                -- Minimum savings to consider per-table
                HAVING sum(savings) > 1e7
                ORDER BY sum(savings) DESC
            ) a
;
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Attachment

pgsql-general by date:

Previous
From: Scara Maccai
Date:
Subject: Re: partitioning using dblink
Next
From: Thomas Kellerer
Date:
Subject: Re: syntax errors at "line 1" when executing every command