Re: Questions about indexes? - Mailing list pgsql-performance

From Curt Sampson
Subject Re: Questions about indexes?
Date
Msg-id Pine.NEB.4.51.0302202020210.474@angelic-vtfw.cvpn.cynic.net
Whole thread Raw
In response to Re: Questions about indexes?  (Ryan Bradetich <rbradetich@uswest.net>)
List pgsql-performance
On Wed, 19 Feb 2003, Ryan Bradetich wrote:

> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user x has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has an invalid shell.
> 1 | Mon Feb 17 00:34:24 MST 2003 | p101 | user y has expired password.
> 2 | Mon Feb 17 00:34:24 MST 2003 | f101 | file /foo has improper owner.

If you're going to normalize this a bit, you should start looking at
the data that are repeated and trying to get rid of the repititions.
First of all, the timestamp is repeated a lot, you might move that to a
separate table and just use a key into that table. But you might even
do better with multiple columns: combine the timestamp and host ID into
one table to get a "host report instance" and replace both those columns
with just that. If host-id/timestamp/category triplets are frequently
repeated, you might even consider combining the three into another
table, and just using an ID from that table with each anomaly.

But the biggest space and time savings would come from normalizing your
anomalys themselves, because there's a huge amount repeated there. If you're
able to change the format to something like:

    invalid shell for user: x
    invalid shell for user: y
    expired password for user: y
    improper owner for file: /foo

You can split those error messages off into another table:

    anomaly_id | anomaly
    -----------+------------------------------------------------
             1 | invalid shell for user
         2 | expired password for user
         3 | improper owner for file

And now your main table looks like this:

    host_id | timestamp                    | ctgr | anomaly_id | datum
    --------+------------------------------+------+------------+------
          1 | Mon Feb 17 00:34:24 MST 2003 | p101 |          1 | x
          1 | Mon Feb 17 00:34:24 MST 2003 | p101 |          1 | y
          1 | Mon Feb 17 00:34:24 MST 2003 | p101 |          2 | y
          2 | Mon Feb 17 00:34:24 MST 2003 | f101 |          3 | /foo

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

pgsql-performance by date:

Previous
From: Mike Nielsen
Date:
Subject: Re: Peluang Usaha yang Luar Biasa
Next
From: Josh Berkus
Date:
Subject: Re: Tuning scenarios (was Changing the default configuration)