Re: Compare rows - Mailing list pgsql-performance

From Jason Hihn
Subject Re: Compare rows
Date
Msg-id NGBBLHANMLKMHPDGJGAPAEBACOAA.jhihn@paytimepayroll.com
Whole thread Raw
In response to Re: Compare rows  (Greg Spiegelberg <gspiegelberg@cranel.com>)
List pgsql-performance
Comment interjected below.

> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Greg
> Spiegelberg
> Sent: Wednesday, October 08, 2003 12:28 PM
> To: PgSQL Performance ML
> Subject: Re: [PERFORM] Compare rows
>
>
> Josh Berkus wrote:
> > Greg,
> >
> >
> >>Anyone have any suggestions on how to efficiently compare
> >>rows in the same table?  This table has 637 columns to be
> >>compared and 642 total columns.
> >
> >
> > 637 columns?   Are you sure that's normalized?   It's hard for
> me to conceive
> > of a circumstance where that many columns would be necessary.
> >
> > If this isn't a catastrophic normalization problem (which it
> sounds like),
> > then you will probably still need to work through procedureal
> normalization
> > code, as SQL simply doesn't offer any way around naming all the
> columns by
> > hand.   Perhaps you could describe the problem in more detail?
> >
>
> The data represents metrics at a point in time on a system for
> network, disk, memory, bus, controller, and so-on.  Rx, Tx, errors,
> speed, and whatever else can be gathered.
>
> We arrived at this one 642 column table after testing the whole
> process from data gathering, methods of temporarily storing then
> loading to the database.  Initially, 37+ tables were in use but
> the one big-un has saved us over 3.4 minutes.
>
> The reason for my initial question was this.  We save changes only.
> In other words, if system S has row T1 for day D1 and if on day D2
> we have another row T1 (excluding our time column) we don't want
> to save it.

Um, isn't this a purpose of a key? And I am confused. Do you want to UPDATE
the changed columns? or skip it all together?
You have: (System, Day, T1 | T2 |...Tn )
But should use:
Master: (System, Day, Table={T1, T2, .. Tn)) [Keys: sytem, day, table]
T1 { System, Day, {other fields}}  [foreign keys [system, day]

This should allow you to find your dupes very fast (indexes!) and save a lot
of space (few/no null columns), and now you don't have to worry about
comparing fields, and moving huge result sets around.


> That said, if the 3.4 minutes gets burned during our comparison which
> saves changes only we may look at reverting to separate tables.  There
> are only 1,700 to 3,000 rows on average per load.
>
> Oh, PostgreSQL 7.3.3, PHP 4.3.1, RedHat 7.3, kernel 2.4.20-18.7smp,
> 2x1.4GHz PIII, 2GB memory, and 1Gbs SAN w/ Hitachi 9910 LUN's.
>
> Greg
>
> --
> Greg Spiegelberg
>   Sr. Product Development Engineer
>   Cranel, Incorporated.
>   Phone: 614.318.4314
>   Fax:   614.431.8388
>   Email: gspiegelberg@Cranel.com
> Cranel. Technology. Integrity. Focus.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Compare rows
Next
From: Bruce Momjian
Date:
Subject: Re: PostgreSQL vs. MySQL