Re: Compare rows - Mailing list pgsql-performance
From | Jason Hihn |
---|---|
Subject | Re: Compare rows |
Date | |
Msg-id | NGBBLHANMLKMHPDGJGAPAEBACOAA.jhihn@paytimepayroll.com Whole thread Raw |
In response to | Re: Compare rows (Greg Spiegelberg <gspiegelberg@cranel.com>) |
List | pgsql-performance |
Comment interjected below. > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Greg > Spiegelberg > Sent: Wednesday, October 08, 2003 12:28 PM > To: PgSQL Performance ML > Subject: Re: [PERFORM] Compare rows > > > Josh Berkus wrote: > > Greg, > > > > > >>Anyone have any suggestions on how to efficiently compare > >>rows in the same table? This table has 637 columns to be > >>compared and 642 total columns. > > > > > > 637 columns? Are you sure that's normalized? It's hard for > me to conceive > > of a circumstance where that many columns would be necessary. > > > > If this isn't a catastrophic normalization problem (which it > sounds like), > > then you will probably still need to work through procedureal > normalization > > code, as SQL simply doesn't offer any way around naming all the > columns by > > hand. Perhaps you could describe the problem in more detail? > > > > The data represents metrics at a point in time on a system for > network, disk, memory, bus, controller, and so-on. Rx, Tx, errors, > speed, and whatever else can be gathered. > > We arrived at this one 642 column table after testing the whole > process from data gathering, methods of temporarily storing then > loading to the database. Initially, 37+ tables were in use but > the one big-un has saved us over 3.4 minutes. > > The reason for my initial question was this. We save changes only. > In other words, if system S has row T1 for day D1 and if on day D2 > we have another row T1 (excluding our time column) we don't want > to save it. Um, isn't this a purpose of a key? And I am confused. Do you want to UPDATE the changed columns? or skip it all together? You have: (System, Day, T1 | T2 |...Tn ) But should use: Master: (System, Day, Table={T1, T2, .. Tn)) [Keys: sytem, day, table] T1 { System, Day, {other fields}} [foreign keys [system, day] This should allow you to find your dupes very fast (indexes!) and save a lot of space (few/no null columns), and now you don't have to worry about comparing fields, and moving huge result sets around. > That said, if the 3.4 minutes gets burned during our comparison which > saves changes only we may look at reverting to separate tables. There > are only 1,700 to 3,000 rows on average per load. > > Oh, PostgreSQL 7.3.3, PHP 4.3.1, RedHat 7.3, kernel 2.4.20-18.7smp, > 2x1.4GHz PIII, 2GB memory, and 1Gbs SAN w/ Hitachi 9910 LUN's. > > Greg > > -- > Greg Spiegelberg > Sr. Product Development Engineer > Cranel, Incorporated. > Phone: 614.318.4314 > Fax: 614.431.8388 > Email: gspiegelberg@Cranel.com > Cranel. Technology. Integrity. Focus. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
pgsql-performance by date: