Re: Compare rows - Mailing list pgsql-performance
From | Jason Hihn |
---|---|
Subject | Re: Compare rows |
Date | |
Msg-id | NGBBLHANMLKMHPDGJGAPIEBECOAA.jhihn@paytimepayroll.com Whole thread Raw |
In response to | Re: Compare rows (Greg Spiegelberg <gspiegelberg@cranel.com>) |
Responses |
Re: Compare rows
|
List | pgsql-performance |
> -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Greg > Spiegelberg > Sent: Wednesday, October 08, 2003 3:11 PM > To: PgSQL Performance ML > Subject: Re: [PERFORM] Compare rows > > > Josh Berkus wrote: > > Greg, > > > > > >>The data represents metrics at a point in time on a system for > >>network, disk, memory, bus, controller, and so-on. Rx, Tx, errors, > >>speed, and whatever else can be gathered. > >> > >>We arrived at this one 642 column table after testing the whole > >>process from data gathering, methods of temporarily storing then > >>loading to the database. Initially, 37+ tables were in use but > >>the one big-un has saved us over 3.4 minutes. > > > > > > Hmmm ... if few of those columns are NULL, then you are > probably right ... > > this is probably the most normalized design. If, however, > many of columns > > are NULL the majority of the time, then the design you should > be using is a > > vertial child table, of the form ( value_type | value ). > > > > Such a vertical child table would also make your comparison > between instances > > *much* easier, as it could be executed via a simple > 4-table-outer-join and 3 > > where clauses. So even if you don't have a lot of NULLs, you > probably want > > to consider this. > > You lost me on that one. What's a "vertical child table"? Parent table Fkey | Option | Value ------------------+--------+------- | OS | Solaris | DISK1 | 30g ^^^^^^^^ ^^^-- values fields are values in a column rather than 'fields' > Statistically, about 6% of the rows use more than 200 of the columns, > 27% of the rows use 80-199 or more columns, 45% of the rows use 40-79 > columns and the remaining 22% of the rows use 39 or less of the columns. > That is a lot of NULLS. Never gave that much thought. > > To ensure query efficiency, hide the NULLs and simulate the multiple > tables I have a boatload of indexes, ensure that every query makees use > of an index, and have created 37 views. It's worked pretty well so > far > > > >>The reason for my initial question was this. We save changes only. > >>In other words, if system S has row T1 for day D1 and if on day D2 > >>we have another row T1 (excluding our time column) we don't want > >>to save it. > > > > > > If re-designing the table per the above is not a possibility, > then I'd suggest > > that you locate 3-5 columns that: > > 1) are not NULL for any row; > > 2) combined, serve to identify a tiny subset of rows, i.e. 3% > or less of the > > table. > > There are always, always, always 7 columns that contain data. > > > > Then put a multi-column index on those columns, and do your > comparison. > > Hopefully the planner should pick up on the availablity of the > index and scan > > only the rows retrieved by the index. However, there is the distinct > > possibility that the presence of 637 WHERE criteria will > confuse the planner, > > causing it to resort to a full table seq scan; in that case, > you will want to > > use a subselect to force the issue. > > That's what I'm trying to avoid is a big WHERE (c1,c2,...,c637) <> > (d1,d2,...,d637) clause. Ugly. > > > > Or, as Joe Conway suggested, you could figure out some kind of > value hash that > > uniquely identifies your rows. > > I've given that some though and though appealing I don't think I'd care > to spend the CPU cycles to do it. Best way I can figure to accomplish > it would be to generate an MD5 on each row without the timestamp and > store it in another column, create an index on the MD5 column, generate > MD5 on each line I want to insert. Makes for a simple WHERE... > > Okay. I'll give it a whirl. What's one more column, right? > > Greg > > -- > Greg Spiegelberg > Sr. Product Development Engineer > Cranel, Incorporated. > Phone: 614.318.4314 > Fax: 614.431.8388 > Email: gspiegelberg@Cranel.com > Cranel. Technology. Integrity. Focus. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
pgsql-performance by date: