Re: Optimization, etc - Mailing list pgsql-sql
From | Masaru Sugawara |
---|---|
Subject | Re: Optimization, etc |
Date | |
Msg-id | 20011117165704.36EF.RK73@echna.ne.jp Whole thread Raw |
In response to | Re: Optimization, etc (Stephan Szabo <sszabo@megazone23.bigpanda.com>) |
List | pgsql-sql |
On Fri, 9 Nov 2001 07:57:41 -0800 (PST) Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: > > On Fri, 9 Nov 2001, Jeff Sack wrote: > > > > One separate issue (the reason why the above examples are all about > > batting statistics) I'm having is representing the innings pitched > > statistic. The way it is often represented (and the way it is done in > > this schema) is something like this 123.0 means exactly 123 innings > > pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings > > pitched. I'm contemplating the best way to write a function that knows > > how to sum these values accurately. Is this something that can be done > > with PL/PGSQL or should I go straight to something like PLPERL? > > Alternatively, I could research a way to represent fractions in the DB > > and write a script to convert all values in this column. Any advice > > here?? > > You'd probably be best off doing the conversion at insert time into an > additional field assuming that inserts are much less likely that > selects on your data. > > (col-floor(col)*(10/3::numeric) seems to get back an appropriate value > but is probably reasonably expensive. Hi,I tried to put your nice idea into the GROUP BY clause, since I wantedto know how expensive it is. There are severalplayers in the playerstable and about 60k rows in the pitching_stats, which have only columnsconcerned with the SUM()aggregate function. In case of my PC with 333MHz clock, 256M SDRAM, and ATA33 HDD, the costs of executing the followingqueries are about 4 sec and 6 sec, respectively. It seems to me that they are reasonable. However the ratio ofthe sort time appears to occupyquite a bit of time.CREATE TABLE players (id serial CONSTRAINT pkey_players PRIMARY KEY, first_name varchar(30));CREATE TABLE pitching_stats(id integer NOT NULL CONSTRAINT key_pitching_stats REFERENCES players(id), ip numeric(5,1)); ----- 1st version. It costs about 4 sec. SELECT pl.id, pl.first_name, t.ret FROM (SELECT ps.id, sum(ps.ip-floor(ps.ip)) % 0.3::numeric + floor((sum(ps.ip-floor(ps.ip)))/ 0.3::numeric) + sum(floor(ps.ip)) AS ret FROM pitching_stats ASps GROUP BY ps.id ) AS t INNER JOIN players AS pl ON (t.id = pl.id) ----- 2nd version. It costs about 6 sec. SELECT pl.id, pl.first_name, sum(ps.ip-floor(ps.ip)) % 0.3::numeric + floor((sum(ps.ip-floor(ps.ip))) / 0.3::numeric) + sum(floor(ps.ip)) AS ret FROM players AS pl INNER JOIN pitching_stats AS ps ON (pl.id = ps.id) GROUP BY pl.id, pl.first_name QUERY PLAN: ----- 1st version. Hash Join (cost=6357.01..7197.06 rows=6000 width=32) -> Subquery Scan t (cost=6355.96..6955.96 rows=6000 width=16) -> Aggregate (cost=6355.96..6955.96 rows=6000 width=16) -> Group (cost=6355.96..6505.96 rows=60000 width=16) -> Sort (cost=6355.96..6355.96 rows=60000 width=16) -> Seq Scan onpitching_stats ps (cost=0.00..983.00 rows=60000 width=16) -> Hash (cost=1.04..1.04 rows=4width=16) -> Seq Scan on players pl (cost=0.00..1.04 rows=4 width=16) ----- 2nd version. Aggregate (cost=9037.33..9787.33 rows=6000 width=32) -> Group (cost=9037.33..9337.33 rows=60000 width=32) -> Sort (cost=9037.33..9037.33 rows=60000 width=32) -> Hash Join (cost=1.05..3384.10 rows=60000 width=32) -> Seq Scan on pitching_stats ps (cost=0.00..983.00 rows=6 0000 width=16) -> Hash (cost=1.04..1.04 rows=4 width=16) -> Seq Scan on players pl (cost=0.00..1.04 rows=4 width=16) regards, Masaru Sugawara