Re: Optimization, etc - Mailing list pgsql-sql
| From | Masaru Sugawara |
|---|---|
| Subject | Re: Optimization, etc |
| Date | |
| Msg-id | 20011117165704.36EF.RK73@echna.ne.jp Whole thread Raw |
| In response to | Re: Optimization, etc (Stephan Szabo <sszabo@megazone23.bigpanda.com>) |
| List | pgsql-sql |
On Fri, 9 Nov 2001 07:57:41 -0800 (PST)
Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:
> > On Fri, 9 Nov 2001, Jeff Sack wrote:
> >
> > One separate issue (the reason why the above examples are all about
> > batting statistics) I'm having is representing the innings pitched
> > statistic. The way it is often represented (and the way it is done in
> > this schema) is something like this 123.0 means exactly 123 innings
> > pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings
> > pitched. I'm contemplating the best way to write a function that knows
> > how to sum these values accurately. Is this something that can be done
> > with PL/PGSQL or should I go straight to something like PLPERL?
> > Alternatively, I could research a way to represent fractions in the DB
> > and write a script to convert all values in this column. Any advice
> > here??
>
> You'd probably be best off doing the conversion at insert time into an
> additional field assuming that inserts are much less likely that
> selects on your data.
>
> (col-floor(col)*(10/3::numeric) seems to get back an appropriate value
> but is probably reasonably expensive.
Hi,I tried to put your nice idea into the GROUP BY clause, since I wantedto know how expensive it is. There are
severalplayers in the playerstable and about 60k rows in the pitching_stats, which have only columnsconcerned with the
SUM()aggregate function. In case of my PC with 333MHz clock, 256M SDRAM, and ATA33 HDD, the costs of executing the
followingqueries are about 4 sec and 6 sec, respectively. It seems to me that they are reasonable. However the ratio
ofthe sort time appears to occupyquite a bit of time.CREATE TABLE players (id serial
CONSTRAINT pkey_players PRIMARY KEY, first_name
varchar(30));CREATE TABLE pitching_stats(id integer NOT NULL
CONSTRAINT key_pitching_stats REFERENCES players(id), ip
numeric(5,1));
----- 1st version. It costs about 4 sec.
SELECT pl.id, pl.first_name, t.ret FROM (SELECT ps.id, sum(ps.ip-floor(ps.ip)) % 0.3::numeric +
floor((sum(ps.ip-floor(ps.ip)))/ 0.3::numeric) + sum(floor(ps.ip)) AS ret FROM pitching_stats
ASps GROUP BY ps.id ) AS t INNER JOIN players AS pl ON (t.id = pl.id)
----- 2nd version. It costs about 6 sec.
SELECT pl.id, pl.first_name, sum(ps.ip-floor(ps.ip)) % 0.3::numeric + floor((sum(ps.ip-floor(ps.ip))) /
0.3::numeric) + sum(floor(ps.ip)) AS ret FROM players AS pl INNER JOIN pitching_stats AS ps ON (pl.id =
ps.id) GROUP BY pl.id, pl.first_name
QUERY PLAN:
----- 1st version.
Hash Join (cost=6357.01..7197.06 rows=6000 width=32) -> Subquery Scan t (cost=6355.96..6955.96 rows=6000 width=16)
-> Aggregate (cost=6355.96..6955.96 rows=6000 width=16) -> Group (cost=6355.96..6505.96 rows=60000
width=16) -> Sort (cost=6355.96..6355.96 rows=60000 width=16) -> Seq Scan
onpitching_stats ps (cost=0.00..983.00 rows=60000 width=16) -> Hash (cost=1.04..1.04
rows=4width=16) -> Seq Scan on players pl (cost=0.00..1.04 rows=4 width=16)
----- 2nd version.
Aggregate (cost=9037.33..9787.33 rows=6000 width=32) -> Group (cost=9037.33..9337.33 rows=60000 width=32) ->
Sort (cost=9037.33..9037.33 rows=60000 width=32) -> Hash Join (cost=1.05..3384.10 rows=60000 width=32)
-> Seq Scan on pitching_stats ps (cost=0.00..983.00 rows=6 0000
width=16) -> Hash (cost=1.04..1.04 rows=4 width=16) -> Seq Scan on players
pl (cost=0.00..1.04 rows=4 width=16)
regards,
Masaru Sugawara