Re: Optimization, etc - Mailing list pgsql-sql

From Masaru Sugawara
Subject Re: Optimization, etc
Date
Msg-id 20011117165704.36EF.RK73@echna.ne.jp
Whole thread Raw
In response to Re: Optimization, etc  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-sql
On Fri, 9 Nov 2001 07:57:41 -0800 (PST)
Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:


> > On Fri, 9 Nov 2001, Jeff Sack wrote:
> >
> > One separate issue (the reason why the above examples are all about
> > batting statistics) I'm having is representing the innings pitched
> > statistic.  The way it is often represented (and the way it is done in
> > this schema) is something like this 123.0 means exactly 123 innings
> > pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings
> > pitched.  I'm contemplating the best way to write a function that knows
> > how to sum these values accurately.  Is this something that can be done
> > with PL/PGSQL or should I go straight to something like PLPERL?
> > Alternatively, I could research a way to represent fractions in the DB
> > and write a script to convert all values in this column.  Any advice
> > here??
> 
> You'd probably be best off doing the conversion at insert time into an
> additional field assuming that inserts are much less likely that
> selects on your data.
> 
> (col-floor(col)*(10/3::numeric) seems to get back an appropriate value
> but is probably reasonably expensive.
Hi,I tried to put your nice idea into the GROUP BY clause, since I wantedto know how expensive it is.  There are
severalplayers in the playerstable and about 60k rows in the pitching_stats, which have only columnsconcerned with the
SUM()aggregate function. In case of my PC with 333MHz clock, 256M SDRAM, and ATA33 HDD, the costs of executing the
followingqueries are about 4 sec and 6 sec, respectively.  It seems to me that they are reasonable.  However the ratio
ofthe sort time appears to occupyquite a bit of time.CREATE TABLE players       (id serial
CONSTRAINT pkey_players                               PRIMARY KEY,                               first_name
varchar(30));CREATE TABLE pitching_stats(id integer                               NOT NULL
CONSTRAINT key_pitching_stats                               REFERENCES players(id),                            ip
numeric(5,1));


----- 1st version.  It costs about 4 sec.
SELECT pl.id, pl.first_name, t.ret FROM (SELECT ps.id, sum(ps.ip-floor(ps.ip)) % 0.3::numeric                +
floor((sum(ps.ip-floor(ps.ip)))/ 0.3::numeric)                + sum(floor(ps.ip)) AS ret           FROM pitching_stats
ASps           GROUP BY ps.id       ) AS t INNER JOIN       players AS pl ON (t.id = pl.id)
 


----- 2nd version.  It costs about 6 sec.
SELECT pl.id, pl.first_name,       sum(ps.ip-floor(ps.ip)) % 0.3::numeric       + floor((sum(ps.ip-floor(ps.ip))) /
0.3::numeric)      + sum(floor(ps.ip)) AS ret  FROM players AS pl INNER JOIN       pitching_stats AS ps ON (pl.id =
ps.id) GROUP BY pl.id, pl.first_name
 



QUERY PLAN:
----- 1st version.
Hash Join  (cost=6357.01..7197.06 rows=6000 width=32) ->  Subquery Scan t  (cost=6355.96..6955.96 rows=6000 width=16)
   ->  Aggregate  (cost=6355.96..6955.96 rows=6000 width=16)             ->  Group  (cost=6355.96..6505.96 rows=60000
width=16)                  ->  Sort  (cost=6355.96..6355.96 rows=60000 width=16)                         ->  Seq Scan
onpitching_stats ps                                  (cost=0.00..983.00 rows=60000 width=16) ->  Hash  (cost=1.04..1.04
rows=4width=16)       ->  Seq Scan on players pl  (cost=0.00..1.04 rows=4 width=16)
 

----- 2nd version.
Aggregate  (cost=9037.33..9787.33 rows=6000 width=32) ->  Group  (cost=9037.33..9337.33 rows=60000 width=32)       ->
Sort (cost=9037.33..9037.33 rows=60000 width=32)             ->  Hash Join  (cost=1.05..3384.10 rows=60000 width=32)
              ->  Seq Scan on pitching_stats ps                                 (cost=0.00..983.00 rows=6 0000
width=16)                  ->  Hash  (cost=1.04..1.04 rows=4 width=16)                         ->  Seq Scan on players
pl                                    (cost=0.00..1.04 rows=4 width=16)
 


regards,
Masaru Sugawara


pgsql-sql by date:

Previous
From: Horst Herb
Date:
Subject: Re: constraint via selection
Next
From: "Aasmund Midttun Godal"
Date:
Subject: Re: constraint via selection