Optimization, etc - Mailing list pgsql-sql

From Jeff Sack
Subject Optimization, etc
Date
Msg-id 005101c168e9$a7aac5a0$019ca8c0@jeff
Whole thread Raw
In response to Re: transposing data for a view  ("Josh Berkus" <josh@agliodbs.com>)
Responses Re: Optimization, etc
List pgsql-sql

 

(apologies if this was sent twice)

 

Hello all.  I have an academic project that I’m working on and, as I’m relatively new to optimization techniques and database design, I’m wondering if some of you can give me some pointers.  Below is the schema (to model baseball statistics), and I’m pretty much stuck with it at this point.  If anyone has any suggestions involving changing the schema, I’d appreciate hearing them just for future reference…

 

There are several interesting queries that I’ve been constructing just to get a feel for this schema, and given that some of these tables contain 100,000-200,000 tuples, some queries are taking a good 5-10 seconds to execute.  I’m just wondering if this is simply the fault of my schema or are these queries poorly constructed?  Here are some queries I’ve been trying:

 

Here’s a query for the top ten all time home run leaders:

select P.first_name, P.last_name, S.player_id, sum(B.hr) as hr, (sum(B.h)::float / sum(B.ab)::float) ab

from statistics S, batting_stats B, players P

where S.id=B.id and S.player_id=P.id and B.ab>0

group by P.first_name, P.last_name, S.player_id

order by hr desc limit 10;

 

Select the 10 highest batting averages of all time where the batter had 600 or more at bats.  Also gather the name, year, team, hits, at bats...

 

select (B.h::float / B.ab) as avg, B.h, B.ab, S.year, P.first_name, P.last_name, T.city, T.name

from batting_stats B, statistics S, players P, Teams T

where B.ab > 600 and S.id=B.id and S.player_id=P.id and S.team_id=T.id

order by avg desc limit 10;

 

To find the names of the single season home run leaders, along with the total number of home runs, the team name/city and the year:

 

select P.first_name, P.last_name, B.hr, T.name, T.city, S.year from statistics S, batting_stats B, players P, teams T

where (S.id=B.id) and

(S.player_id=P.id) and

(B.hr>30) and

(T.id=S.team_id) limit 10;

 

You get the idea.  These queries take a while.  Is this just the way it is or there things that can be done to optimize this? 

 

One separate issue (the reason why the above examples are all about batting statistics) I’m having is representing the innings pitched statistic.  The way it is often represented (and the way it is done in this schema) is something like this 123.0 means exactly 123 innings pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings pitched.  I’m contemplating the best way to write a function that knows how to sum these values accurately.  Is this something that can be done with PL/PGSQL or should I go straight to something like PLPERL?  Alternatively, I could research a way to represent fractions in the DB and write a script to convert all values in this column.  Any advice here??

 

Thanks in advance for any thoughts, comments, suggestions…

 

-Jeff

 

 

create table leagues

(

            -- create an integer id field for easier and efficient FK referencing

            id                     serial,

            name               varchar(50),

            first_year         integer not null

                        check (first_year > 1850 and first_year < 2010),

            last_year         integer

                        check (first_year > 1850 and first_year < 2010),

            primary key(id),

            constraint chronological_order

                        check (last_year >= first_year)

);

 

 

create table teams

(

            id                     serial,

            name               varchar(50) not null,

            city                  varchar(50),

            first_year         integer not null

                        check (first_year > 1850 and first_year < 2010),

            last_year         integer

                        check (first_year > 1850 and first_year < 2010),

            alt_id               integer,

            league_id        integer not null,

            primary key(id),

            foreign key(alt_id) references teams(id),

            foreign key(league_id) references leagues(id)

                        on delete cascade,

            constraint chronological_order

                        check (last_year >= first_year)

);

 

create table players

(

            id                     serial,

            first_name       varchar(30),

            last_name       varchar(30) not null,

            bats                 char(1) check (bats in ('L','R','S')),

            throws             char(1) check (throws in ('L','R')),

            dob                  date,

            primary key(id)

);

 

create table statistics

(

            id                                 serial,

            year                             integer not null,

            g                                  integer,

            player_id                     integer not null,

            team_id                                   integer not null,

            foreign key(player_id) references players(id)

                        on delete cascade,

            foreign key(team_id) references teams(id)

                        on delete cascade,

            primary key(id)

);

 

create table managing_stats

(

            id                                 integer not null,

            w                                  integer,

            l                                   integer,

            primary key(id),

            foreign key(id) references statistics(id)

                        on delete cascade

);

 

create table fielding_stats

(

            id                                 integer not null,

            pos                              char(5),

            po                                integer,

            a                                  integer,

            e                                  integer,

            dp                                integer,

            primary key(id),

            foreign key(id) references statistics(id)

                        on delete cascade

);

 

create table batting_stats

(

            id                                 integer not null,

            ab                                integer,

            r                                   integer,

            h                                  integer,

            doubles                                   integer,

            triples                          integer,

            hr                                 integer,

            rbi                                integer,

            sb                                integer,

            cs                                integer,

            bb                                integer,

            so                                integer,

            sh                                integer,

            sf                                 integer,

            ibb                               integer,

            hbp                              integer,

            primary key(id),

            foreign key(id) references statistics(id)

                        on delete cascade

);

 

create table pitching_stats

(

            id                                 integer not null,

            w                                  integer,

            l                                   integer,

            gs                                integer,

            cg                                integer,

            sh                                integer,

            sv                                integer,

            ip                                 numeric(5,1),

            h                                  integer,

            er                                 integer,

            hr                                 integer,

            bb                                integer,

            so                                integer,

            primary key(id),

            foreign key(id) references statistics(id)

                        on delete cascade

);

 

 

 

 

 

pgsql-sql by date:

Previous
From: "fstelpstra@yahoo.com"
Date:
Subject: Re: Can this query be optimized?
Next
From: "Jeff Sack"
Date:
Subject: Optimizing