Re: pgsql and large tables - Mailing list pgsql-general

From Gurupartap Davis
Subject Re: pgsql and large tables
Date
Msg-id 014001c16e28$d8451dc0$0f00a8c0@marlows
Whole thread Raw
In response to Re: pgsql and large tables  (Francisco Reyes <lists@natserv.com>)
List pgsql-general
err..forgot to cc the list...

--------------------------------------
Thanks for trying to help me out...I've got some more info for you:

 > > I've got a large database, currently about 40 million rows in the
biggest table.
> ...
>
> > I initially created the table and with a primary key (5 fields: char(4),
varchar(32), date, int, int)
>
> Out of curiosity why are you using this as your primary key and not a
> serial? Will your tables be normalized?
>
> >and a couple additional indexes, one that just changes the order of the
> >primary key, and one that sorts on the date.
>
> >From whawt I have read so far your primary key doesn't sound like what
you
> would want to have as a primary key... but I guess the design of your
> table is a totally different issue.

I'm migrating this table from an existing mysql installation...
This is what it looks like right now:
              Table "forecast"
  Attribute   |         Type          | Modifier
--------------+-----------------------+----------
 zhr          | smallint              |
 zday         | smallint              |
 model        | character varying(32) |
 temp         | numeric(6,2)          |
 modelhr      | smallint              | not null
 modelruntime | smallint              | not null
 modelrundate | smallint              |
 stn          | character(4)          | not null
 rh           | numeric(6,2)          |
 wdsp         | numeric(6,2)          |
 wddir        | character varying(2)  |
 dwpt         | numeric(6,2)          |
 lpre         | numeric(6,2)          |
 yearmoda     | date                  | not null

It's a table for weather forecasts, a record is identified uniquely by
(model, stn, yearmoda, modelruntime, modelhr) although I will rarely have a
query that specifies all of those fields.  The reason for the permuted
indexes mentioned above is because mysql will use a prefix of a multi-column
key to narrow down a search.  I guess pgsql doesn't use indexes that way?
(I noticed in the to-do list something like "reenable partial indexes")

The cardinality on these by the way, is approximately: model: 15-30,
stn:1800, yearmoda: ~450 so far..., modelruntime: 4, and
modelhr:10-40...Does this make any difference on what order they should be
listed in the index?

So, I should add a serial id column, I guess, and make that the primary key
(why is this?  I'm wondering, since I will most likely never refer to that
column).  Now I need some useful indexes.  Most of my queries will be for a
complete model run at a particular station, so I'm thinking of an index on
(model, stn, yearmoda, modelruntime) to return all modelhr's at a stn ...
Another common set of queries deals with a *complete* model run (all stn's),
so I would need another index on (model, yearmoda, modelruntime).
Yet another useful query type contains all model runs within a certain date
range, aggregated by modelhr...it looks to me like the above 2 indexes might
be sufficient for that, though....

eg: SELECT avg(temp) from forecast WHERE model='avn' and stn='KDRO' and
yearmoda >= $date1 and yearmoda <= $date2 and modelruntime=0 GROUP BY
modelhr...

...or would I need an index that references modelhr for that one???

Oh, and I guess I still need a UNIQUE index, as well.  Hmm.  Now I'm
wondering again if that serial id column is going to mess with me.
Sometimes I will need to re-import a forecast because something went wrong,
and all the values are bogus...can I do a REPLACE into a table, specifying
all of the columns of a UNIQUE index, without specifying the primary key?
How exactly is having a primary key as serial going to help me here?  (Sorry
if this is a dumb question, I'm kinda new to this)

> >Should I make an artificial primary key with a serial type to simplify
things?
>
> I recommend you do this. Not only because it is more efficient, but
> because the type of key you selected has "issues" for lack of a better
> term. You can of course have a "unique" index so you don't have dups.
>
> I think that when you are dealing with a 40 million table you need to
> consider your hardware. You also didn't tell us how big are the rows.
> The more info you give us the more others will be able to help.


The 40 million rows, by the way, is for about 2 months of data ;-)...we've
got about another year of data to import (and will be growing by about 1.5
million rows a day) , so I want to make sure I've got the table and indexes
set up optimally, first, since it will take about a week to import all the
data.

It is currently on a P3 700MHz machine, running RedHat 7.2 and PostgreSQL
7.1.3, with 256MB of RAM.  We have a dual P3 700 with 512MB that we could
move it to for production(the mysql db is currently running on it)




pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Create Rule
Next
From: "Culley Harrelson"
Date:
Subject: [PHP] Re: psql with PHP question