Thread: pgsql and large tables

pgsql and large tables

From

"Gurupartap Davis"

Date:

13 November 2001, 15:50:01

I've got a large database, currently about 40 million rows in the biggest table. I'm trying to set it up with PostgreSQL 7.1.3, but I've got some questions...

I initially created the table and with a primary key (5 fields: char(4), varchar(32), date, int, int) and a couple additional indexes, one that just changes the order of the primary key, and one that sorts on the date. Then I imported 40 million rows and tried some queries, but discovered that none of my queries were using indexes, causing them to take forever.

So I read somewhere in the archives thatyou need to VACUUM a table regularly for indexes to work properly. I tried that, but aborted after about 5 hours. I can't use pgsql if I have to take the db down for more than 10-15 minutes a day.

Then I read somewhere else that you should drop your indexes before VACUUMing and re-create them afterwards. I tried that, and VACUUM finished in about 10 minutes. Kewl... but now I've been trying to recreate my primary key for the last 18 hours...not so good.

Should I have dropped all indexes *except* for the primary? or would VACUUM still take forever that way? Should I make an artificial primary key with a serial type to simplify things? Anyone have any hints at all for me?

thanks,

Partap Davis

Practical PostgreSQL over 5,000 copies already!

From

"Command Prompt, Inc."

Date:

13 November 2001, 16:29:44

Hello,

Hey folks, I just got some exciting news. Our book Practical PostgreSQL
has sold over 5,000 copies already and it isn't even out! I believe that
is a testament to how well PostgreSQL is growing.

Command Prompt, would like to thank all of the PostgreSQL community for
help with this book. Especially, Tom Lane who has assisted us a lot with
some of the oddities of PG.

Sincerely,

Joshua Drake

--
--
by way of pgsql-general@commandprompt.com
http://www.postgresql.info/
http://www.commandprompt.com/

Re: pgsql and large tables

From

Francisco Reyes

Date:

15 November 2001, 14:10:41

On Tue, 13 Nov 2001, Gurupartap Davis wrote:

I am somewhat new to PostgreSQL, but didn't see a reply to your answer so
I would tell you what I know about your queries.

> I've got a large database, currently about 40 million rows in the biggest table.
...

> I initially created the table and with a primary key (5 fields: char(4), varchar(32), date, int, int)

Out of curiosity why are you using this as your primary key and not a
serial? Will your tables be normalized?

>and a couple additional indexes, one that just changes the order of the
>primary key, and one that sorts on the date.

From whawt I have read so far your primary key doesn't sound like what you
would want to have as a primary key... but I guess the design of your
table is a totally different issue.

> Then I imported 40 million rows and tried some queries, but discovered
>that none of my queries were using indexes, causing them to take forever.

It is difficult for anyone to reply to generic questions. How about
showing us the query, an explain plan, the index attributes. Any of the
index based on a function?

> So I read somewhere in the archives thatyou need to VACUUM a table regularly
>for indexes to work properly.

Sort of. Let's say they work better. It has to do with what the optimizer
believes is your data distribution to determine when to use the index.

> I tried that, but aborted after about 5 hours.

You have told us nothing about your hardware, your OS, which version of
PostgreSQL (unless I missed it)

> I can't use pgsql if I have to take the db down for more than 10-15 minutes a day.

Then maybe you may not be able to use it right now. I believe that the
current "vacuum analyse" and maybe vacuum too may need to lock the table
while they are run. Hopefully others will explain this better. This will
also change on the soon to be release 7.2.

>Then I read somewhere else that you should drop your indexes before VACUUMing
>and re-create them afterwards.

That makes little sense to me. Also don't recall reading this. What I do
recall is that before a bulk load you want to drop your indexes. My
understanding was that you wanted to have your indexes when you do vacuum
analyze.

>I tried that, and VACUUM finished in about 10 minutes.
>I've been trying to recreate my primary key for the last
>18 hours...not so good.

Again, we need more info about your hardware, OS, versino of pgsql, etc...

> Should I have dropped all indexes *except* for the primary?

My understanding is that you may want to drop your data when doing a big
load, not when doing a vacuum.

>Should I make an artificial primary key with a serial type to simplify things?

I recommend you do this. Not only because it is more efficient, but
because the type of key you selected has "issues" for lack of a better
term. You can of course have a "unique" index so you don't have dups.

I think that when you are dealing with a 40 million table you need to
consider your hardware. You also didn't tell us how big are the rows.
The more info you give us the more others will be able to help.

Re: pgsql and large tables

From

"Andrew G. Hammond"

Date:

15 November 2001, 16:23:58

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2001 November 13 03:49 pm, Gurupartap Davis wrote:

> I initially created the table and with a primary key (5 fields: char(4),
> varchar(32), date, int, int) and a couple additional indexes, one that just
> changes the order of the primary key, and one that sorts on the date.  Then
> I imported 40 million rows and tried some queries, but discovered that none
> of my queries were using indexes, causing them to take forever.

If your primary key is spread over 5 fields, then it's index will also be
spread over those 5 fields.  This means that in order to use the index, your
querries must specify those fields in the WHERE clause.  Order is also
important.  From what it looks like above, your index is sorted on first the
char(4), then varchar(32) ... to the last int.  So if you specify everything
except the first column (char(4)) then your primary key index will be useless
to you.

> So I read somewhere in the archives thatyou need to VACUUM a table
> regularly for indexes to work properly.

As I understand it, VACUUM simply cleans up obsolete versions of the data (an
artifact from the transactions?), but VACUUM ANALYZE generates some
statistics about various useful stuff that allows the optimizer or planner or
something to do it's job more effectively.

>  Should I make an artificial primary key with a serial type to simplify
> things?  Anyone have any hints at all for me?

Integer primary keys (such as the SERIAL type) are probably going to be a
heck of a lot more efficient than a 5 field mixed key.  I ALWAYS start EVERY
table I make with (id SERIAL PRIMARY KEY, ...

I believe that this is sound practice.  You can still add extra constraints
(such as declaring a UNIQUE INDEX over several fields) to ensure data
consistency, but using an integer id as the primary key has always paid off
for me.  As an aside, I extend my nomenclature to use key_foo where foo is
the name of the table being referenced when I use foreign keys.  Makes life a
lot easier in any number of situations.

- --
Andrew G. Hammond     mailto:drew@xyzzy.dhs.org   http://xyzzy.dhs.org/~drew/
56 2A 54 EF 19 C0 3B 43 72 69 5B E3 69 5B A1 1F                  613-389-5481
5CD3 62B0 254B DEB1 86E0  8959 093E F70A B457 84B1
"To blow recursion you must first blow recur" -- me
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjv0IvwACgkQCT73CrRXhLHkWwCeNiCFdaTpB7y2k20DTCVTdzcf
y9IAn3/m6tSNmxB1zI0LFx1cVn17Bfbh
=voiW
-----END PGP SIGNATURE-----

Re: pgsql and large tables

From

"Gurupartap Davis"

Date:

15 November 2001, 18:00:54

err..forgot to cc the list...

--------------------------------------
Thanks for trying to help me out...I've got some more info for you:

 > > I've got a large database, currently about 40 million rows in the
biggest table.
> ...
>
> > I initially created the table and with a primary key (5 fields: char(4),
varchar(32), date, int, int)
>
> Out of curiosity why are you using this as your primary key and not a
> serial? Will your tables be normalized?
>
> >and a couple additional indexes, one that just changes the order of the
> >primary key, and one that sorts on the date.
>
> >From whawt I have read so far your primary key doesn't sound like what
you
> would want to have as a primary key... but I guess the design of your
> table is a totally different issue.

I'm migrating this table from an existing mysql installation...
This is what it looks like right now:
              Table "forecast"
  Attribute   |         Type          | Modifier
--------------+-----------------------+----------
 zhr          | smallint              |
 zday         | smallint              |
 model        | character varying(32) |
 temp         | numeric(6,2)          |
 modelhr      | smallint              | not null
 modelruntime | smallint              | not null
 modelrundate | smallint              |
 stn          | character(4)          | not null
 rh           | numeric(6,2)          |
 wdsp         | numeric(6,2)          |
 wddir        | character varying(2)  |
 dwpt         | numeric(6,2)          |
 lpre         | numeric(6,2)          |
 yearmoda     | date                  | not null

It's a table for weather forecasts, a record is identified uniquely by
(model, stn, yearmoda, modelruntime, modelhr) although I will rarely have a
query that specifies all of those fields.  The reason for the permuted
indexes mentioned above is because mysql will use a prefix of a multi-column
key to narrow down a search.  I guess pgsql doesn't use indexes that way?
(I noticed in the to-do list something like "reenable partial indexes")

The cardinality on these by the way, is approximately: model: 15-30,
stn:1800, yearmoda: ~450 so far..., modelruntime: 4, and
modelhr:10-40...Does this make any difference on what order they should be
listed in the index?

So, I should add a serial id column, I guess, and make that the primary key
(why is this?  I'm wondering, since I will most likely never refer to that
column).  Now I need some useful indexes.  Most of my queries will be for a
complete model run at a particular station, so I'm thinking of an index on
(model, stn, yearmoda, modelruntime) to return all modelhr's at a stn ...
Another common set of queries deals with a *complete* model run (all stn's),
so I would need another index on (model, yearmoda, modelruntime).
Yet another useful query type contains all model runs within a certain date
range, aggregated by modelhr...it looks to me like the above 2 indexes might
be sufficient for that, though....

eg: SELECT avg(temp) from forecast WHERE model='avn' and stn='KDRO' and
yearmoda >= $date1 and yearmoda <= $date2 and modelruntime=0 GROUP BY
modelhr...

...or would I need an index that references modelhr for that one???

Oh, and I guess I still need a UNIQUE index, as well.  Hmm.  Now I'm
wondering again if that serial id column is going to mess with me.
Sometimes I will need to re-import a forecast because something went wrong,
and all the values are bogus...can I do a REPLACE into a table, specifying
all of the columns of a UNIQUE index, without specifying the primary key?
How exactly is having a primary key as serial going to help me here?  (Sorry
if this is a dumb question, I'm kinda new to this)

> >Should I make an artificial primary key with a serial type to simplify
things?
>
> I recommend you do this. Not only because it is more efficient, but
> because the type of key you selected has "issues" for lack of a better
> term. You can of course have a "unique" index so you don't have dups.
>
> I think that when you are dealing with a 40 million table you need to
> consider your hardware. You also didn't tell us how big are the rows.
> The more info you give us the more others will be able to help.


The 40 million rows, by the way, is for about 2 months of data ;-)...we've
got about another year of data to import (and will be growing by about 1.5
million rows a day) , so I want to make sure I've got the table and indexes
set up optimally, first, since it will take about a week to import all the
data.

It is currently on a P3 700MHz machine, running RedHat 7.2 and PostgreSQL
7.1.3, with 256MB of RAM.  We have a dual P3 700 with 512MB that we could
move it to for production(the mysql db is currently running on it)

Re: pgsql and large tables

From

Francisco Reyes

Date:

15 November 2001, 18:51:04

On Thu, 15 Nov 2001, Gurupartap Davis wrote:

> Thanks for trying to help me out...I've got some more info for you:

Not a problem, but remember to include the list. I am but a relatively new
users and you would deprive yourself from the help of others more
experienced if you don't cc the list.

> I'm migrating this table from an existing mysql installation...
> This is what it looks like right now:
>               Table "forecast"
>   Attribute   |         Type          | Modifier
> --------------+-----------------------+----------
>  zhr          | smallint              |
>  zday         | smallint              |
>  model        | character varying(32) |
>  temp         | numeric(6,2)          |
>  modelhr      | smallint              | not null
>  modelruntime | smallint              | not null
>  modelrundate | smallint              |
>  stn          | character(4)          | not null
>  rh           | numeric(6,2)          |
>  wdsp         | numeric(6,2)          |
>  wddir        | character varying(2)  |
>  dwpt         | numeric(6,2)          |
>  lpre         | numeric(6,2)          |
>  yearmoda     | date                  | not null
>
> It's a table for weather forecasts, a record is identified uniquely by
> (model, stn, yearmoda, modelruntime, modelhr) although I will rarely have a
> query that specifies all of those fields.

What are your most heavily looked upon columns and do you have indexes on
them?

>The reason for the permuted indexes mentioned above

Are those 5 columns what make a record unique?
I see it strange that all of the fields that you described as your primary
key you didn't put not null on them. In particular model. What are zhr and
zday?

>is because mysql will use a prefix of a multi-column
>key to narrow down a search.  I guess pgsql doesn't use indexes that way?

Don't know. I usually use Index according to my needs. If I need to search
a lot on 3 columns then I index those 3 columns. I don't see the point on
adding columns to an index if it will rarely be used.

>(I noticed in the to-do list something like "reenable partial indexes")

So far I have not tried to do searches on partial index keys.

> The cardinality on these by the way, is approximately: model: 15-30,
> stn:1800, yearmoda: ~450 so far..., modelruntime: 4, and
> modelhr:10-40...Does this make any difference on what order they should be
> listed in the index?

I think that on a compound index then probably not. If you have something
which is a limiting factor then you may want to have that by itself on an
index. In general, not only on pgsql, anything which would limit your
searches is usually to have that on it's own index. I don't know much
about your data, but as an example if you work a lot by range of dates
then having an index on date would be a good index to have.

> So, I should add a serial id column, I guess, and make that the primary key
That is one approach and I still don't understand enough your data to say
this is actually good.

As a theoretical example let's say that you have a an index like you
suggest below model, stn, yearmoda and modelruntime, these could be on a
model table and then have a serial key on those 4. On the other table
with the rest of the info you only have the key instead of those 4 fields.
This is basically normalization of your data. It's advantage is limiting
how much I/O you need to do. Are you familiar with normalization? Don't
want to bore you with stuff you may know.

>Now I need some useful indexes.  Most of my queries will be for a
>complete model run at a particular station, so I'm thinking of an index on
>(model, stn, yearmoda, modelruntime) to return all modelhr's at a stn ...

When you do your DB design you not only want to look at what makes it easy
to use, but what makes it efficient. This is where normalization usually
helps. Although I rarelly ever normalize formally, I usually have some of
it's concepts used depeding on what I want to achieve.

>Another common set of queries deals with a *complete* model run (all stn's),
>so I would need another index on (model, yearmoda, modelruntime).
> Yet another useful query type contains all model runs within a certain date
> range, aggregated by modelhr...it looks to me like the above 2 indexes might
> be sufficient for that, though....

I advise you try to think if there is anything which can limit your
queries. Any factor in common on all these queries. If such limiting
factor exists then you could have this on a separate, smaller, table.

> ...or would I need an index that references modelhr for that one???

It is difficult for me to try to understand your data, since I don't know
almost anything about the meaning of the fields and how you will access
these columns.

> Sometimes I will need to re-import a forecast because something went wrong,
> and all the values are bogus...can I do a REPLACE into a table, specifying

Postgresql Doesn't have a REPLACE, but what you do is that you "begin" a
transaction, delete all the old data, re-insert the model and then "end"
the transaction.

> How exactly is having a primary key as serial going to help me here?

If you can have some sort of "parent" table with some data which is a
"key" and that you will use it as your starting point. Having a smaller
table with the right indexes can greatly help your queries due to smaller
I/O needing to be done.

> The 40 million rows, by the way, is for about 2 months of data ;-)...we've
> got about another year of data to import (and will be growing by about 1.5
> million rows a day) , so I want to make sure I've got the table and indexes
> set up optimally, first, since it will take about a week to import all the
> data.

When you talk about such sizes you need to look at your hardware and at
the optimizations you have done with PostgreSQL, or any database for that
matter.

> It is currently on a P3 700MHz machine, running RedHat 7.2 and PostgreSQL
> 7.1.3, with 256MB of RAM.  We have a dual P3 700 with 512MB that we could
> move it to for production(the mysql db is currently running on it)

Memory is super cheap nowadays, about $30 for 256MB if memory serves me
right. Increase the memory on these machines. Are you using IDE or SCSI?
How many HDs? Using any type of RAID? Have you increased your buffers on
PostgreSQL? Have you looked at your shared memory settings? Are you using
explain with all your queries to see if they are using your indexes?

You also mentioned issues with downtime. Is this DB going to be used
24x7? You need to do vacuum analyze at least after every big update.

How often will your data be updated? Once data is loaded will it be
changed at all?

Re: pgsql and large tables

From

"Gurupartap Davis"

Date:

15 November 2001, 20:00:27

Sheesh...I think I see what you're saying about normalization.  I've never
had a formal db class, but it looks like you're talking about removing all
the redundancy from the database...and there's a lot in this one(!)

I could have a forecast table like so:
id serial primary key,
model_id references model (id),
yearmoda date,
modelruntime smallint

Then a temperature table:
forecast_id references forecast (id),
stn_id references station (id)
modelhr smallint,
value numeric (6,2)

repeat for relative humidity, dewpoint, etc...

zhr,zday, and modelrundate in the current table are redundant, as they are
derivitive of the date and modelhr....

It looks like it would save a hella lot of disk space...probably over 50%,
seeing as how certain variables (dwpt, rh, lpre) only show up in a fraction
of the records...(ie, most records have NULL for these fields)

Dang...I don't know if my employer will let me make a change this radical,
though.  We would have to redo every single query in about 50 scripts.
Hrmmm.  I'm a programmer, not a DB admin...but my employer is too cheap
(errr, sorry..."thrifty") to hire a real DB admin :-/

>
> > I'm migrating this table from an existing mysql installation...
> > This is what it looks like right now:
> >               Table "forecast"
> >   Attribute   |         Type          | Modifier
> > --------------+-----------------------+----------
> >  zhr          | smallint              |
> >  zday         | smallint              |
> >  model        | character varying(32) |
> >  temp         | numeric(6,2)          |
> >  modelhr      | smallint              | not null
> >  modelruntime | smallint              | not null
> >  modelrundate | smallint              |
> >  stn          | character(4)          | not null
> >  rh           | numeric(6,2)          |
> >  wdsp         | numeric(6,2)          |
> >  wddir        | character varying(2)  |
> >  dwpt         | numeric(6,2)          |
> >  lpre         | numeric(6,2)          |
> >  yearmoda     | date                  | not null
> >
> > It's a table for weather forecasts, a record is identified uniquely by
> > (model, stn, yearmoda, modelruntime, modelhr) although I will rarely
have a
> > query that specifies all of those fields.

We need to run the db 24x7.  Data comes in all day long, one model run at a
time.  A model run is anywhere from 10-40 hours interpolated over about 1800
weather stations.  All we do is add data (and, occasionally, re-insert data
that was found corrupt)...no deleting, though.  Would you recommend doing a
vacuum analyze after every model run, or would once a day be sufficient?

> You also mentioned issues with downtime. Is this DB going to be used
> 24x7? You need to do vacuum analyze at least after every big update.
>
> How often will your data be updated? Once data is loaded will it be
> changed at all?
>

Re: pgsql and large tables

From

Doug McNaught

Date:

15 November 2001, 20:50:42

"Gurupartap Davis" <partap@yahoo.com> writes:

> Sheesh...I think I see what you're saying about normalization.  I've never
> had a formal db class, but it looks like you're talking about removing all
> the redundancy from the database...and there's a lot in this one(!)

[...]

> Dang...I don't know if my employer will let me make a change this radical,
> though.  We would have to redo every single query in about 50 scripts.

You might be able to define views and rules in such a way that old
queries still work, or work with minor changes.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

Re: pgsql and large tables

From

Francisco Reyes

Date:

16 November 2001, 10:00:34

On Thu, 15 Nov 2001, Gurupartap Davis wrote:

> Sheesh...I think I see what you're saying about normalization.  I've never
> had a formal db class, but it looks like you're talking about removing all
> the redundancy from the database...and there's a lot in this one(!)

The basics of normalization are not all that complex. Most of the time the
biggest issue is truly understanding your data and how you will need to
access it.

> I could have a forecast table like so:

I could try to help you, but I would need the original tables with every
field explained. Then your suggested new design.

> zhr,zday, and modelrundate in the current table are redundant, as they are
> derivitive of the date and modelhr....

If they can be derived AND you will never need to search on them, then
don't store them at all.

> It looks like it would save a hella lot of disk space.

It is not only the space saving, but your queries will run faster.

> seeing as how certain variables (dwpt, rh, lpre) only show up in a fraction
> of the records...(ie, most records have NULL for these fields)

Those could possibly be on a separate table then.

> Dang...I don't know if my employer will let me make a change this radical,
> though.

You could do something in between an optimal design and something which
would be relatively easy to do. Just explain to your employer that these
changes can reduce space consumption and speed up your queries. The
alternative is getting faster hardware and more memory... and even then
queries may not be as fast as they could be just by re-designing your
database.

> We would have to redo every single query in about 50 scripts.

Again talk to your employer. This change will take place once and your
queries and the savings on time and space will be a long term process.

> Hrmmm.  I'm a programmer, not a DB admin...but my employer is too cheap
> (errr, sorry..."thrifty") to hire a real DB admin :-/

If you are not paid by the hour then it may be just an issue of you
getting some help over the net and doing it yourself.
You may want to look into general DB forums, newsgroups, etc... where
people with more experience on design can help you.

Doing a non formal design is fairly simple, it is just a matter of knowing
your data. If you take the time to do what I suggested and explain your
existing tables field by fiend, taking in consideration we may not know
the terms, then people on this forum may be able to help you.

You don't need to go into too many details. Something like
zhr hour of ???. Usually blank/populated, etc..
zday day of ???. Usually blank/populated, etc..
model a description/name??
temp temperature at time/date???

You could also through in 20 records so we see how things are distributed.
In other words which values are often repeated and may be better off on a
separate table. Also which fields are usually blank so perhaps you may
consider putting them on a separate table for the space savings.

> We need to run the db 24x7.  Data comes in all day long,

How about letting the data go into ASCII files and then import them at set
intervals. In particular you want to have time to do your vacuum analyze
at least once per day.

>one model run at a time.
>A model run is anywhere from 10-40 hours
>interpolated over about 1800 weather stations.

Is the model run on the database?

>All we do is add data (and, occasionally, re-insert data
> that was found corrupt)...no deleting, though.  Would you recommend doing a
> vacuum analyze after every model run, or would once a day be sufficient?

Actually in your case I would recommend to do a vacuum analyze after each
large insert, if possible. If not then once a day.

Does the data comes in batches or as a continues stream?

What are the chances of getting more memory as I mentioned on the previous
mail?

Re: pgsql and large tables

From

"Aasmund Midttun Godal"

Date:

16 November 2001, 13:00:43

The way I would do this is to spend some time designing the data model which you find optimal, when this is done you
createit in a database, and test it for a little while. Then you create views which look like your old database. This
waythe interface with your application will not be broken. Finally you test the new interface (make sure it actually
worksas it is supposed to) import the old data, and continue running it. Then you make your own little plan, as to how,
whichand when you are going to update the other scripts. This will give you many advantages: 

1. You will learn a lot more about postgres (views, rules etc.)
2. You will be able to get many of the advantages much quicker
3. You will get all the advantages over time and you can focus on the most important one's first.

Regards,

Aasmund.

On Fri, 16 Nov 2001 10:01:16 -0500 (EST), Francisco Reyes <lists@natserv.com> wrote:
> On Thu, 15 Nov 2001, Gurupartap Davis wrote:
>
>
> The basics of normalization are not all that complex. Most of the time the
> biggest issue is truly understanding your data and how you will need to
> access it.
>
>
> I could try to help you, but I would need the original tables with every
> field explained. Then your suggested new design.
>
>
> If they can be derived AND you will never need to search on them, then
> don't store them at all.
>
>
> It is not only the space saving, but your queries will run faster.
>
>
> Those could possibly be on a separate table then.
>
>
> You could do something in between an optimal design and something which
> would be relatively easy to do. Just explain to your employer that these
> changes can reduce space consumption and speed up your queries. The
> alternative is getting faster hardware and more memory... and even then
> queries may not be as fast as they could be just by re-designing your
> database.
>
>
> Again talk to your employer. This change will take place once and your
> queries and the savings on time and space will be a long term process.
>
>
> If you are not paid by the hour then it may be just an issue of you
> getting some help over the net and doing it yourself.
> You may want to look into general DB forums, newsgroups, etc... where
> people with more experience on design can help you.
>
> Doing a non formal design is fairly simple, it is just a matter of knowing
> your data. If you take the time to do what I suggested and explain your
> existing tables field by fiend, taking in consideration we may not know
> the terms, then people on this forum may be able to help you.
>
> You don't need to go into too many details. Something like
> zhr hour of ???. Usually blank/populated, etc..
> zday day of ???. Usually blank/populated, etc..
> model a description/name??
> temp temperature at time/date???
>
> You could also through in 20 records so we see how things are distributed.
> In other words which values are often repeated and may be better off on a
> separate table. Also which fields are usually blank so perhaps you may
> consider putting them on a separate table for the space savings.
>
>
> How about letting the data go into ASCII files and then import them at set
> intervals. In particular you want to have time to do your vacuum analyze
> at least once per day.
>
>
> Is the model run on the database?
>
>
>
> Actually in your case I would recommend to do a vacuum analyze after each
> large insert, if possible. If not then once a day.
>
> Does the data comes in batches or as a continues stream?
>
> What are the chances of getting more memory as I mentioned on the previous
> mail?
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

Aasmund Midttun Godal

aasmund@godal.com - http://www.godal.com/
+47 40 45 20 46