On Wed, Feb 18, 2009 at 12:39:50PM -0800, Mike Christensen wrote:
> I'm doing some perf testing and need huge amounts of data.  So I have a
> program that is adding data to a few tables ranging from 500,000 to 15M
> rows.
I assume you're repeatedly inserting data and then deleting it?  If so,
PG won't get much of a chance to clean up after you.  Because of the way
it handles transactions all of the old data will be left in the table
until the table is vacuumed and the appropriate tuples/rows are marked as
deleted.
> The program is just a simply C# program that blasts data into the
> DB,
Just out of interest, do you know about the COPY command? things will go
much faster than a large number of INSERT statements.
> but after about 3M rows or so I get an errror:
>
> ERROR:  could not extend relation 1663/41130/41177: No space left on device
> HINT:  Check free disk space.
>
> If I do a full VACUUM on the table being inserted into, the error goes
> away but it comes back very quickly.  Obviously, I wouldn't want this
> happening in a production environment.
VACUUM FULL's should very rarely be done, routine maintenance would be
to do plain VACUUMs or let the auto-vacuum daemon handle things.  This
will mark the space as available and subsequent operations will reuse
the space.
> What's the recommended setup in a production environment for tables
> where tons of data will be inserted?
If you're repeatedly inserting and deleting data then you'll probably want
to intersperse some VACUUMs in there.
> It seems to me there's some sort of "max table size" before you have to
> allocate more space on the disk, however I can't seem to find where
> these settings are and how to allow millions of rows to be inserted into
> a table without having to vacuum every few million rows..
There's no maximum table size you get control over; 15million rows on
its own isn't considered particularly big but you need to start being
careful at that stage.  If you've got a particularly "wide" table (i.e.
lots of attributes/columns) this is obviously going to take more space
and you may consider normalizing the data out into separate tables.
Once your row count gets to 10 or 100 times what your dealing with you'd
probably need to start thinking about partitioning the tables and how to
do that would depend on your usage patterns.
--
  Sam  http://samason.me.uk/