Thread: optimizing advice

optimizing advice

From
Rüdiger Sörensen
Date:
dear all,

I am building a database that will be really huge and grow rapidly. It
holds data from satellite observations. Data is imported via a java
application. The import is organized via files, that are parsed by the
application; each file hods the data of one orbit of the satellite.
One of the tables will grow by about 40,000 rows per orbit, there are
roughly 13 orbits a day. The import of one day (13 orbits) into the
database takes 10 minutes at the moment. I will have to import data back
to the year 2000 or even older.
I think that there will be a performance issue when the table under
question grows, so I partitioned it using a timestamp column and one
child table per quarter. Unfortunately, the import of 13 orbits now
takes 1 hour instead of 10 minutes as before.  I can live with that, if
the import time will not grow sigificantly as the table grows further.



anybody with comments/advice?


tia,
Ruediger.



Attachment

Re: optimizing advice

From
Steve Atkins
Date:
On Dec 1, 2009, at 1:34 AM, Rüdiger Sörensen wrote:

> dear all,
>
> I am building a database that will be really huge and grow rapidly. It holds data from satellite observations. Data
isimported via a java application. The import is organized via files, that are parsed by the application; each file
hodsthe data of one orbit of the satellite. 
> One of the tables will grow by about 40,000 rows per orbit, there are roughly 13 orbits a day. The import of one day
(13orbits) into the database takes 10 minutes at the moment. I will have to import data back to the year 2000 or even
older.
> I think that there will be a performance issue when the table under question grows, so I partitioned it using a
timestampcolumn and one child table per quarter. Unfortunately, the import of 13 orbits now takes 1 hour instead of 10
minutesas before.  I can live with that, if the import time will not grow sigificantly as the table grows further. 
>
>
>
> anybody with comments/advice?

Make sure you're running the latest release of postgresql.

Import directly into the appropriate child table, rather than relying on redirecting inserts into the parent table.

Do the import using copy protocol, not insert. I don't know whether, or how well, java supports that but it's something
youreally want to be using. It's quite a lot faster than multiple inserts - especially when there's any network latency
betweenyou and the database - and somewhat faster than inserts with multiple sets of values. 

For the bulk imports of old data, import the data first, then build the indexes and add constraints later. Disable
autovacuumduring the initial bulk import. Also, consider turning fsync off for those imports (less of a win on modern
versions,but every little helps). 

Take a look at http://www.postgresql.org/docs/current/interactive/populate.html

Do all the usual tuning stuff - http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server ,
http://wiki.postgresql.org/wiki/Performance_Optimization

Cheers,
  Steve


Re: optimizing advice

From
Vick Khera
Date:
On Tue, Dec 1, 2009 at 2:58 PM, Steve Atkins <steve@blighty.com> wrote:
> For the bulk imports of old data, import the data first, then build the indexes and add constraints later. Disable
autovacuumduring the initial bulk import. Also, consider turning fsync off for those imports (less of a win on modern
versions,but every little helps). 
>

for 8.3 (possibly earlier, I don't know) and up, you can

 SET LOCAL synchronous_commit TO OFF

in your transactions and get the bulk of the benefit of turning off
fsync without having to mess with server level settings.

You can also use

 SET SESSION synchronous_commit TO OFF

for your whole session to disable the fsync on commit (or implicit
commit per statement outside of transaction).

The beauty of this over fsync off is that if your DB crashes, you lose
things at transaction boundaries, not randomly, so it is easier to
re-do your work.

Re: optimizing advice

From
Edgardo Portal
Date:
On 2009-12-01, Rüdiger Sörensen <r.soerensen@mpic.de> wrote:
> dear all,
>
> I am building a database that will be really huge and grow rapidly. It
> holds data from satellite observations. Data is imported via a java
> application. The import is organized via files, that are parsed by the
> application; each file hods the data of one orbit of the satellite.
> One of the tables will grow by about 40,000 rows per orbit, there are
> roughly 13 orbits a day. The import of one day (13 orbits) into the
> database takes 10 minutes at the moment. I will have to import data back
> to the year 2000 or even older.
> I think that there will be a performance issue when the table under
> question grows, so I partitioned it using a timestamp column and one
> child table per quarter. Unfortunately, the import of 13 orbits now
> takes 1 hour instead of 10 minutes as before.  I can live with that, if
> the import time will not grow sigificantly as the table grows further.
>
> anybody with comments/advice?
>
> tia,
> Ruediger.

Re the apparent performance penalty for your imports...

If you implemented partitions with ON INSERT rules, maybe you could
"turn on & off" inserts into particular partitions as necessary, i.e. create
a RULE while importing a particular range of timestamps, then remove that
rule when you won't be inserting more data for that partition?

Re: optimizing advice

From
Scott Marlowe
Date:
2009/12/1 Rüdiger Sörensen <r.soerensen@mpic.de>:
> dear all,
>
> I am building a database that will be really huge and grow rapidly. It holds
> data from satellite observations. Data is imported via a java application.
> The import is organized via files, that are parsed by the application; each
> file hods the data of one orbit of the satellite.
> One of the tables will grow by about 40,000 rows per orbit, there are
> roughly 13 orbits a day. The import of one day (13 orbits) into the database
> takes 10 minutes at the moment. I will have to import data back to the year
> 2000 or even older.
> I think that there will be a performance issue when the table under question
> grows, so I partitioned it using a timestamp column and one child table per
> quarter. Unfortunately, the import of 13 orbits now takes 1 hour instead of
> 10 minutes as before.  I can live with that, if the import time will not
> grow sigificantly as the table grows further.

I'm gonna guess you're using rules instead of triggers for
partitioning?  Switching to triggers is a big help if you've got a
large amount of data to import / store.  If you need some help on
writing the triggers shout back, I had to do this to our stats db this
summer and it's been much faster with triggers.

Re: optimizing advice

From
"Bret"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Scott Marlowe
> Sent: Tuesday, December 01, 2009 2:10 PM
> To: r.soerensen@mpic.de
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] optimizing advice
>
> 2009/12/1 R�diger S�rensen <r.soerensen@mpic.de>:
> > dear all,
> >
> > I am building a database that will be really huge and grow
> rapidly. It
> > holds data from satellite observations. Data is imported
> via a java application.
> > The import is organized via files, that are parsed by the
> application;
> > each file hods the data of one orbit of the satellite.
> > One of the tables will grow by about 40,000 rows per orbit,
> there are
> > roughly 13 orbits a day. The import of one day (13 orbits) into the
> > database takes 10 minutes at the moment. I will have to import data
> > back to the year 2000 or even older.
> > I think that there will be a performance issue when the table under
> > question grows, so I partitioned it using a timestamp
> column and one
> > child table per quarter. Unfortunately, the import of 13 orbits now
> > takes 1 hour instead of 10 minutes as before. �I can live
> with that,
> > if the import time will not grow sigificantly as the table
> grows further.
>
> I'm gonna guess you're using rules instead of triggers for
> partitioning?  Switching to triggers is a big help if you've
> got a large amount of data to import / store.  If you need
> some help on writing the triggers shout back, I had to do
> this to our stats db this summer and it's been much faster
> with triggers.
>
> --
> Sent via pgsql-general mailing list
> (pgsql-general@postgresql.org) To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

189,800,000 records per year..
Hope they are short records.
Not knowing what the report target is, perhaps breaking orbits
into separate servers (or at least db's) by month or year, then
querying to build your research data on another server..

Steve..how does this compare to the stats db??