Re: High inserting by syslog - Mailing list pgsql-general

From Richard Huxton
Subject Re: High inserting by syslog
Date
Msg-id 486CF97A.8060800@archonet.com
Whole thread Raw
In response to High inserting by syslog  ("Valter Douglas Lisbôa Jr." <douglas@trenix.com.br>)
Responses Re: High inserting by syslog
List pgsql-general
Valter Douglas Lisbôa Jr. wrote:
> Hello all, I have a perl script thats load a entire day squid log to a
> postgres table. I run it at midnight by cronjob and turns off the indexes
> before do it (turning it on after). The script works fine, but I want to
> change this to a diferent approach.
>
> I'd like to insert on the fly the log lines, so long it be generated to have
> the data on-line. But the table has some indexes and the load of lines is
> about 300.000/day, so the average inserting is 3,48/sec. I think this could
> overload the database server (i did not test yet), so if I want to create a
> no indexed table to receive the on-line inserting and do a job moving all
> lines to the main indexed table at midnight.

There are two things to bear in mind.

1. What you need to worry about is the peak rate of inserts, not the
average. Even at 30 rows/sec that's not too bad.
2. What will your system do if the database is taken offline for a
period? How will it catch up?

The limiting factor will be the speed of your disks. Assuming a single
disk (no battery-backed raid cache) you'll be limited to your RPM (e.g.
10,000 commits / minute). That will fall off rapidly if you only have
one disk and it's busy doing other reads/writes. But, if you batch many
log-lines together you need many less commits.

So - to address both points above, I'd use a script with a flexible
batch-size.
1. Estimate how many log-lines need to be saved to the database.
2. Batch together a suitable number of lines (1-1000) and commit them to
the database.
3. Sleep 1-10 secs
4. Back to #1, disconnect and reconnect every once in a while.

If the database is unavailable for any reason, this script will
automatically feed rows faster when it returns.

> My question is, Does exists a better solution, or this tatic is a good way to
> do this?

You might want to partition the table monthly. That will make it easier
to manage a few years from now.
http://www.postgresql.org/docs/current/static/ddl-partitioning.html

Also, consider increasing checkpoint_segments if you find the system
gets backed-up.
Perhaps consider setting synchronous_commit to off (but only for the
connection saving the log-lines to the database)
http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html

--
   Richard Huxton
   Archonet Ltd

pgsql-general by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: High inserting by syslog
Next
From: "Valter Douglas Lisbôa Jr."
Date:
Subject: Re: High inserting by syslog