Home > mailing lists

Efficient processing of staging data - Mailing list pgsql-general

From	Joris Dobbelsteen
Subject	Efficient processing of staging data
Date	September 6, 2008 13:01:26
Msg-id	48C2A94C.6020108@familiedobbelsteen.nl Whole thread Raw
Responses	Re: Efficient processing of staging data
List	pgsql-general

Tree view

Dear,

I'm looking for an efficient way to process data that I have stored in a
staging table. This is syslog output, so it consists of text, where I
need regexes to filter it out. The data will be split into several/a lot
of tables.

I currently have a set of queries that
* read from staging and insert into destination table.
* delete from staging (using destination table).
This requires 2 scans over the tables and the index operations (which
pay of here). I do the processing incrementally, in order to keep run
time under control. But for only 4 filters this will take around 6 to 30
minutes.

My later idea was to do a (incremental) table scan on the staging table
and have a function do the processing of the row. The function will then
either:
* decide to do nothing
* decide to insert the (transformed) row into the destination table and
delete it from the staging table.

An obvious extension would be to put this processing in the INSERT
trigger of the staging table, saving the I/O's requires for an insert
and delete from the staging table. I like to do this afterwards for the
moment.

What are the recommended methods and helpful implementation hints to get
it working optimally? I.e. would my method work or are there any better
solutions possible?

How can I write the filtering functions in such a manner that I can
later transform the solution in a trigger based one?

Regards,

- Joris Dobbelsteen

pgsql-general by date:

From: "Aaron Burnett"
Date: 06 September 2008, 09:43:05
Subject: Re: large inserts and fsync

From: Andriy Bakay
Date: 06 September 2008, 13:33:14
Subject: PostgreSQL SSL problem

Efficient processing of staging data - Mailing list pgsql-general

Previous

Next