Thread: Using PgSQL in high volume and throughput problem

Using PgSQL in high volume and throughput problem

From
Nikola Milutinovic
Date:
Hi all.

This might be OT, especiallz since I do not have the actual data for
volume and throughput, but can give only "bystander" impression.

I might get on-board a project that deals with a high volume and high
throughput data crunching task. It will involve data pattern recognition
and similar stuff. The project is meant to run in Java and use Berkeley
DB, probably Apache Lucene.

Now, this is the juicy part. Due to high volume and high throughput,
data is actually stored in ordinary files, while Berkeley DB is used
only for indexes to that data!

Like I've said, I don't have the figures, but I was told that that was
the only way to make it work, everything else failed to perform. My
question, in your oppinion, can PgSQL perform in such a scenario? Using
JDBC, of course.

I do realize that PgSQL gives a lot of good stuff, but here the speed is
of essence. The previous project has stripped Java code to the bare
bones, regarding data structures, just to make it faster.

Nix.

Re: Using PgSQL in high volume and throughput problem

From
Josh Berkus
Date:
Nikola,

> Like I've said, I don't have the figures, but I was told that that was
> the only way to make it work, everything else failed to perform. My
> question, in your oppinion, can PgSQL perform in such a scenario? Using
> JDBC, of course.

Your problem statement is so vague, nobody can answer that.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: [JDBC] Using PgSQL in high volume and throughput problem

From
Scott Marlowe
Date:
On Thu, 2005-05-12 at 00:34, Nikola Milutinovic wrote:
> Hi all.
>
> This might be OT, especiallz since I do not have the actual data for
> volume and throughput, but can give only "bystander" impression.
>
> I might get on-board a project that deals with a high volume and high
> throughput data crunching task. It will involve data pattern recognition
> and similar stuff. The project is meant to run in Java and use Berkeley
> DB, probably Apache Lucene.
>
> Now, this is the juicy part. Due to high volume and high throughput,
> data is actually stored in ordinary files, while Berkeley DB is used
> only for indexes to that data!
>
> Like I've said, I don't have the figures, but I was told that that was
> the only way to make it work, everything else failed to perform. My
> question, in your oppinion, can PgSQL perform in such a scenario? Using
> JDBC, of course.
>
> I do realize that PgSQL gives a lot of good stuff, but here the speed is
> of essence. The previous project has stripped Java code to the bare
> bones, regarding data structures, just to make it faster.

This sounds like a batch processing job, and those are often handled
much more quickly by hand written perl / java / php / yourfavlanghere.

Is there a need for concurrent updates / selects on these data?  If not,
then batch processing and tossing the results into a database may well
be the best way to do it.