Thread: Using PgSQL in high volume and throughput problem
Hi all. This might be OT, especiallz since I do not have the actual data for volume and throughput, but can give only "bystander" impression. I might get on-board a project that deals with a high volume and high throughput data crunching task. It will involve data pattern recognition and similar stuff. The project is meant to run in Java and use Berkeley DB, probably Apache Lucene. Now, this is the juicy part. Due to high volume and high throughput, data is actually stored in ordinary files, while Berkeley DB is used only for indexes to that data! Like I've said, I don't have the figures, but I was told that that was the only way to make it work, everything else failed to perform. My question, in your oppinion, can PgSQL perform in such a scenario? Using JDBC, of course. I do realize that PgSQL gives a lot of good stuff, but here the speed is of essence. The previous project has stripped Java code to the bare bones, regarding data structures, just to make it faster. Nix.
Nikola, > Like I've said, I don't have the figures, but I was told that that was > the only way to make it work, everything else failed to perform. My > question, in your oppinion, can PgSQL perform in such a scenario? Using > JDBC, of course. Your problem statement is so vague, nobody can answer that. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On Thu, 2005-05-12 at 00:34, Nikola Milutinovic wrote: > Hi all. > > This might be OT, especiallz since I do not have the actual data for > volume and throughput, but can give only "bystander" impression. > > I might get on-board a project that deals with a high volume and high > throughput data crunching task. It will involve data pattern recognition > and similar stuff. The project is meant to run in Java and use Berkeley > DB, probably Apache Lucene. > > Now, this is the juicy part. Due to high volume and high throughput, > data is actually stored in ordinary files, while Berkeley DB is used > only for indexes to that data! > > Like I've said, I don't have the figures, but I was told that that was > the only way to make it work, everything else failed to perform. My > question, in your oppinion, can PgSQL perform in such a scenario? Using > JDBC, of course. > > I do realize that PgSQL gives a lot of good stuff, but here the speed is > of essence. The previous project has stripped Java code to the bare > bones, regarding data structures, just to make it faster. This sounds like a batch processing job, and those are often handled much more quickly by hand written perl / java / php / yourfavlanghere. Is there a need for concurrent updates / selects on these data? If not, then batch processing and tossing the results into a database may well be the best way to do it.