Thread: Re: [GENERAL] Large databases, performance

Re: [GENERAL] Large databases, performance

From
"Shridhar Daithankar"
Date:
On 3 Oct 2002 at 8:54, Charles H. Woloszynski wrote:

> Can you comment on the tools you are using to do the insertions (Perl,
> Java?) and the distribution of data (all random, all static), and the
> transaction scope (all inserts in one transaction, each insert as a
> single transaction, some group of inserts as a transaction).

Most proably it's all inserts in one transaction spread almost uniformly over
around 15-20 tables. Of course there will be bunch of transactions..

> I'd be curious what happens when you submit more queries than you have
> processors (you had four concurrent queries and four CPUs), if you care
> to run any additional tests.  Also, I'd report the query time in
> absolute (like you did) and also in 'Time/number of concurrent queries".
>  This will give you a sense of how the system is scaling as the workload
> increases.  Personally I am more concerned about this aspect than the
> load time, since I am going to guess that this is where all the time is
> spent.

I don't think so. Because we plan to put enough shared buffers that would
almost contain the indexes in RAM if not data. Besides number of tuples
expected per query are not many. So more concurrent queries are not going to
hog anything other than CPU power at most.

Our major concern remains load time as data is generated in real time and is
expecetd in database with in specified time period. We need indexes for query
and inserting into indexed table is on hell of a job. We did attempt inserting
8GB of data in indexed table. It took almost 20 hours at 1K tuples per second
on average.. Though impressive it's not acceptable for that load..
>
> Was the original posting on GENERAL or HACKERS.  Is this moving the
> PERFORMANCE for follow-up?  I'd like to follow this discussion and want
> to know if I should join another group?

Shall I subscribe to performance?  What's the exat list name? Benchmarks? I
don't see anything as performance mailing list on  this page..
http://developer.postgresql.org/mailsub.php?devlp

> P.S.  Anyone want to comment on their expectation for 'commercial'
> databases handling this load?  I know that we cannot speak about
> specific performance metrics on some products (licensing restrictions)
> but I'd be curious if folks have seen some of the databases out there
> handle these dataset sizes and respond resonably.

Well, if something handles such kind of data with single machine and costs
under USD20K for entire setup, I would be willing to recommend that to client..

BTW we are trying same test on HP-UX. I hope we get some better figures on 64
bit machines..

Bye
 Shridhar

--
Clarke's Conclusion:    Never let your sense of morals interfere with doing the
right thing.


Re: [GENERAL] Large databases, performance

From
Justin Clift
Date:
Shridhar Daithankar wrote:
<snip>
> > Was the original posting on GENERAL or HACKERS.  Is this moving the
> > PERFORMANCE for follow-up?  I'd like to follow this discussion and want
> > to know if I should join another group?
>
> Shall I subscribe to performance?  What's the exat list name? Benchmarks? I
> don't see anything as performance mailing list on  this page..
> http://developer.postgresql.org/mailsub.php?devlp

It's a fairly new mailing list.  :)

pgsql-performance@postgresql.org

Easiest way to subscribe is by emailing majordomo@postgresql.org with:

subscribe pgsql-performance

as the message body.

:-)

Regards and best wishes,

Justin Clift

<snip>
> Bye
>  Shridhar
>
> --
> Clarke's Conclusion:    Never let your sense of morals interfere with doing the
> right thing.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
   - Indira Gandhi

Re: [GENERAL] Large databases, performance

From
Curt Sampson
Date:
On Thu, 3 Oct 2002, Shridhar Daithankar wrote:

> Our major concern remains load time as data is generated in real time and is
> expecetd in database with in specified time period.

If your time period is long enough, you can do what I do, which is
to use partial indexes so that the portion of the data being loaded
is not indexed. That will speed your loads quite a lot. Aftewards
you can either generate another partial index for the range you
loaded, or generate a new index over both old and new data, and
then drop the old index.

The one trick is that the optimizer is not very smart about combining
multiple indexes, so you often need to split your queries across
the two "partitions" of the table that have separate indexes.

> Shall I subscribe to performance?

Yes, you really ought to. The list is pgsql-performance@postgresql.org.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC