Thread: Re: [GENERAL] Performance while loading data and indexing

Re: [GENERAL] Performance while loading data and indexing

From
Justin Clift
Date:
Shridhar Daithankar wrote:
<snip>
> My friend argues for ext2 to eliminate journalling overhead but I favour
> reiserfs personally having used it in pgbench with 10M rows on paltry 20GB IDE
> disk for 25 tps..

If it's any help, the setup I mentioned before with differnt disks for
the data and the WAL files was getting an average of about 72 tps with
200 concurrent users on pgbench.  Haven't tuned it in a hard core way at
all, and it only has 256MB DDR RAM in it at the moment (single CPU
AthonXP 1600).  These are figures made during the 2.5k+ test runs of
pgbench done when developing pg_autotune recently.

As a curiosity point, how predictable are the queries you're going to be
running on your database?  They sound very simple and very predicatable.

The pg_autotune tool might be your friend here.  It can deal with
arbitrary SQL instead of using the pg_bench stuff of Tatsuos, and it can
also deal with an already loaded database.  You'd just have to tweak the
names of the tables that it vacuums and the names of the indexes that it
reindexes between each run, to get some idea of your overall server
performance at different load points.

Probably worth taking a good look at if you're not afraid of editing
variables in C code.  :)

> We will be attempting raiserfs and/or XFS if required. I know how much speed
> difference exists between resiserfs and ext2. Would not be surprised if
> everythng just starts screaming in one go..

We'd all probably be interested to hear this.  Added the PostgreSQL
"Performance" mailing list to this thread too, Just In Case. (wow that's
a lot of cross posting now).

Regards and best wishes,

Justin Clift

> Bye
>  Shridhar
>
> --
> Cropp's Law:    The amount of work done varies inversly with the time spent in the
> office.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
   - Indira Gandhi

Re: [GENERAL] Performance while loading data and indexing

From
"Shridhar Daithankar"
Date:
On 27 Sep 2002 at 1:12, Justin Clift wrote:

> Shridhar Daithankar wrote:
> As a curiosity point, how predictable are the queries you're going to be
> running on your database?  They sound very simple and very predicatable.

Mostly predictable selects. Not a domain expert on telecom so not very sure.
But in my guess prepare statement in 7.3 should come pretty handy. i.e. by the
time we finish evaluation and test deployment, 7.3 will be out in next couple
of months to say so. So I would recommend doing it 7.3 way only..
>
> The pg_autotune tool might be your friend here.  It can deal with
> arbitrary SQL instead of using the pg_bench stuff of Tatsuos, and it can
> also deal with an already loaded database.  You'd just have to tweak the
> names of the tables that it vacuums and the names of the indexes that it
> reindexes between each run, to get some idea of your overall server
> performance at different load points.
>
> Probably worth taking a good look at if you're not afraid of editing
> variables in C code.  :)

Gladly. We started with altering pgbench here for testing and rapidly settled
to perl generated random queries. Once postgresql wins the evaluation match and
things come to implementation, pg_autotune would be a handy tool. Just that
can't do it right now. Have to fight mysql and SAP DB before that..

BTW any performance figures on SAP DB? People here are as it frustrated with it
with difficulties in setting it up. But still..
>

> > We will be attempting raiserfs and/or XFS if required. I know how much speed
> > difference exists between resiserfs and ext2. Would not be surprised if
> > everythng just starts screaming in one go..
>
> We'd all probably be interested to hear this.  Added the PostgreSQL
> "Performance" mailing list to this thread too, Just In Case. (wow that's
> a lot of cross posting now).

I know..;-) Glad that PG list does not have strict policies like no non-
subscriber posting or no attachments.. etc..

IMO reiserfs, though journalling one, is faster than ext2 etc. because the way
it handles metadata. Personally I haven't come across ext2 being faster than
reiserfs on few machine here for day to day use.

I guess I should have a freeBSD CD handy too.. Just to give it a try. If it
comes down to a better VM.. though using 2.4.19 here.. so souldn't matter
much..

I will keep you guys posted on file system stuff... Glad that we have much
flexibility with postgresql..

Bye
 Shridhar

--
Bilbo's First Law:    You cannot count friends that are all packed up in barrels.