Thread: Re: Need for speed 2

Re: Need for speed 2

From
"Merlin Moncure"
Date:
> Putting pg_xlog on the IDE drives gave about 10% performance
> improvement. Would faster disks give more performance?
>
> What my application does:
>
> Every five minutes a new logfile will be imported. Depending on the
> source of the request it will be imported in one of three "raw click"
> tables. (data from two months back, to be able to verify customer
> complains)
> For reporting I have a set of tables. These contain data from the last
> two years. My app deletes all entries from today and reinserts updated
> data calculated from the raw data tables.
>
> The queries contain no joins only aggregates. I have several indexes
to
> speed different kinds of queries.
>
> My problems occur when one users does a report that contains to much
old
> data. In that case all cache mechanisms will fail and disc io is the
> limiting factor.

It seems like you are pushing limit of what server can handle.  This
means: 1. expensive server upgrade. or
2. make software more efficient.

Since you sound I/O bound, you can tackle 1. by a. adding more memory or
b. increasing i/o throughput.

Unfortunately, you already have a pretty decent server (for x86) so 1.
means 64 bit platform and 2. means more expensive hard drives.  The
archives is full of information about this...

Is your data well normalized?  You can do tricks like:
if table has fields a,b,c,d,e,f with a is primary key, and d,e,f not
frequently queried or missing, move d,e,f to seprate table.

well normalized structures are always more cache efficient.  Do you have
lots of repeating and/or empty data values in your tables?

Make your indexes and data as small as possible to reduce pressure on
the cache, here are just a few tricks:
1. use int2/int4 instead of numeric
2. know when to use char and varchar
3. use functional indexes to reduce index expression complexity.  This
can give extreme benefits if you can, for example, reduce double field
index to Boolean.

Merlin

What *_mem to increase when running CLUSTER

From
Andrew Lazarus
Date:
Should I temporarily increase sort_mem, vacuum_mem, neither, or both
when doing a CLUSTER on a large (100 million row) table where as many as
half of the tuples are deadwood from UPDATEs or DELETEs? I have large
batch (10 million row) inserts, updates, and deletes so I'm not sure
frequent vacuuming would help.


Attachment

Re: What *_mem to increase when running CLUSTER

From
Steve Poe
Date:
Andrew,

On Thu, 2005-08-25 at 12:24 -0700, Andrew Lazarus wrote:
> Should I temporarily increase sort_mem, vacuum_mem, neither, or both
> when doing a CLUSTER on a large (100 million row) table where as many as
> half of the tuples are deadwood from UPDATEs or DELETEs? I have large
> batch (10 million row) inserts, updates, and deletes so I'm not sure
> frequent vacuuming would help.

You may need to experiment with both. What version of Postgres? What is
the size of your database? How many concurrent users? If you're seeing
half of the tuples are dead, I look at checking your max_fsm_pages and
max_fsm_relations after a full vacuum analyze before doing too much with
sort mem.

Your mileage may vary.

Best of luck.

Steve Poe


>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


Re: What *_mem to increase when running CLUSTER

From
Tom Lane
Date:
Andrew Lazarus <andrew@pillette.com> writes:
> Should I temporarily increase sort_mem, vacuum_mem, neither, or both
> when doing a CLUSTER on a large (100 million row) table

The only part of that job that can use lots of memory is the index
rebuilds.  In recent PG versions maintenance_work_mem is the thing
to increase for an index build; previously sort_mem controlled it.
I forget when the changeover was; maybe 8.0.

            regards, tom lane