Re: Massive performance issues - Mailing list pgsql-performance

From Matthew Sackman
Subject Re: Massive performance issues
Date
Msg-id 20050901211359.GD7131@pongo.lshift.net
Whole thread Raw
In response to Re: Massive performance issues  (Arjen van der Meijden <acmmailing@tweakers.net>)
Responses Re: Massive performance issues
Re: Massive performance issues
List pgsql-performance
On Thu, Sep 01, 2005 at 10:54:45PM +0200, Arjen van der Meijden wrote:
> On 1-9-2005 19:42, Matthew Sackman wrote:
> >Obviously, to me, this is a problem, I need these queries to be under a
> >second to complete. Is this unreasonable? What can I do to make this "go
> >faster"? I've considered normalising the table but I can't work out
> >whether the slowness is in dereferencing the pointers from the index
> >into the table or in scanning the index in the first place. And
> >normalising the table is going to cause much pain when inserting values
> >and I'm not entirely sure if I see why normalising it should cause a
> >massive performance improvement.
>
> In this case, I think normalising will give a major decrease in on-disk
> table-size of this large table and the indexes you have. If that's the
> case, that in itself will speed-up all i/o-bound queries quite a bit.

Well that's the thing - on the queries where it decides to use the index
it only reads at around 3MB/s and the CPU is maxed out, whereas when it
doesn't use the index, the disk is being read at 60MB/s. So when it
decides to use an index, I don't seem to be IO bound at all. Or at least
that's the way it seems to me.

> locality_1, _2, city and county can probably be normalised away without
> much problem, but going from varchar's to integers will probably safe
> you quite a bit of (disk)space.

Sure, that's what I've been considering today.

> But since it won't change the selectivity of indexes, so you won't get
> more index-scans instead of sequential scans, I suppose.
> I think its not that hard to create a normalized set of tables from this
>  data-set (using insert into tablename select distinct ... from address
> and such, insert into address_new (..., city) select ... (select cityid
> from cities where city = address.city) from address)
> So its at least relatively easy to figure out the performance
> improvement from normalizing the dataset a bit.

Yeah, the initial creation isn't too painful but when adding rows into
the address table it gets more painful. However, as I've said elsewhere,
the import isn't the critical path so I can cope with that pain,
possibly coding around it in a stored proceedure and triggers as
suggested.

> If you want to improve your hardware, have a look at the Western Digital
> Raptor-series SATA disks, they are fast scsi-like SATA drives. You may
> also have a look at the amount of memory available, to allow caching
> this (entire) table.

Well I've got 1GB of RAM, but from analysis of its use, a fair amount
isn't being used. About 50% is actually in use by applications and about
half of the rest is cache and the rest isn't being used. Has this to do
with the max_fsm_pages and max_fsm_relations settings? I've pretty much
not touched the configuration and it's the standard Debian package.

Matthew

pgsql-performance by date:

Previous
From: Matthew Sackman
Date:
Subject: Re: Massive performance issues
Next
From: Jeff Frost
Date:
Subject: Re: Massive performance issues