Re: Massive performance issues - Mailing list pgsql-performance

From Matthew Sackman
Subject Re: Massive performance issues
Date
Msg-id 20050901210626.GC7131@pongo.lshift.net
Whole thread Raw
In response to Re: Massive performance issues  ("Steinar H. Gunderson" <sgunderson@bigfoot.com>)
Responses Re: Massive performance issues
List pgsql-performance
On Thu, Sep 01, 2005 at 10:09:30PM +0200, Steinar H. Gunderson wrote:
> >     "address_city_index" btree (city)
> >     "address_county_index" btree (county)
> >     "address_locality_1_index" btree (locality_1)
> >     "address_locality_2_index" btree (locality_2)
> >     "address_pc_bottom_index" btree (postcode_bottom)
> >     "address_pc_middle_index" btree (postcode_middle)
> >     "address_pc_top_index" btree (postcode_top)
> >     "address_pc_top_middle_bottom_index" btree (postcode_top,
> >                              postcode_middle, postcode_bottom)
> >     "address_pc_top_middle_index" btree (postcode_top, postcode_middle)
> >     "address_postcode_index" btree (postcode)
> >     "address_property_type_index" btree (property_type)
> >     "address_street_index" btree (street)
> >     "street_prefix" btree (lower("substring"((street)::text, 1, 1)))
>
> Wow, that's quite a lof of indexes... but your problem isn't reported as
> being in insert/update/delete.

Hah, well now that you mention it. Basically, 100,000 rows come in in a
bulk import every month and the only way I can get it to complete in any
sane time frame at all is to drop the indexes, do the import and then
recreate the indexes. But that's something that I'm OK with - the
imports don't have to be that fast and whilst important, it's not *the*
critical path. Selection from the database is, hence the indexes.

> > This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 and a
> > SATA harddrive.
>
> 8.0 or 8.1 might help you some -- better (and more!) disks will probably help
> a _lot_.

Ok, I did try 8.0 when I started this and found that the server bind
parameters (both via DBD::Pg (with pg_prepare_server => 1) and via JDBC
(various versions I tried)) failed - the parameters were clearly not
being substituted. This was Postgresql 8.0 from Debian unstable. That
was a couple of weeks ago and I've not been back to check whether its
been fixed. Anyway, because of these problems I dropped back to 7.4.

> > Queries such as:
> >
> > select locality_2 from address where locality_2 = 'Manchester';
> >
> > are taking 14 seconds to complete, and this is only 2 years worth of
> > data - we will have up to 15 years (so over 15 million rows).
>
> As Tom pointed out; you're effectively doing random searches here, and using
> CLUSTER might help. Normalizing your data to get smaller rows (and avoid
> possibly costly string comparisons if your strcoll() is slow) will probably
> also help.

Ok, so you're saying that joining the address table into an address_city
table (the obvious normalization) will help here?

The locale settings in postgresql.conf all have en_GB and a \l shows
encoding of LATIN1. So I don't think I've set anything to UTF8 or such
like.

> > I need to get to the stage where I can run queries such as:
> > select street, locality_1, locality_2, city from address
> > where (city = 'Nottingham' or locality_2 = 'Nottingham'
> >        or locality_1 = 'Nottingham')
> >   and upper(substring(street from 1 for 1)) = 'A'
> > group by street, locality_1, locality_2, city
> > order by street
> > limit 20 offset 0
>
> This might be a lot quicker than pulling all the records like in your example
> queries...

Yes, that certainly does seem to be the case - around 4 seconds. But I
need it to be 10 times faster (or thereabouts) otherwise I have big
problems!

Many thanks for all the advice so far.

Matthew


pgsql-performance by date:

Previous
From: Ron
Date:
Subject: Re: Massive performance issues
Next
From: Matthew Sackman
Date:
Subject: Re: Massive performance issues