Re: Massive performance issues - Mailing list pgsql-performance

From Mark Kirkwood
Subject Re: Massive performance issues
Date
Msg-id 4318FAB9.4060800@paradise.net.nz
Whole thread Raw
In response to Re: Massive performance issues  (Mark Kirkwood <markir@paradise.net.nz>)
List pgsql-performance
Mark Kirkwood wrote:
> Matthew Sackman wrote:
>
>> I need to get to the stage where I can run queries such as:
>
>  >
>
>> select street, locality_1, locality_2, city from address where (city =
>> 'Nottingham' or locality_2 = 'Nottingham'
>>        or locality_1 = 'Nottingham')
>>   and upper(substring(street from 1 for 1)) = 'A' group by street,
>> locality_1, locality_2, city
>> order by street
>> limit 20 offset 0
>>
>> and have the results very quickly.
>>
>
> This sort of query will be handled nicely in 8.1 - it has bitmap and/or
> processing to make use of multiple indexes. Note that 8.1 is in beta now.
>

As others have commented, you will probably need better hardware to
achieve a factor of 1000 improvement, However I think using 8.1 could by
itself give you a factor or 10->100 improvement.

e.g. Using your schema and generating synthetic data:


EXPLAIN
SELECT street, locality_1, locality_2, city
FROM address
WHERE (city = '500TH CITY'
        OR locality_2 = '50TH LOCALITY'
        OR locality_1 = '500TH LOCALITY')
   AND upper(substring(street from 1 for 1)) = 'A'
GROUP BY street, locality_1, locality_2, city
ORDER BY street
LIMIT 20 OFFSET 0

          QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=59559.04..59559.09 rows=20 width=125)
    ->  Sort  (cost=59559.04..59559.09 rows=21 width=125)
          Sort Key: street
          ->  HashAggregate  (cost=59558.37..59558.58 rows=21 width=125)
                ->  Bitmap Heap Scan on address  (cost=323.19..59556.35
rows=202 width=125)
                      Recheck Cond: (((city)::text = '500TH CITY'::text)
OR ((locality_2)::text = '50TH LOCALITY'::text) OR ((locality_1)::text =
'500TH LOCALITY'::text))
                      Filter: (upper("substring"((street)::text, 1, 1))
= 'A'::text)
                      ->  BitmapOr  (cost=323.19..323.19 rows=40625 width=0)
                            ->  Bitmap Index Scan on address_city_index
  (cost=0.00..15.85 rows=1958 width=0)
                                  Index Cond: ((city)::text = '500TH
CITY'::text)
                            ->  Bitmap Index Scan on
address_locality_2_index  (cost=0.00..143.00 rows=18000 width=0)
                                  Index Cond: ((locality_2)::text =
'50TH LOCALITY'::text)
                            ->  Bitmap Index Scan on
address_locality_1_index  (cost=0.00..164.33 rows=20667 width=0)
                                  Index Cond: ((locality_1)::text =
'500TH LOCALITY'::text)
(14 rows)


This takes 0.5s -> 2s to execute (depending on the frequencies generated
for the two localities).

So we are a factor of 10 better already, on modest HW (2xPIII 1Ghz 2G
running FreeBSD 5.4).

To go better than this you could try a specific summary table:


CREATE TABLE address_summary AS
SELECT street,
        locality_1,
        locality_2,
        city,
        upper(substring(street from 1 for 1)) AS cut_street
FROM address
GROUP BY street, locality_1, locality_2, city
;

CREATE INDEX address_summary_city_index ON address_summary(city);
CREATE INDEX address_summary_locality_1_index ON
address_summary(locality_1);
CREATE INDEX address_summary_locality_2_index ON
address_summary(locality_2);
CREATE INDEX address_summary_street_index ON address_summary(street);
CREATE INDEX street_summary_prefix ON address_summary(cut_street);


And the query can be rewritten as:

EXPLAIN
SELECT street, locality_1, locality_2, city
FROM address_summary
WHERE (city = '500TH CITY'
        OR locality_2 = '50TH LOCALITY'
        OR locality_1 = '500TH LOCALITY')
   AND cut_street = 'A'
ORDER BY street
LIMIT 20 OFFSET 0

                QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..2006.05 rows=20 width=125)
    ->  Index Scan using address_summary_street_index on address_summary
  (cost=0.00..109028.81 rows=1087 width=125)
          Filter: ((((city)::text = '500TH CITY'::text) OR
((locality_2)::text = '50TH LOCALITY'::text) OR ((locality_1)::text =
'500TH LOCALITY'::text)) AND (cut_street = 'A'::text))
(3 rows)


This takes 0.02s - so getting close to the factor of 1000 (a modern
machine with 3-5 times the memory access speed will get you there easily).

The effectiveness of the summary table will depend on the how much the
GROUP BY reduces the cardinality (not much in this synthetic case), so
you will probably get better improvement with the real data!

Cheers

Mark

pgsql-performance by date:

Previous
From: Ramesh kumar
Date:
Subject: Re: ORDER BY and LIMIT not propagated on inherited
Next
From: Carlos Benkendorf
Date:
Subject: Improving performance of a query