Thread: How to query and index for customer with lastname and city

How to query and index for customer with lastname and city

From

Joost Kraaijeveld

Date:

04 March 2006, 05:58:09

Hi,

I have two tables:

Customer: objectid, lastname, fk_address
Address: objectid, city

I want to select all customers with a name >= some_name and living in a city >= some_city, all comparisons case insensitive

Below is what I actually have. Given the fact that it takes forever to get a result (> 6 seconds) , there must be something wrong with my solution or my expectation. Can anyone tell what I should do to make this query go faster ( or convince me to wait for the result ;-()?

SELECT customers.objectid FROM prototype.customers,prototype.addresses
WHERE
customers.contactAddress = addresses.objectId
AND
(
TRIM(UPPER(lastName)) >= TRIM(UPPER('some_name'))
AND
TRIM(UPPER(city)) >= TRIM(UPPER('some_city'))
)
order by TRIM(UPPER(lastname)), TRIM(UPPER(city))

Explain analyze after a full alayse vacuum:

Sort (cost=54710.68..54954.39 rows=97484 width=111) (actual time=7398.971..7680.405 rows=96041 loops=1)
Sort Key: btrim(upper(customers.lastname)), btrim(upper(addresses.city))
-> Hash Join (cost=14341.12..46632.73 rows=97484 width=111) (actual time=1068.862..5472.788 rows=96041 loops=1)
        Hash Cond: ("outer".contactaddress = "inner".objectid)
        -> Seq Scan on customers (cost=0.00..24094.01 rows=227197 width=116) (actual time=0.018..1902.646 rows=223990 loops=1)
              Filter: (btrim(upper(lastname)) >= 'JANSEN'::text)
        -> Hash (cost=13944.94..13944.94 rows=158473 width=75) (actual time=1068.467..1068.467 rows=158003 loops=1)
              -> Bitmap Heap Scan on addresses (cost=1189.66..13944.94 rows=158473 width=75) (actual time=71.259..530.986 rows=158003 loops=1)
                    Recheck Cond: (btrim(upper(city)) >= 'NIJMEGEN'::text)
                    -> Bitmap Index Scan on prototype_addresses_trim_upper_city (cost=0.00..1189.66 rows=158473 width=0) (actual time=68.290..68.290 rows=158003 loops=1)
                          Index Cond: (btrim(upper(city)) >= 'NIJMEGEN'::text)
Total runtime: 7941.095 ms

I have indices on :
fki_customers_addresses
customer.lastname (both lastname and trim(uppercase(lastname))
addresses.city (both city and trim(uppercase(city))

I

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl

Re: How to query and index for customer with lastname and city

From

"hubert depesz lubaczewski"

Date:

04 March 2006, 09:49:43

On 3/4/06, Joost Kraaijeveld <J.Kraaijeveld@askesis.nl> wrote:
>  Below is what I actually have. Given the fact that it takes forever to get
> a result (> 6 seconds) , there must be something wrong with my solution or
> my expectation. Can anyone tell what I should do to make this query go
> faster ( or convince me to wait for the result ;-()?
>  Explain analyze after a full alayse vacuum:
>  Sort  (cost=54710.68..54954.39 rows=97484 width=111) (actual
> time=7398.971..7680.405 rows=96041 loops=1)
>    Sort Key: btrim(upper(customers.lastname)), btrim(upper(addresses.city))
>    ->  Hash Join  (cost=14341.12..46632.73 rows=97484 width=111) (actual
> time=1068.862..5472.788 rows=96041 loops=1)
>          Hash Cond: ("outer".contactaddress = "inner".objectid)
>          ->  Seq Scan on customers  (cost=0.00..24094.01 rows=227197
> width=116) (actual time=0.018..1902.646 rows=223990 loops=1)
>                Filter: (btrim(upper(lastname)) >= 'JANSEN'::text)
>          ->  Hash  (cost=13944.94..13944.94 rows=158473 width=75) (actual
> time=1068.467..1068.467 rows=158003 loops=1)
>                ->  Bitmap Heap Scan on addresses  (cost=1189.66..13944.94
> rows=158473 width=75) (actual time=71.259..530.986 rows=158003 loops=1)
>                      Recheck Cond: (btrim(upper(city)) >= 'NIJMEGEN'::text)
>                      ->  Bitmap Index Scan on
> prototype_addresses_trim_upper_city  (cost=0.00..1189.66
> rows=158473 width=0) (actual time=68.290..68.290 rows=158003 loops=1)
>                            Index Cond: (btrim(upper(city)) >=
> 'NIJMEGEN'::text)
>  Total runtime: 7941.095 ms

explain clearly shows, that index is used for addresses scan, but it
is not so for users.
explain estimates that 227197 customers match the lastname criteria -
which looks awfuly high.
how many record do you have in the customers table?

i would try to create index test on customers(contactAddress,
trim(uppercase(lastname)));
or with other ordring of fields.

try this - create the index, make analyze of customers table, and
recheck explain.
then try the second index in the same manner.

maybe this could of some help...

depesz

Re: How to query and index for customer with lastname

From

Joost Kraaijeveld

Date:

04 March 2006, 10:18:23

Hi Hubert,

On Sat, 2006-03-04 at 14:49 +0100, hubert depesz lubaczewski wrote:
> >  Sort  (cost=54710.68..54954.39 rows=97484 width=111) (actual
> > time=7398.971..7680.405 rows=96041 loops=1)
> >    Sort Key: btrim(upper(customers.lastname)), btrim(upper(addresses.city))
> >    ->  Hash Join  (cost=14341.12..46632.73 rows=97484 width=111) (actual time=1068.862..5472.788 rows=96041
loops=1)
> >          Hash Cond: ("outer".contactaddress = "inner".objectid)
> >          ->  Seq Scan on customers  (cost=0.00..24094.01 rows=227197 width=116) (actual time=0.018..1902.646
rows=223990loops=1) 
> >                Filter: (btrim(upper(lastname)) >= 'JANSEN'::text)
> >          ->  Hash  (cost=13944.94..13944.94 rows=158473 width=75) (actual time=1068.467..1068.467 rows=158003
loops=1)
> >                ->  Bitmap Heap Scan on addresses  (cost=1189.66..13944.94 rows=158473 width=75) (actual
time=71.259..530.986rows=158003 loops=1) 
> >                      Recheck Cond: (btrim(upper(city)) >= 'NIJMEGEN'::text)
> >                      ->  Bitmap Index Scan on prototype_addresses_trim_upper_city  (cost=0.00..1189.66 rows=158473
width=0)(actual time=68.290..68.290 rows=158003 loops=1) 
> >                            Index Cond: (btrim(upper(city)) >=> 'NIJMEGEN'::text)
> >  Total runtime: 7941.095 ms
>
> explain clearly shows, that index is used for addresses scan, but it
Yes, but I do not understand why I have both a "Bitmap Index Scan"  and
a "Bitmap Heap Scan" on (btrim(upper(city)) >=> 'NIJMEGEN'::text)?

> is not so for users.
> explain estimates that 227197 customers match the lastname criteria -
> which looks awfuly high.
> how many record do you have in the customers table?
368915 of which 222465 actually meet the condition.

From what I understand from the mailing list, PostgreSQL prefers a table
scan whenever it expects that the number of records in the resultset
will be ~ > 10 % of the total number of records in the table. Which
explains the table scan for customers, but than again, it does not
explain why it uses the index on addresses: it has 369337 addresses of
which 158003 meet the condition

> i would try to create index test on customers(contactAddress,
> trim(uppercase(lastname)));
> or with other ordring of fields.
>
> try this - create the index, make analyze of customers table, and
> recheck explain.
> then try the second index in the same manner.
Makes no difference.


--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: J.Kraaijeveld@Askesis.nl
web: www.askesis.nl

Re: How to query and index for customer with lastname and city

From

"hubert depesz lubaczewski"

Date:

04 March 2006, 10:23:06

On 3/4/06, Joost Kraaijeveld <J.Kraaijeveld@askesis.nl> wrote:
> > how many record do you have in the customers table?
> 368915 of which 222465 actually meet the condition.
> >From what I understand from the mailing list, PostgreSQL prefers a table
> scan whenever it expects that the number of records in the resultset
> will be ~ > 10 % of the total number of records in the table. Which
> explains the table scan for customers, but than again, it does not
> explain why it uses the index on addresses: it has 369337 addresses of
> which 158003 meet the condition

bitmap index scan is faster than sequential table scan. that's all. it
was introduced in 8.1 as far as i remember.
basically - i doubt if you can get better performace from query when
the result row-count is that high.

out of curiosity though - why do you need so many rows? it's not
possible to view them, nor do anything meaningful with 200 thousand
rows!

depesz

Re: How to query and index for customer with lastname and city

From

Kevin Brown

Date:

04 March 2006, 17:27:40

On Saturday 04 March 2006 08:23, hubert depesz lubaczewski wrote:
> On 3/4/06, Joost Kraaijeveld <J.Kraaijeveld@askesis.nl> wrote:
> > > how many record do you have in the customers table?
> >
> > 368915 of which 222465 actually meet the condition.
> >
> > >From what I understand from the mailing list, PostgreSQL prefers a table
> >
> > scan whenever it expects that the number of records in the resultset
> > will be ~ > 10 % of the total number of records in the table. Which
> > explains the table scan for customers, but than again, it does not
> > explain why it uses the index on addresses: it has 369337 addresses of
> > which 158003 meet the condition
>
> bitmap index scan is faster than sequential table scan. that's all. it
> was introduced in 8.1 as far as i remember.
> basically - i doubt if you can get better performace from query when
> the result row-count is that high.
>
> out of curiosity though - why do you need so many rows? it's not
> possible to view them, nor do anything meaningful with 200 thousand
> rows!
>
> depesz

If you're just displaying, use limit and offset to grab one page at a time.
If you're manipulating it would be a good idea to do something in a stored
procedure.