Re: optimization join on random value - Mailing list pgsql-performance

From Heikki Linnakangas
Subject Re: optimization join on random value
Date
Msg-id 5546930E.5080201@iki.fi
Whole thread Raw
In response to optimization join on random value  (Anton Bushmelev <djeday84@gmail.com>)
List pgsql-performance
On 05/04/2015 12:23 AM, Anton Bushmelev wrote:
>   Hello guru of postgres,  it's possoble to tune query with join on random
> string ?
> i know that it is not real life example, but i need it for tests.
>
> soe=# explain
> soe-#  SELECT   ADDRESS_ID,
> soe-#           CUSTOMER_ID,
> soe-#           DATE_CREATED,
> soe-#           HOUSE_NO_OR_NAME,
> soe-#           STREET_NAME,
> soe-#           TOWN,
> soe-#           COUNTY,
> soe-#           COUNTRY,
> soe-#           POST_CODE,
> soe-#           ZIP_CODE
> soe-#         FROM ADDRESSES
> soe-#         WHERE customer_id = trunc( random()*45000) ;
>                                          QUERY
> PLAN
> -------------------------------------------------------------------------------------------
>   Seq Scan on addresses  (cost=0.00..165714.00 rows=22500 width=84)
>     Filter: ((customer_id)::double precision = trunc((random() *
> 45000::double precision)))
> (2 rows)
>

There are two problems here that prohibit the index from being used:

1. random() is volatile, so it's recalculated for each row.
2. For the comparison, customer_id is cast to a float, and the index is
on the bigint value.

To work around the first problem, put the random() call inside a
subquery. And for the second problem, cast to bigint.

SELECT ... FROM addresses
WHERE customer_id = (SELECT random()*45000)::bigint

- Heikki



pgsql-performance by date:

Previous
From: Anton Bushmelev
Date:
Subject: optimization join on random value
Next
From: "Muthusamy, Sivaraman"
Date:
Subject: How to clean/truncate / VACUUM FULL pg_largeobject without (much) downtime?