Thread: Combination of partial and full indexes

Combination of partial and full indexes

From
Rafał Gutkowski
Date:
Hi.

I had a fight with a query planner because it doesn’t listen.

There are two indexes:

 - with expression in descending order:
    "offers_offer_next_update_idx" btree (offer_next_update(update_ts, update_freq) DESC) WHERE o_archived = false
 - unique with two columns:
    "offers_source_id_o_key_idx" UNIQUE, btree (source_id, o_key)

Here's the query with filter for offers.source_id columns which
is pretty slow because "offers_source_id_o_key_idx" is not used:

    EXPLAIN ANALYZE
    SELECT offers.o_url AS offers_o_url
    FROM offers
    WHERE offers.source_id = 1 AND offers.o_archived = false AND now() > offer_next_update(offers.update_ts, offers.update_freq)
    ORDER BY offer_next_update(offers.update_ts, offers.update_freq) DESC
    LIMIT 1000;

    Limit  (cost=0.68..23403.77 rows=1000 width=116) (actual time=143.544..147.870 rows=1000 loops=1)
      ->  Index Scan using offers_offer_next_update_idx on offers  (cost=0.68..1017824.69 rows=43491 width=116) (actual time=143.542..147.615 rows=1000 loops=1)
            Index Cond: (now() > offer_next_update(update_ts, update_freq))
            Filter: (source_id = 1)
            Rows Removed by Filter: 121376
    Total runtime: 148.023 ms


When I remove filter on offers.source_id, query plan looks like this:

    EXPLAIN ANALYZE
    SELECT offers.o_url AS offers_o_url
    FROM offers
    WHERE offers.o_archived = false AND now() > offer_next_update(offers.update_ts, offers.update_freq)
    ORDER BY offer_next_update(offers.update_ts, offers.update_freq) DESC
    LIMIT 1000;

    Limit  (cost=0.68..4238.27 rows=1000 width=116) (actual time=0.060..3.877 rows=1000 loops=1)
      ->  Index Scan using offers_offer_next_update_idx on offers  (cost=0.68..1069411.78 rows=252363 width=116) (actual time=0.058..3.577 rows=1000 loops=1)
            Index Cond: (now() > offer_next_update(update_ts, update_freq))
    Total runtime: 4.031 ms


I even tried to change orders of conditions in second query but it doesn't seem
to make a difference for a planner.

Shouldn't query planner use offers_source_id_o_key_idx to speed up query above?


PostgreSQL version: PostgreSQL 9.3.12 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4, 64-bit

Configuration:
             name             |            current_setting             |        source
------------------------------+----------------------------------------+----------------------
 application_name             | psql                                   | client
 checkpoint_completion_target | 0.9                                    | configuration file
 checkpoint_segments          | 3                                      | configuration file
 client_encoding              | UTF8                                   | client
 DateStyle                    | ISO, MDY                               | configuration file
 default_text_search_config   | pg_catalog.english                     | configuration file
 effective_cache_size         | 128MB                                  | configuration file
 external_pid_file            | /var/run/postgresql/9.3-main.pid       | configuration file
 lc_messages                  | en_US.UTF-8                            | configuration file
 lc_monetary                  | en_US.UTF-8                            | configuration file
 lc_numeric                   | en_US.UTF-8                            | configuration file
 lc_time                      | en_US.UTF-8                            | configuration file
 max_connections              | 100                                    | configuration file
 max_locks_per_transaction    | 168                                    | configuration file
 max_stack_depth              | 2MB                                    | environment variable
 port                         | 5432                                   | configuration file
 shared_buffers               | 4GB                                    | configuration file
 temp_buffers                 | 12MB                                   | configuration file
 unix_socket_directories      | /var/run/postgresql                    | configuration file
 work_mem                     | 16MB                                   | configuration file


Definitions:

CREATE OR REPLACE FUNCTION public.offer_next_update(last timestamp without time zone, minutes smallint)
 RETURNS timestamp without time zone
 LANGUAGE plpgsql
 IMMUTABLE
AS $function$
BEGIN
        RETURN last + (minutes || ' min')::interval;
END
$function$


Re: Combination of partial and full indexes

From
Gerardo Herzig
Date:
I dont think offers_source_id_o_key_idx will be used at all. It is a UNIQUE index on (source_id, o_key), but your query
doesnot filter for any "o_key", so reading that index does not provide the pointers needed to fetch the actual data in
thetable. 

I will try an index on source_id, offer_next_update(offers.update_ts, offers.update_freq) and see what happens

HTH
Gerardo

----- Mensaje original -----
> De: "Rafał Gutkowski" <goodkowski@gmail.com>
> Para: pgsql-performance@postgresql.org
> Enviados: Martes, 7 de Junio 2016 10:39:14
> Asunto: [PERFORM] Combination of partial and full indexes
>
>
> Hi.
>
>
> I had a fight with a query planner because it doesn’t listen.
>
>
> There are two indexes:
>
>
> - with expression in descending order:
> "offers_offer_next_update_idx" btree (offer_next_update(update_ts,
> update_freq) DESC) WHERE o_archived = false
> - unique with two columns:
> "offers_source_id_o_key_idx" UNIQUE, btree (source_id, o_key)
>
>
> Here's the query with filter for offers.source_id columns which
> is pretty slow because "offers_source_id_o_key_idx" is not used:
>
>
> EXPLAIN ANALYZE
> SELECT offers.o_url AS offers_o_url
> FROM offers
> WHERE offers.source_id = 1 AND offers.o_archived = false AND now() >
> offer_next_update(offers.update_ts, offers.update_freq)
> ORDER BY offer_next_update(offers.update_ts, offers.update_freq) DESC
> LIMIT 1000;
>
>
> Limit (cost=0.68..23403.77 rows=1000 width=116) (actual
> time=143.544..147.870 rows=1000 loops=1)
> -> Index Scan using offers_offer_next_update_idx on offers
> (cost=0.68..1017824.69 rows=43491 width=116) (actual
> time=143.542..147.615 rows=1000 loops=1)
> Index Cond: (now() > offer_next_update(update_ts, update_freq))
> Filter: (source_id = 1)
> Rows Removed by Filter: 121376
> Total runtime: 148.023 ms
>
>
>
>
> When I remove filter on offers.source_id, query plan looks like this:
>
>
> EXPLAIN ANALYZE
> SELECT offers.o_url AS offers_o_url
> FROM offers
> WHERE offers.o_archived = false AND now() >
> offer_next_update(offers.update_ts, offers.update_freq)
> ORDER BY offer_next_update(offers.update_ts, offers.update_freq) DESC
> LIMIT 1000;
>
>
> Limit (cost=0.68..4238.27 rows=1000 width=116) (actual
> time=0.060..3.877 rows=1000 loops=1)
> -> Index Scan using offers_offer_next_update_idx on offers
> (cost=0.68..1069411.78 rows=252363 width=116) (actual
> time=0.058..3.577 rows=1000 loops=1)
> Index Cond: (now() > offer_next_update(update_ts, update_freq))
> Total runtime: 4.031 ms
>
>
>
>
> I even tried to change orders of conditions in second query but it
> doesn't seem
> to make a difference for a planner.
>
>
> Shouldn't query planner use offers_source_id_o_key_idx to speed up
> query above?
>
>
>
>
> PostgreSQL version: PostgreSQL 9.3.12 on x86_64-unknown-linux-gnu,
> compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4, 64-bit
>
>
> Configuration:
> name | current_setting | source
> ------------------------------+----------------------------------------+----------------------
> application_name | psql | client
> checkpoint_completion_target | 0.9 | configuration file
> checkpoint_segments | 3 | configuration file
> client_encoding | UTF8 | client
> DateStyle | ISO, MDY | configuration file
> default_text_search_config | pg_catalog.english | configuration file
> effective_cache_size | 128MB | configuration file
> external_pid_file | /var/run/postgresql/9.3-main.pid | configuration
> file
> lc_messages | en_US.UTF-8 | configuration file
> lc_monetary | en_US.UTF-8 | configuration file
> lc_numeric | en_US.UTF-8 | configuration file
> lc_time | en_US.UTF-8 | configuration file
> max_connections | 100 | configuration file
> max_locks_per_transaction | 168 | configuration file
> max_stack_depth | 2MB | environment variable
> port | 5432 | configuration file
> shared_buffers | 4GB | configuration file
> temp_buffers | 12MB | configuration file
> unix_socket_directories | /var/run/postgresql | configuration file
> work_mem | 16MB | configuration file
>
>
>
>
> Definitions:
>
>
>
> CREATE OR REPLACE FUNCTION public.offer_next_update(last timestamp
> without time zone, minutes smallint)
> RETURNS timestamp without time zone
> LANGUAGE plpgsql
> IMMUTABLE
> AS $function$
> BEGIN
> RETURN last + (minutes || ' min')::interval;
> END
> $function$
>
>
>
>


Re: Combination of partial and full indexes

From
Rafał Gutkowski
Date:
I thought that first column from left in multi-column index can and will be used just as it would be a single column
index.

It doesn’t seem to work with unqiue indexes, which ultimetly makes sense.

Thank you Gerardo.

> On 07 Jun 2016, at 19:36, Gerardo Herzig <gherzig@fmed.uba.ar> wrote:
>
> I dont think offers_source_id_o_key_idx will be used at all. It is a UNIQUE index on (source_id, o_key), but your
querydoes not filter for any "o_key", so reading that index does not provide the pointers needed to fetch the actual
datain the table. 
>
> I will try an index on source_id, offer_next_update(offers.update_ts, offers.update_freq) and see what happens
>
> HTH
> Gerardo
>
> ----- Mensaje original -----
>> De: "Rafał Gutkowski" <goodkowski@gmail.com>
>> Para: pgsql-performance@postgresql.org
>> Enviados: Martes, 7 de Junio 2016 10:39:14
>> Asunto: [PERFORM] Combination of partial and full indexes
>>
>>
>> Hi.
>>
>>
>> I had a fight with a query planner because it doesn’t listen.
>>
>>
>> There are two indexes:
>>
>>
>> - with expression in descending order:
>> "offers_offer_next_update_idx" btree (offer_next_update(update_ts,
>> update_freq) DESC) WHERE o_archived = false
>> - unique with two columns:
>> "offers_source_id_o_key_idx" UNIQUE, btree (source_id, o_key)
>>
>>
>> Here's the query with filter for offers.source_id columns which
>> is pretty slow because "offers_source_id_o_key_idx" is not used:
>>
>>
>> EXPLAIN ANALYZE
>> SELECT offers.o_url AS offers_o_url
>> FROM offers
>> WHERE offers.source_id = 1 AND offers.o_archived = false AND now() >
>> offer_next_update(offers.update_ts, offers.update_freq)
>> ORDER BY offer_next_update(offers.update_ts, offers.update_freq) DESC
>> LIMIT 1000;
>>
>>
>> Limit (cost=0.68..23403.77 rows=1000 width=116) (actual
>> time=143.544..147.870 rows=1000 loops=1)
>> -> Index Scan using offers_offer_next_update_idx on offers
>> (cost=0.68..1017824.69 rows=43491 width=116) (actual
>> time=143.542..147.615 rows=1000 loops=1)
>> Index Cond: (now() > offer_next_update(update_ts, update_freq))
>> Filter: (source_id = 1)
>> Rows Removed by Filter: 121376
>> Total runtime: 148.023 ms
>>
>>
>>
>>
>> When I remove filter on offers.source_id, query plan looks like this:
>>
>>
>> EXPLAIN ANALYZE
>> SELECT offers.o_url AS offers_o_url
>> FROM offers
>> WHERE offers.o_archived = false AND now() >
>> offer_next_update(offers.update_ts, offers.update_freq)
>> ORDER BY offer_next_update(offers.update_ts, offers.update_freq) DESC
>> LIMIT 1000;
>>
>>
>> Limit (cost=0.68..4238.27 rows=1000 width=116) (actual
>> time=0.060..3.877 rows=1000 loops=1)
>> -> Index Scan using offers_offer_next_update_idx on offers
>> (cost=0.68..1069411.78 rows=252363 width=116) (actual
>> time=0.058..3.577 rows=1000 loops=1)
>> Index Cond: (now() > offer_next_update(update_ts, update_freq))
>> Total runtime: 4.031 ms
>>
>>
>>
>>
>> I even tried to change orders of conditions in second query but it
>> doesn't seem
>> to make a difference for a planner.
>>
>>
>> Shouldn't query planner use offers_source_id_o_key_idx to speed up
>> query above?
>>
>>
>>
>>
>> PostgreSQL version: PostgreSQL 9.3.12 on x86_64-unknown-linux-gnu,
>> compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4, 64-bit
>>
>>
>> Configuration:
>> name | current_setting | source
>> ------------------------------+----------------------------------------+----------------------
>> application_name | psql | client
>> checkpoint_completion_target | 0.9 | configuration file
>> checkpoint_segments | 3 | configuration file
>> client_encoding | UTF8 | client
>> DateStyle | ISO, MDY | configuration file
>> default_text_search_config | pg_catalog.english | configuration file
>> effective_cache_size | 128MB | configuration file
>> external_pid_file | /var/run/postgresql/9.3-main.pid | configuration
>> file
>> lc_messages | en_US.UTF-8 | configuration file
>> lc_monetary | en_US.UTF-8 | configuration file
>> lc_numeric | en_US.UTF-8 | configuration file
>> lc_time | en_US.UTF-8 | configuration file
>> max_connections | 100 | configuration file
>> max_locks_per_transaction | 168 | configuration file
>> max_stack_depth | 2MB | environment variable
>> port | 5432 | configuration file
>> shared_buffers | 4GB | configuration file
>> temp_buffers | 12MB | configuration file
>> unix_socket_directories | /var/run/postgresql | configuration file
>> work_mem | 16MB | configuration file
>>
>>
>>
>>
>> Definitions:
>>
>>
>>
>> CREATE OR REPLACE FUNCTION public.offer_next_update(last timestamp
>> without time zone, minutes smallint)
>> RETURNS timestamp without time zone
>> LANGUAGE plpgsql
>> IMMUTABLE
>> AS $function$
>> BEGIN
>> RETURN last + (minutes || ' min')::interval;
>> END
>> $function$
>>
>>
>>
>>



Re: Combination of partial and full indexes

From
Rafał Gutkowski
Date:
Altough, creating index `btree (source_id)` still changes nothing. So is `btree (source_id) WHERE o_archived = false`.

It looks like partial indexes and full indexes cannot mix togheter even if when they have same condition.

> On 08 Jun 2016, at 10:52, Rafał Gutkowski <goodkowski@gmail.com> wrote:
>
> I thought that first column from left in multi-column index can and will be used just as it would be a single column
index.
>
> It doesn’t seem to work with unqiue indexes, which ultimetly makes sense.
>
> Thank you Gerardo.
>
>> On 07 Jun 2016, at 19:36, Gerardo Herzig <gherzig@fmed.uba.ar> wrote:
>>
>> I dont think offers_source_id_o_key_idx will be used at all. It is a UNIQUE index on (source_id, o_key), but your
querydoes not filter for any "o_key", so reading that index does not provide the pointers needed to fetch the actual
datain the table. 
>>
>> I will try an index on source_id, offer_next_update(offers.update_ts, offers.update_freq) and see what happens
>>
>> HTH
>> Gerardo
>>
>> ----- Mensaje original -----
>>> De: "Rafał Gutkowski" <goodkowski@gmail.com>
>>> Para: pgsql-performance@postgresql.org
>>> Enviados: Martes, 7 de Junio 2016 10:39:14
>>> Asunto: [PERFORM] Combination of partial and full indexes
>>>
>>>
>>> Hi.
>>>
>>>
>>> I had a fight with a query planner because it doesn’t listen.
>>>
>>>
>>> There are two indexes:
>>>
>>>
>>> - with expression in descending order:
>>> "offers_offer_next_update_idx" btree (offer_next_update(update_ts,
>>> update_freq) DESC) WHERE o_archived = false
>>> - unique with two columns:
>>> "offers_source_id_o_key_idx" UNIQUE, btree (source_id, o_key)
>>>
>>>
>>> Here's the query with filter for offers.source_id columns which
>>> is pretty slow because "offers_source_id_o_key_idx" is not used:
>>>
>>>
>>> EXPLAIN ANALYZE
>>> SELECT offers.o_url AS offers_o_url
>>> FROM offers
>>> WHERE offers.source_id = 1 AND offers.o_archived = false AND now() >
>>> offer_next_update(offers.update_ts, offers.update_freq)
>>> ORDER BY offer_next_update(offers.update_ts, offers.update_freq) DESC
>>> LIMIT 1000;
>>>
>>>
>>> Limit (cost=0.68..23403.77 rows=1000 width=116) (actual
>>> time=143.544..147.870 rows=1000 loops=1)
>>> -> Index Scan using offers_offer_next_update_idx on offers
>>> (cost=0.68..1017824.69 rows=43491 width=116) (actual
>>> time=143.542..147.615 rows=1000 loops=1)
>>> Index Cond: (now() > offer_next_update(update_ts, update_freq))
>>> Filter: (source_id = 1)
>>> Rows Removed by Filter: 121376
>>> Total runtime: 148.023 ms
>>>
>>>
>>>
>>>
>>> When I remove filter on offers.source_id, query plan looks like this:
>>>
>>>
>>> EXPLAIN ANALYZE
>>> SELECT offers.o_url AS offers_o_url
>>> FROM offers
>>> WHERE offers.o_archived = false AND now() >
>>> offer_next_update(offers.update_ts, offers.update_freq)
>>> ORDER BY offer_next_update(offers.update_ts, offers.update_freq) DESC
>>> LIMIT 1000;
>>>
>>>
>>> Limit (cost=0.68..4238.27 rows=1000 width=116) (actual
>>> time=0.060..3.877 rows=1000 loops=1)
>>> -> Index Scan using offers_offer_next_update_idx on offers
>>> (cost=0.68..1069411.78 rows=252363 width=116) (actual
>>> time=0.058..3.577 rows=1000 loops=1)
>>> Index Cond: (now() > offer_next_update(update_ts, update_freq))
>>> Total runtime: 4.031 ms
>>>
>>>
>>>
>>>
>>> I even tried to change orders of conditions in second query but it
>>> doesn't seem
>>> to make a difference for a planner.
>>>
>>>
>>> Shouldn't query planner use offers_source_id_o_key_idx to speed up
>>> query above?
>>>
>>>
>>>
>>>
>>> PostgreSQL version: PostgreSQL 9.3.12 on x86_64-unknown-linux-gnu,
>>> compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4, 64-bit
>>>
>>>
>>> Configuration:
>>> name | current_setting | source
>>> ------------------------------+----------------------------------------+----------------------
>>> application_name | psql | client
>>> checkpoint_completion_target | 0.9 | configuration file
>>> checkpoint_segments | 3 | configuration file
>>> client_encoding | UTF8 | client
>>> DateStyle | ISO, MDY | configuration file
>>> default_text_search_config | pg_catalog.english | configuration file
>>> effective_cache_size | 128MB | configuration file
>>> external_pid_file | /var/run/postgresql/9.3-main.pid | configuration
>>> file
>>> lc_messages | en_US.UTF-8 | configuration file
>>> lc_monetary | en_US.UTF-8 | configuration file
>>> lc_numeric | en_US.UTF-8 | configuration file
>>> lc_time | en_US.UTF-8 | configuration file
>>> max_connections | 100 | configuration file
>>> max_locks_per_transaction | 168 | configuration file
>>> max_stack_depth | 2MB | environment variable
>>> port | 5432 | configuration file
>>> shared_buffers | 4GB | configuration file
>>> temp_buffers | 12MB | configuration file
>>> unix_socket_directories | /var/run/postgresql | configuration file
>>> work_mem | 16MB | configuration file
>>>
>>>
>>>
>>>
>>> Definitions:
>>>
>>>
>>>
>>> CREATE OR REPLACE FUNCTION public.offer_next_update(last timestamp
>>> without time zone, minutes smallint)
>>> RETURNS timestamp without time zone
>>> LANGUAGE plpgsql
>>> IMMUTABLE
>>> AS $function$
>>> BEGIN
>>> RETURN last + (minutes || ' min')::interval;
>>> END
>>> $function$
>>>
>>>
>>>
>>>
>