NULLS LAST performance - Mailing list pgsql-performance

From Mathieu De Zutter
Subject NULLS LAST performance
Date
Msg-id AANLkTimo4nrZ3_5wgMRniMoosv8eT4dZA+Buz_HCxEr_@mail.gmail.com
Whole thread Raw
Responses Re: NULLS LAST performance  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance
Hi all,

Running PostgreSQL 8.4.7 (backport package from Debian Lenny).

I have some queries that are based on views, and an engine adds a few clauses (like NULLS LAST). One of these queries has a performance problem.

The simplified form is this:

shs=# explain analyze select * from performance e JOIN part v ON v.performance_id = e.id order by e.creation_date desc limit 10;
                                                                              QUERY PLAN                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..4.25 rows=10 width=312) (actual time=0.078..0.147 rows=10 loops=1)
   ->  Nested Loop  (cost=0.00..62180.28 rows=146294 width=312) (actual time=0.078..0.145 rows=10 loops=1)
         ->  Index Scan Backward using performance_create_idx on performance e  (cost=0.00..12049.21 rows=145379 width=247) (actual time=0.051..0.087 rows=10 loops=1)
         ->  Index Scan using part_performance_idx on part v  (cost=0.00..0.33 rows=1 width=65) (actual time=0.005..0.005 rows=1 loops=10)
               Index Cond: (v.performance_id = e.id)
 Total runtime: 0.205 ms

creation_date is declared as NOT NULL, and since it's a inner join, creation_date can never be null in this query. I'd think that if I add NULLS LAST, it wouldn't have any effect.

However, the query with NULLS LAST (as generated by the engine):

shs=# explain analyze select * from performance e JOIN part v ON v.performance_id = e.id order by e.creation_date desc nulls last limit 10;
                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=25773.76..25773.79 rows=10 width=312) (actual time=492.959..492.963 rows=10 loops=1)
   ->  Sort  (cost=25773.76..26139.50 rows=146294 width=312) (actual time=492.958..492.962 rows=10 loops=1)
         Sort Key: e.creation_date
         Sort Method:  top-N heapsort  Memory: 27kB
         ->  Merge Join  (cost=1.27..22612.40 rows=146294 width=312) (actual time=0.064..367.160 rows=146294 loops=1)
               Merge Cond: (e.id = v.performance_id)
               ->  Index Scan using performance_pkey on performance e  (cost=0.00..11989.20 rows=145379 width=247) (actual time=0.035..160.838 rows=145379 loops=1)
               ->  Index Scan using part_performance_idx on part v  (cost=0.00..8432.35 rows=146294 width=65) (actual time=0.025..91.084 rows=146294 loops=1)
 Total runtime: 493.062 ms

Both tables have around 150k rows as you can read from the last plan.

Table performance:

                                      Table "public.performance"
     Column      |           Type           |                        Modifiers                         
-----------------+--------------------------+----------------------------------------------------------
 created_by      | integer                  | not null
 creation_date   | timestamp with time zone | not null
 comments        | text                     | 
 owned_by        | integer                  | not null
 id              | integer                  | not null default nextval('performance_id_seq'::regclass)
 title           | text                     | 
 title_          | text                     | 
 performer_id    | integer                  | 
 first_medium_id | integer                  | 
 vperf_id        | integer                  | 
 perf_date       | partial_date             | 
 bonustrack      | boolean                  | not null default false
 type_id         | integer                  | not null
 instrumental    | boolean                  | not null default false
 init_rev_level  | smallint                 | not null default 1
 curr_rev_level  | smallint                 | not null default 1
 revision_date   | timestamp with time zone | 
 revised_by      | integer                  | 
 object_type     | text                     | not null default 'performance'::text
 editor_note     | text                     | 
 active          | boolean                  | not null default true
Indexes:
    "performance_pkey" PRIMARY KEY, btree (id)
    "performance_create_idx" btree (creation_date)
    "performance_medium_idx" btree (first_medium_id)
    "performance_own_idx" btree (owned_by)
    "performance_performer_idx" btree (performer_id)

Table part:

                                      Table "public.part"
     Column     |           Type           |                     Modifiers                     
----------------+--------------------------+---------------------------------------------------
 created_by     | integer                  | not null
 creation_date  | timestamp with time zone | 
 comments       | text                     | 
 owned_by       | integer                  | not null
 id             | integer                  | not null default nextval('part_id_seq'::regclass)
 work_id        | integer                  | not null
 performance_id | integer                  | not null
Indexes:
    "part_pkey" PRIMARY KEY, btree (id)
    "part_own_idx" btree (owned_by)
    "part_performance_idx" btree (performance_id)
    "part_work_idx" btree (work_id)

Please advise!

Thanks.

Kind regards,

Mathieu

pgsql-performance by date:

Previous
From: "Benjamin Krajmalnik"
Date:
Subject: Unused indices
Next
From: Merlin Moncure
Date:
Subject: Re: NULLS LAST performance