Re: updating statistics on slow running query - Mailing list pgsql-performance

From Jim Nasby
Subject Re: updating statistics on slow running query
Date
Msg-id 5466B203.2000706@BlueTreble.com
Whole thread Raw
In response to Re: updating statistics on slow running query  (Eric Ramirez <eric.ramirez.sv@gmail.com>)
List pgsql-performance
On 11/10/14, 4:52 PM, Eric Ramirez wrote:
> Hi Matteo,
> Thanks for your suggestions, I just run some test with ILIKE and LIKE, and ILIKE is consistently slower so I think I
willkeep the Lower  functions. As per your suggestion,  I have switched indexes to use GIN type index, they seem to
build/reada bit faster, still the Recheck task continues to happen in the query plan though. I have removed the Gender
columnfrom the query since is not relevant in my tests. With all this playing around it looks like the stats are now a
bitmore accurate. 
> The query went down to 9 seconds, ideally I would like to get to execute in 2 seconds..., any thoughts on what else I
couldtry? 
> Thanks again,
> Eric

Please don't top-post.

You might try the trigram contrib module: http://www.postgresql.org/docs/9.1/static/pgtrgm.html

BTW, converting status and gender to enums will likely save you a non-trivial amount of space. It won't help in this
query,but if there's other stuff the server will be doing it's probably worth-while. 

> =# explain (analyse on,buffers on)select T.form_id from TAR_MVW_targeting_record AS T where T.status NOT IN
('ANULLED') AND LOWER(T.household_member_last_name) LIKE LOWER('%tu%')  group by T.form_id; 
>                                                                                  QUERY PLAN
>
>
-------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------
>   HashAggregate  (cost=557677.27..561360.83 rows=368356 width=8) (actual time=10172.672..10410.068 rows=786669
loops=1)
>     Buffers: shared hit=304998
>     ->  Bitmap Heap Scan on tar_mvw_targeting_record t  (cost=80048.06..552677.27 rows=2000002 width=8) (actual
time=2481.418..9564.280rows 
> =999933 loops=1)
>           Recheck Cond: ((status)::text <> 'ANULLED'::text)
>           Filter: (lower((household_member_last_name)::text) ~~ '%tu%'::text)
>           Rows Removed by Filter: 9000079
>           Buffers: shared hit=304998
>           ->  Bitmap Index Scan on tar_mvw_targeting_record_lower_idx4  (cost=0.00..79548.06 rows=10000012 width=0)
(actualtime=2375.399..2 
> 375.399 rows=10000012 loops=1)
>                 Buffers: shared hit=7369
>   Total runtime: 10475.240 ms
>
>
>
>
> On Mon, Nov 10, 2014 at 1:57 PM, desmodemone <desmodemone@gmail.com <mailto:desmodemone@gmail.com>> wrote:
>
>
>
>     2014-11-10 18:43 GMT+01:00 Eric Ramirez <eric.ramirez.sv@gmail.com <mailto:eric.ramirez.sv@gmail.com>>:
>
>
>         Hi,
>         I have created a sample database with test data to help benchmark our application. The database has ten
millionrecords, and is running on a dedicated server(postgres 9.3) with 8GB of RAM.  Our queries are pretty slow with
thisamount of data and is my job to get them to run to at acceptable speed. First thing that I notice was that the
planner'srow estimates are off by a large number or records (millions) I have updated the statistics target but didn't
seemto make a difference. The relevant output follows. 
>         Am I looking in the wrong place, something else I should be trying?
>         Thanks in advance for your comments/suggestions,
>         Eric.
>
>
>         =# show work_mem;
>           work_mem
>         ----------
>           1GB
>         (1 row)
>         =# show effective_cache_size;
>           effective_cache_size
>         ----------------------
>           5GB
>         (1 row)
>
>         =#ALTER TABLE TAR_MVW_TARGETING_RECORD ALTER COLUMN household_member_first_name SET STATISTICS 5000;
>         =# vacuum analyse TAR_MVW_TARGETING_RECORD;
>
>         =# \d tar_mvw_targeting_record;
>                       Table "public.tar_mvw_targeting_record"
>                     Column            |         Type          | Modifiers
>         -----------------------------+-----------------------+-----------
>           household_member_id         | bigint                |
>           form_id                     | bigint                |
>           status                      | character varying(64) |
>           gender                      | character varying(64) |
>           household_member_first_name | character varying(64) |
>           household_member_last_name  | character varying(64) |
>
>         Indexes:
>             "tar_mvw_targeting_record_form_id_household_member_id_idx" UNIQUE, btree (form_id, household_member_id)
>           "tar_mvw_targeting_record_lower_idx" gist (lower(household_member_first_name::text)
extensions.gist_trgm_ops)
>           WHERE status::text <> 'ANULLED'::text
>              "tar_mvw_targeting_record_lower_idx1" gist (lower(household_member_last_name::text)
extensions.gist_trgm_ops)
>           WHERE status::text <> 'ANULLED'::text
>
>
>         =# explain (analyse on,buffers on)select T.form_id from TAR_MVW_targeting_record AS T where T.status NOT IN
('ANULLED') AND LOWER(T.household_member_last_name) LIKE LOWER('%tu%') AND T.gender='FEMALE' group by T.form_id; 
>                                                                                          QUERY PLAN
>
>
-------------------------------------------------------------------------------------------------------------------------------------------
>         -------------------------------
>           HashAggregate  (cost=450994.35..452834.96 rows=184061 width=8) (actual time=11932.959..12061.206
rows=442453loops=1) 
>             Buffers: shared hit=307404 read=109743
>             ->  Bitmap Heap Scan on tar_mvw_targeting_record t  (cost=110866.33..448495.37 rows=999592 width=8)
(actualtime=3577.301..11629.132 row 
>         s=500373 loops=1)
>                   Recheck Cond: ((lower((household_member_last_name)::text) ~~ '%tu%'::text) AND ((status)::text <>
'ANULLED'::text))
>                   Rows Removed by Index Recheck: 9000079
>                   Filter: ((gender)::text = 'FEMALE'::text)
>                   Rows Removed by Filter: 499560
>                   Buffers: shared hit=307404 read=109743
>                   ->  Bitmap Index Scan on tar_mvw_targeting_record_lower_idx1  (cost=0.00..110616.43 rows=2000002
width=0)(actual time=3471.142..3 
>         471.142 rows=10000012 loops=1)
>                         Index Cond: (lower((household_member_last_name)::text) ~~ '%tu%'::text)
>                         Buffers: shared hit=36583 read=82935
>           Total runtime: 12092.059 ms
>         (12 rows)
>
>         Time: 12093.107 ms
>
>         p.s. this plan was ran three times, first time took 74 seconds.
>
>
>
>     Hello Eric,
>                              did you try with gin index instead ? so you could avoid, if possible, the recheck
condition(almost the gin index is not lossy ), further if you always use a predicate like "gender=" , you could think
topartition the indexes based on that predicate (where status NOT IN ('ANULLED')  and gender='FEMALE', in the other
caseit wil be where status NOT IN ('ANULLED')  and gender='MALE' ) . Moreover you could avoid also the "lower" operator
andtry use directly the ilike , instead of "like". 
>
>     CREATE INDEX tar_mvw_targeting_record_idx02 ON tar_mvw_targeting_record USING gin (  status gin_trgm_ops)
wherestatusNOT IN ('ANULLED')  and gender='FEMALE'  ; 
>     CREATE INDEX tar_mvw_targeting_record_idx03 ON tar_mvw_targeting_record USING gin (  status gin_trgm_ops)
wherestatusNOT IN ('ANULLED')  and gender='MALE'  ; 
>
>
>     explain (analyse on,buffers on)  select T.form_id from TAR_MVW_targeting_record AS T where T.status NOT IN
('ANULLED') AND T.household_member_last_name ilike LOWER('%tu%') AND T.gender='FEMALE' group by T.form_id; 
>
>
>       I hope it works
>
>     have a nice day
>
>
>     --
>     Matteo Durighetto
>
>     - - - - - - - - - - - - - - - - - - - - - - -
>
>     Italian PostgreSQL User Group <http://www.itpug.org/index.it.html>
>     Italian Community for Geographic Free/Open-Source Software <http://www.gfoss.it>
>
>


--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: pgtune + configurations with 9.3
Next
From: Jim Nasby
Date:
Subject: Re: pgtune + configurations with 9.3