Re: Query taking long time - Mailing list pgsql-performance

From acanada
Subject Re: Query taking long time
Date
Msg-id A1F27E10-B02F-4F3C-A09D-C4E512D3534D@cnio.es
Whole thread Raw
In response to Re: Query taking long time  (Evgeny Shishkin <itparanoia@gmail.com>)
Responses Re: Query taking long time
List pgsql-performance
Hello,

First of all I'd like to thank all of you for taking your time and help me with this. Thank you very much.

I did migrate the database to the new server with 32 processors Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz  and 60GB of RAM. 
Evegeny pointed that the disks I am using are not fast enough (For data: 00:1f.2 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 05); and for logging a SAS disk but with only 240GB available, database is 365GB...). I cannot change the locations of data and log since there's not enough space for the data in the SAS disk.  Sadly this is a problem that I cannot solve any time soon...

The migration had really improved the performance
I paste the before and after (the migration) explain analyze, buffers(if aplicable due to server versions)

BEFORE:
explain analyze select * from (select * from entity2document2  where name='Acetaminophen' ) as a  order by a.hepval;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=18015.66..18027.15 rows=4595 width=139) (actual time=39755.942..39756.246 rows=2845 loops=1)
   Sort Key: entity2document2.hepval
   Sort Method:  quicksort  Memory: 578kB
   ->  Bitmap Heap Scan on entity2document2  (cost=116.92..17736.15 rows=4595 width=139) (actual time=45.682..39751.255 rows=2845 loops=1)
         Recheck Cond: ((name)::text = 'Acetaminophen'::text)
         ->  Bitmap Index Scan on entity2document2_name  (cost=0.00..115.77 rows=4595 width=0) (actual time=45.124..45.124 rows=2845 loops=1)
               Index Cond: ((name)::text = 'Acetaminophen'::text)
 Total runtime: 39756.507 ms

 AFTER:
 explain (analyze,buffers) select * from (select * from entity2document2  where name='Acetaminophen' ) as a  order by a.hepval;
                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=18434.76..18446.51 rows=4701 width=131) (actual time=9196.634..9196.909 rows=2845 loops=1)
   Sort Key: entity2document2.hepval
   Sort Method: quicksort  Memory: 604kB
   Buffers: shared hit=4 read=1725
   ->  Bitmap Heap Scan on entity2document2  (cost=105.00..18148.03 rows=4701 width=131) (actual time=38.668..9190.318 rows=2845 loops=1)
         Recheck Cond: ((name)::text = 'Acetaminophen'::text)
         Buffers: shared hit=4 read=1725
         ->  Bitmap Index Scan on entity2documentnew_name  (cost=0.00..103.82 rows=4701 width=0) (actual time=30.905..30.905 rows=2845 loops=1)
               Index Cond: ((name)::text = 'Acetaminophen'::text)
               Buffers: shared hit=1 read=14
 Total runtime: 9197.186 ms

The improve is definitely good!!.
This is the table that I'm using: 
\d+ entity2document2;
                                    Table "public.entity2document2"
      Column      |              Type              | Modifiers | Storage  | Stats target | Description 
------------------+--------------------------------+-----------+----------+--------------+-------------
 id               | integer                        | not null  | plain    |              | 
 document_id      | integer                        |           | plain    |              | 
 name             | character varying(255)         | not null  | extended |              | 
 qualifier        | character varying(255)         | not null  | extended |              | 
 tagMethod        | character varying(255)         |           | extended |              | 
 created          | timestamp(0) without time zone | not null  | plain    |              | 
 updated          | timestamp(0) without time zone |           | plain    |              | 
 curation         | integer                        |           | plain    |              | 
 hepval           | double precision               |           | plain    |              | 
 cardval          | double precision               |           | plain    |              | 
 nephval          | double precision               |           | plain    |              | 
 phosval          | double precision               |           | plain    |              | 
 patternCount     | double precision               |           | plain    |              | 
 ruleScore        | double precision               |           | plain    |              | 
 hepTermNormScore | double precision               |           | plain    |              | 
 hepTermVarScore  | double precision               |           | plain    |              | 
 svmConfidence    | double precision               |           | plain    |              | 
Indexes:
"ent_pkey" PRIMARY KEY, btree (id)
    "ent_cardval" btree (cardval)
    "ent_document_id" btree (document_id)
    "ent_heptermnormscore" btree ("hepTermNormScore")
    "ent_heptermvarscore" btree ("hepTermVarScore")
    "ent_hepval" btree (hepval)
    "ent_name" btree (name)
    "ent_nephval" btree (nephval)
    "ent_patterncount" btree ("patternCount")
    "ent_phosval" btree (phosval)
    "ent_qualifier" btree (qualifier)
    "ent_qualifier_name" btree (qualifier, name)
    "ent_rulescore" btree ("ruleScore")
    "ent_svm_confidence_index" btree ("svmConfidence")

And this are my current_settings

            name            |  current_setting   |        source        
----------------------------+--------------------+----------------------
 application_name           | psql               | client
 client_encoding            | UTF8               | client
 DateStyle                  | ISO, MDY           | configuration file
 default_text_search_config | pg_catalog.english | configuration file
 effective_cache_size       | 45000MB            | configuration file
 lc_messages                | en_US.UTF-8        | configuration file
 lc_monetary                | en_US.UTF-8        | configuration file
 lc_numeric                 | en_US.UTF-8        | configuration file
 lc_time                    | en_US.UTF-8        | configuration file
 listen_addresses           | *                  | configuration file
 log_timezone               | Europe/Madrid      | configuration file
 logging_collector          | on                 | configuration file
 maintenance_work_mem       | 4000MB             | configuration file
 max_connections            | 100                | configuration file
 max_stack_depth            | 2MB                | environment variable
 shared_buffers             | 10000MB            | configuration file
 TimeZone                   | Europe/Madrid      | configuration file
 work_mem                   | 32MB               | configuration file

The size  of the table is 41 GB and some statistics:
 relname             | rows_in_bytes |  num_rows   | number_of_indexes | unique | single_column | multi_column 
entity2document2               | 89 MB         | 9.33479e+07 |                14 | Y      |            13 |            1


I'm doing right now the CLUSTER on the table using the name+hepval multiple index as Venkata told me and will post you if it works. 
Anyway, even though the improvement is important, I'd like an increase of the performance. When the number of rows returned is high, the performance decreases too much.. 

If anyone have any idea...

Best regards,
Andrés




El Mar 12, 2014, a las 12:12 AM, Evgeny Shishkin escribió:

Hello,

new server with more ram will definitely help to keep your working set in memory.
But if you want your queries be fast on cold (on disk) data, then you need more/faster disks.

And work_mem = 1000MB is too much, better set to 32MB so you don’t get OOM Killer.
And may be slightly lower shared_buffers. 

On 11 Mar 2014, at 18:56, acanada <acanada@cnio.es> wrote:

Hello,

I cannot do explain (analyze, buffers) since I am on 8.3 postgres version.
I am migrating to the new server and upgrading it.
Once it is ready again I will post the explain query here.
The new disk is SATA disk with 5TB, raid 0 or 1...
lspci | grep -i raid
00:1f.2 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller (rev 05)

All database is 200GB and the table entity2document2 is 

x=> select pg_size_pretty(pg_relation_size('entity2document2'));
 pg_size_pretty 
----------------
 11 GB
(1 row)

x=> select pg_size_pretty(pg_total_relation_size('entity2document2'));
 pg_size_pretty 
----------------
 29 GB
(1 row)

The index of the name column:
x=> select pg_size_pretty(pg_relation_size('entity2document2_name'));
 pg_size_pretty 
----------------
 2550 MB
(1 row)


I am tunning the new server with this parameters...
shared_buffers = 15000MB
work_mem = 1000MB
maintenance_work_mem = 2000MB

Any other parameter that should be modified?

Thank you for your help!
Andrés


El Mar 10, 2014, a las 9:22 PM, desmodemone escribió:

Hello Andres,
                       with enable_bitmapscan=off;   could you do :

explain ( analyze , buffers ) select * from entity2document2  where name='ranitidine' ;

I think it's interesting to understand how much it's clustered the table  entity2document2.
infact the query extract 13512 rows in 79945.362 ms around 4 ms for row, and I suspect the table is not well clustered on that column, so every time the
process is asking for a different page of the table or the i/o system have some problem.

Moreover, another point it's : how much it's big ? the rows are arounf 94M , but how much it's big ?  it's important the average row length


Have a nice day

2014-03-06 15:45 GMT+01:00 acanada <acanada@cnio.es>:
Hello Mat,

Setting enable_bitmapscan to off doesn't really helps. It gets worse...

x=> SET enable_bitmapscan=off; 
SET
x=> explain analyze select * from (select * from entity2document2  where name='ranitidine' ) as a  order by a.hepval;
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=18789.21..18800.70 rows=4595 width=131) (actual time=79965.282..79966.657 rows=13512 loops=1)
   Sort Key: entity2document2.hepval
   Sort Method:  quicksort  Memory: 2301kB
   ->  Index Scan using entity2document2_name on entity2document2  (cost=0.00..18509.70 rows=4595 width=131) (actual time=67.507..79945.362 rows=13512 loops=1)
         Index Cond: ((name)::text = 'ranitidine'::text)
 Total runtime: 79967.705 ms
(6 rows)

Any other idea? 

Thank you very much for your help. Regards,
Andrés

El Mar 6, 2014, a las 2:11 PM, desmodemone escribió:


Il 05/mar/2014 00:36 "Venkata Balaji Nagothi" <vbnpgc@gmail.com> ha scritto:
>
> After looking at the distinct values, yes the composite Index on "name" and "hepval" is not recommended. That would worsen - its expected.
>
> We need to look for other possible work around. Please drop off the above Index. Let me see if i can drill further into this.
>
> Meanwhile - can you help us know the memory parameters (work_mem, temp_buffers etc) set ?
>
> Do you have any other processes effecting this query's performance ?
>
> Any info about your Disk, RAM, CPU would also help.
>
> Regards,
> Venkata Balaji N
>
> Fujitsu Australia
>
>
>
>
> Venkata Balaji N
>
> Sr. Database Administrator
> Fujitsu Australia
>
>
> On Tue, Mar 4, 2014 at 10:23 PM, acanada <acanada@cnio.es> wrote:
>>
>> Hello,
>>
>> I don't know if this helps to figure out what is the problem but after adding the multicolumn index on name and hepval, the performance is even worse (¿?).  Ten times worse...
>>
>> explain analyze select * from (select * from entity_compounddict2document  where name='progesterone') as a order by a.hepval;
>>                                                                          QUERY PLAN                                                                          
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Sort  (cost=422746.18..423143.94 rows=159104 width=133) (actual time=95769.674..95797.943 rows=138165 loops=1)
>>    Sort Key: entity_compounddict2document.hepval
>>    Sort Method:  quicksort  Memory: 25622kB
>>    ->  Bitmap Heap Scan on entity_compounddict2document  (cost=3501.01..408999.90 rows=159104 width=133) (actual time=70.789..95519.258 rows=138165 loops=1)
>>          Recheck Cond: ((name)::text = 'progesterone'::text)
>>          ->  Bitmap Index Scan on entity_compound2document_name  (cost=0.00..3461.23 rows=159104 width=0) (actual time=35.174..35.174 rows=138165 loops=1)
>>                Index Cond: ((name)::text = 'progesterone'::text)
>>  Total runtime: 95811.838 ms
>> (8 rows)
>>
>> Any ideas please?
>>
>> Thank you 
>> Andrés.
>>
>>
>>
>> El Mar 4, 2014, a las 12:28 AM, Venkata Balaji Nagothi escribió:
>>
>>> On Mon, Mar 3, 2014 at 9:17 PM, acanada <acanada@cnio.es> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Thankyou for your answer.
>>>> I have made more changes than a simple re-indexing recently. I have moved the sorting field to the table in order to avoid the join clause. Now the schema is very simple. The query only implies one table:
>>>>
>>>> x=> \d+ entity_compounddict2document;
>>>>                       Table "public.entity_compounddict2document"
>>>>       Column      |              Type              | Modifiers | Storage  | Description 
>>>> ------------------+--------------------------------+-----------+----------+-------------
>>>>  id               | integer                        | not null  | plain    | 
>>>>  document_id      | integer                        |           | plain    | 
>>>>  name             | character varying(255)         |           | extended | 
>>>>  qualifier        | character varying(255)         |           | extended | 
>>>>  tagMethod        | character varying(255)         |           | extended | 
>>>>  created          | timestamp(0) without time zone |           | plain    | 
>>>>  updated          | timestamp(0) without time zone |           | plain    | 
>>>>  curation         | integer                        |           | plain    | 
>>>>  hepval           | double precision               |           | plain    | 
>>>>  cardval          | double precision               |           | plain    | 
>>>>  nephval          | double precision               |           | plain    | 
>>>>  phosval          | double precision               |           | plain    | 
>>>>  patternCount     | double precision               |           | plain    | 
>>>>  ruleScore        | double precision               |           | plain    | 
>>>>  hepTermNormScore | double precision               |           | plain    | 
>>>>  hepTermVarScore  | double precision               |           | plain    | 
>>>> Indexes:
>>>>     "entity_compounddict2document_pkey" PRIMARY KEY, btree (id)
>>>>     "entity_compound2document_cardval" btree (cardval)
>>>>     "entity_compound2document_heptermnormscore" btree ("hepTermNormScore")
>>>>     "entity_compound2document_heptermvarscore" btree ("hepTermVarScore")
>>>>     "entity_compound2document_hepval" btree (hepval)
>>>>     "entity_compound2document_name" btree (name)
>>>>     "entity_compound2document_nephval" btree (nephval)
>>>>     "entity_compound2document_patterncount" btree ("patternCount")
>>>>     "entity_compound2document_phosval" btree (phosval)
>>>>     "entity_compound2document_rulescore" btree ("ruleScore")
>>>> Has OIDs: no
>>>>
>>>>            tablename            |                   indexname                                              |  num_rows    | table_size  | index_size | unique | number_of_scans | tuples_read | tuples_fetched 
>>>>  entity_compounddict2document   | entity_compound2document_cardval               | 5.42452e+07 | 6763 MB    | 1162 MB    | Y      |               0 |           0 |              0
>>>>  entity_compounddict2document   | entity_compound2document_heptermnormscore      | 5.42452e+07 | 6763 MB    | 1162 MB    | Y      |               0 |           0 |              0
>>>>  entity_compounddict2document   | entity_compound2document_heptermvarscore       | 5.42452e+07 | 6763 MB    | 1162 MB    | Y      |               0 |           0 |              0
>>>>  entity_compounddict2document   | entity_compound2document_hepval                | 5.42452e+07 | 6763 MB    | 1162 MB    | Y      |               0 |           0 |              0
>>>>  entity_compounddict2document   | entity_compound2document_name                  | 5.42452e+07 | 6763 MB    | 1505 MB    | Y      |              24 |      178680 |              0
>>>>  entity_compounddict2document   | entity_compound2document_nephval               | 5.42452e+07 | 6763 MB    | 1162 MB    | Y      |               0 |           0 |              0
>>>>  entity_compounddict2document   | entity_compound2document_patterncount          | 5.42452e+07 | 6763 MB    | 1162 MB    | Y      |               0 |           0 |              0
>>>>  entity_compounddict2document   | entity_compound2document_phosval               | 5.42452e+07 | 6763 MB    | 1162 MB    | Y      |               0 |           0 |              0
>>>>  entity_compounddict2document   | entity_compound2document_rulescore             | 5.42452e+07 | 6763 MB    | 1162 MB    | Y      |               0 |           0 |              0
>>>>  entity_compounddict2document   | entity_compounddict2document_pkey              | 5.42452e+07 | 6763 MB    | 1162 MB    | Y      |               0 |           0 |              0
>>>>
>>>> The table has aprox. 54,000,000 rows
>>>> There are no NULLs in hepval field and pg_settings haven't changed. I also have done "analyze" to this table.
>>>>
>>>> I have simplified the query and added the last advise that you told me:
>>>>
>>>> Query: 
>>>>
>>>>  explain analyze select * from (select * from entity_compounddict2document  where name='ranitidine') as a order by a.hepval;
>>>>                                                                       QUERY PLAN                                                                      
>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>  Sort  (cost=11060.50..11067.55 rows=2822 width=133) (actual time=32715.097..32716.488 rows=13512 loops=1)
>>>>    Sort Key: entity_compounddict2document.hepval
>>>>    Sort Method:  quicksort  Memory: 2301kB
>>>>    ->  Bitmap Heap Scan on entity_compounddict2document  (cost=73.82..10898.76 rows=2822 width=133) (actual time=6.034..32695.483 rows=13512 loops=1)
>>>>          Recheck Cond: ((name)::text = 'ranitidine'::text)
>>>>          ->  Bitmap Index Scan on entity_compound2document_name  (cost=0.00..73.12 rows=2822 width=0) (actual time=3.221..3.221 rows=13512 loops=1)
>>>>                Index Cond: ((name)::text = 'ranitidine'::text)
>>>>  Total runtime: 32717.548 ms
>>>>
>>>> Another query:
>>>> explain analyze select * from (select * from entity_compounddict2document  where name='progesterone' ) as a  order by a.hepval;
>>>>
>>>> QUERY PLAN
>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>  Sort  (cost=367879.25..368209.24 rows=131997 width=133) (actual time=9262.887..9287.046 rows=138165 loops=1)
>>>>    Sort Key: entity_compounddict2document.hepval
>>>>    Sort Method:  quicksort  Memory: 25622kB
>>>>    ->  Bitmap Heap Scan on entity_compounddict2document  (cost=2906.93..356652.81 rows=131997 width=133) (actual time=76.316..9038.485 rows=138165 loops=1)
>>>>          Recheck Cond: ((name)::text = 'progesterone'::text)
>>>>          ->  Bitmap Index Scan on entity_compound2document_name  (cost=0.00..2873.93 rows=131997 width=0) (actual time=40.913..40.913 rows=138165 loops=1)
>>>>                Index Cond: ((name)::text = 'progesterone'::text)
>>>>  Total runtime: 9296.815 ms
>>>>
>>>>
>>>> It has improved (I supose because of the lack of the join table) but still taking a lot of time... Anything I can do??
>>>>
>>>> Any help would be very appreciated. Thank you very much.
>>>
>>>
>>>
>>> Good to know performance has increased.
>>>
>>> "entity_compounddict2document" table goes through high INSERTS ?
>>>
>>> Can you help us know if the "helpval" column and "name" column have high duplicate values ? "n_distinct" value from pg_stats table would have that info. 
>>>
>>> Below could be a possible workaround -
>>>
>>> As mentioned earlier in this email, a composite Index on name and hepval column might help. If the table does not go through lot of INSERTS, then consider performing a CLUSTER on the table using the same INDEX.
>>>
>>> Other recommendations -
>>>
>>> Please drop off all the Non-primary key Indexes which have 0 scans / hits. This would harm the DB and the DB server whilst maintenance and DML operations.
>>>
>>> Regards,
>>> Venkata Balaji N
>>>
>>> Fujitsu Australia
>>
>>
>>
>> **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.
>>
>> **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.
>>
>


Hi I think the problem is th heap scan of the table , that the backend have to do because the btree to bitmap conversion becomes lossy. Try to disable the enable_bitmapscan for the current session and rerun the query.

Mat Dba




**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.

**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.





**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.

**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.




**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido.

**CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies.

pgsql-performance by date:

Previous
From: Szymon Guz
Date:
Subject: Re: question about partial index
Next
From: Evgeniy Shishkin
Date:
Subject: slave wal is ahead of master