Re: Performance issues when the number of records are around 10 Million - Mailing list pgsql-performance

From venu madhav
Subject Re: Performance issues when the number of records are around 10 Million
Date
Msg-id AANLkTimH6EzAourWQThioZWMcGN3dA123kumX23QDNe-@mail.gmail.com
Whole thread Raw
In response to Re: Performance issues when the number of records are around 10 Million  (Shrirang Chitnis <Shrirang.Chitnis@hovservices.com>)
List pgsql-performance


On Wed, May 12, 2010 at 3:22 AM, Shrirang Chitnis <Shrirang.Chitnis@hovservices.com> wrote:
Venu,

For starters,

1) You have used the e.cid twice in ORDER BY clause.
[Venu] Actually the second cid acts as a secondary sort order if any other column in the table is used for sorting. In the query since the primary sorting key was also  cid, we are seeing it twice. I can remove it.
2) If you want last twenty records in the table matching the criteria of timestamp, why do you need the offset?
[Venu] It is part of an UI  application where a user can ask for date between any dates. It has the options to browse through the data retrieved between those intervals.
3) Do you have indexes on sig_id, signature and timestamp fields?
[Venu] Yes, I do have indexes on those three.
 
If you do not get a good response after that, please post the EXPLAIN ANALYZE for the query.
snort=# EXPLAIN ANALYZE select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name, e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,  e.wifi_addr_2, e.view_status, bssid  FROM event e, signature s WHERE s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND e.timestamp < '1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21 offset 10539780;
                                                                 QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=7885743.98..7885743.98 rows=1 width=287) (actual time=1462193.060..1462193.083 rows=14 loops=1)
   ->  Sort  (cost=7859399.66..7885743.98 rows=10537727 width=287) (actual time=1349648.207..1456496.334 rows=10539794 loops=1)
         Sort Key: e.cid
         ->  Hash Join  (cost=2.44..645448.31 rows=10537727 width=287) (actual time=0.182..139745.001 rows=10539794 loops=1)
               Hash Cond: ("outer".signature = "inner".sig_id)
               ->  Seq Scan on event e  (cost=0.00..487379.97 rows=10537727 width=104) (actual time=0.012..121595.257 rows=10539794 loops=1)
                     Filter: (("timestamp" >= 1270449180::bigint) AND ("timestamp" < 1273473180::bigint))
               ->  Hash  (cost=2.35..2.35 rows=35 width=191) (actual time=0.097..0.097 rows=36 loops=1)
                     ->  Seq Scan on signature s  (cost=0.00..2.35 rows=35 width=191) (actual time=0.005..0.045 rows=36 loops=1)
 Total runtime: 1463829.145 ms
(10 rows)
Thank you,
Venu Madhav.

Thanks,

Shrirang Chitnis
Sr. Manager, Applications Development
HOV Services
Office: (866) 808-0935 Ext: 39210
shrirang.chitnis@hovservices.com
www.hovservices.com


The information contained in this message, including any attachments, is attorney privileged and/or confidential information intended only for the use of the individual or entity named as addressee.  The review, dissemination, distribution or copying of this communication by or to anyone other than the intended addressee is strictly prohibited.  If you have received this communication in error, please immediately notify the sender by replying to the message and destroy all copies of the original message.

From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of venu madhav
Sent: Tuesday, May 11, 2010 2:18 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Performance issues when the number of records are around 10 Million

Hi all,
     In my database application, I've a table whose records can reach 10M and insertions can happen at a faster rate like 100 insertions per second in the peak times. I configured postgres to do auto vacuum on hourly basis. I have frontend GUI application in CGI which displays the data from the database. When I try to get the last twenty records from the database, it takes around 10-15  mins to complete the operation.This is the query which is used:

select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name, e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
e.wifi_addr_2, e.view_status, bssid  FROM event e, signature s WHERE s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND e.timestamp < '1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21 offset 10539780;

Can any one suggest me a better solution to improve the performance.

Please let me know if you've any further queries.


Thank you,
Venu

pgsql-performance by date:

Previous
From: venu madhav
Date:
Subject: Re: Performance issues when the number of records are around 10 Million
Next
From: venu madhav
Date:
Subject: Re: Performance issues when the number of records are around 10 Million