Thread: Performance issues when the number of records are around 10 Million

Performance issues when the number of records are around 10 Million

From
venu madhav
Date:
Hi all,
       In my database application, I've a table whose records can
reach 10M and insertions can happen at a faster rate like 100
insertions per second in the peak times. I configured postgres to do
auto vacuum on hourly basis. I have frontend GUI application in CGI
which displays the data from the database.
        When I try to get the last twenty records from the database,
it takes around 10-15 mins to complete the operation.This is the query
which is used:

select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
e.wifi_addr_2, e.view_status, bssid  FROM event e, signature s WHERE
s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND
e.timestamp < '1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21
offset 10539780;

Can any one suggest me a better solution to improve the performance.
Please let me know if you've any further queries.


Thank you,
Venu

Re: Performance issues when the number of records are around 10 Million

From
Sergey Konoplev
Date:
On 11 May 2010 10:18, venu madhav <venutaurus539@gmail.com> wrote:
> select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
> e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
> e.wifi_addr_2, e.view_status, bssid  FROM event e, signature s WHERE
> s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND
> e.timestamp < '1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21
> offset 10539780;
>
> Can any one suggest me a better solution to improve the performance.
> Please let me know if you've any further queries.

1. Avoid using large OFFSETs. Do instead "... ORDER BY e.cid, e.cid LIMIT 21;"
2. What "EXPLAIN SELECT ..." shows?
3. What "\d event" prints?

--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com /
Linkedin: http://ru.linkedin.com/in/grayhemp /
JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp / ICQ: 29353802

Re: Performance issues when the number of records are around 10 Million

From
"A. Kretschmer"
Date:
In response to venu madhav :
> Hi all,
>        In my database application, I've a table whose records can
> reach 10M and insertions can happen at a faster rate like 100
> insertions per second in the peak times. I configured postgres to do
> auto vacuum on hourly basis. I have frontend GUI application in CGI
> which displays the data from the database.
>         When I try to get the last twenty records from the database,
> it takes around 10-15 mins to complete the operation.This is the query
> which is used:
>
> select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
> e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
> e.wifi_addr_2, e.view_status, bssid  FROM event e, signature s WHERE
> s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND
> e.timestamp < '1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21
> offset 10539780;

First, show us the table-definition for both tables.
Secondly the output generated from EXPLAIN ANALYSE <your query>

I'm surprised about the "e.timestamp >= '1270449180'", is this a
TIMESTAMP-column?

And, to retrieve the last twenty records you should write:

ORDER BY ts DESC LIMIT 20


With a proper index on this column this should force an index-scan.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

Re: Performance issues when the number of records are around 10 Million

From
Brian Modra
Date:
On 11/05/2010, Sergey Konoplev <gray.ru@gmail.com> wrote:
> On 11 May 2010 10:18, venu madhav <venutaurus539@gmail.com> wrote:
>> select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
>> e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
>> e.wifi_addr_2, e.view_status, bssid  FROM event e, signature s WHERE
>> s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND
>> e.timestamp < '1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21
>> offset 10539780;
>>
>> Can any one suggest me a better solution to improve the performance.
>> Please let me know if you've any further queries.
>
> 1. Avoid using large OFFSETs. Do instead "... ORDER BY e.cid, e.cid LIMIT
> 21;"
> 2. What "EXPLAIN SELECT ..." shows?
> 3. What "\d event" prints?
>
> --
> Sergey Konoplev
>
> Blog: http://gray-hemp.blogspot.com /
> Linkedin: http://ru.linkedin.com/in/grayhemp /
> JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp / ICQ: 29353802
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Have you checked if the indexes don't have lots of dead references?
Try to create new indexes, and then delete the old indexes (or just
use reindex if this is not an online database in production).

--
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

Re: Performance issues when the number of records are around 10 Million

From
venu madhav
Date:
On May 11, 12:03 pm, andreas.kretsch...@schollglas.com ("A.
Kretschmer") wrote:
> In response to venu madhav :
>
>
>
> > Hi all,
> >        In my database application, I've a table whose records can
> > reach 10M and insertions can happen at a faster rate like 100
> > insertions per second in the peak times. I configured postgres to do
> > auto vacuum on hourly basis. I have frontend GUI application in CGI
> > which displays the data from the database.
> >         When I try to get the last twenty records from the database,
> > it takes around 10-15 mins to complete the operation.This is the query
> > which is used:
>
> > select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
> > e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
> > e.wifi_addr_2, e.view_status, bssid  FROM event e, signature s WHERE
> > s.sig_id = e.signature   AND e.timestamp >= '1270449180' AND
> > e.timestamp < '1273473180'  ORDER BY e.cid DESC,  e.cid DESC limit 21
> > offset 10539780;
>
> First, show us the table-definition for both tables.
> Secondly the output generated from EXPLAIN ANALYSE <your query>
[Venu Madhav]
-------------------------------- event schema
-------------------------------------------------------------
snort=# \d event;
                  Table "public.event"
       Column       |         Type          | Modifiers
--------------------+-----------------------+-----------
 sid                | integer               | not null
 cid                | bigint                | not null
 sig_name           | character varying(80) | not null
 signature          | integer               | not null
 sig_class          | character varying(80) | not null
 sig_priority       | bigint                |
 timestamp          | bigint                | not null
 sniff_ip           | character varying(16) |
 sniff_channel      | smallint              |
 bssid              | character varying(18) |
 view_status        | smallint              |
 wifi_ver           | smallint              | not null
 wifi_type          | smallint              | not null
 wifi_stype         | smallint              | not null
 wifi_other_fc_bits | smallint              | not null
 wifi_dur_id        | integer               | not null
 wifi_addr_1        | character varying(18) |
 wifi_addr_2        | character varying(18) |
 wifi_addr_3        | character varying(18) |
 wifi_addr_4        | character varying(18) |
 wifi_seq_ctrl      | integer               |
Indexes:
    "event_pkey" PRIMARY KEY, btree (cid)
    "cid_idx" btree (cid)
    "signature_idx" btree (signature)
    "timestamp_idx" btree ("timestamp")
    "wifi_addr_1_idx" btree (wifi_addr_1)
    "wifi_addr_2_idx" btree (wifi_addr_2)
Foreign-key constraints:
    "event_fkey_sid" FOREIGN KEY (sid) REFERENCES sensor(sid) ON
UPDATE CASCADE
    "event_fkey_signature" FOREIGN KEY (signature) REFERENCES
signature(sig_id) ON UPDATE CASCADE

----------------------------------------------- schema of signature
----------------------------------
snort=# \d signature;
                                     Table "public.signature"
    Column    |         Type          |
Modifiers
--------------+-----------------------
+------------------------------------------------------------
 sig_id       | integer               | not null default
nextval('signature_sig_id_seq'::regclass)
 sig_name     | character varying(80) | not null
 sig_class    | character varying(80) | not null
 sig_priority | bigint                |
 sig_rev      | bigint                |
 sig_sid      | bigint                |
 sig_config   | text                  |
Indexes:
    "signature_pkey" PRIMARY KEY, btree (sig_id)
    "sig_class_idx" btree (sig_class)
    "sig_name_idx" btree (sig_name)

-------------------------- Explain Analyse of the query
-------------------------------------------
nort=# EXPLAIN ANALYZE select e.cid, timestamp, s.sig_class,
s.sig_priority, s.sig_name, e.sniff_ip, e.sniff_channel, s.sig_config,
e.wifi_addr_1,  e.wifi_addr_2, e.view_status, bssid  FROM event e,
signature s WHERE s.sig_id = e.signature   AND e.timestamp >=
'1270449180' AND e.timestamp < '1273473180'  ORDER BY e.cid DESC,
e.cid DESC limit 21 offset 10539780;
                                                                 QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=7885743.98..7885743.98 rows=1 width=287) (actual
time=1462193.060..1462193.083 rows=14 loops=1)
   ->  Sort  (cost=7859399.66..7885743.98 rows=10537727 width=287)
(actual time=1349648.207..1456496.334 rows=10539794 loops=1)
         Sort Key: e.cid
         ->  Hash Join  (cost=2.44..645448.31 rows=10537727 width=287)
(actual time=0.182..139745.001 rows=10539794 loops=1)
               Hash Cond: ("outer".signature = "inner".sig_id)
               ->  Seq Scan on event e  (cost=0.00..487379.97
rows=10537727 width=104) (actual time=0.012..121595.257 rows=10539794
loops=1)
                     Filter: (("timestamp" >= 1270449180::bigint) AND
("timestamp" < 1273473180::bigint))
               ->  Hash  (cost=2.35..2.35 rows=35 width=191) (actual
time=0.097..0.097 rows=36 loops=1)
                     ->  Seq Scan on signature s  (cost=0.00..2.35
rows=35 width=191) (actual time=0.005..0.045 rows=36 loops=1)
 Total runtime: 1463829.145 ms
(10 rows)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> I'm surprised about the "e.timestamp >= '1270449180'", is this a
> TIMESTAMP-column?
[Venu Madhav] Yes, it is timestamp in epoch time.
>
> And, to retrieve the last twenty records you should write:
>
> ORDER BY ts DESC LIMIT 20
>
> With a proper index on this column this should force an index-scan.
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
> GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general