Re: Query seem to slow if table have more than 200 million rows - Mailing list pgsql-performance

From Ahmad Fajar
Subject Re: Query seem to slow if table have more than 200 million rows
Date
Msg-id SVONERLVbNDN4CxpAOA00000061@ki-communication.com
Whole thread Raw
In response to Re: Query seem to slow if table have more than 200 million rows  ("Qingqing Zhou" <zhouqq@cs.toronto.edu>)
List pgsql-performance
Hi Qingqing,

I don't know whether the statistic got is bad or good, this is the
statistic:
scooby=# select a.relid, a.relname, b.indexrelid, b.indexrelname,
c.idx_scan, c.idx_tup_read, c.idx_tup_fetch,
scooby-# a.heap_blks_read, a.heap_blks_hit, a.idx_blks_read, a.idx_blks_hit,
scooby-# a.toast_blks_read, a.toast_blks_hit, a.tidx_blks_read,
a.tidx_blks_hit, b.idx_blks_read, b.idx_blks_hit
scooby-# from pg_statio_user_tables a, pg_statio_user_indexes b,
pg_stat_all_indexes c
scooby-# where a.relid=b.relid and a.relid=c.relid and
b.indexrelid=c.indexrelid and a.relname=b.relname and
scooby-# a.relname=c.relname and a.relname='fti_dict1';
  relid   |  relname  | indexrelid | indexrelname | idx_scan | idx_tup_read
| idx_tup_fetch | heap_blks_read | heap_blks_hit | idx
_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit | idx_blks_read | idx_blks_hit
----------+-----------+------------+--------------+----------+--------------
+---------------+----------------+---------------+----
-----------+--------------+-----------------+----------------+--------------
--+---------------+---------------+--------------
 22880226 | fti_dict1 |   22880231 | idx_dict3    |        0 |            0
|             0 |              0 |             0 |
         0 |            0 |                 |                |
|               |             0 |            0
 22880226 | fti_dict1 |   22880230 | idx_dict2    |        7 |       592799
|        592799 |              0 |             0 |
         0 |            0 |                 |                |
|               |             0 |            0
 22880226 | fti_dict1 |   22880229 | idx_dict1    |        0 |            0
|             0 |              0 |             0 |
         0 |            0 |                 |                |
|               |             0 |            0
(3 rows)

I have try several time the query below with different keyword, but I just
got idx_tup_read and idx_tup_fetch changed, others keep zero.
The Index are:
Ids (Idx_dict1),
keywords (idx_dict2 varchar_ops),
keywords (idx_dict3 varchar_pattern_ops) ==> I use this index for query ...
keywords like 'blabla%', just for testing purpose

Regards,
ahmad fajar

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Qingqing Zhou
Sent: Selasa, 27 September 2005 8:43
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query seem to slow if table have more than 200
million rows


""Ahmad Fajar"" <gendowo@konphalindo.or.id> wrote
>
> Select ids, keywords from dict where keywords='blabla' ('blabla' is a
> single
> word);
>
> The table have 200 million rows, I have index the keywords field. On the
> first time my query seem to slow to get the result, about 15-60 sec to get
> the result. But if I repeat the query I will get fast result. My question
> is
> why on the first time the query seem very slow.
>
> Table structure is quite simple:
>
> Ids bigint, keywords varchar(150), weight varchar(1), dpos int.
>

The first slowness is obviously caused by disk IOs. The second time is
faster because all data pages it requires are already in buffer pool. 200
million rows is not a problem for btree index, even if your client tool
appends some spaces to your keywords at your insertion time, the ideal btree

is 5 to 6 layers high at most. Can you show the iostats of index from your
statistics view?
http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-S
TATS-VIEWS

Regards,
Qingqing



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


pgsql-performance by date:

Previous
From: Andrey Repko
Date:
Subject: Index not used on group by
Next
From: Gnanavel S
Date:
Subject: PostgreSQL overall design