Home > mailing lists

Re: SELECT is slow on smaller table? - Mailing list pgsql-performance

From	Ao Jianwang
Subject	Re: SELECT is slow on smaller table?
Date	March 1, 2013 00:30:15
Msg-id	CAAb+5fXO+n6N8Heg4ip97HOb_ZDG=tHgQKcSFSiXEoYNJL9ScA@mail.gmail.com Whole thread
In response to	Re: SELECT is slow on smaller table? (Julien Cigar <jcigar@ulb.ac.be>)
List	pgsql-performance

Tree view

Thanks Julien very much.

Two strange behaviors I found:

1) Even I restart the machine and restart the PostgreSQL, then I execute the query, i still see the shared_hit. It seems when start PG, i will automatically load the data in the cache of the last time?

2) After I rerun the query, the time for the smaller data is about 19ms, while the time for the bigger data is about 17ms. And the trend is the time for bigger data is always faster than the smaller data for about 1 to 2 ms

Any suggestions? thanks very much.

On Thu, Feb 28, 2013 at 11:19 PM, Julien Cigar <jcigar@ulb.ac.be> wrote:

On 02/28/2013 16:11, Ao Jianwang wrote:
Hi,

Does any one can tell me why the same query runs against on smaller data is slower than bigger table. thanks very much.

I am using PostgreSQL9.1.8.

t_apps_1 and t_estimate_1 are about 300M respectively, while *_list_1 about 10M more or less. According to the result, it need to read a lot of blocks(112) from disk.
explain (ANALYZE ON, BUFFERS ON, verbose on
) SELECT e.t_id, SUM(e.estimate) as est
FROM
t_estimate_list_1 l,
t_apps_list_1 rl,
t_apps_1 r,
t_estimate_1 e
WHERE
l.id = rl.dsf_id and
l.date = '2012-07-01' and
l.fed_id = 202 and
l.st_id = 143464 and
rl.cat_id = 12201 and
l.id = e.list_id and
rl.id = r.list_id and
r.t_id = e.t_id
GROUP BY e.t_id;
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2529.91..2530.06 rows=15 width=8) (actual time=1041.391..1041.409 rows=97 loops=1)
Buffers: shared hit=304 read=112
-> Nested Loop (cost=0.00..2529.84 rows=15 width=8) (actual time=96.752..1041.145 rows=97 loops=1)
Buffers: shared hit=304 read=112
-> Nested Loop (cost=0.00..312.60 rows=242 width=12) (actual time=62.035..70.239 rows=97 loops=1)
Buffers: shared hit=18 read=10
-> Nested Loop (cost=0.00..16.56 rows=1 width=12) (actual time=19.520..19.521 rows=1 loops=1)
Buffers: shared hit=3 read=6
-> Index Scan using t_estimate_list_1_unique on t_estimate_list_1 l (cost=0.00..8.27 rows=1 width=4) (actual time=11.175..11.176 rows=1 loops=1)
Index Cond: ((date = '2012-07-01'::date) AND (st_id = 143464) AND (fed_id = 202))
Buffers: shared hit=2 read=4
-> Index Scan using t_apps_list_1_unique on t_apps_list_1 rl (cost=0.00..8.28 rows=1 width=8) (actual time=8.339..8.339 rows=1 loops=1)
Index Cond: ((dsf_id = l.id) AND (cat_id = 12201))
Buffers: shared hit=1 read=2
-> Index Scan using t_apps_1_pkey on t_apps_1 r (cost=0.00..288.56 rows=598 width=8) (actual time=42.513..50.676 rows=97 loops=1)
Index Cond: (list_id = rl.id)
Buffers: shared hit=15 read=4
-> Index Scan using t_estimate_1_pkey on t_estimate_1 e (cost=0.00..9.15 rows=1 width=12) (actual time=10.006..10.007 rows=1 loops=97)
Index Cond: ((list_id = l.id) AND (t_id = r.t_id))
Buffers: shared hit=286 read=102
Total runtime: 1041.511 ms
(21 rows)

The table *_30 are about 30 times larger than *_1 in the above SQL. According to the result, it need to read a lot of blocks(22) from disk.
explain (ANALYZE ON, BUFFERS ON
) SELECT e.t_id, SUM(e.estimate) as est
FROM
t_estimate_list_30 l,
t_apps_list_30 rl,
t_apps_30 r,
t_estimate_30 e
WHERE
l.id = rl.dsf_id and
l.date = '2012-07-01' and
l.fed_id = 202 and
l.st_id = 143464 and
rl.cat_id = 12201 and
l.id = e.list_id and
rl.id = r.list_id and
r.t_id = e.t_id
GROUP BY e.t_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3494.89..3495.04 rows=15 width=8) (actual time=160.612..160.632 rows=97 loops=1)
Buffers: shared hit=493 read=22
-> Nested Loop (cost=0.00..3494.81 rows=15 width=8) (actual time=151.183..160.533 rows=97 loops=1)
Buffers: shared hit=493 read=22
-> Nested Loop (cost=0.00..431.42 rows=240 width=12) (actual time=105.810..106.597 rows=97 loops=1)
Buffers: shared hit=20 read=10
-> Nested Loop (cost=0.00..16.58 rows=1 width=12) (actual time=52.804..52.805 rows=1 loops=1)
Buffers: shared hit=4 read=6
-> Index Scan using t_estimate_list_5_unique on t_estimate_list_5 l (cost=0.00..8.27 rows=1 width=4) (actual time=19.846..19.846 rows=1 loops=1)
Index Cond: ((date = '2012-07-01'::date) AND (st_id = 143464) AND (fed_id = 202))
Buffers: shared hit=2 read=4
-> Index Scan using t_apps_list_5_unique on t_apps_list_5 rl (cost=0.00..8.30 rows=1 width=8) (actual time=32.951..32.952 rows=1 loops=1)
Index Cond: ((dsf_id = l.id) AND (cat_id = 12201))
Buffers: shared hit=2 read=2
-> Index Scan using t_apps_5_pkey on t_apps_5 r (cost=0.00..393.68 rows=1693 width=8) (actual time=53.004..53.755 rows=97 loops=1)
Index Cond: (list_id = rl.id)
Buffers: shared hit=16 read=4
-> Index Scan using t_estimate_5_pkey on t_estimate_5 e (cost=0.00..12.75 rows=1 width=12) (actual time=0.555..0.555 rows=1 loops=97)
Index Cond: ((list_id = l.id) AND (t_id = r.t_id))
Buffers: shared hit=473 read=12
Total runtime: 160.729 ms
(21 rows)

Probably that somes pages have to be loaded in memory ...
It should be faster if you re-run the same query just after

-- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.

pgsql-performance by date:

From: "Carlo Stonebanks"
Date: 28 February 2013, 20:14:10
Subject: Re: Are bitmap index scans slow to start?

From: Markus Hervén
Date: 01 March 2013, 06:32:12
Subject: Processing of subqueries in union

Re: SELECT is slow on smaller table? - Mailing list pgsql-performance

Previous

Next