Thread: How to evaluate "explain analyze" correctly soon after "explain" for the same statement ?

Hi all:

I have one question about the cache clearing.

 

If I use the following soon after database startup(or first time I use it):

 

postgres=#  explain analyze select id,deptno from gaotab where id=200;

                                                        QUERY PLAN             

                                          

--------------------------------------------------------------------------------

-------------------------------------------

 Index Only Scan using idx_id_dept on gaotab  (cost=0.00..8.27 rows=1 width=8) (

actual time=30.912..30.915 rows=1 loops=1)

   Index Cond: (id = 200)

   Heap Fetches: 1

 Total runtime: 47.390 ms

(4 rows)

 

postgres=#

 

The result is: the above explain analyze got a total runtime of 47 ms.

 

But If I  restart the database again, and then execute the following:

 

postgres=# explain select id,deptno from gaotab where id=200;

                                  QUERY PLAN                                  

-------------------------------------------------------------------------------

 Index Only Scan using idx_id_dept on gaotab  (cost=0.00..8.27 rows=1 width=8)

   Index Cond: (id = 200)

(2 rows)

postgres=# explain analyze select id,deptno from gaotab where id=200;

                                                       QUERY PLAN              

                                        

--------------------------------------------------------------------------------

-----------------------------------------

 Index Only Scan using idx_id_dept on gaotab  (cost=0.00..8.27 rows=1 width=8) (

actual time=0.052..0.053 rows=1 loops=1)

   Index Cond: (id = 200)

   Heap Fetches: 1

 Total runtime: 0.074 ms

(4 rows)

 

This time I got the total runtime of 0.074ms, obviously the  explain analyze benefit from the explain statement.

 

It might not be a big problem in a small system.

But when in a production environment,  When I  want to use explain  and  then , soon use explain analyze for the same statement,

How can I avoid the influence of cache and get the right answer for evaluating purpose? 

 

It is not a good idea to restart the database again and again I think.

I wonder is there any method of clearing cache or even clear only a particular part of the cache?

 

In my test environment, I can get the following:

 

postgres=# show seq_page_cost;           

 seq_page_cost            

---------------           

1           

(1 row)           

           

postgres=# show cpu_tuple_cost;           

 cpu_tuple_cost            

----------------           

0.01           

(1 row)           

 

postgres=# show cpu_operator_cost;           

 cpu_operator_cost            

-------------------           

0.0025           

(1 row) 

 

And my table is like that

 
postgres=# analyze;    
ANALYZE   
postgres=# select a.relpages, a.reltuples, a.relfilenode,a.reltype,b.typname from pg_class a, pg_type b where a.relname like 'gaotab%' and a.reltype=b.oid;   relpages | reltuples | relfilenode | reltype | typname    
----------+-----------+-------------+---------+---------          7 |      1000 |       16396 |   16386 | gaotab    
(1 row)   
 

Thanks in advance.

 

高健 wrote:
> I have one question about the cache clearing.
> 
> If I use the following soon after database startup(or first time I use it):
> 
> postgres=#  explain analyze select id,deptno from gaotab where id=200;

> The result is: the above explain analyze got a total runtime of 47 ms.
>
> But If I  restart the database again, and then execute the following:
> 
> postgres=# explain select id,deptno from gaotab where id=200;

> postgres=# explain analyze select id,deptno from gaotab where id=200;

> This time I got the total runtime of 0.074ms, obviously the  explain analyze benefit from the explain
> statement.

The EXPLAIN will not have a noticable effect on the performance
of the EXPLAIN ANALYZE.

If you actually restarted the PostgreSQL server like you said,
then the difference must be that the file is cached in the file
system cache.

You can verify that be omitting the EXPLAIN in the second run.

> It might not be a big problem in a small system.
> 
> But when in a production environment,  When I  want to use explain  and  then , soon use explain
> analyze for the same statement,
> 
> How can I avoid the influence of cache and get the right answer for evaluating purpose?
> 
> 
> 
> It is not a good idea to restart the database again and again I think.
> 
> I wonder is there any method of clearing cache or even clear only a particular part of the cache?

The only way of clearing the cache in database shared memory is to
restart the server.

That's just a simple "pg_ctl restart -m fast -D <datadir>".
I think that's simple enough for tests.

You should also empty the file system cache.
On recent Linux systems that would be
  sync; echo 3 > /proc/sys/vm/drop_caches
You'd have to consult the documentation for other OSs.

Yours,
Laurenz Albe

=?UTF-8?B?6auY5YGl?= <luckyjackgao@gmail.com> writes:
> It might not be a big problem in a small system.
> But when in a production environment,  When I  want to use explain  and
> then , soon use explain analyze for the same statement,
> How can I avoid the influence of cache and get the right answer for
> evaluating purpose?

I think this question is based on a false premise.  Why do you feel that
the behavior with cold caches is "the right answer", and not the behavior
with warm caches?  A short-duration query like this one is not going to
be interesting at all for performance unless it's executed quite a lot,
and if it's executed quite a lot then the warm-cache result ought to be
the more representative one.

In general, trying to tune for cold-cache cases seems backwards to me.
It's much more productive to try to ensure that the caches are warm.

            regards, tom lane


Hi tom

At frist I have thought that the database parsed my explain statement, 
so the pre-compiled execution plan will be re-used , which made the statement's second run quick.

I think that what you said is right. 

Thank you

2012/11/7 Tom Lane <tgl@sss.pgh.pa.us>
高健 <luckyjackgao@gmail.com> writes:
> It might not be a big problem in a small system.
> But when in a production environment,  When I  want to use explain  and
> then , soon use explain analyze for the same statement,
> How can I avoid the influence of cache and get the right answer for
> evaluating purpose?

I think this question is based on a false premise.  Why do you feel that
the behavior with cold caches is "the right answer", and not the behavior
with warm caches?  A short-duration query like this one is not going to
be interesting at all for performance unless it's executed quite a lot,
and if it's executed quite a lot then the warm-cache result ought to be
the more representative one.

In general, trying to tune for cold-cache cases seems backwards to me.
It's much more productive to try to ensure that the caches are warm.

                        regards, tom lane