Thread: Interesting incosistent query timing

Interesting incosistent query timing

From
Ernest E Vogelsinger
Date:
Hi,

I am researching some interesting inconsistent query timing and hope some
of the gurus hanging out here might help me shed a light on this...

The table:
 Column |           Type           |                     Modifiers

--------+--------------------------+----------------------------------------
------------
 rid    | integer                  | not null default
nextval('rv2_mdata_id_seq'::text)
 pid    | integer                  |
 owid   | integer                  |
 ioid   | integer                  |
 dcid   | character varying        |
 dsid   | character varying        |
 drid   | integer                  |
 usg    | integer                  |
 idx    | character varying        |
 env    | integer                  |
 nxid   | integer                  |
 ci     | integer                  |
 cd     | numeric(21,6)            |
 cr     | real                     |
 cts    | timestamp with time zone |
 cst    | character varying        |
 ctx    | text                     |
 cbl    | oid                      |
 acl    | text                     |
Indexes: id_mdata_dictid,
         id_mdata_dictid_dec,
         id_mdata_dictid_int,
         id_mdata_dictid_real,
         id_mdata_dictid_string,
         id_mdata_dictid_text,
         id_mdata_dictid_timestamp,
         id_mdata_dowid,
         id_mdata_ioid,
         id_mdata_owid
Primary key: rv2_mdata_pkey

Index "id_mdata_dictid_string"
 Column |       Type
--------+-------------------
 dcid   | character varying
 dsid   | character varying
 drid   | integer
 nxid   | integer
 cst    | character varying
btree
Index predicate: ((usg & 16) = 16)



The query:
explain analyze verbose
select distinct t1.owid
        from rv2_mdata t1
        where t1.dcid='ADDR' and t1.dsid='AUXDICT' and t1.drid=110 and
t1.usg & 16 = 16
        and t1.nxid = 0
        and t1.cst ilike '%redist%'
        and t1.owid > 10
;

For the first time run it executes in 1.5 - 2 seconds. From the second
time, only 10 msec are needed for the same result:

Unique  (cost=3.84..3.84 rows=1 width=4) (actual time=1569.36..1569.39
rows=11 loops=1)
  ->  Sort  (cost=3.84..3.84 rows=1 width=4) (actual time=1569.36..1569.37
rows=11 loops=1)
        ->  Index Scan using id_mdata_dictid_string on rv2_mdata t1
(cost=0.00..3.83 rows=1 width=4) (actual time=17.02..1569.22 rows=11 loops=1)
Total runtime: 1569.50 msec


Unique  (cost=3.84..3.84 rows=1 width=4) (actual time=10.51..10.53 rows=11
loops=1)
  ->  Sort  (cost=3.84..3.84 rows=1 width=4) (actual time=10.51..10.51
rows=11 loops=1)
        ->  Index Scan using id_mdata_dictid_string on rv2_mdata t1
(cost=0.00..3.83 rows=1 width=4) (actual time=0.60..10.43 rows=11 loops=1)
Total runtime: 10.64 msec

If any of the "dcid", "dsid", or "drid" constraint values are altered, the
query starts again at 1.5 - 2 secs, then drops to 10.5 msec again.

Even after restarting PostgreSQL, the number is lower (~50 msec) than when
running for the first time.

I really would like to get a consistent timing here (the lower the better
of course) since these queries will happen quite often within our
application, and I need a consistent and predictable timing (this being a
core component).

This is postgresql 7.2.1 on RH72.

Any clues? Thanks for insights,


--
   >O     Ernest E. Vogelsinger
   (\)    ICQ #13394035
    ^     http://www.vogelsinger.at/



Re: [GENERAL] Interesting incosistent query timing

From
Manfred Koizar
Date:
On Tue, 17 Jun 2003 00:46:54 +0200, Ernest E Vogelsinger
<ernest@vogelsinger.at> wrote:
>For the first time run it executes in 1.5 - 2 seconds. From the second
>time, only 10 msec are needed for the same result

I'd call it inconsistent, if it were the other way round :-)  I guess
you are seeing the effects of disk caching.  Watch the drive LED
during the first run ...

Servus
 Manfred

Re: Interesting incosistent query timing

From
"Nikolaus Dilger"
Date:
Ernest,

My guess is that the second execution of the query is
shorter since the data blocks are cached in memory.
When you modify the data then it needs to be read again
from disk which is much slower than from memory.  The
short execution after restarting PostgreSQL seems to
indicate that your data is cached in the Linux buffer
cache.

The only strange thing seems to be that you have so few
rows.  Are you getting the data from a remote machine?
How many bytes does a single row have?  Are they really
large???

Regards,
Nikolaus

On Tue, 17 Jun 2003 00:46:54 +0200, Ernest E
Vogelsinger wrote:

>
> Hi,
>
> I am researching some interesting inconsistent query
> timing and hope some
> of the gurus hanging out here might help me shed a
> light on this...
>
> The table:
>  Column |           Type           |

>  Modifiers
>
>
--------+--------------------------+----------------------------------------
> ------------
>  rid    | integer                  | not null default
> nextval('rv2_mdata_id_seq'::text)
>  pid    | integer                  |
>  owid   | integer                  |
>  ioid   | integer                  |
>  dcid   | character varying        |
>  dsid   | character varying        |
>  drid   | integer                  |
>  usg    | integer                  |
>  idx    | character varying        |
>  env    | integer                  |
>  nxid   | integer                  |
>  ci     | integer                  |
>  cd     | numeric(21,6)            |
>  cr     | real                     |
>  cts    | timestamp with time zone |
>  cst    | character varying        |
>  ctx    | text                     |
>  cbl    | oid                      |
>  acl    | text                     |
> Indexes: id_mdata_dictid,
>          id_mdata_dictid_dec,
>          id_mdata_dictid_int,
>          id_mdata_dictid_real,
>          id_mdata_dictid_string,
>          id_mdata_dictid_text,
>          id_mdata_dictid_timestamp,
>          id_mdata_dowid,
>          id_mdata_ioid,
>          id_mdata_owid
> Primary key: rv2_mdata_pkey
>
> Index "id_mdata_dictid_string"
>  Column |       Type
> --------+-------------------
>  dcid   | character varying
>  dsid   | character varying
>  drid   | integer
>  nxid   | integer
>  cst    | character varying
> btree
> Index predicate: ((usg & 16) = 16)
>
>
>
> The query:
> explain analyze verbose
> select distinct t1.owid
>         from rv2_mdata t1
>         where t1.dcid='ADDR' and t1.dsid='AUXDICT' and
> t1.drid=110 and
> t1.usg & 16 = 16
>         and t1.nxid = 0
>         and t1.cst ilike '%redist%'
>         and t1.owid > 10
> ;
>
> For the first time run it executes in 1.5 - 2 seconds.
> From the second
> time, only 10 msec are needed for the same result:
>
> Unique  (cost=3.84..3.84 rows=1 width=4) (actual
> time=1569.36..1569.39
> rows=11 loops=1)
>   ->  Sort  (cost=3.84..3.84 rows=1 width=4) (actual
> time=1569.36..1569.37
> rows=11 loops=1)
>         ->  Index Scan using id_mdata_dictid_string on
> rv2_mdata t1
> (cost=0.00..3.83 rows=1 width=4) (actual
> time=17.02..1569.22 rows=11 loops=1)
> Total runtime: 1569.50 msec
>
>
> Unique  (cost=3.84..3.84 rows=1 width=4) (actual
> time=10.51..10.53 rows=11
> loops=1)
>   ->  Sort  (cost=3.84..3.84 rows=1 width=4) (actual
> time=10.51..10.51
> rows=11 loops=1)
>         ->  Index Scan using id_mdata_dictid_string on
> rv2_mdata t1
> (cost=0.00..3.83 rows=1 width=4) (actual
> time=0.60..10.43 rows=11 loops=1)
> Total runtime: 10.64 msec
>
> If any of the "dcid", "dsid", or "drid" constraint
> values are altered, the
> query starts again at 1.5 - 2 secs, then drops to 10.5
> msec again.
>
> Even after restarting PostgreSQL, the number is lower
> (~50 msec) than when
> running for the first time.
>
> I really would like to get a consistent timing here
> (the lower the better
> of course) since these queries will happen quite often
> within our
> application, and I need a consistent and predictable
> timing (this being a
> core component).
>
> This is postgresql 7.2.1 on RH72.
>
> Any clues? Thanks for insights,
>
>
> --
>    >O     Ernest E. Vogelsinger
>    (\)    ICQ #13394035
>     ^     http://www.vogelsinger.at/
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map
> settings

Re: Interesting incosistent query timing

From
Ernest E Vogelsinger
Date:
At 04:20 17.06.2003, Nikolaus Dilger said:
--------------------[snip]--------------------
>My guess is that the second execution of the query is
>shorter since the data blocks are cached in memory.
>When you modify the data then it needs to be read again
>from disk which is much slower than from memory.  The
>short execution after restarting PostgreSQL seems to
>indicate that your data is cached in the Linux buffer
>cache.
>
>The only strange thing seems to be that you have so few
>rows.  Are you getting the data from a remote machine?
>How many bytes does a single row have?  Are they really
>large???
--------------------[snip]--------------------

What exactly do you mean? This table is quite filled (2.3 million rows),
but the query results are correct.


--
   >O     Ernest E. Vogelsinger
   (\)    ICQ #13394035
    ^     http://www.vogelsinger.at/



Re: Interesting incosistent query timing

From
nikolaus@dilger.cc
Date:
Ernest,

Thanks for providing the additional information that
the table has 2.3 million rows.

See during the first execution you spend most of the
time scanning the index id_mdata_dictid_string.  And
since that one is quite large it takes 1500 msec to
read the index from disk into memory.

For the second execution you read the large index from
memory.  Therfore it takes only 10 msec.

Once you change the data you need to read from disk
again and the query takes a long time.

Regards,
Nikolaus

> For the first time run it executes in 1.5 - 2 seconds.
> From the second
> time, only 10 msec are needed for the same result:
>
> Unique  (cost=3.84..3.84 rows=1 width=4) (actual
> time=1569.36..1569.39
> rows=11 loops=1)
>   ->  Sort  (cost=3.84..3.84 rows=1 width=4) (actual
> time=1569.36..1569.37
> rows=11 loops=1)
>         ->  Index Scan using id_mdata_dictid_string on
> rv2_mdata t1
> (cost=0.00..3.83 rows=1 width=4) (actual
> time=17.02..1569.22 rows=11 loops=1)
> Total runtime: 1569.50 msec
>
>
> Unique  (cost=3.84..3.84 rows=1 width=4) (actual
> time=10.51..10.53 rows=11
> loops=1)
>   ->  Sort  (cost=3.84..3.84 rows=1 width=4) (actual
> time=10.51..10.51
> rows=11 loops=1)
>         ->  Index Scan using id_mdata_dictid_string on
> rv2_mdata t1
> (cost=0.00..3.83 rows=1 width=4) (actual
> time=0.60..10.43 rows=11 loops=1)
> Total runtime: 10.64 msec


On Tue, 17 Jun 2003 04:54:56 +0200, Ernest E
Vogelsinger wrote:

>
> At 04:20 17.06.2003, Nikolaus Dilger said:
> --------------------[snip]--------------------
> >My guess is that the second execution of the query is
> >shorter since the data blocks are cached in memory.
> >When you modify the data then it needs to be read
again
> >from disk which is much slower than from memory.  The
> >short execution after restarting PostgreSQL seems to
> >indicate that your data is cached in the Linux buffer
> >cache.
> >
> >The only strange thing seems to be that you have so
few
> >rows.  Are you getting the data from a remote
machine?
> >How many bytes does a single row have?  Are they
really
> >large???
> --------------------[snip]--------------------
>
> What exactly do you mean? This table is quite filled
> (2.3 million rows),
> but the query results are correct.
>
>
> --
>    >O     Ernest E. Vogelsinger
>    (\)    ICQ #13394035
>     ^     http://www.vogelsinger.at/
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>
> http://www.postgresql.org/docs/faqs/FAQ.html

Re: Interesting incosistent query timing

From
Ernest E Vogelsinger
Date:
At 00:45 18.06.2003, nikolaus@dilger.cc said:
--------------------[snip]--------------------
>Thanks for providing the additional information that
>the table has 2.3 million rows.
>
>See during the first execution you spend most of the
>time scanning the index id_mdata_dictid_string.  And
>since that one is quite large it takes 1500 msec to
>read the index from disk into memory.
>
>For the second execution you read the large index from
>memory.  Therfore it takes only 10 msec.
>
>Once you change the data you need to read from disk
>again and the query takes a long time.
--------------------[snip]--------------------

I came to the same conclusion - I installed a cron script that performs a
select against that index on a regular basis (3 minutes). After that even
the most complex queries against this huge table go like whoosssshhh ;-)

Would be interesting what one could do to _not_ have to take this basically
clumsy approach...


--
   >O     Ernest E. Vogelsinger
   (\)    ICQ #13394035
    ^     http://www.vogelsinger.at/



Re: Interesting incosistent query timing

From
Tom Lane
Date:
Ernest E Vogelsinger <ernest@vogelsinger.at> writes:
> I came to the same conclusion - I installed a cron script that performs a
> select against that index on a regular basis (3 minutes). After that even
> the most complex queries against this huge table go like whoosssshhh ;-)
> Would be interesting what one could do to _not_ have to take this basically
> clumsy approach...

Seems like your kernel is falling down on the job: if those files are
the most heavily used ones on the machine, it should be keeping them in
disk cache without such prompting.

If they are not all that heavily used, then you are basically slowing
everything else down in order to speed up these queries (because you're
starving everything else for disk cache).  Which may be a reasonable
tradeoff in your situation, but be aware of what you're doing.

The best compromise may be to buy more RAM ;-)

            regards, tom lane