Re: [SQL] sql performance and cache - Mailing list pgsql-performance

From Richard Huxton
Subject Re: [SQL] sql performance and cache
Date
Msg-id 200310111255.08777.dev@archonet.com
Whole thread Raw
In response to Re: [SQL] sql performance and cache  ("Chris Faulkner" <chrisf@oramap.com>)
List pgsql-performance
On Saturday 11 October 2003 12:12, Chris Faulkner wrote:
> Hello
>
> Thanks for the reply.
>
> > The short answer is that PG doesn't cache query results. The only
> > way it could
> > do so safely is to lock all tables you access to make sure that no other
> > process changes them. That would effectively turn PG into a
> > single-user DB in
> > short notice.
>
> I am not sure I agree with you. I have done similar things with Oracle and
> found that the second query will execute much more quickly than the first.
> It could be made to work in at least two scenarios

I'm guessing because the underlying rows and perhaps the plan are cached,
rather than the results. If you cached the results of the first query you'd
only have the max length, not your other data anyway.

[snip]

> > I assume these two queries are linked? If you rely on the max size being
> > unchanged and have more than one process using the database, you
> > should make
> > sure you lock the rows in question.
>
> I can rely on the max size remaining the same. As I mentioned above, the
> tables are entirely read only. The data will not be updated or deleted by
> anyone - I don't need to worry about that. The data will be updated en
> masse once every 3 months.

Hmm - might be worth adding a column for your array length and pre-calculating
if your data is basically static.

> > There is a discussion of the postgresql.conf file and how to tune it at:
> >   http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
>
> Thanks for that.
>
> > Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN
> > ANALYSE of either/both queries to the performance list. I'd drop
> > the sql list
> > when we're just talking about performance.
>
> To be honest, my main concern was about the cache. If the second one could
> use a cache amd execute in 2 seconds, that would be better that reducing
> the execution of each individual query by 30% or so.

I'm puzzled as to why they aren't both below 2 seconds to start with - you're
not dealing with that many rows.

> Thanks for the offer of help on this one. explain analyze gives me the same
> as the last message - did you want verbose ?

Nope, this is what I need. Verbose prints pages of stuff that only the
developers would be interested in. This one actually runs the query and gives
you a second set of figures showing times.

>  Nested Loop  (cost=0.00..147.11 rows=1 width=148) (actual
> time=84.00..12323.00 rows=67 loops=1)
>    Join Filter: ("inner"."GEOM_ID_OF_POINT" = "outer"."POINT_ID")
>    ->  Index Scan using gidx_oscar_point on oscar_point p
> (cost=0.00..61.34 rows=1 width=57) (actual time=0.00..9.00 rows=67 loops=1)
>          Index Cond: (wkb_geometry && 'SRID=-1;BOX3D(529540 179658.88
> 0,530540 1
> 80307.12 0)'::geometry)
>          Filter: ((("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
> 'TQ28SE'::bp
> char) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
> 'TQ38SW'::bpchar)) AND
>  ("FEAT_CODE" = 3500))

This next bit is the issue. It's joining on TILE_REF and then filtering by
your three static values. That's taking 67 * 150ms = 10.05secs

>    ->  Index Scan using idx_on_tile_ref on oscar_node n  (cost=0.00..85.74
> rows=2 width=91) (actual time=0.06..150.07 rows=4797 loops=67)
>          Index Cond: (n."TILE_REF" = "outer"."TILE_REF")
>          Filter: (("TILE_REF" = 'TQ27NE'::bpchar) OR ("TILE_REF" =
> 'TQ28SE'::bpchar) OR ("TILE_REF" = 'TQ37NW'::bpchar) OR ("TILE_REF" =
> 'TQ38SW'::bpchar))

Now if you look at the first set of figures, it's estimating 2 rows rather
than the 4797 you're actually getting. That's probably why it's chosen to
join then filter rather than the other way around.

I'd suggest the following:
1. VACUUM FULL on the table in question if you haven't done so since the last
update/reload. If you aren't doing this after every bulk upload, you probably
should be.
2. VACUUM ANALYSE/ANALYSE the table.
3. Check the tuning document I mentioned and make sure your settings are at
least reasonable. They don't have to be perfect - that last 10% takes
forever, but if they are badly wrong it can cripple you.
4. PG should now have up-to-date stats and a reasonable set of config
settings. If it's still getting its row estimates wrong, we'll have to look
at the statistics its got.

If we reach the statistics tinkering stage, it might be better to wait til
Monday if you can - more people on the list then.
--
  Richard Huxton
  Archonet Ltd

pgsql-performance by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: [SQL] sql performance and cache
Next
From: Nick Barr
Date:
Subject: Re: go for a script! / ex: PostgreSQL vs. MySQL