Re: Sanity checking big select performance - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Sanity checking big select performance
Date
Msg-id 54500DD7.9040900@fuzzy.cz
Whole thread Raw
In response to Sanity checking big select performance  (Jeff Chen <jeff@pixleeteam.com>)
List pgsql-performance
On 28.10.2014 22:15, Jeff Chen wrote:
> Hi friends!
>
> I'd love to get a sanity check on whether a fat select query I'm doing
> makes sense given the infrastructure that we have.
>
> We have 3 big tables that we typically join together for certain
> queries: a ~40 million row photos table, a ~20 million row users table,
> and a ~50 million row photo_to_album table that maps photos to albums.

So how much data is it? Does it fit within RAM (after loading into DB,
with all the indexes)?

> We like to display real time analytics, which often results in a query like:
>
> select (random aggregations )
> from
> photo_to_album join photos on photos.id <http://photos.id> =
> photo_to_album.photo_id
> join users on users.id <http://users.id> = photos.user_id
> where
> photo_to_album.album_id = <something>
> and
> photos.created_at between <some dates>
> and <other junk>
>
> We have indexes on all of the joins, and the where clauses.

Can we get EXPLAIN (and ideally EXPLAIN ANALYZE) for such queries?

> One of these queries that should be targeting something like 300K
> photos takes 38 seconds to run (with an aggregate/nested loop taking
> effectively all of that time), and then upon second execution with a
> warm cache, 4 seconds.

Well, if you're hitting disk, it's going to be slow. As you observed,
after loading it into page cache, it's much faster.

> Also worryingly, it spikes read IOPS to almost 1500/sec during the time
> and write IOPS 200/sec.  When not running the query, steady level read
> iops basically nil, write hovers around 50-100.
>
> This also increases the queue depth from basically 0 up to 6.  Keeping
> the queue depth high seems to cause timeouts in other queries.  The CPU
> is barely if at all affected, hovering around 20%.  Memory also barely
> affected.

20% is ~2 CPU cores (as you have 8 of them).

> We have a RDS Postgres database, m3.2xlarge with 2000 Provisioned IOPS
> and 400GB storage.  This translates to 8 virtual CPUs, 30GiB memory, and
> all SSD drives.

AFAIK there are two PostgreSQL major versions supported on RDS - 9.1 and
9.3. Which one are you using?

Also, can you list values for some basic parameters (shared_buffers,
work_mem)? We don't know what are the default values on RDS, neither if
you somehow modified them.

> Several questions here:
>
> 1) Is that level of IOPS normal?

Ummmmm, why wouldn't it be? Each IO request works with 16 KB (on EBS),
and you're reading/writing a certain amount of data.

> 2) Is it bad that the level of iops can queue requests that screw up the
> whole database even if it's just select queries?  Especially when the
> CPU and Memory are still plentiful?

You're saturating a particular resource. If you hit I/O wall, you can't
use the CPU / memory. The fact that it slows down your queries is
somehow expected.

Is it bad? Well, if you need to minimize impact on other queries, then
probably yes.

> 3) What is up with the huge difference between cold and warm cache?

I don't understand why you're surprised by this? The EBS performance on
m3.2xlarge (with EBS-Optimized networking, i.e. 1 Gbit dedicated to EBS)
you get up to ~120 MB/s, except that you set 2000 IOPS, which is ~32
MB/s. Memory is orders of magnitude faster, hence the difference.

regards
Tomas


pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Incredibly slow restore times after 9.0>9.2 upgrade
Next
From: "Huang, Suya"
Date:
Subject: Re: unnecessary sort in the execution plan when doing group by