Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours - Mailing list pgsql-performance

From Andres Freund
Subject Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Date
Msg-id 20180913194347.sjnui3hajk7amcx6@alap3.anarazel.de
Whole thread Raw
In response to Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Select count(*) on a 2B Rows Tables Takes ~20 Hours  (Fred Habash <fmhabash@gmail.com>)
List pgsql-performance
Hi,

On 2018-09-13 14:12:02 -0400, Tom Lane wrote:
> > This is an Aurora cluster running on r4.2xlarge (8 vCPU, 61g).
> 
> Don't know much about Aurora, but I wonder whether you paid for
> guaranteed (provisioned) IOPS, and if so what service level.

Given that aurora uses direct-io and has the storage layer largely
completely replaced, I'm not sure how much we can help here.  My
understanding is that access to blocks can require page-level "log
reconciliation", which can cause adverse IO patterns.  The direct-IO
means that cache configuration / prefetching is much more crucial.  If a
lot of those tuples aren't frozen (don't quite know how that works
there), the clog accesses will also kill you if the table was filled
over many transactions, since clog's access characteristics to a lot of
xids is pretty bad with DIO.

Greetings,

Andres Freund


pgsql-performance by date:

Previous
From: Fd Habash
Date:
Subject: RE: Select count(*) on a 2B Rows Tables Takes ~20 Hours
Next
From: Fd Habash
Date:
Subject: How Do You Associate a Query With its Invoking Procedure?