Re: Many DataFileRead - IO waits - Mailing list pgsql-performance

From Ben Snaidero
Subject Re: Many DataFileRead - IO waits
Date
Msg-id CAEPE5bOtH=8dwFyu8MSOiksK56jAH7j_-i6c-_JatFx=6UofYg@mail.gmail.com
Whole thread Raw
In response to Re: Many DataFileRead - IO waits  (Michael Lewis <mlewis@entrata.com>)
Responses Re: Many DataFileRead - IO waits  (Michael Lewis <mlewis@entrata.com>)
List pgsql-performance
On Thu, Feb 27, 2020 at 11:54 AM Michael Lewis <mlewis@entrata.com> wrote:
How big is ix_tabledata_intid_timestampdate_intotherid3_intotherid2 on disk? If you create another index with same fields, how much space does it take? Real question- are you vacuuming aggressively enough for your workload? Your index name seems to indicate that intotherid3 would be the third key, and yet the planner chose not to scan that deep and instead filtered after it found the relevant tuples based on intid and timestampdate. That seems peculiar to me.

The documentation discourages multi-column indexes because they have limited application unless the same fields are always used. Personally, I don't love reviewing the stats of indexscans or how many tuples were fetched and having to guess how deeply the index was scanned for the various queries involved.

I'd wonder if an index on only intid_timestampdate would be both much smaller and also have a more right-leaning pattern of information being added and accessed in terms of keeping frequently needing blocks in shared buffers.

As a side note, that planning time seems high to me for such a simple query. Have you increased default_statistics_target significantly perhaps?

In this case the index is quite large ~400GB but as you can see from the explain plan it's doing a backward scan and not accessing that many buffers.  Other servers with this issue are much smaller.  We have autovacuum set to the default setting but this table does not get any deletes so I don't think that is the problem.  I think the reason it does not go deeper into the index keys is because it's just looking for the first occurence based on date (limit 1) not all of them although even if looking for all of them I think it would still scan in the same way since there would be other intotherid3 values between the ones in this search key

In regards to default_statistics_target I have not increased this value at all.

All this said regarding statistics and vacuum/bloat we restored a two day old copy of the database (on one of the servers experiencing the issue) and the issue was still present.  These systems are all on cloud infrastructure so I am leaning towards it being something hardware related (especially as it's only happening on a few servers) but our cloud provider says nothing has changed in this respect.

pgsql-performance by date:

Previous
From: Michael Lewis
Date:
Subject: Re: Many DataFileRead - IO waits
Next
From: Michael Lewis
Date:
Subject: Re: Many DataFileRead - IO waits