Re: Disk wait problem... - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Disk wait problem...
Date
Msg-id 9938ff1a-8837-454b-a50c-a748e84c172a@aklaver.com
Whole thread Raw
In response to Disk wait problem...  (pf@pfortin.com)
Responses Re: Disk wait problem...
Re: Disk wait problem...
List pgsql-general
On 10/23/23 14:55, pf@pfortin.com wrote:
Please reply to list also.
Ccing the list for this post.

> On Mon, 23 Oct 2023 12:44:56 -0700 Adrian Klaver wrote:
> 
>> On 10/23/23 11:54, pf@pfortin.com wrote:
>>> Hi,
>>>
>>> I have a 1.6TB database with over 330 tables on a 4TB NVMe SSD. All
>>> tables are static (no updates); most in 8M and 33M row sizes. Queries have
>>> been great, until recently.
>>
>>> Also attached is the relevant system journal entries for one query that
>>> took 2 seconds and two more that took 62 and 52 seconds... Unlike the
>>> above, these had 3 processes in disk-wait -- see attached screenshot...
>>
>> What happens if you select the tables in the command line client psql?
> 
> Hi Adrian,
> 
> Sorry about everything inline; I think it saves time vs dealing with
> attachments...
> 
> Summary:
> * single large partition dedicated to PG
> * only PG seems affected
> * buffered reads are 6x slower when PG queries are in progress
> * psql queries of 30  33M tables shows random results up to 4m51s
> * successful psql queries below 500ms
> * ls -l loop on another directory had no delays.
> 
> HTH...
> 
> The WB query was in disk-wait, so psql gave:
> ncsbe=# select count(*) from ncvhis_2022_10_29;
> FATAL:  terminating connection due to administrator command
> server closed the connection unexpectedly
>          This probably means the server terminated abnormally
>          before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> 
> The above query finished a second or two before the WB results
> appeared; then this next query succeeded right after:
> ncsbe=# select count(*) from
> ncvhis_2022_10_29; count
> ----------
>   33861176
> (1 row)
> 
> Another table went into disk-wait long enough that I was able to query
> the tmp directory on the same disk just fine.
> 
> Querying the same table; no error, canceled it:
> ncsbe=# select count(*) from ncvhis_2022_07_23;
> ^CCancel request sent
> ERROR:  canceling statement due to user request
> 
> While this table was still in disk-wait, tried another table which
> returned quickly:
> ncsbe=# select count(*) from ncvhis_2023_10_21;
>    count
> ----------
>   32983343
> (1 row)
> 
> Eventually, the WB query did complete...
> 
> Next, I made a query list for 30 tables, turned on timing and pasted the
> queries.  Pasting results inline:
> ncsbe=# select count(*) from ncvhis_2012_12_06;
> select count(*) from ncvhis_2016_12_03;
> select count(*) from ncvhis_2020_08_01;
> select count(*) from ncvhis_2020_10_31;
> select count(*) from ncvhis_2020_12_05;
> select count(*) from ncvhis_2020_12_26;
> select count(*) from ncvhis_2021_03_06;
> select count(*) from ncvhis_2021_06_12;
> select count(*) from ncvhis_2022_07_23;
> select count(*) from ncvhis_2022_10_15;
> select count(*) from ncvhis_2022_10_22;
> select count(*) from ncvhis_2022_10_29;
> select count(*) from ncvhis_2022_11_05;
> select count(*) from ncvhis_2022_11_12;
> select count(*) from ncvhis_2022_11_19;
> select count(*) from ncvhis_2022_11_26;
> select count(*) from ncvhis_2022_12_03;
> select count(*) from ncvhis_2022_12_10;
> select count(*) from ncvhis_2022_12_17;
> select count(*) from ncvhis_2022_12_24;
> select count(*) from ncvhis_2022_12_31;
> select count(*) from ncvhis_2023_01_08;
> select count(*) from ncvhis_2023_01_14;
> select count(*) from ncvhis_2023_01_21;
> select count(*) from ncvhis_2023_01_28;
> select count(*) from ncvhis_2023_02_04;
> select count(*) from ncvhis_2023_02_11;
> select count(*) from ncvhis_2023_02_18;
> select count(*) from ncvhis_2023_02_27;
> select count(*) from ncvhis_2023_03_04;
>    count
> ----------
>   31923950
> (1 row)
> 
> Time: 72404.786 ms (01:12.405)
>    count
> ----------
>   29701168
> (1 row)
> 
> Time: 301.246 ms
>    count
> ----------
>   32172845
> (1 row)
> 
> Time: 409.974 ms
>    count
> ----------
>   32162748
> (1 row)
> 
> Time: 363.836 ms
>    count
> ----------
>   34679651
> (1 row)
> 
> Time: 351.167 ms
>    count
> ----------
>   35006711
> (1 row)
> 
> Time: 348.378 ms
>    count
> ----------
>   35003995
> (1 row)
> 
> Time: 348.712 ms
>    count
> ----------
>   34994502
> (1 row)
> 
> Time: 351.901 ms
>    count
> ----------
>   33686292
> (1 row)
> 
> Time: 487.837 ms
>    count
> ----------
>   33861658
> (1 row)
> 
> Time: 40987.826 ms (00:40.988)
>    count
> ----------
>   33861381
> (1 row)
> 
> Time: 76964.281 ms (01:16.964)
>    count
> ----------
>   33861176
> (1 row)
> 
> Time: 483.329 ms
>    count
> ----------
>   33861071
> (1 row)
> 
> Time: 18919.267 ms (00:18.919)
>    count
> ----------
>   29344354
> (1 row)
> 
> Time: 50896.978 ms (00:50.897)
>    count
> ----------
>   30709227
> (1 row)
> 
> Time: 25784.000 ms (00:25.784)
>    count
> ----------
>   32368001
> (1 row)
> 
> Time: 45407.599 ms (00:45.408)
>    count
> ----------
>   33126826
> (1 row)
> 
> Time: 75171.609 ms (01:15.172)
>   count
> ----------
>   33126808
> (1 row)
> 
> Time: 88871.004 ms (01:28.871)
>    count
> ----------
>   33127317
> (1 row)
> 
> Time: 128565.127 ms (02:08.565)
>   count
> ----------
>   33127192
> (1 row)
> 
> Time: 210248.222 ms (03:30.248)
>    count
> ----------
>   33127117
> (1 row)
> 
> Time: 246609.561 ms (04:06.610)   4 minutes!  ;p
>    count
> ----------
>   33124457
> (1 row)
> 
> Time: 166406.283 ms (02:46.406)
>    count
> ----------
>   33123714
> (1 row)
> 
> Time: 291483.538 ms (04:51.484)  nearly 5 minutes ;p ;p
>    count
> ----------
>   33123313
> (1 row)
> 
> Time: 2269.961 ms (00:02.270)
>    count
> ----------
>   33123197
> (1 row)
> 
> Time: 413.219 ms
>    count
> ----------
>   33123091
> (1 row)
> 
> Time: 433.817 ms
>    count
> ----------
>   33122953
> (1 row)
> 
> Time: 395.844 ms
>    count
> ----------
>   33122752
> (1 row)
> 
> Time: 7251.637 ms (00:07.252)
>    count
> ----------
>   33122665
> (1 row)
> 
> Time: 384.328 ms
>    count
> ----------
>   33122573
> (1 row)
> 
> Time: 384.887 ms
> ncsbe=#
> 
> I don't see a pattern in the above; matches the randomness I saw using
> WB...
> 
> Before the above finished, I issued this command on another konsole...
> 
> $ while true; do ls -l > /tmp/ll; date; done
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> Mon Oct 23 05:12:54 PM EDT 2023
> ...
> Lots of high speed output with no pauses, at 64 loops per second.
> 
> So that means only PG is getting disk waits.
> 
> This disk is dedicated to PG and a rarely used semi-permanent tmp
> directory on a single partition.
> 
> This is from fdisk -l:
> Disk /dev/nvme1n1: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
> Disk model: Seagate FireCuda 530 ZP4000GM30023
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disklabel type: gpt
> Disk identifier: 9B3C5D73-6FFD-475D-9E1E-26F43B41C690
> 
> Device         Start        End    Sectors  Size Type
> /dev/nvme1n1p1  2048 7814035455 7814033408  3.6T Linux filesystem
> 
> $ hdparm -Tt /dev/nvme1n1
> /dev/nvme1n1:
>   Timing cached reads:   36820 MB in  2.00 seconds = 18434.72 MB/sec
>   Timing buffered disk reads: 774 MB in  3.00 seconds = 257.74 MB/sec
> 
> The 30 queries still hadn't finished at this point...
> 
> Interesting: the hdparm buffered read is very slow; once the queries
> finished, the disk performance is back:
> $ hdparm -Tt /dev/nvme1n1
> 
> /dev/nvme1n1:
>   Timing cached reads:   37382 MB in  2.00 seconds = 18715.85 MB/sec
>   Timing buffered disk reads: 5002 MB in  3.17 seconds = 1575.89 MB/sec
> 
> Over 6x faster...
> 
> So far, the only disk waits I see are PG related.  Do have any opinion on
> Ken's comment about parallel queries?
> 
> Thanks,
> Pierre
> 
> 
>>>
>>> I just stopped PG, updated the full backup which was done overnight (so
>>> the update took about a second), restarted PG and the problem persists...
>>>
>>> Any suggestions where to look next?
>>>
>>> Thanks,
>>> Pierre
>>

-- 
Adrian Klaver
adrian.klaver@aklaver.com




pgsql-general by date:

Previous
From: Brad White
Date:
Subject: setting up streaming replication
Next
From: pf@pfortin.com
Date:
Subject: Re: Disk wait problem...