Re: Disk wait problem... - Mailing list pgsql-general
From | pf@pfortin.com |
---|---|
Subject | Re: Disk wait problem... |
Date | |
Msg-id | 20231026155016.6b739a7e.pfortin@pfortin.com Whole thread Raw |
In response to | Re: Disk wait problem... ("Peter J. Holzer" <hjp-pgsql@hjp.at>) |
Responses |
Re: Disk wait problem... may not be hardware...
|
List | pgsql-general |
Hi Peter, All of the following is based on using SQL_workbench/J (WB) (Mageia Linux has not provided a viable pgAdmin4 package); WB is setup to autoload the table row count and 10 rows. I'm sticking to one set of files where they are all roughly 33.6M rows. I've been doing a lot of digging and found more interesting info... At first, I thought the disk was in disk wait the entire time of the delay; this _appeared_ to be the case since the system journal logged everything about each query, with every query looking the same; no errors. I was going to provide some logs; but suffice it to say that the logs, combined with running iostat every second has been enlightening... Every time I switch from one table to another, sometimes the results display in: * less than a half-second * one or more seconds * up to 5+ minutes. So far, none has failed to eventually return; just l..o..n..g delays... This morning, I've been using iostat: $ while true; do echo -n "$(date +%H:%M:%S) "; iostat /dev/nvme1n1 | grep nvme1n1; sleep 1; done When the response is fast; I see iostat report ~225000 bytes read. On the other hand when the query takes a long time to respond, I see the read bytes spread over time... a little every second... So while htop appears to report a steady D (disk wait) status, I now suspect the short transfers are fast and most of the time, htop is going to catch the D state. It definitely appears to be a problem with the 4TB NVMe SSD drive which utilities say it still has 95% life left[1]. The drive is reporting: Error Information Log Entries: 1,177 which is growing, so I'll be digging into that. It also reports: Error Information (NVMe Log 0x01, 16 of 63 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message 0 1177 0 0xc002 0x4004 - 0 1 - Invalid Field in Command 1 1176 0 0xb001 0x4004 - 0 1 - Invalid Field in Command 2 1175 0 0x801a 0x4004 - 0 1 - Invalid Field in Command ... This list is not growing, so I'll look into it later than sooner. A few moments ago, I decided to switch to the backup which resides on a 17TB mechanical platter drive which is running great. [1] SSD drives wear out with usage; think of it as a vehicle tire which slowly wears out. Also found that there is a firmware update available from: https://www.seagate.com/support/downloads/ No idea if this will solve the problem; but... Thanks for your suggestions. Thought I saw there may be instrumentation within PG which I can access; suggestions welcome if so... Cheers, Pierre On Thu, 26 Oct 2023 11:50:00 +0200 Peter J. Holzer wrote: >On 2023-10-23 16:31:30 -0700, Adrian Klaver wrote: >> On 10/23/23 14:55, pf@pfortin.com wrote: >> > Next, I made a query list for 30 tables, turned on timing and pasted the >> > queries. Pasting results inline: >> > ncsbe=# select count(*) from ncvhis_2012_12_06; >> > select count(*) from ncvhis_2016_12_03; >> > select count(*) from ncvhis_2020_08_01; >> > select count(*) from ncvhis_2020_10_31; >[...] >> > Time: 72404.786 ms (01:12.405) >> > Time: 301.246 ms >> > Time: 409.974 ms >> > Time: 363.836 ms >> > Time: 351.167 ms >> > Time: 348.378 ms >> > Time: 348.712 ms >> > Time: 351.901 ms >> > Time: 487.837 ms >> > Time: 40987.826 ms (00:40.988) >> > Time: 76964.281 ms (01:16.964) >> > Time: 483.329 ms >> > Time: 18919.267 ms (00:18.919) >> > Time: 50896.978 ms (00:50.897) >> > Time: 25784.000 ms (00:25.784) >> > Time: 45407.599 ms (00:45.408) >> > Time: 75171.609 ms (01:15.172) >> > Time: 88871.004 ms (01:28.871) >> > Time: 128565.127 ms (02:08.565) >> > Time: 210248.222 ms (03:30.248) >> > Time: 246609.561 ms (04:06.610) 4 minutes! ;p >> > Time: 166406.283 ms (02:46.406) >> > Time: 291483.538 ms (04:51.484) nearly 5 minutes ;p ;p >> > Time: 2269.961 ms (00:02.270) >> > Time: 413.219 ms >> > Time: 433.817 ms >> > Time: 395.844 ms >> > Time: 7251.637 ms (00:07.252) >> > Time: 384.328 ms >> > Time: 384.887 ms >> > >> > I don't see a pattern in the above; matches the randomness I saw using >> > WB... > >It would be interesting to see whether there's a difference in plan >between the slow and the fast queries. > >If your experiment above is repeatable, you can just prefix every >query with > explain (analyze, buffers) > > >explain (analyze, buffers) select count(*) from ncvhis_2016_12_03; >explain (analyze, buffers) select count(*) from ncvhis_2020_08_01; >explain (analyze, buffers) select count(*) from ncvhis_2020_10_31; >... > >If it is hard to reproduce (happens only sometimes), the best way to >catch is probably to enable auto_explain ><https://www.postgresql.org/docs/current/auto-explain.html>, possibly >with the auto_explain.log_analyze option and check the logs after it >happened. (auto_explain is generally a good idea, IMHO). > > >> > Before the above finished, I issued this command on another konsole... >> > >> > $ while true; do ls -l > /tmp/ll; date; done > >This is unlikely to generate noticeable disk waits. The current >directory will be in the cache after the first ls and the writes happen >asynchroneously. > > hp >
pgsql-general by date: