Re: Disk wait problem... may not be hardware... - Mailing list pgsql-general

From pf@pfortin.com
Subject Re: Disk wait problem... may not be hardware...
Date
Msg-id 20231027194609.1bf96324.pfortin@pfortin.com
Whole thread Raw
In response to Re: Disk wait problem... may not be hardware...  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Responses Re: Disk wait problem... may not be hardware...
Re: Disk wait problem... may not be hardware...
Re: Disk wait problem... may not be hardware...
List pgsql-general
Peter,

Thanks for your patience; I've been feeling pressure to get this
resolved; so have been lax in providing info here..  Hope the following
helps...

On Fri, 27 Oct 2023 19:07:11 +0200 Peter J. Holzer wrote:

>On 2023-10-26 22:03:25 -0400, pf@pfortin.com wrote:
>> Are there any extra PG low level logs that can be turned on?

$ uname -a
Linux pf.pfortin.com 6.5.3-server-1.mga10 #1 SMP PREEMPT_DYNAMIC
   Sat Sep 16 00:04:28 UTC 2023 x86_64 GNU/Linux

PostgreSQL 15.4 on x86_64-mageia-linux-gnu, compiled by gcc (Mageia
12.3.0-3.mga9) 12.3.0, 64-bit

Operating System: Mageia 10 KDE Plasma
Version: 5.27.8 KDE Frameworks Version: 5.111.0
Qt Version: 5.15.7
Kernel Version: 6.5.3-server-1.mga10 (64-bit)
Graphics Platform: X11
Processors: 20 × 12th Gen Intel® Core™ i7-12700K
Memory: 125.5 GiB of RAM
Graphics Processor: AMD Radeon RX 6600 XT
Manufacturer: Dell Inc.
Product Name: XPS 8950

>Have you looked at the query plans as I recommended? (You might also
>want to enable track_io_timing to get extra information, but comparing
>just the query plans of fast and slow queries would be a first step)

I didn't see how that would help since other than the table name the
queries are identical.  Curious: are you implying PG stores tables
differently?  The tables are probably greater than 90-95% identical rows.
We're still investigating how to best store that type of data which
changes a small number of rows each week...

The query is the built-in row-count query in SQL-workbench/J which is
visible in the attached files.

>explain (analyze, buffers) select count(*) from ncvhis_2016_12_03;
Finalize Aggregate  (cost=404669.65..404669.66 rows=1 width=8) (actual
time=844.158..847.309 rows=1 loops=1) Buffers: shared hit=248 read=25022
  ->  Gather  (cost=404669.43..404669.65 rows=2 width=8) (actual
time=844.133..847.301 rows=3 loops=1) Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=248 read=25022
        ->  Partial Aggregate  (cost=403669.43..403669.45 rows=1 width=8)
(actual time=838.772..838.772 rows=1 loops=3) Buffers: shared hit=248
read=25022 ->  Parallel Index Only Scan using
ncvhis_2016_12_03_voted_party_cd_idx on ncvhis_2016_12_03
(cost=0.44..372735.05 rows=12373755 width=0) (actual time=18.277..592.473
rows=9900389 loops=3) Heap Fetches: 0 Buffers: shared hit=248 read=25022
Planning Time: 0.069 ms JIT:
  Functions: 8
  Options: Inlining false, Optimization false, Expressions true,
Deforming true Timing: Generation 0.284 ms, Inlining 0.000 ms,
Optimization 0.268 ms, Emission 3.590 ms, Total 4.143 ms Execution Time:
847.498 ms

>explain (analyze, buffers) select count(*) from ncvhis_2020_08_01;
Finalize Aggregate  (cost=438377.94..438377.95 rows=1 width=8) (actual
time=624.700..628.024 rows=1 loops=1) Buffers: shared hit=286 read=27084
  ->  Gather  (cost=438377.73..438377.94 rows=2 width=8) (actual
time=624.669..628.016 rows=3 loops=1) Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=286 read=27084
        ->  Partial Aggregate  (cost=437377.73..437377.74 rows=1 width=8)
(actual time=619.297..619.297 rows=1 loops=3) Buffers: shared hit=286
read=27084 ->  Parallel Index Only Scan using
ncvhis_2020_08_01_voted_party_cd_idx on ncvhis_2020_08_01
(cost=0.44..403856.78 rows=13408379 width=0) (actual time=0.029..357.697
rows=10724282 loops=3) Heap Fetches: 0 Buffers: shared hit=286 read=27084
Planning Time: 0.072 ms JIT:
  Functions: 8
  Options: Inlining false, Optimization false, Expressions true,
Deforming true Timing: Generation 0.285 ms, Inlining 0.000 ms,
Optimization 0.218 ms, Emission 3.933 ms, Total 4.435 ms Execution Time:
628.216 ms

>explain (analyze, buffers) select count(*) from ncvhis_2020_10_31;
Finalize Aggregate  (cost=438179.73..438179.74 rows=1 width=8) (actual
time=1090.209..1092.976 rows=1 loops=1) Buffers: shared hit=276 read=27095
  ->  Gather  (cost=438179.52..438179.73 rows=2 width=8) (actual
time=1090.141..1092.967 rows=3 loops=1) Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=276 read=27095
        ->  Partial Aggregate  (cost=437179.52..437179.53 rows=1 width=8)
(actual time=1084.766..1084.767 rows=1 loops=3) Buffers: shared hit=276
read=27095 ->  Parallel Index Only Scan using
ncvhis_2020_10_31_voted_party_cd_idx on ncvhis_2020_10_31
(cost=0.44..403675.53 rows=13401595 width=0) (actual
time=222.686..816.447 rows=10720916 loops=3) Heap Fetches: 0 Buffers:
shared hit=276 read=27095
Planning Time: 0.131 ms JIT:
  Functions: 8
  Options: Inlining false, Optimization false, Expressions true,
Deforming true Timing: Generation 0.322 ms, Inlining 0.000 ms,
Optimization 0.261 ms, Emission 3.648 ms, Total 4.231 ms Execution Time:
1093.209 ms



Those responses are reasonably quick...  A suggestion from the Linux list
was to run "fstrim -v /mnt/work" which is the NVMe mount point for the DB:
 fstrim -nv /mnt/work
/mnt/work: 0 B (dry run) trimmed

$ fstrim -v /mnt/work
/mnt/work: 2 TiB (2248993087488 bytes) trimmed

but that didn't change anything noticeable.

Something I hadn't noticed before: SQL-workbench/J (build 129.6) displays
an execution timer at the bottom of the window.  I see all the queries
reporting 0-47ms which it plenty fast. It's the results that often take a
long time to display.


>You haven't shown any postgresql logs or your settings, so it's hard to
>know what you have already turned on.

data/pg_hba.conf:
local   all             all                                     trust
host    all             all             127.0.0.1/32            trust
host    all             all             ::1/128                 trust
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust


data/postgresql.conf:
max_connections = 100
shared_buffers = 128MB
dynamic_shared_memory_type = posix
max_wal_size = 4GB
min_wal_size = 80MB
log_statement = 'all'
log_timezone = 'America/New_York'
datestyle = 'iso, mdy'
timezone = 'America/New_York'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'



>There are a ton of logging related parameters.

I'll turn some more on...  The only log I found which had info I could use
was the journald -- see attached files. The queries were automatically
triggered when selecting the tables in SQL-workbench/J.

Attached files:

Journal: contains several queries:
1. under 1 second
2. under 1 second
3. under 1 second; i/o response split across 2 iostat queries
4. under 1 second; i/o response split across 2 iostat queries
5. about 2 seconds; i/o response split across 2 iostat queries -- re-read?
6. under 1 second

journal2: contains a table where the i/o dribbled out slowly over ~97
seconds.
The tables that respond slowly are random and the time to respond is
random.

In case I didn't share this: I get new tables on Sat. mornings. The
entire DB is backed up (cron job) by stopping PG on Sun. morning, doing a
"cp -auf" to a platter drive; PG restart.  When I switched (via symlinks)
from the DB to the backup, the backup responds the same way; which is why
I no longer believe it's absolutely hardware -- until it's found _and_
fixed, I'll keep investigating.

>> I've only found logging in the journal;
>
>By "the journal" you mean the one maintained by journald? (Did you
>mention what OS you are using? From several outputs I'm guessing it is
>Linux). On Linux systems postgresql is usually set up to log into files
>somewhere under /var/log/postgresql (see the log_directory entry in
>postgresql.conf) but your setup might be different, of course.

Answered above.

I've been responding to a number of suggestions. Not sure I mentioned
this here:  I found and installed a firmware update to the NVMe SSD; but
that didn't change anything.

>        hp
>

Attachment

pgsql-general by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: PgAmin view
Next
From: Adrian Klaver
Date:
Subject: Re: Disk wait problem... may not be hardware...