Thread: Why is my table continuousely written?

Why is my table continuousely written?

From
Peter
Date:
Hello,

 this is postgres version 12.11_1 on Freebsd 13.1

I have a table "mess", filename "6309215", that behaves strange.
Data is currently only inserted/appended, there is no update and no
(successful) delete done, autovacuum is NOT enabled.

This is the definition:

CREATE TABLE IF NOT EXISTS dnstap.mess
(
    id bigint NOT NULL DEFAULT nextval('dnstap.mess_id_seq'::regclass),
    mess text COLLATE pg_catalog."default",
    hash text COLLATE pg_catalog."default",
    CONSTRAINT mess_pkey PRIMARY KEY (id),
    CONSTRAINT mess_hash_key UNIQUE (hash)
)
TABLESPACE l1only;


And this is the table:

-rw-------  1 770  770  1073741824 Aug  2 04:38 6309212
-rw-------  1 770  770  1073741824 Aug  2 04:38 6309212.1
-rw-------  1 770  770  1073741824 Aug  2 04:38 6309212.2
-rw-------  1 770  770  1073741824 Aug  2 04:39 6309212.3
-rw-------  1 770  770  1073741824 Aug  2 04:39 6309212.4
-rw-------  1 770  770   405872640 Aug  3 04:28 6309212.5
-rw-------  1 770  770     1441792 Aug  3 03:25 6309212_fsm
-rw-------  1 770  770      180224 Aug  2 05:13 6309212_vm
-rw-------  1 770  770           0 Aug  2 00:21 6309213
-rw-------  1 770  770        8192 Aug  2 04:40 6309214
-rw-------  1 770  770  1073741824 Aug  3 04:29 6309215
-rw-------  1 770  770  1073741824 Aug  3 04:30 6309215.1
-rw-------  1 770  770  1073741824 Aug  3 04:13 6309215.10
-rw-------  1 770  770   995811328 Aug  3 04:16 6309215.11
-rw-------  1 770  770  1073741824 Aug  3 04:32 6309215.2
-rw-------  1 770  770  1073741824 Aug  3 04:34 6309215.3
-rw-------  1 770  770  1073741824 Aug  3 04:35 6309215.4
-rw-------  1 770  770  1073741824 Aug  3 04:37 6309215.5
-rw-------  1 770  770  1073741824 Aug  3 04:39 6309215.6
-rw-------  1 770  770  1073741824 Aug  3 04:41 6309215.7
-rw-------  1 770  770  1073741824 Aug  3 04:42 6309215.8
-rw-------  1 770  770  1073741824 Aug  3 04:12 6309215.9
-rw-------  1 770  770     3170304 Aug  3 04:38 6309215_fsm
-rw-------  1 770  770      393216 Aug  3 04:38 6309215_vm
-rw-------  1 770  770  1073741824 Aug  3 03:17 6309216
-rw-------  1 770  770  1073741824 Aug  2 04:33 6309216.1
-rw-------  1 770  770   241360896 Aug  3 03:18 6309216.2
-rw-------  1 770  770      606208 Aug  3 03:18 6309216_fsm
-rw-------  1 770  770       73728 Aug  3 03:17 6309216_vm
-rw-------  1 770  770    43139072 Aug  3 03:18 6309217


There is another table, "6309212", that is also only appended data,
and that behaves as expected: only the final segment is written, the
other segments are a day old.
But in "6309215" all the segments were recently written. ("6309216" is
the TOAST for "6309215", and "6309217" seems to be an index on that.)


This is what happens:

2022-08-03 03:09:03 CEST
-- ------- Check for hash value clash with old entries
SELECT i1.id
  FROM dnstap.incoming AS i1,
       dnstap.mess AS mess
 WHERE mess.hash = md5(i1.mess)
   AND mess.mess <> i1.mess

I don't think a SELECT like this has reasons to write the
referenced table - but it does:

-rw-------  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215
-rw-------  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.1
-rw-------  1 770  770  1073741824 Aug  3 03:09:47 2022 6309215.10
-rw-------  1 770  770   965394432 Aug  3 03:10:17 2022 6309215.11
-rw-------  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.2
-rw-------  1 770  770  1073741824 Aug  3 03:09:20 2022 6309215.3
-rw-------  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.4
-rw-------  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.5
-rw-------  1 770  770  1073741824 Aug  3 03:10:18 2022 6309215.6
-rw-------  1 770  770  1073741824 Aug  3 03:10:21 2022 6309215.7
-rw-------  1 770  770  1073741824 Aug  3 03:09:25 2022 6309215.8
-rw-------  1 770  770  1073741824 Aug  3 03:09:21 2022 6309215.9

And this goes on in that way with all the queries.
And when finally the vacuum analyze is run, it is the same, only
that one writes the segments in sequence (only inserts were appended
to the end, there are no updates or deletes):

2022-08-03 04:28:16 CEST
VACUUM (ANALYZE) dnstap.mess;

-rw-------  1 770  770  1073741824 Aug  3 04:29 6309215
-rw-------  1 770  770  1073741824 Aug  3 04:30 6309215.1
-rw-------  1 770  770  1073741824 Aug  3 04:13 6309215.10
-rw-------  1 770  770   995811328 Aug  3 04:16 6309215.11
-rw-------  1 770  770  1073741824 Aug  3 04:32 6309215.2
-rw-------  1 770  770  1073741824 Aug  3 04:33 6309215.3
-rw-------  1 770  770  1073741824 Aug  3 04:07 6309215.4
-rw-------  1 770  770  1073741824 Aug  3 04:08 6309215.5
-rw-------  1 770  770  1073741824 Aug  3 04:09 6309215.6
-rw-------  1 770  770  1073741824 Aug  3 04:10 6309215.7
-rw-------  1 770  770  1073741824 Aug  3 04:11 6309215.8
-rw-------  1 770  770  1073741824 Aug  3 04:12 6309215.9
-rw-------  1 770  770     3170304 Aug  3 03:18 6309215_fsm
-rw-------  1 770  770      393216 Aug  3 04:28 6309215_vm

What is worse is this:

-rw-------  1 770   770  16777216 Aug  3 04:28 000000010000007800000067.ok
-rw-------  1 770   770  16777216 Aug  3 04:28 000000010000007800000068.ok
-rw-------  1 770   770  16777216 Aug  3 04:29 000000010000007800000069.ok
-rw-------  1 770   770  16777216 Aug  3 04:29 00000001000000780000006A.ok
-rw-------  1 770   770  16777216 Aug  3 04:29 00000001000000780000006B.ok
-rw-------  1 770   770  16777216 Aug  3 04:30 00000001000000780000006C.ok
-rw-------  1 770   770  16777216 Aug  3 04:30 00000001000000780000006D.ok
-rw-------  1 770   770  16777216 Aug  3 04:30 00000001000000780000006E.ok
-rw-------  1 770   770  16777216 Aug  3 04:30 00000001000000780000006F.ok
-rw-------  1 770   770  16777216 Aug  3 04:31 000000010000007800000070.ok
-rw-------  1 770   770  16777216 Aug  3 04:31 000000010000007800000071.ok

The table is not only touched, it also creates lots of WAL - and these
here are redo-logs, i.e. I need storage for them. I am seriousely
drowning in the mass of continuousely created data - and the table is
now at 15% of the intended size, so this will apparently get more and
more and more!

History:

I created the table with an 'EXCLUDE =' constraint with hash index on
the "mess" field, as a kind of unique constraint. And with a timestamp
field to be occasionally updated - so at that time did updates happen.

The EXCLUDE constraint suffered from performance issues at 5 mio. rows,
so I had to add my own hash and check for collisions myself (the
"mess" field can be between 100 and maybe 100'000 characters). 

Then, already because of excessive write activity, I moved the
timestamp to a separate table. But that obviousely didn't help the
write activity.

I am running nightly vacuum, and did a vacuum FULL two days ago.
also I moved the table to another tablespace yesterday - it should
be cleanly rewritten. But the problem stays.

The only activity currently is a handful of inserts per day, of some
500'000 rows, run in a single transaction with repeatable read.


How can this be stopped and the table made to behave like the others?

-- PMc



Why is my table continuousely written? -> XID issue?

From
Peter
Date:
I had a look into what actually changed in the table. At the first
write that I grabbed, four rows in that segment had such a change:

117ee000  77 00 00 00 00 df b8 82  8e a4 00 00 64 00 a0 00  |w...........d...|
117ee000  77 00 00 00 f0 22 b4 f3  68 d3 00 00 64 00 a0 00  |w...."..h...d...|

117ee0a0  f8 5c 00 00 f8 5c 00 00  09 00 00 00 06 00 f7 8b  |.\...\..........|
117ee0b0  13 00 04 00 93 00 18 0d  d8 27 fe 01 00 00 00 00  |.........'......|
117ee0b0  13 00 04 00 93 01 18 0d  d8 27 fe 01 00 00 00 00  |.........'......|

1aad8000  77 00 00 00 30 fc 8c a5  80 8a 00 00 58 00 a8 00  |w...0.......X...|
1aad8000  77 00 00 00 e8 12 8a f3  6e 5b 00 00 58 00 a8 00  |w.......n[..X...|

1aad80a0                           f6 5d 00 00 00 00 00 00  |.........]......|
1aad80b0  09 00 00 00 06 00 6c d5  10 00 04 00 03 08 18 0d  |......l.........|
1aad80b0  09 00 00 00 06 00 6c d5  10 00 04 00 03 09 18 0d  |......l.........|

If I get that right, then it is the HEAP_XMIN_COMMITTED from
t_infomask that got set for the rows.

The next write is a bit different:

0000a000  76 00 00 00 60 9d 84 d0  23 4c 04 00 70 00 50 01  |v...`...#L..p.P.|
0000a000  78 00 00 00 80 81 e9 05  78 0d 04 00 70 00 50 01  |x.......x...p.P.|

0000a690                           cf 51 00 00 00 00 00 00  |504c54...Q......|
0000a690                           cf 51 00 00 13 5e 00 00  |504c54...Q...^..|
0000a6a0  00 00 00 00 06 00 05 00  12 00 04 00 03 2b 18 0d  |.............+..|
0000a6a0  00 00 00 00 06 00 05 00  12 00 04 00 93 23 18 0d  |.............#..|

003ec000  76 00 00 00 30 98 85 d0  59 e9 04 00 60 00 48 01  |v...0...Y...`.H.|
003ec000  78 00 00 00 08 33 1b 03  bd 97 04 00 60 00 48 01  |x....3......`.H.|

003eddf0  cf 51 00 00 00 00 00 00  00 00 00 00 06 00 f6 01  |.Q..............|
003eddf0  cf 51 00 00 13 5e 00 00  00 00 00 00 06 00 f6 01  |.Q...^..........|
003ede00  02 00 04 00 03 2b 18 0d  f3 56 2a 00 00 00 00 00  |.....+...V*.....|
003ede00  02 00 04 00 93 23 18 0d  f3 56 2a 00 00 00 00 00  |.....#...V*.....|

It seems here is the t_xmax for the row updated, and also flags in
t_infomask changed - and there are about 1000 rows (of some 3 mio.
that are contained in this segment) where this is done.

So if I get this right, it looks like some out-of-band housekeeping is
done here.
Now, while I would understand that this is done as occasion offers
when a record is read anyway, and I do not think that I grok the
XID housekeeping in fullness, I also do not think that I have any open
transactions that would persist a server restart. So I was under the
impression that a server restart plus a vacuum FULL, without any
further update/delete operations, should resolve all the needs for
such housekeeping for the time being, and quiesce the situation.

Apparently it does not. So, then, what does?