Thread: Append only tables
Hi, I have a few tables that are append only. Thre are only gets insert and select queries, never update or delete. What I see is that every file is still being updated. It's currently about 500 GB big, and every of that almost 500 files has been touched the past 24 hours. I assume that that the free space map is being used, and that it still finds places where it can insert a row in one of the files. (auto) vacuum is not happening on the table. Is there a way I can turn off this behaviour, and that it really only writes to the last few pages? Kurt
On Mar 20, 2020, at 5:50 PM, Kurt Roeckx <kurt@roeckx.be> wrote:Hi,
I have a few tables that are append only. Thre are only gets insert
and select queries, never update or delete.
What I see is that every file is still being updated. It's
currently about 500 GB big, and every of that almost 500 files has
been touched the past 24 hours.
(auto) vacuum is not happening on the table.
It could be the result of records being frozen during selects statements; select can freeze tuples that meet the criteria for being frozen.
I would recommend vacuuming the table to get the existing records frozen; then I believe you should see fewer updates to the all the base files.
On Sat, Mar 21, 2020 at 01:44:47AM -0400, Rui DeSousa wrote: > > > > On Mar 20, 2020, at 5:50 PM, Kurt Roeckx <kurt@roeckx.be> wrote: > > > > Hi, > > > > I have a few tables that are append only. Thre are only gets insert > > and select queries, never update or delete. > > > > What I see is that every file is still being updated. It's > > currently about 500 GB big, and every of that almost 500 files has > > been touched the past 24 hours. > > > > (auto) vacuum is not happening on the table. > > > > Vacuum is still required as the tuples need to be frozen. Have you vacuumed the table? I did not manually vacuum them, nor did auto vacuum ever get triggered, at least not according to what is in pg_stat_user_tables; > It could be the result of records being frozen during selects statements; select can freeze tuples that meet the criteriafor being frozen. I guess it's doing that now. iotop shows: TID PRIO USER DISK READ> DISK WRITE SWAPIN IO COMMAND 10227 be/4 postgres 67.68 G 67.56 G 0.00 % 79.66 % postgres: 12/main: kurt certs [local] VACUUM > I would recommend vacuuming the table to get the existing records frozen; then I believe you should see fewer updates tothe all the base files. If this is needed, why doesn't autovacuum do this for me? Or do I need to modify some parameters so that autovacuum does get triggered? I actually didn't run any select query on that table yet, it's still importing the data. Can an insert also trigger the freeze? Anyway, the vacuum is running, I will let you know if it helps or not. Kurt
Kurt Roeckx schreef op vr 20-03-2020 om 22:50 [+0100]: > Hi, > > I have a few tables that are append only. Thre are only gets insert > and select queries, never update or delete. Hallo Kurt, i would use cstore in your case (fdw for more efficient storage; it is columnar storage, append only data) https://github.com/citusdata/cstore_fdw > > What I see is that every file is still being updated. It's > currently about 500 GB big, and every of that almost 500 files has > been touched the past 24 hours. > > I assume that that the free space map is being used, and that it > still finds places where it can insert a row in one of the files. > > (auto) vacuum is not happening on the table. > > Is there a way I can turn off this behaviour, and that it really > only writes to the last few pages? > > > Kurt > > > -- mvg, Wim Bertels -- Lector UC Leuven-Limburg -- He that breaks a thing to find out what it is has left the path of wisdom. -- J.R.R. Tolkien
On Sat, Mar 21, 2020 at 4:31 AM Kurt Roeckx <kurt@roeckx.be> wrote:
On Sat, Mar 21, 2020 at 01:44:47AM -0400, Rui DeSousa wrote:
>
>
> > On Mar 20, 2020, at 5:50 PM, Kurt Roeckx <kurt@roeckx.be> wrote:
> >
> > Hi,
> >
> > I have a few tables that are append only. Thre are only gets insert
> > and select queries, never update or delete.
> >
> > What I see is that every file is still being updated. It's
> > currently about 500 GB big, and every of that almost 500 files has
> > been touched the past 24 hours.
> >
> > (auto) vacuum is not happening on the table.
> >
>
> Vacuum is still required as the tuples need to be frozen. Have you vacuumed the table?
I did not manually vacuum them, nor did auto vacuum ever get
triggered, at least not according to what is in
pg_stat_user_tables;
> It could be the result of records being frozen during selects statements; select can freeze tuples that meet the criteria for being frozen.
I guess it's doing that now. iotop shows:
TID PRIO USER DISK READ> DISK WRITE SWAPIN IO COMMAND
10227 be/4 postgres 67.68 G 67.56 G 0.00 % 79.66 % postgres: 12/main: kurt certs [local] VACUUM
> I would recommend vacuuming the table to get the existing records frozen; then I believe you should see fewer updates to the all the base files.
If this is needed, why doesn't autovacuum do this for me? Or do I
need to modify some parameters so that autovacuum does get
triggered?
I actually didn't run any select query on that table yet, it's
still importing the data. Can an insert also trigger the freeze?
Anyway, the vacuum is running, I will let you know if it helps or
not.
Kurt
I wouldn't worry so much about the files themselves simply being touched. Postgres has to do things over time even with old pages to keep the visibility maps up to date. However, you can do things to keep what it does when it touches those files to a minimum.
If at any time this table was getting updates or deletes, then yes, new inserts may be using the free space map to fill in space that vacuum had previously marked as available. If you'd like to clean this "free" space up, you can run a VACUUM FULL on the table which will completely rewrite it and all its indexes, compacting them down into the smallest space possible (basically a defrag). Note this will lock the table for the duration, but since this is now an insert only table, you should rarely, if ever, have to do this again, so it would be worth the outage time right now.
Insert only tables will not trigger autovacuum. Autovacuum is triggered only by counters that are incremented by updates and deletes (see autovacuum_analyze_scale_factor, autovacuum_analyze_threshold, autovacuum_vacuum_scale_factor, autovacuum_vacuum_threshold). However, at some point, the xid values on this table will become old enough that autovacuum will kick in to prevent xid exhaustion/wraparound (see autovacuum_freeze_max_age). If you're on at least version 9.6, vacuuming the table, especially a vacuum full, will mark all old pages as frozen so autovacuum should be able to skip right over them. Version 12 improves upon this even more. So this emergency vacuum should have relatively little impact, only depending on how much new data there is.
I'd be careful with cstore at this time unless your data is easily recreatable. Unless something has changed, it provides zero crash safety and cannot be replicated because it's not part of the WAL stream.
Keith
On Mar 21, 2020, at 4:30 AM, Kurt Roeckx <kurt@roeckx.be> wrote:I would recommend vacuuming the table to get the existing records frozen; then I believe you should see fewer updates to the all the base files.
If this is needed, why doesn't autovacuum do this for me? Or do I
need to modify some parameters so that autovacuum does get
triggered?
I actually didn't run any select query on that table yet, it's
still importing the data. Can an insert also trigger the freeze?
As Keith stated; due to lack of updates and deletes the counters are not necessarily going to trigger an auto vacuum. Have a look at your auto vacuum settings and freeze settings to determine the best approach.
You want to be more aggressive with those settings or manually vacuum the table periodically. What you don’t want to occur to an extremely large table that has never been vacuumed is have it issue a freeze wraparound vacuum. The select freezing the tuples is a good thing here as it helps advance the xmin age. Really don’t want wraparound vacuums to occur; as that would indicate not enough vacuums are occurring for your current workload/volume.
Here’s a set of queries that I use to monitor transaction id age of the database and tables.
create or replace function util.numeric_format(_percision int, _scale int)
returns varchar(100)
as $$
declare
_grpCnt int;
_decCnt int;
begin
_grpCnt := ((_percision - _scale) / 3);
_decCnt := ((_percision - _scale) % 3);
return trim(leading ',' from (repeat('9', _decCnt) || repeat('G999', _grpCnt) || 'D' || repeat('9', _scale)));
end;
$$ language plpgsql
immutable returns null on null input
;
comment on function numeric_format(int, int)
is 'Returns a format string for a numeric value given the percision.'
;
with stat as (
select datname as database
, (datfrozenxid::text)::bigint as frozen_txid
, age(datfrozenxid) as txid_age
, (2^31 - 1000000) as max_txid
, (2^31 - 1000000) - age(datfrozenxid) as txid_free
, current_setting('autovacuum_freeze_max_age'::text)::double precision as freeze_age
, case pg_is_in_recovery() when true then 0 else txid_current() end & (~(-1::bigint << 31)) as current_txid
, case pg_is_in_recovery() when true then 0 else txid_current() end as epoch_txid
from pg_database
where datname = current_database()
)
select stat.database
, to_char(stat.frozen_txid, util.numeric_format(10,0)) as frozen_txid
, to_char(stat.current_txid, util.numeric_format(10,0)) as current_txid
, to_char(stat.epoch_txid, util.numeric_format(10,0)) as epoch_txid
, to_char(txid_age, numeric_format(10, 0)) as txid_age
, to_char(txid_free, numeric_format(10, 0)) as txid_free
, to_char(txid_age / max_txid * 100, '9G999D99%') as used_txid_pct
, to_char(txid_age / freeze_age * 100, '9G999D99%') as used_freeze_pct
, now() as asof
from stat
;
select n.nspname as schema
, c.relname as table_name
, c.relpages::bigint as pages
, pg_size_pretty(c.relpages::bigint * current_setting('block_size')::bigint) as size
, to_char((c.relfrozenxid::text)::bigint, numeric_format(15, 0)) as frozen_xtid
, to_char(age(c.relfrozenxid), numeric_format(15, 0)) as txid_age
, to_char((2^31 - 1000000) - age(c.relfrozenxid), numeric_format(15, 0)) as txid_free
, to_char(age(c.relfrozenxid) / (2^31 - 1000000) * 100, '9G999D99%') as used_txid_pct
, to_char(age(c.relfrozenxid) / current_setting('autovacuum_freeze_max_age'::text)::double precision * 100, '9G999D99%') as used_freeze_pct
from pg_class c
join pg_namespace n on n.oid = c.relnamespace
where c.relkind in ('r', 'm')
order by (c.relfrozenxid::text)::bigint
, c.relpages::bigint desc
limit 50
;
On Mar 21, 2020, at 1:15 PM, Rui DeSousa <rui@crazybean.net> wrote:
(2^31 - 1000000)
I can’t recall why I subtracted 1 million; I would change all the occurrences to just: (2^31)
On Fri, 2020-03-20 at 22:50 +0100, Kurt Roeckx wrote: > I have a few tables that are append only. Thre are only gets insert > and select queries, never update or delete. > > What I see is that every file is still being updated. It's > currently about 500 GB big, and every of that almost 500 files has > been touched the past 24 hours. > > I assume that that the free space map is being used, and that it > still finds places where it can insert a row in one of the files. > > (auto) vacuum is not happening on the table. This is probably the first reader setting hint bits on the table rows. To determine whether a row is visible or not, the first reader has to consult the commit log to see if the xmin and xmax special columns of the row belong to committed transactions or not. To make life easier for future readers, it will then set special flags on the row that provide that information without the requirement to consult the commit log. This modifies the row, even if the data don't change, and the row has to be written again. > Is there a way I can turn off this behaviour, and that it really > only writes to the last few pages? You can explicitly read or vacuum the new rows, that will set the hint bits. But, as has been explained, at some point the table will have to receive an anti-wraparound vacuum that will freeze old rows. So the best you can do is to VACUUM (FREEZE) the table after you load data. Then the table should not be modified any more. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Mon, Mar 23, 2020 at 02:35:48PM +0100, Laurenz Albe wrote: > On Fri, 2020-03-20 at 22:50 +0100, Kurt Roeckx wrote: > > I have a few tables that are append only. Thre are only gets insert > > and select queries, never update or delete. > > > > What I see is that every file is still being updated. It's > > currently about 500 GB big, and every of that almost 500 files has > > been touched the past 24 hours. > > > > I assume that that the free space map is being used, and that it > > still finds places where it can insert a row in one of the files. > > > > (auto) vacuum is not happening on the table. > > This is probably the first reader setting hint bits on the table rows. > > To determine whether a row is visible or not, the first reader has > to consult the commit log to see if the xmin and xmax special columns > of the row belong to committed transactions or not. > > To make life easier for future readers, it will then set special > flags on the row that provide that information without the requirement > to consult the commit log. > > This modifies the row, even if the data don't change, and the row > has to be written again. > > > Is there a way I can turn off this behaviour, and that it really > > only writes to the last few pages? > > You can explicitly read or vacuum the new rows, that will set the > hint bits. > > But, as has been explained, at some point the table will have to receive > an anti-wraparound vacuum that will freeze old rows. > > So the best you can do is to VACUUM (FREEZE) the table after you load > data. Then the table should not be modified any more. I did a normal vacuum, and it seems to be behaving better, it's not writing all over the old files anymore. I think I'll set autovacuum_freeze_max_age a lot lower than the default 200 M. Note that this is not a static table, it will always be adding more rows. The behaviour I'm now still seeing is that it's reading the table all over during import of new data. I assume that also caused the writes before. I would really like to avoid all the random reads, but I'm not sure it can use the data from the index to avoid needing to read the datafile itself. My table looks like this: Table "public.raw_certificates" Column | Type | Collation | Nullable | Default --------------------+---------+-----------+----------+--------------------------------------------- certificate | bytea | | not null | id | bigint | | not null | nextval('raw_certificate_id_seq'::regclass) sha256_fingerprint | bytea | | not null | pre_certificate | boolean | | not null | Indexes: "raw_certificates_pkey" PRIMARY KEY, btree (id) "raw_certificates_sha256_fingerprint_key" UNIQUE CONSTRAINT, btree (sha256_fingerprint) Referenced by: TABLE "certificates" CONSTRAINT "certificates_raw_certificate_id_fkey" FOREIGN KEY (raw_certificate_id) REFERENCES raw_certificates(id) TABLE "ct_entry" CONSTRAINT "ct_entry_raw_certificate_id_fkey" FOREIGN KEY (raw_certificate_id) REFERENCES raw_certificates(id) To import data into it, I currently do: CREATE TEMP TABLE import_certs (certificate bytea not null, sha256_fingerprint bytea) COPY import_certs (certificate) FROM stdin update import_certs set sha256_fingerprint = digest(certificate, 'sha256') insert into raw_certificates (sha256_fingerprint, certificate, pre_certificate) select i.sha256_fingerprint, i.certificate,false from import_certs as i on conflict do nothing The behaviour I currently see is: - It's doing a read from a raw_certificates_sha256_fingerprint_key file, then from a raw_certificates file, then again from raw_certificates_sha256_fingerprint_key, repeating this about 5 times. - Then it does a write and a read to the import_certs table. I guess that after reading from the index, it needs to check the table that it's still visible or something. There isn't a way to avoid this? The write to the import_certs is very confusing to me. Anyway, the major reason for the sha256_fingerprint field is just to remove the duplicates. I would only like to have 1 copy of each certificate in that table. Does anybody have a suggestion on how to improve the performance? Once I catch up with all the old data again, I expect this table alone to be in the order of 10 TB, and grow at around 250 GB / month. And I think I need to start to consider moving it to SSDs to keep up. Kurt
On Mar 24, 2020, at 2:41 PM, Kurt Roeckx <kurt@roeckx.be> wrote:I think I'll set autovacuum_freeze_max_age a lot lower than the default
200 M.
Hmm… not sure what you really want to accomplish?
Administrating a high volume database and went the other way; with it set to 800M; however, auto vacuum set aggressively. I don’t want to see wrap around/freeze vacuums occurring as they are more aggressive than auto vacuum, a good indicator that vacuuming is not occurring enough, and will block where normal auto vacuums will yield. Really shouldn’t run into blocking issue unless issuing DDL statements; which the application shouldn’t do anyway. It does happen though; despite not allowing DDLs an applicatoin developer thought it would be wise to embed statistics updating on a table. The result is that the application would be denied but it still required a lock on the table to do so; the result is the freeze operation blocked the application.
On Tue, Mar 24, 2020 at 04:59:39PM -0400, Rui DeSousa wrote: > > > > On Mar 24, 2020, at 2:41 PM, Kurt Roeckx <kurt@roeckx.be> wrote: > > > > I think I'll set autovacuum_freeze_max_age a lot lower than the default > > 200 M. > > Hmm… not sure what you really want to accomplish? > > Administrating a high volume database and went the other way; with it set to 800M; however, auto vacuum set aggressively. The only other setting I see is autovacuum_vacuum_threshold, which only looks at updates and deletes. Kurt
> On Mar 24, 2020, at 6:02 PM, Kurt Roeckx <kurt@roeckx.be> wrote: > > > The only other setting I see is autovacuum_vacuum_threshold, which > only looks at updates and deletes. > Is it just the one table? Might want to set autovacuum_freeze_max_age on the table itself.
On Tue, Mar 24, 2020 at 07:01:08PM -0400, Rui DeSousa wrote: > > > On Mar 24, 2020, at 6:02 PM, Kurt Roeckx <kurt@roeckx.be> wrote: > > > > > > The only other setting I see is autovacuum_vacuum_threshold, which > > only looks at updates and deletes. > > Is it just the one table? Might want to set autovacuum_freeze_max_age on the table itself. That is at least what I meant. And it would be for at least 2 tables. Kurt