Re: New criteria for autovacuum - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: New criteria for autovacuum
Date
Msg-id CAJ7c6TPZf5EfLFohUyZXqmqFKOLhMnamZL+CFZuuuGSrcNm=Hw@mail.gmail.com
Whole thread Raw
In response to New criteria for autovacuum  (Konstantin Knizhnik <knizhnik@garret.ru>)
Responses Re: New criteria for autovacuum
Re: New criteria for autovacuum
List pgsql-hackers
Hi Konstantin,

> But the problem can be quite easily reproduced. We can just populate table with some data with some other transaction
withassigned XID active.
 
> Then explicitly vacuum this tables or wait until autovacuum does it.
> At this moment table has no more dead or inserted tuples so autovacuum will not be called for it. But heap pages of
thistable are still not marked as all-visible.
 
> And will never be marked as all-visible unless table is updated or is explicitly vacuumed.

I decided to experiment with the scenario you are describing. For
those who likes to have exact steps to reproduce the issue, as I do,
here they are:

Session 1:

```
CREATE TABLE humidity(
  ts TIMESTAMP NOT NULL,
  city TEXT NOT NULL,
  humidity INT NOT NULL);

CREATE INDEX humidity_idx ON humidity (ts, city) INCLUDE (humidity);

INSERT INTO humidity (ts, city, humidity)
SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random()
FROM generate_series('2010-01-01' :: TIMESTAMP,
                     '2020-12-31', '1 hour') AS ts,
     unnest(array['Moscow', 'Berlin']) AS city;
```

Session 2:

```
BEGIN;

INSERT INTO humidity (ts, city, humidity)
SELECT ts + (INTERVAL '60 minutes' * random()), city, 100*random()
FROM generate_series('2022-01-01' :: TIMESTAMP,
                     '2025-12-31', '1 hour') AS ts,
     unnest(array['Moscow', 'Berlin']) AS city;

-- no COMMIT
```

Session 1:

```
VACUUM humidity;
```

Session 2:

```
COMMIT;
```

Session 1:

```
EXPLAIN (ANALYZE, BUFFERS ON) SELECT humidity FROM humidity WHERE ts
>= '2022-01-01' AND ts < '2022-05-02' AND city = 'Moscow';
```

The result is:

```
 Index Only Scan using humidity_idx on humidity  (cost=0.42..58.75
rows=67 width=4) (actual time=0.060..7.490 rows=2904.00 loops=1)
   Index Cond: ((ts >= '2022-01-01 00:00:00'::timestamp without time
zone) AND (ts < '2022-05-02 00:00:00'::timestamp without time zone)
AND (city = 'M
oscow'::text))
   Heap Fetches: 2904
   Index Searches: 1
   Buffers: shared hit=123
 Planning:
   Buffers: shared hit=10
 Planning Time: 0.840 ms
 Execution Time: 7.964 ms
```

... and it is claimed that autovacuum will never be triggered in order
to set hint bits, assuming we never modify the table again.

I have mixed feelings about addressing this. Although this behavior is
somewhat counterintuitive, if the user has a read-only lookup table
he/she can always execute VACUUM manually. In order to relieve him of
this unbearable burden we are going to need to introduce some overhead
that will affect all the users (not to mention people maintaining the
code). This would be convenient for sure but I'm not entirely sure if
it's worth it.

Thoughts?

-- 
Best regards,
Aleksander Alekseev



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Next
From: Andrey Borodin
Date:
Subject: Re: [PATCH] Add sortsupport for range types and btree_gist