Why is this query touching 4gb of buffers? - Mailing list pgsql-general

From hubert depesz lubaczewski
Subject Why is this query touching 4gb of buffers?
Date
Msg-id aPtcnI2BZkeHjRvF@depesz.com
Whole thread Raw
Responses Re: Why is this query touching 4gb of buffers?
List pgsql-general
Hi,
I have weird-ish case, that I can't grok, or at least explain in
hand-wavy way.

Very simple query:

SELECT
    some_table.communication_channel_id,
    some_table.root_account_id
FROM
    some_schema.some_table
WHERE
    workflow_state = 'pending' AND
    send_at <= '2025-10-23 12:01:13';

On complex-ish table:

                                                      Table "some_schema.some_table"
             Column              │            Type             │ Collation │ Nullable │
Default

═════════════════════════════════╪═════════════════════════════╪═══════════╪══════════╪═══════════════════════════════════════════════════════════════════
 id                              │ bigint                      │           │ not null │
nextval('some_schema.some_table_id_seq'::regclass)
 notification_id                 │ bigint                      │           │          │
 notification_policy_id          │ bigint                      │           │          │
 context_id                      │ bigint                      │           │          │
 context_type                    │ character varying(255)      │           │          │
 communication_channel_id        │ bigint                      │           │          │
 frequency                       │ character varying(255)      │           │          │
 workflow_state                  │ character varying(255)      │           │          │
 batched_at                      │ timestamp without time zone │           │          │
 created_at                      │ timestamp without time zone │           │          │
 updated_at                      │ timestamp without time zone │           │          │
 send_at                         │ timestamp without time zone │           │          │
 link                            │ text                        │           │          │
 name_of_topic                   │ text                        │           │          │
 summary                         │ text                        │           │          │
 root_account_id                 │ bigint                      │           │          │
 notification_policy_override_id │ bigint                      │           │          │
Indexes:
    "some_table_pkey" PRIMARY KEY, btree (id), tablespace "data1"
…
    "index_some_table_pending" btree (send_at) WHERE workflow_state::text = 'pending'::text, tablespace "data1"
…

Running this via explain shows:

 Index Scan using index_some_table_pending on some_table  (cost=0.43..399992.44 rows=2215063 width=16) (actual
time=382.466..382.466rows=0 loops=1)
 
   Index Cond: (send_at <= '2025-10-23 12:01:13'::timestamp without time zone)
   Buffers: shared hit=554347
 Planning Time: 0.082 ms
 Execution Time: 382.482 ms
(5 rows)

Specifically, I see that it returned 0 rows, and to do it, it had to "touch" 554k buffers - ~ 4GB of data.

Sizes of the relavant things:
                 oid                  │ relkind │ pg_relation_size │ pg_table_size │ pg_total_relation_size
══════════════════════════════════════╪═════════╪══════════════════╪═══════════════╪════════════════════════
 some_schema.some_table               │ r       │      15339020288 │   15376539648 │            22347988992
 some_schema.index_some_table_pending │ i       │         77127680 │      77168640 │               77168640
(2 rows)


I didn't check analyze time, so can't comment on this, but I did reindex of
this index, which shrunk it to 16973824 bytes, and now the query runs, as expected, in < 1ms:

 Index Scan using index_some_table_pending on some_table  (cost=0.43..392423.37 rows=2215272 width=16) (actual
time=0.005..0.005rows=0 loops=1)
 
   Index Cond: (send_at <= '2025-10-23 12:01:13'::timestamp without time zone)
   Buffers: shared hit=3
 Planning:
   Buffers: shared hit=3
 Planning Time: 0.787 ms
 Execution Time: 0.016 ms
(7 rows)

While talking with others, there have been used certain terms, like
"unbounded range", which I understand, but I still don't see why, with
the same stats, and NO rows returned by index - before index, pg has to
do stuff to 4GB of data?! Anyone could try to explain?

If it matters, this is rather old Pg: PostgreSQL 14.7, and the query was
running on streaming replica, but results on primary were basically the
same.

Best regards,

depesz




pgsql-general by date:

Previous
From: Gregory Smith
Date:
Subject: Re: Index corruption issue after migration from RHEL 7 to RHEL 9 (PostgreSQL 11 streaming replication)
Next
From: hubert depesz lubaczewski
Date:
Subject: Re: Why is this query touching 4gb of buffers?