Home > mailing lists

Entire index scanned, but only when in SQL function? - Mailing list pgsql-performance

From	Philip Semanchuk
Subject	Entire index scanned, but only when in SQL function?
Date	July 11, 2023 16:07:26
Msg-id	9D9DBF30-B6EF-49DB-864D-610849C37D41@americanefficient.com Whole thread Raw
List	pgsql-performance

Tree view

Hi there,
I’m on Postgres 13.11 and I'm seeing a situation where an INSERT...SELECT statement seq scans an index, but only when
wrappedin a SQL function. When invoked directly (via psql) or when called via a PL/pgSQL function, it only reads the
indextuples it needs, resulting in much better performance. I can solve my problem by writing the function in PL/pgSQL,
butI'm curious why the pure SQL version behaves the way it does. 

Here's my table --

\d documents
+-------------------+------------------+----------------------------------------+
| Column            | Type             | Modifiers                              |
|-------------------+------------------+----------------------------------------|
| document_id       | integer          |  not null generated always as identity |
| product_id        | integer          |  not null                              |
| units_sold        | integer          |  not null                              |
| sale_date         | date             |  not null                              |
... some other columns ...
+-------------------+------------------+----------------------------------------+

CREATE INDEX idx_philip_tmp on documents (document_id, product_id);

Here's the SQL function which will use that index --

CREATE OR REPLACE FUNCTION fn_create_tasks(product_ids int[])
RETURNS void
AS $$
    -- Create processing tasks for documents related to these products
    INSERT INTO
        processing_queue (document_id)
    SELECT
        DISTINCT document_id
    FROM
        documents
    JOIN unnest(product_ids::int[]) AS product_id USING (product_id)
    ;

$$ LANGUAGE sql VOLATILE PARALLEL SAFE;

96498 is a product_id that has one associated document_id. When I copy/paste this statement into psql, it executes
quickly,and pg_stat_user_indexes.idx_tup_read reports 2 tuples read for the index. 

INSERT INTO
    processing_queue (document_id)
SELECT
    DISTINCT document_id
FROM
    documents
JOIN unnest(ARRAY[96498]::int[]) AS product_id USING (product_id)
;

When I copy/paste this into psql, I expect it to perform just as quickly but it does not.
pg_stat_user_indexes.idx_tup_readreports 64313783 tuples read (which is the entire index). 

SELECT fn_create_tasks(ARRAY[96498]::int[])

If I rewrite fn_create_tasks() in PL/pgSQL, it behaves as I expect (executes quickly, pg_stat_user_indexes.idx_tup_read
=2). 

SELECT fn_create_tasks_plpgsql(ARRAY[96498]::int[])

My rule of thumb is that SQL functions always perform as well as or better than a PL/pgSQL equivalent, but this is a
casewhere that's not true. If anyone can give me some clues as to what's happening here, I'd appreciate it. 

Thanks
Philip

pgsql-performance by date:

From: Dimitrios Apostolou
Date: 10 July 2023, 14:28:51
Subject: Performance implications of 8K pread()s

From: Thomas Munro
Date: 11 July 2023, 17:12:45
Subject: Re: Performance implications of 8K pread()s

Entire index scanned, but only when in SQL function? - Mailing list pgsql-performance

Previous

Next