Partial index creation always scans the entire table - Mailing list pgsql-performance

From MingJu Wu
Subject Partial index creation always scans the entire table
Date
Msg-id CAKVFrvFY-f7kgwMRMiPLbPYMmgjc8Y2jjUGK_Y0HVcYAmU6ymg@mail.gmail.com
Whole thread Raw
Responses Re: Partial index creation always scans the entire table  (Sergei Kornilov <sk@zsrv.org>)
Re: Partial index creation always scans the entire table  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Partial index creation always scans the entire table  (Laurenz Albe <laurenz.albe@cybertec.at>)
List pgsql-performance
Hello,

When creating partial indexes, can postgres utilize another index for figuring which rows should be included in the partial index, without performing a full table scan?

My scenario is that I have a table with 50M rows that are categorized into 10K categories. I need to create a partial index for each category. I have created a index on the category column, hoping that postgres can use this information when creating the partial indexes. However, postgres always performs full table scan.

I've tested with PostgreSQL 12.2. Below is an example setup showing the problem.

TEST1 shows that building a full index covering all rows takes 18 seconds.

TEST2 shows that creating a partial index for one of the category1 (category1=1) takes 3 seconds. This means that for creating 10K partial indexes for each category, it will take over 8 hours. Compared to just 18 seconds in TEST1, it is much longer due to repeated full table scans.

TEST3 shows that even with another index (index_category2 created in SETUP) covering category2, creating a partial index for one of the category2 (category2=1) still takes 3 seconds. I think postgres is still doing a full table scan here.

My question is: can postgres utilize index_category2 is TEST3?

Thank you.

---------
-- SETUP
---------

CREATE TABLE test_data (
    id bigint PRIMARY KEY,
    category1 bigint,
    category2 bigint
);


INSERT INTO test_data(id, category1, category2)
SELECT id, category, category FROM (
    SELECT
        generate_series(1, 50000000) AS id,
        (random()*10000)::bigint AS category
) q;
--  Query returned successfully in 1 min 47 secs.

CREATE INDEX index_category2 ON test_data(category2);
-- Query returned successfully in 32 secs 347 msec.


--------------
-- TEST1: CREATE FULL INDEX
--------------

CREATE INDEX index_full ON test_data(id);
-- Query returned successfully in 18 secs 713 msec.


--------------
-- TEST2: CREATE PARTIAL INDEX, using category1
--------------

CREATE INDEX index_partial_1 ON test_data(id) WHERE category1=1;
-- Query returned successfully in 3 secs 523 msec.


--------------
-- TEST3: CREATE PARTIAL INDEX, using category2
--------------

CREATE INDEX index_partial_2 ON test_data(id) WHERE category2=1;
-- Query returned successfully in 3 secs 651 msec.


--- END ---

pgsql-performance by date:

Previous
From: Michael Lewis
Date:
Subject: Re: How to avoid UPDATE performance degradation in a transaction
Next
From: Sergei Kornilov
Date:
Subject: Re: Partial index creation always scans the entire table