Re: [BUGS] BUG #14768: CREATE INDEX CONCURRENTLY IF NOT EXISTScancels autovacuum even if the index already exists. - Mailing list pgsql-bugs

From Marcin Barczyński
Subject Re: [BUGS] BUG #14768: CREATE INDEX CONCURRENTLY IF NOT EXISTScancels autovacuum even if the index already exists.
Date
Msg-id CAP3o3PdRbOk0dkiQcH7JJxhHa6-9RV8vsWCtiPSPjAkdUfeSWQ@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] BUG #14768: CREATE INDEX CONCURRENTLY IF NOT EXISTS cancels autovacuum even if the index already exists.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On Thu, Aug 24, 2017 at 7:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm afraid this complaint is just wishful/sloppy thinking.  

Cancelling autovacuum was a problem for two reasons:

1. We ran a bunch of CREATE INDEX CONCURRENTLY IF NOT EXISTS queries on our service start up. We thought it's a no-op if an index already exists, and wanted to have some kind of background migrations: the service works normally, but some operations are slower until the index is created.

2. Autovacuum takes days/weeks in our scale (billions of rows), so effectively it never completed due to service restarts. We investigated the problem, and it turned out that most of the time was spent on vacuuming a GIST index on timestamp range. I took a look at the code, and during vacuum GIST index is traversed in a logical order which translates into random disk accesses (function gistbulkdelete in gistvacuum.c). Our index has almost 800 GB, so random accesses affect us badly. On the other hand, btree indexes are vacuumed in physical order (function btvacuumscan in nbtree.c).

We replaced CREATE INDEX CONCURRENTLY IF NOT EXISTS with proper migrations. 
By the way, CREATE INDEX IF NOT EXISTS also cancels autovacuum task. I get your point with locks, but from a user's perspective, I don't understand why I have to resort to the following code to avoid cancelling autovacuum:

    DO $$
    BEGIN
        IF NOT EXISTS (
                SELECT 1
                    FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace
                    WHERE n.nspname = 'my_namespace' AND c.relname = 'my_index') THEN
            CREATE INDEX my_index ...;
        END IF;
    END$$;


As for the long autovacuum, maybe I should report it as a separate bug. For now, I'm planning to replace all uses of 'contains' operator with the following function:

    CREATE OR REPLACE FUNCTION tstzrange_contains(
        range tstzrange,
        ts timestamptz)
    RETURNS bool AS
    $$
    SELECT (ts >= lower(range) AND (lower_inc(range) OR ts > lower(range)))
       AND (ts <= upper(range) AND (upper_inc(range) OR ts < upper(range)))
    $$ LANGUAGE SQL IMMUTABLE;

and create btree indexes on lower and upper bound:

    CREATE INDEX my_table_time_range_lower_idx ON my_table (lower(time_range));
    CREATE INDEX my_table_time_range_upper_idx ON my_table (upper(time_range));

Is it the best approach? 

-- 
Regards,
Marcin

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: [BUGS] BUG #14768: CREATE INDEX CONCURRENTLY IF NOT EXISTS cancels autovacuum even if the index already exists.
Next
From: soleuu@gmail.com
Date:
Subject: [BUGS] BUG #14790: pg_restore - segfault