Thread: Making table reloading easier

Making table reloading easier

From
Craig Ringer
Date:
Hi all

A very common operation that users perform is reloading tables.
Sometimes as part of an ETL process. Sometimes as part of a dump and
reload. Sometimes loading data from external DBs, etc.

Right now users have to jump through a bunch of hoops to do this efficiently:

BEGIN;

TRUNCATE TABLE my_table;

SELECT pg_get_indexdef(indexrelid::regclass)
FROM pg_index WHERE indrelid = 'table_name'::regclass;

-- Drop 'em all
DROP INDEX ... ;

COPY my_table FROM 'file';

-- Re-create indexes
CREATE INDEX ...;

COMMIT;


This is pretty clunky.

We already have support for disabling indexes, it's just not exposed
to the user. So the simplest option would seem to be to expose it with
something like:
 ALTER TABLE my_table DISABLE INDEX ALL;

which would take an ACCESS EXCLUSIVE lock then set:
 indistready = 'f' indislive = 'f' indisvalid = 'f'

on each index, or the named index if the user specifies one particular index.

After loading the table, a REINDEX on the table would rebuild and
re-enable the indexes.

That changes the process to:

BEGIN;
TRUNCATE TABLE my_table;
ALTER TABLE my_table DISABLE INDEX ALL;
COPY ...;
REINDEX TABLE my_table;
COMMIT;

It'd be even better to also add a REINDEX flag to COPY, where it
disables indexes and re-creates them after it finishes. But that could
be done separately.

Thoughts?

I'm not sure I can tackle this in the current dev cycle, but it looks
simple enough that I can't help wondering what obvious thing I'm
missing about why it hasn't been done yet.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Making table reloading easier

From
Corey Huinker
Date:

  ALTER TABLE my_table
  DISABLE INDEX ALL;

+1
This very thing came up in a conversation with PeterG early last year. I was in favor then and I was surprised that the only thing standing in the way was a lack of ALTER TABLE syntax.
Creating temporary data structures to mimic existing metadata structures is a pain.
 

It'd be even better to also add a REINDEX flag to COPY, where it
disables indexes and re-creates them after it finishes. But that could
be done separately.

I'm iffy on the COPY change. If we add index rebuilding, why not disabling as well? If the COPY fails, what state do we leave the indexes in?

I'm not sure I can tackle this in the current dev cycle,

I may have some spare cycles to devote to this, but it's unfamiliar territory. I'm happy to do the grunt work if I had some higher level guidance.

Re: Making table reloading easier

From
Michael Paquier
Date:
On Thu, Nov 3, 2016 at 11:37 PM, Corey Huinker <corey.huinker@gmail.com> wrote:
>>   ALTER TABLE my_table
>>   DISABLE INDEX ALL;
>
> +1
> This very thing came up in a conversation with PeterG early last year. I was
> in favor then and I was surprised that the only thing standing in the way
> was a lack of ALTER TABLE syntax.
> Creating temporary data structures to mimic existing metadata structures is
> a pain.

As long as the exclusive lock is kept until the end of the transaction
that would make it. I got scared when reading $subject that you were
going to propose a ENABLE ALL. This command should be a no-opt outside
a transaction block though.

>> It'd be even better to also add a REINDEX flag to COPY, where it
>> disables indexes and re-creates them after it finishes. But that could
>> be done separately.
>
> I'm iffy on the COPY change. If we add index rebuilding, why not disabling
> as well? If the COPY fails, what state do we leave the indexes in?

Yeah, I am -1 on that. Leaving the indexes in an unclean state would
force the users to REINDEX anyway afterwards.
-- 
Michael