Thread: Vacuuming

Vacuuming

From
Paul Lambert
Date:
Is there any point to vacuuming a table if it has been bulk-populated by
data after a truncate?

I.e. If I do this:
TRUNCATE TABLE vehicles;
INSERT INTO vehicles (SELECT DISTINCT ON (dealer_id,vehicle_address) *
FROM vehicles_temp_load WHERE (dealer_id,vehicle_address) is not null);

Is there any point in vacuuming?

Also, is there any point in recreating indexes on this table after a
load like this or will indexes have been correctly maintained/updated by
the above insert. Note: This insert often loads tens of millions of records.

BTW, this is on Windows.

Thanks,
Paul.

--
Paul Lambert
Database Administrator
AutoLedgers

Re: Vacuuming

From
Tom Lane
Date:
Paul Lambert <paul.lambert@autoledgers.com.au> writes:
> Is there any point to vacuuming a table if it has been bulk-populated by
> data after a truncate?

> I.e. If I do this:
> TRUNCATE TABLE vehicles;
> INSERT INTO vehicles (SELECT DISTINCT ON (dealer_id,vehicle_address) *
> FROM vehicles_temp_load WHERE (dealer_id,vehicle_address) is not null);

> Is there any point in vacuuming?

The only thing a vacuum would do for you there is set the commit hint
bits on the newly-inserted rows.  Which might be worth doing if you want
to get the table into a totally "clean" state, but it's probably a bit
excessive.  SELECTs on the table will set the hint bits anyway as
they visit not-yet-hinted rows, so it's really a matter of do you want
to pay that overhead all at once or spread-out.

What you *do* want to do in this situation is an ANALYZE.

            regards, tom lane

Re: Vacuuming

From
Paul Lambert
Date:
Tom Lane wrote:
>
> The only thing a vacuum would do for you there is set the commit hint
> bits on the newly-inserted rows.  Which might be worth doing if you want
> to get the table into a totally "clean" state, but it's probably a bit
> excessive.  SELECTs on the table will set the hint bits anyway as
> they visit not-yet-hinted rows, so it's really a matter of do you want
> to pay that overhead all at once or spread-out.
>
> What you *do* want to do in this situation is an ANALYZE.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>
>

Thanks Tom.

Should the ANALYZE be done before or after indexes are built? Or is that
irrelevant? Should I not even bother rebuilding indexes when I do these
loads?

Currently I:
1) Drop Indexes
2) Truncate and copy in new data
3) Vacuum - now changed to analyze.
4) Create indexes

I add steps one and four on the assumption that adding 40 million
records in one hit might get the indexes confused - but if they are
pretty stable I can remove these steps.

P.

--
Paul Lambert
Database Administrator
AutoLedgers


Re: Vacuuming

From
Tom Lane
Date:
Paul Lambert <paul.lambert@autoledgers.com.au> writes:
> Tom Lane wrote:
>> What you *do* want to do in this situation is an ANALYZE.

> Should the ANALYZE be done before or after indexes are built? Or is that
> irrelevant?

For ordinary indexes it doesn't matter.  If you have any expression
indexes then you should build them before running ANALYZE, because
ANALYZE takes the hint to collect stats on those expressions as well
as the raw column values.  (Eventually this advice might apply to
multicolumn and partial indexes as well, but right now ANALYZE doesn't
treat those specially, AFAIR.)  In any case there's no good reason
to do ANALYZE first if you have a free choice.

> Should I not even bother rebuilding indexes when I do these loads?

There's some value in the advice to "drop indexes, load data, recreate
indexes".  TRUNCATE will happily truncate the indexes to nothing along
with the table, but when you then load data you are building the indexes
incrementally instead of in-bulk.  This process is slower than a bulk
index build and ends up with a more-fragmented index.  (At least for
btree indexes --- I'm not sure which other index types are smarter
about bulk vs incremental build.)

> Currently I:
> 1) Drop Indexes
> 2) Truncate and copy in new data
> 3) Vacuum - now changed to analyze.
> 4) Create indexes

I'd interchange steps 3 and 4; otherwise you are good.

            regards, tom lane

Re: Vacuuming

From
Tom Lane
Date:
I wrote:
>> Is there any point in vacuuming?

> The only thing a vacuum would do for you there is set the commit hint
> bits on the newly-inserted rows.  Which might be worth doing if you want
> to get the table into a totally "clean" state, but it's probably a bit
> excessive.  SELECTs on the table will set the hint bits anyway as
> they visit not-yet-hinted rows, so it's really a matter of do you want
> to pay that overhead all at once or spread-out.

I forgot to mention that any other operation that examines every table
row will fix all the hint bits as well.  In particular a CREATE INDEX
would do that --- so if you are planning to create some indexes then
there's certainly no point in a VACUUM just after a table load.

            regards, tom lane

Re: Vacuuming

From
Paul Lambert
Date:
Tom Lane wrote:
>
> I forgot to mention that any other operation that examines every table
> row will fix all the hint bits as well.  In particular a CREATE INDEX
> would do that --- so if you are planning to create some indexes then
> there's certainly no point in a VACUUM just after a table load.
>
>             regards, tom lane
>

Thanks for all the help Tom, educational as always.

--
Paul Lambert
Database Administrator
AutoLedgers

Re: Vacuuming

From
"Pavan Deolasee"
Date:

On 5/8/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I forgot to mention that any other operation that examines every table
row will fix all the hint bits as well.  In particular a CREATE INDEX
would do that ---

I might be missing something, but I think CREATE INDEX work on
SnapshotAny and hence may not actually examine any table row ?

Thanks,
Pavan

--

EnterpriseDB     http://www.enterprisedb.com

Re: Vacuuming

From
Tom Lane
Date:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> On 5/8/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I forgot to mention that any other operation that examines every table
>> row will fix all the hint bits as well.  In particular a CREATE INDEX
>> would do that ---

> I might be missing something, but I think CREATE INDEX work on
> SnapshotAny and hence may not actually examine any table row ?

SnapshotAny is a no-op, but HeapTupleSatisfiesVacuum isn't.

            regards, tom lane

Re: Vacuuming

From
"Pavan Deolasee"
Date:


On 5/8/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

SnapshotAny is a no-op, but HeapTupleSatisfiesVacuum isn't.

Oh yes. My apologies for forgetting IndexBuildHeapScan()
 

Thanks,
Pavan

--

EnterpriseDB     http://www.enterprisedb.com