Home > mailing lists

Re: questions on (parallel) COPY and when to REINDEX - Mailing list pgsql-general

From	Andy Colson
Subject	Re: questions on (parallel) COPY and when to REINDEX
Date	August 2, 2009 22:19:00
Msg-id	4A76367D.5060802@squeakycode.net Whole thread Raw
In response to	Re: questions on (parallel) COPY and when to REINDEX (Janet Jacobsen <jsjacobsen@lbl.gov>)
Responses	Re: questions on (parallel) COPY and when to REINDEX
List	pgsql-general

Tree view

>> On 1 Aug 2009, at 23:24, Janet Jacobsen wrote:
>>> My questions are:
>>> (2) Should I REINDEX these two tables daily after the pipeline
>>> completes?  Is this what other people do in practice?
>>
>> No need, but as people are querying as soon as data is arriving, an
>> analyse of the table you just copied to should help performance - the
>> query plans will be derived from more accurate table statistics that way.
>
> The files that are being copied into the table(s) are between 200 to 1,000
> each, but there are hundreds of these small files every day.  Would you
> recommend running analyze after every COPY?

Just once, at the end.  Assuming you are only deleting 100k records and re-adding/updating another 100k in a batch.
That'snot so many records it'll through the stats out of whack.  (If you were dropping/updating 15M in a batch you
mightwanna analyze in the middle) 

>>
>> You wrote that your pipeline runs for a period of 4 hours and the
>> table is about 15M rows now. What is taking up all that time? I
>> understand why you'd want to parallelise that process, but where do
>> you expect the gains?
>
>
> We're processing images, and the data processing and analysis takes
> up most of the time, but the images can be processed/analyzed in parallel.
> We've been doing all of the data loading at the end - one COPY at a
> time.  Originally that made sense because the researchers wanted to
> check the images before loading the data/analysis results into the db.

Ah!  Images!  When you are doing the COPY are you escaping the data?  You cant "just" copy a binary file.

-Andy

pgsql-general by date:

From: andy
Date: 02 August 2009, 21:37:29
Subject: Re: questions on (parallel) COPY and when to REINDEX

From: Ilya Urikh
Date: 02 August 2009, 23:46:09
Subject: Problem with Numerics multiplication in C-function

Re: questions on (parallel) COPY and when to REINDEX - Mailing list pgsql-general

Previous

Next