On Mon, Sep 10, 2012 at 9:34 AM, Craig Gibson <craiggib@gmail.com> wrote:
> Hi all
>
> I am no database wizard so I am hoping someone may be able to assist me :)
>
> I get a daily CSV file of 6.5 million records. I create a temporary
> table and COPY them in. On completion I create an index on the mdnid
> column. This column is also indexed in table 2. This part is very
> fast. I had some 'checkpoint too often' issues, but that I have
> resolved.
>
> I then use the following procedure to update all the records, and if a
> record does not exist, insert it instead:
>
...
>
> From my understanding, a for loop is encapsulated in a cursor anyway
> so no need to do that. Am I fundamentally doing something wrong as the
> operation is slow as molasses?
How big is the parent table? Are you CPU limited or IO limited?
If you are not CPU limited, then I would guess that the indexes on the
parent table do not fit in RAM or shared_buffers and that maintaining
the indexes on the parent table during the updates/inserts is the
bottleneck.
> Maybe there is a better way altogether
> that I have not thought of? The bottom line is that at no point can
> the e_lookup table be unavailable to clients,
It always has to be available for updates, or just for selects?
> else I would have just
> done a simple drop and rename post the COPY.
Maybe you can just do an atomic rename.
Cheers,
Jeff