Thread: Does writing new records while massive update will generate lock ?

Does writing new records while massive update will generate lock ?

From
Victor d'Agostino
Date:
Hi everybody,

I added a datetime column to a table with 51.10^6 entries.
ALTER TABLE MYBIGTABLE ADD COLUMN date timestamp without time zone;

I'm updating this column (for more than 48 hours now) on a RAID5 server.
UPDATE MYBIGTABLE SET date = (SELECT date FROM INDEXEDTABLE WHERE INDEXEDTABLE.email_id=MYBIGTABLE.email_id) WHERE date is null;

This transaction is still running and will end in several days. It only uses 1 core.



My question is : Can I add new records in the table or will it generate locks ?


I am using postgresql 8.4

Thanks for your help !

Regards,

Victor

Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite.Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu.

Attachment

Re: Does writing new records while massive update will generate lock ?

From
Shaun Thomas
Date:
On 08/21/2014 08:41 AM, Victor d'Agostino wrote:

> UPDATE MYBIGTABLE SET date = (SELECT date FROM INDEXEDTABLE WHERE
> INDEXEDTABLE.email_id=MYBIGTABLE.email_id) WHERE date is null;

I may be wrong here, but wouldn't this style of query force a nested
loop? Over several million rows, that would take an extremely long time.
You might want to try this syntax instead:

UPDATE MYBIGTABLE big
    SET date = idx.date
   FROM INDEXEDTABLE idx
  WHERE idx.email_id = big.email_id
    AND big.date IS NULL;

> This transaction is still running and will end in several days. It only
> uses 1 core.

That's not your problem. I suspect if you checked your RAID IO, you'd
see 100% IO utilization because instead of a sequence scan, it's
performing a random seek for every update.

> My question is : Can I add new records in the table or will it generate
> locks ?

Your update statement will only lock the rows being updated. You should
be able to add new rows, but with the IO consuming your RAID, you'll
probably see significant write delays that resemble lock waits.

--
Shaun Thomas
OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Does writing new records while massive update will generate lock ?

From
Raymond O'Donnell
Date:
On 21/08/2014 14:41, Victor d'Agostino wrote:
> Hi everybody,
>
> I added a datetime column to a table with 51.10^6 entries.
>
> ALTER TABLE MYBIGTABLE ADD COLUMN date timestamp without time zone;
>
> I'm updating this column (for more than 48 hours now) on a RAID5
> server.
>
> UPDATE MYBIGTABLE SET date = (SELECT date FROM INDEXEDTABLE WHERE
> INDEXEDTABLE.email_id=MYBIGTABLE.email_id) WHERE date is null;
>
> This transaction is still running and will end in several days. It
> only uses 1 core.
>
> My question is : Can I add new records in the table or will it
> generate locks ?
>
> I am using postgresql *8.4*


Not that this helps your issue, but you may not be aware that 8.4 is now
end-of-life and so is no longer supported:

  http://www.postgresql.org/support/versioning

Ray.


--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie


Re: Does writing new records while massive update will generate lock ?

From
Alban Hertroys
Date:

On 21 August 2014 15:41, Victor d'Agostino <victor.d.agostino@fiducial.net> wrote:
I'm updating this column (for more than 48 hours now) on a RAID5 server.

RAID5? That's probably the worst performing RAID configuration you can have and is usually advised against on this list.
You would be better off with RAID 10, RAID 1+0 or even RAID 1, but you would be using more disk space.

That said, if your IO is not being saturated by that query, you could split the update across multiple CPU's by dividing up the email_id's over multiple queries that you run from a session each.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.