stored proc and inserting hundreds of thousands of rows - Mailing list pgsql-performance

From Joel Reymont
Subject stored proc and inserting hundreds of thousands of rows
Date
Msg-id D5DE36B9-17CF-4A55-9119-CDC0BDC4E48D@gmail.com
Whole thread Raw
Responses Re: stored proc and inserting hundreds of thousands of rows  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
I have a stored proc that potentially inserts hundreds of thousands, potentially millions, of rows (below).

This stored proc is part of the the sequence of creating an ad campaign and links an ad to documents it should be
displayedwith. 

A few of these stored procs can run concurrently as users create ad campaigns.

We have 2 million documents now and linking an ad to all of them takes 5 minutes on my top-of-the-line SSD MacBook Pro.

Last but not least, the system has to quickly serve ads while documents are being linked which is a problem at the
moment.

What can I do to make linking documents to ads faster or have less impact on the system. I would like the system to be
asresponsive with serving ads while the linking itself is allowed to take a few minutes.  

One thing I'm concerned with, for example, is the whole multi-million row insert running within the stored proc
transaction.I think inserting rows one by one or in small batches may be an improvement. I don't know how to accomplish
this,though. 

    Thanks, Joel

---

CREATE DOMAIN doc_id AS varchar(64);
CREATE DOMAIN id AS int;

CREATE TABLE doc_ads
(
  doc_id    id NOT NULL REFERENCES docs,
  ad_id     id NOT NULL REFERENCES ads,
  distance  float NOT NULL
);

CREATE INDEX doc_ads_idx ON doc_ads(doc_id);

CREATE OR REPLACE FUNCTION link_doc_to_ads(doc id, threshold float)
RETURNS void AS $$
BEGIN
  INSERT INTO doc_ads (doc_id, ad_id, distance)
  SELECT doc, (t).ad_id, (t).distance
  FROM (SELECT ads_within_distance(topics, threshold) AS t
       FROM docs
       WHERE id = doc) AS x;
  ANALYZE doc_ads;
END;
$$ LANGUAGE plpgsql;

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------




pgsql-performance by date:

Previous
From: Kenneth Marshall
Date:
Subject: Re: REINDEX takes half a day (and still not complete!)
Next
From: "Kevin Grittner"
Date:
Subject: Re: stored proc and inserting hundreds of thousands of rows