Home > mailing lists

Re: [GENERAL] plpgsql function with offset - Postgres 9.1 - Mailing list pgsql-general

From	Patrick B
Subject	Re: [GENERAL] plpgsql function with offset - Postgres 9.1
Date	June 16, 2017 01:19:45
Msg-id	CAJNY3iud3BehfEBCamyn_ou9pPT=1UUGai0b5ore1Q-VPt5OpA@mail.gmail.com Whole thread
In response to	Re: [GENERAL] plpgsql function with offset - Postgres 9.1 (Albe Laurenz <laurenz.albe@wien.gv.at>)
Responses	Re: [GENERAL] plpgsql function with offset - Postgres 9.1 Re: [GENERAL] plpgsql function with offset - Postgres 9.1
List	pgsql-general

Tree view

2017-05-29 19:27 GMT+12:00 Albe Laurenz <laurenz.albe@wien.gv.at>:

Patrick B wrote:
> I am running a background task on my DB, which will copy data from tableA to tableB. For
> that, I'm writing a PL/PGSQL function which basically needs to do the following:
>
>
> 1. Select the data from tableA
> 2. The limit will be put when calling the function
> 3. insert the selected data on Step 1 onto new table
>
> Question:
>
> * When I stop it and start it again, how can the query "know" that it has already
> processed some rows so it won't do it twice on the same rows? If it stopped on row number
> 100, I need it to continue on row number 101, for example.
>
> * How can I ask the function to return the number of processed rows?
>
>
> I can add a column on TableB if needed, but not on tableA.
>
> This is what I've done so far:

> CREATE or REPLACE FUNCTION data_copy(rows integer)
> RETURNS SETOF bigint AS $$

CREATE or REPLACE FUNCTION data_copy(p_limit integer, p_offset integer)
RETURNS integer;

> declare
> row record;
> offset_num integer;

num_rows integer := 0;

> BEGIN
> FOR row IN EXECUTE '
> SELECT
> id,
> path,
> name,
> name_last,
> created_at
> FROM
> tablea
> WHERE
> ready = true
> ORDER BY 1 LIMIT ' || rows || ' OFFSET ' || rows || ''

'... LIMIT ' || p_limit || ' OFFSET ' || p_offset

> LOOP

num_rows := num_rows + 1;

> INSERT INTO tableB (id,path,name,name_last,created_at)
> VALUES (row.id,row.path,row.name,row.name_last,row.created_at);
>
> END LOOP;

RETURN num_rows;

> END
> $$ language 'plpgsql';

There are two problems with this approach:

1. It will do the wrong thing if rows are added or deleted in "tablea" while
you process it.

There will be actually records being inserted in tablea while processing the migration.... Any ideas here?

I can add another column in tablea, like example: row_migrated boolean --> if that helps

2. Queries with hight OFFSET values have bad performance.

No problem. The plan is to perform 2k rows at once, which is not much.

The solution is to avoid OFFSET and to use "keyset pagination":
http://use-the-index-luke.com/no-offset

Thanks

Patrick

pgsql-general by date:

From: Tom Lane
Date: 16 June 2017, 01:16:14
Subject: Re: [GENERAL] Connection options

From: "David G. Johnston"
Date: 16 June 2017, 01:35:00
Subject: Re: [GENERAL] plpgsql function with offset - Postgres 9.1

Re: [GENERAL] plpgsql function with offset - Postgres 9.1 - Mailing list pgsql-general

Previous

Next