Thread: PL/PGSQL + inserts+updates+limit - Postgres 9.3

PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick Baker
Date:
Hi guys,

I need a function ( PL/PGSQL ) to perform a deletion of some BLOBS...

I have four tables:

- original_table1_b = Original table, where the BLOBS are
- table1_n_b = Table where everything related to the BLOBS is stored (file_id, account_id, note_id, etc)
- table2_y_b = Table BACKUP - The blobs+data will be copied to here before being deleted
- table3_n_b = On the table1_n_b, each blob is related to a note_id. Each note_id has three different file_id. I want to delete just the greatest one. So on this 
table3_n_b table I'm storing the greates file_id (by size) 


How is the table3_n_b table created:

SELECT * INTO table3_n_b FROM
(       SELECT account_id, note_id, st_ino, size FROM       (               SELECT DISTINCT ON                       (note_id) note_id,                       MAX(size),                       file_id,                       id               FROM                       table1_n_b               GROUP BY                       note_id, size, file_id, id               ORDER BY                       note_id, size desc       ) AS r1
) AS r2;

The function must perform the following:

1 - Select note_id + size + file_id + full_path from table1_n_b table to the new table2_y_b one, but only those file_id that are greatest, so here we use the table created above: table3_n_b

- Something like this?

INSERT INTO table2_y_b (note_id, size, file_id, full_path)
    (
            SELECT
                    t1.note_id,
                    t1.size,
                    t1.file_id,
                    t1.full_path
            INTO
                    table2_y_b
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1.file_id
    )

2 - Once the Blob's data is inside the table2_y_b table, we can now copy the blobs into the same table.

- something like this?

INSERT INTO table2_y_b (data)
    (
            SELECT
                    o1.data
            FROM
                    original_table1_b o1
            JOIN
                    table3_n_b t3 ON t3.file_id = o1.file_id
    )

3 - Changing the table2_y_b.migrated column from 0 to 1 (1 means the blob has been already copied):
FOR crtRow IN execute
'UPDATE table2_y_b SET migrated = 1 WHERE file_id = crtRow.file_id AND migrated = 0 ' || $1 ||' offset '||


4 - After we have a backup of the blobs+data, we can now delete the blob (setting the column as NULL)
FOR crtRow IN execute
'UPDATE original_table1_b SET data = NULL WHERE file_id = crtRow.file_id ' || $1 ||' offset '||


This is what I've done so far:

CREATE or REPLACE FUNCTION function_1_name(rows integer)
RETURNS INTEGER AS $$

declare
  completed integer;
  crtRow record;

BEGIN
  offset_num = 0;

-- Copiyng the data into the table which will store the data+blobs
FOR crtRow IN execute
    'INSERT INTO table2_y_b (note_id, size, file_id, full_path)
    (
            SELECT
                    t1.note_id,
                    t1.size,
                    t1.file_id,
                    t1.full_path
            INTO
                    table2_y_b
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1.file_id
    ) ' || $1 ||' offset '||

-- Copying the BLOBS
FOR crtRow IN execute
    'INSERT INTO table2_y_b (data)
    (
            SELECT
                    o1.data
            FROM
                    original_table1_b o1
            JOIN
                    table3_n_b t3 ON t3.file_id = o1.file_id
            JOIN
                    table2_y_b t2 ON t2.file_id = o1.file_id
            WHERE
                    t2.migrated = 0
    ) ' || $1 ||' offset '||

-- Update the migrated column from 0 to 1, for those rows that have been modified/copied.
FOR crtRow IN execute
'UPDATE table2_y_b SET migrated = 1 WHERE file_id = crtRow.file_id AND migrated = 0 ' || $1 ||' offset '||

FOR crtRow IN execute
'UPDATE original_table1_b SET data = NULL WHERE file_id = crtRow.file_id ' || $1 ||' offset '||


RETURN file_id;

END

$$ language 'plpgsql';


Am I doing right?
When I will call the function: select function_1_name(5000) or select function_1_name(15000) will it respect the limited by the rows?

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
"David G. Johnston"
Date:
On Wed, Jun 1, 2016 at 8:10 PM, Patrick Baker <patrickbakerbr@gmail.com> wrote:
Hi guys,

I need a function ( PL/PGSQL ) to perform a deletion of some BLOBS...


​[...]
When I will call the function: select function_1_name(5000) or select function_1_name(15000) will it respect the limited by the rows?

​I'd suggest you setup a test environment with some unimportant data on a non-production machine and try it yourself.
​​

David J.
 

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick Baker
Date:



​I'd suggest you setup a test environment with some unimportant data on a non-production machine and try it yourself.
​​

David J.
 


Thanks.. but if I'm asking the list that's because I'm already testing it and it's not working... ;)


Patrick

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
"David G. Johnston"
Date:
On Wednesday, June 1, 2016, Patrick Baker <patrickbakerbr@gmail.com> wrote:



​I'd suggest you setup a test environment with some unimportant data on a non-production machine and try it yourself.
​​

David J.
 


Thanks.. but if I'm asking the list that's because I'm already testing it and it's not working... ;)


Are you getting errors?  A quick look seemed like it shouldn't even run do to syntax problems.

David J. 

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Adrian Klaver
Date:
On 06/01/2016 05:10 PM, Patrick Baker wrote:
> Hi guys,
>
> I need a function ( PL/PGSQL ) to perform a deletion of some BLOBS...
>
> I have four tables:
>
>     *- original_table1_b =* Original table, where the BLOBS are
>     *- table1_n_b =* Table where everything related to the BLOBS is
>     stored (file_id, account_id, note_id, etc)
>     *- table2_y_b =* Table BACKUP - The blobs+data will be copied to
>     here before being deleted
>     *- table3_n_b =* On the *table1_n_b*, each blob is related to a
>     note_id. Each note_id has three different file_id. I want to delete
>     just the greatest one. So on this *table3_n_b* table I'm storing the
>     greates file_id (by size)
>
>
>
> How is the *table3_n_b* table created:
>
> |SELECT*INTOtable3_n_b FROM(SELECTaccount_id,note_id,st_ino,size
> FROM(SELECTDISTINCTON(note_id)note_id,MAX(size),file_id,id
> FROMtable1_n_b GROUPBYnote_id,size,file_id,id ORDERBYnote_id,size
> desc)ASr1 )ASr2;|
>
>
> The function must perform the following:
>
> 1 - Select /_note_id + size + file_id + full_path_/ from *table1_n_b*
> table to the new *table2_y_b* one, but only those file_id that are
> greatest, so here we use the table created above: *table3_n_b*:
>
> - Something like this?
>
>     INSERT INTO table2_y_b (note_id, size, file_id, full_path)
>         (
>                 SELECT
>                         t1.note_id,
>                         t1.size,
>                         t1.file_id,
>                         t1.full_path
>                 INTO
>                         table2_y_b
>                 FROM
>                         table1_n_b t1
>                 JOIN
>                         table3_n_b t3 ON t3.file_id = t1.file_id
>         )
>
>
> 2 - Once the Blob's data is inside the *table2_y_b* table, we can now
> copy the blobs into the same table.
>
> - something like this?
>
>     INSERT INTO table2_y_b (data)
>         (
>                 SELECT
>                         o1.data
>                 FROM
>                         original_table1_b o1
>                 JOIN
>                         table3_n_b t3 ON t3.file_id = o1.file_id
>         )
>
>
> 3 - Changing the table2_y_b.migrated column from 0 to 1 (1 means the
> blob has been already copied):
>
>     FOR crtRow IN execute
>     'UPDATE table2_y_b SET migrated = 1 WHERE file_id = crtRow.file_id
>     AND migrated = 0 ' || $1 ||' offset '||
>
>
>
> 4 - After we have a backup of the blobs+data, we can now delete the blob
> (setting the column as NULL)
>
>     FOR crtRow IN execute
>
>     'UPDATE original_table1_b SET data = NULL WHERE file_id =
>     crtRow.file_id ' || $1 ||' offset '||
>
>
>
> *This is what I've done so far:*
>
>         CREATE or REPLACE FUNCTION function_1_name(rows integer)
>
>         RETURNS INTEGER AS $$
>
>
>         declare
>
>           completed integer;
>
>           crtRow record;
>
>
>         BEGIN
>
>           offset_num = 0;
>
>
>         -- Copiyng the data into the table which will store the data+blobs
>
>         FOR crtRow IN execute
>
>             'INSERT INTO table2_y_b (note_id, size, file_id, full_path)
>
>             (
>
>                     SELECT
>
>                             t1.note_id,
>
>                             t1.size,
>
>                             t1.file_id,
>
>                             t1.full_path
>
>                     INTO
>
>                             table2_y_b
>
>                     FROM
>
>                             table1_n_b t1
>
>                     JOIN
>
>                             table3_n_b t3 ON t3.file_id = t1.file_id
>
>             ) ' || $1 ||' offset '||
>
>
>         -- Copying the BLOBS
>
>         FOR crtRow IN execute
>
>             'INSERT INTO table2_y_b (data)
>
>             (
>
>                     SELECT
>
>                             o1.data
>
>                     FROM
>
>                             original_table1_b o1
>
>                     JOIN
>
>                             table3_n_b t3 ON t3.file_id = o1.file_id
>
>                     JOIN
>
>                             table2_y_b t2 ON t2.file_id = o1.file_id
>
>                     WHERE
>
>                             t2.migrated = 0
>
>             ) ' || $1 ||' offset '||
>
>
>         -- Update the migrated column from 0 to 1, for those rows that
>         have been modified/copied.
>
>         FOR crtRow IN execute
>
>         'UPDATE table2_y_b SET migrated = 1 WHERE file_id =
>         crtRow.file_id AND migrated = 0 ' || $1 ||' offset '||
>
>
>         FOR crtRow IN execute
>
>         'UPDATE original_table1_b SET data = NULL WHERE file_id =
>         crtRow.file_id ' || $1 ||' offset '||
>
>
>
>         RETURN file_id;
>
>
>         END
>
>
>         $$ language 'plpgsql';
>
>
>
> Am I doing right?
> When I will call the function: *select function_1_name(5000) or **select
> function_1_name(15000)* will it respect the limited by the rows?
>

I maybe be missing it, but I see no LIMIT in the function.

I do see OFFSET and it looks backwards to me?:

|| $1 ||' offset '||

https://www.postgresql.org/docs/9.5/static/sql-select.html

LIMIT Clause

The LIMIT clause consists of two independent sub-clauses:

LIMIT { count | ALL }
OFFSET start

Also I not sure what offset_num is supposed to do, it is declared but
not used?




--
Adrian Klaver
adrian.klaver@aklaver.com


Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick Baker
Date:


I maybe be missing it, but I see no LIMIT in the function.

I do see OFFSET and it looks backwards to me?:

|| $1 ||' offset '||

https://www.postgresql.org/docs/9.5/static/sql-select.html

LIMIT Clause

The LIMIT clause consists of two independent sub-clauses:

LIMIT { count | ALL }
OFFSET start

Also I not sure what offset_num is supposed to do, it is declared but not used?

Yep.. it's declared but it's not used..

Hmm.. interesting that about the LIMIT clause
However.. I had to changed the function...

Here is the new one:
CREATE or REPLACE FUNCTION function_data_1(rows integer)
RETURNS INTEGER AS $$

declare
  completed integer;
  offset_num integer;
  crtRow record;

BEGIN
  offset_num = 0;

INSERT INTO table2_y_b (note_id, size, file_id, full_path)
    (
            SELECT
                    t1.note_id,
                    t1.size,
                    t1.file_id,
                    t1.full_path
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1.file_id
    );

UPDATE table2_y_b t2 SET segment_data =
    (
            SELECT
                    o1.data
            FROM
                    original_table1_b o1
            JOIN
                    table3_n_b t3 ON t3.file_id = o1.file_id
            WHERE
                    t2.migrated = 0
            AND
                    t2.file_id = o1.file_id
    );

UPDATE table2_y_b SET migrated = 1 WHERE file_id = crtRow.file_id AND migrated = 0;

UPDATE original_table1_b SET data = NULL WHERE file_id = crtRow.file_id;

END

$$ language 'plpgsql';


It's all working, except the LIMIT...  if possible can you please give me an example of that LIMIT in some of those queries?

Thanks

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Adrian Klaver
Date:
On 06/01/2016 10:04 PM, Patrick Baker wrote:
>
>
>     I maybe be missing it, but I see no LIMIT in the function.
>
>     I do see OFFSET and it looks backwards to me?:
>
>     || $1 ||' offset '||
>
>     https://www.postgresql.org/docs/9.5/static/sql-select.html
>
>     LIMIT Clause
>
>     The LIMIT clause consists of two independent sub-clauses:
>
>     LIMIT { count | ALL }
>     OFFSET start
>
>     Also I not sure what offset_num is supposed to do, it is declared
>     but not used?
>
>
> Yep.. it's declared but it's not used..
>
> Hmm.. interesting that about the LIMIT clause
> However.. I had to changed the function...
>
> Here is the new one:
>
>         CREATE or REPLACE FUNCTION function_data_1(rows integer)
>
>         RETURNS INTEGER AS $$
>
>
>         declare
>
>           completed integer;
>
>           offset_num integer;
>
>           crtRow record;
>
>
>         BEGIN
>
>           offset_num = 0;
>
>
>         INSERT INTO table2_y_b (note_id, size, file_id, full_path)
>
>             (
>
>                     SELECT
>
>                             t1.note_id,
>
>                             t1.size,
>
>                             t1.file_id,
>
>                             t1.full_path
>
>                     FROM
>
>                             table1_n_b t1
>
>                     JOIN
>
>                             table3_n_b t3 ON t3.file_id = t1.file_id
>
>             );
>
>
>         UPDATE table2_y_b t2 SET segment_data =
>
>             (
>
>                     SELECT
>
>                             o1.data
>
>                     FROM
>
>                             original_table1_b o1
>
>                     JOIN
>
>                             table3_n_b t3 ON t3.file_id = o1.file_id
>
>                     WHERE
>
>                             t2.migrated = 0
>
>                     AND
>
>                             t2.file_id = o1.file_id
>
>             );
>
>
>         UPDATE table2_y_b SET migrated = 1 WHERE file_id =
>         crtRow.file_id AND migrated = 0;
>
>
>         UPDATE original_table1_b SET data = NULL WHERE file_id =
>         crtRow.file_id;
>
>
>         END
>
>
> $$ language 'plpgsql';
>
>
> It's all working, except the LIMIT...  if possible can you please give
> me an example of that LIMIT in some of those queries?

I do not what you are trying to LIMIT/OFFSET, so I have no idea where to
place the LIMIT/OFFSET.

Maybe an example query showing what you are trying to do will help?

>
> Thanks


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
"David G. Johnston"
Date:
On Thu, Jun 2, 2016 at 1:04 AM, Patrick Baker <patrickbakerbr@gmail.com> wrote:

It's all working, except the LIMIT...  if possible can you please give me an example of that LIMIT in some of those queries?

​​
You also should use ORDER BY when using LIMIT and OFFSET; though depending on the setup it could be omitted.  Usually as long as the second execution cannot select any of the records the first execution touched you can choose a random quantity.  But if you want random then using OFFSET is pointless.

​SELECT *
FROM generate_series(1, 10)
ORDER BY 1
LIMIT 5
OFFSET 3

generate_series
----------------------
4
5
6
7
8
You are going to have difficultly finding people willing to help when you cannot put together a self-contained and syntax error free example (I think the last one is...) of what you want to do.  The PostgreSQL parser is very good at reading code and telling you what it doesn't like.  I'm not inclined to spend time reading queries that obviously cannot run and point out those same problems.  If you can a particular error you don't understand I'll be happy to try and explain what it is trying to tell you.

​You probably need to reformulate your update to read:

UPDATE tbl
FROM (
SELECT 50 RECORDS
)​ src
WHERE src = tbl;

​And ensure that the 50 being selected each time through are a different 50.

Writeable CTEs will probably help here.


​David J.

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick Baker
Date:


2016-06-03 2:10 GMT+12:00 David G. Johnston <david.g.johnston@gmail.com>:
On Thu, Jun 2, 2016 at 1:04 AM, Patrick Baker <patrickbakerbr@gmail.com> wrote:

It's all working, except the LIMIT...  if possible can you please give me an example of that LIMIT in some of those queries?

​​
You also should use ORDER BY when using LIMIT and OFFSET; though depending on the setup it could be omitted.  Usually as long as the second execution cannot select any of the records the first execution touched you can choose a random quantity.  But if you want random then using OFFSET is pointless.

​SELECT *
FROM generate_series(1, 10)
ORDER BY 1
LIMIT 5
OFFSET 3

generate_series
----------------------
4
5
6
7
8
You are going to have difficultly finding people willing to help when you cannot put together a self-contained and syntax error free example (I think the last one is...) of what you want to do.  The PostgreSQL parser is very good at reading code and telling you what it doesn't like.  I'm not inclined to spend time reading queries that obviously cannot run and point out those same problems.  If you can a particular error you don't understand I'll be happy to try and explain what it is trying to tell you.

​You probably need to reformulate your update to read:

UPDATE tbl
FROM (
SELECT 50 RECORDS
)​ src
WHERE src = tbl;

​And ensure that the 50 being selected each time through are a different 50.

Writeable CTEs will probably help here.


​David J.



Hi David.

The SQLs inside the function works.... I'm just having problem about limiting the query to the number of rows I want, and also, to teach the update SQL to only touch the records the other SQLs inside the function have touched.

This is the function updated:

CREATE or REPLACE FUNCTION function_data_1(rows integer)
RETURNS INTEGER AS $$

declare
  completed integer;
  offset_num integer;
  crtRow record;

BEGIN
  offset_num = 0;

INSERT INTO table2_y_b (note_id, size, file_id, full_path)
    (
            SELECT
                    t1.note_id,
                    t1.size,
                    t1.file_id,
                    t1.full_path
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1.file_id
    );

UPDATE table2_y_b t2 SET segment_data =
    (
            SELECT
                    o1.data
            FROM
                    original_table1_b o1
            JOIN
                    table3_n_b t3 ON t3.file_id = o1.file_id
            WHERE
                    t2.migrated = 0
            AND
                    t2.file_id = o1.file_id
    );

UPDATE table2_y_b SET migrated = 1 WHERE file_id = crtRow.file_id AND migrated = 0;

UPDATE original_table1_b SET data = NULL WHERE file_id = crtRow.file_id;

END

$$ language 'plpgsql';




- As you can see, the first insert, inserts data into a new table from another select. This query must be limited by the number of rows I'll provide when calling the function; example:

select function_data_1(5000);
select function_data_1(60000);
select function_data_1(15000);

- The first update, copies the BLOBS from the original_table1_b table into the new one (as above). Here, I also need the query knows to only touch those records that have been touched by the above query.


- The second update, set the table2_y_b.migrated column from 0 to 1, telling me that, that record has been touched by the query. So the next call ( select function_data_1(60000); ) will already know that it does not need to touch that record; example:

WHERE
                    t2.migrated = 0

- The third and last update, deletes (set the blobs column as null) the blobs that have already been touched by the above queries.... Still.. don't know how to tell postgres to only touches the rows that have been touched by the above queries....



Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
"David G. Johnston"
Date:
On Thu, Jun 2, 2016 at 5:03 PM, Patrick Baker <patrickbakerbr@gmail.com> wrote:


2016-06-03 2:10 GMT+12:00 David G. Johnston <david.g.johnston@gmail.com>:
On Thu, Jun 2, 2016 at 1:04 AM, Patrick Baker <patrickbakerbr@gmail.com> wrote:

It's all working, except the LIMIT...  if possible can you please give me an example of that LIMIT in some of those queries?

​​
You also should use ORDER BY when using LIMIT and OFFSET; though depending on the setup it could be omitted.  Usually as long as the second execution cannot select any of the records the first execution touched you can choose a random quantity.  But if you want random then using OFFSET is pointless.

​SELECT *
FROM generate_series(1, 10)
ORDER BY 1
LIMIT 5
OFFSET 3

generate_series
----------------------
4
5
6
7
8
You are going to have difficultly finding people willing to help when you cannot put together a self-contained and syntax error free example (I think the last one is...) of what you want to do.  The PostgreSQL parser is very good at reading code and telling you what it doesn't like.  I'm not inclined to spend time reading queries that obviously cannot run and point out those same problems.  If you can a particular error you don't understand I'll be happy to try and explain what it is trying to tell you.

​You probably need to reformulate your update to read:

UPDATE tbl
FROM (
SELECT 50 RECORDS
)​ src
WHERE src = tbl;

​And ensure that the 50 being selected each time through are a different 50.

Writeable CTEs will probably help here.


​David J.



Hi David.

The SQLs inside the function works....

​Really?  ​You seem to have lost your FOR loop for starters, and your RETURN statement, and a semi-colon after END, and I doubt crtRow.file_id works, should I go on...so, yes, you can run the four individual SQL statements correctly but the function itself is bogus.
 
I'm just having problem about limiting the query to the number of rows I want, and also, to teach the update SQL to only touch the records the other SQLs inside the function have touched.
This is the function updated:

CREATE or REPLACE FUNCTION function_data_1(rows integer)
RETURNS INTEGER AS $$

declare
  completed integer;
  offset_num integer;
  crtRow record;

BEGIN
  offset_num = 0;

INSERT INTO table2_y_b (note_id, size, file_id, full_path)
    (
            SELECT
                    t1.note_id,
                    t1.size,
                    t1.file_id,
                    t1.full_path
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1.file_id
    );

UPDATE table2_y_b t2 SET segment_data =
    (
            SELECT
                    o1.data
            FROM
                    original_table1_b o1
            JOIN
                    table3_n_b t3 ON t3.file_id = o1.file_id
            WHERE
                    t2.migrated = 0
            AND
                    t2.file_id = o1.file_id
    );

UPDATE table2_y_b SET migrated = 1 WHERE file_id = crtRow.file_id AND migrated = 0;

UPDATE original_table1_b SET data = NULL WHERE file_id = crtRow.file_id;

END

$$ language 'plpgsql';




- As you can see, the first insert, inserts data into a new table from another select. This query must be limited by the number of rows I'll provide when calling the function; example:

select function_data_1(5000);
select function_data_1(60000);
select function_data_1(15000);

- The first update, copies the BLOBS from the original_table1_b table into the new one (as above). Here, I also need the query knows to only touch those records that have been touched by the above query.


- The second update, set the table2_y_b.migrated column from 0 to 1, telling me that, that record has been touched by the query. So the next call ( select function_data_1(60000); ) will already know that it does not need to touch that record; example:

WHERE
                    t2.migrated = 0

- The third and last update, deletes (set the blobs column as null) the blobs that have already been touched by the above queries.... Still.. don't know how to tell postgres to only touches the rows that have been touched by the above queries....


​Here's a fish - though you will still need to clean it.​

​This is not tested, and I haven't ever build this exact query for real, but it should work in theory...

--assumes that to be migrated records have previously had their migrated flag set to 0

function name (number_of_rows_to_process integer)
LANGUAGE sql -- this no longer requires procedural logic so no need for plpgsql
RETURNS SETOF bigint --returns the affected ids
AS $$
WITH the_records_I_want_to_affect AS (
-- pick N records to process
SELECT id, ...
FROM source_tbl
WHERE migrated = 0
​ORDER BY ...
LIMIT number_of_rows_to_process  -- your function argument goes here
FOR UPDATE
)​,
migrate_the_data AS (
-- place a copy of them into the archive table
INSERT INTO migration_table
SELECT id, ...
FROM the_records_I_want_to_affect 
RETURNING *
),
mark_as_migrated AS (
-- mark them as having been archived and nullify the blob data
UPDATE source_tbl
SET migrated = 1, data = null
FROM migrate_the_data recs
RETURNING source_tbl.id
)
SELECT id FROM mark_as_migrated;
$$

​I am sure a fully working version of this idiom in present in one and more places on the internet.  Feel free to search out fully working examples with additional commentary.​

You can make a FOR loop version of this work, and had to many years ago before writable CTEs were implemented.

David J.

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Adrian Klaver
Date:
On 06/02/2016 02:03 PM, Patrick Baker wrote:
>
>
> 2016-06-03 2:10 GMT+12:00 David G. Johnston <david.g.johnston@gmail.com
> <mailto:david.g.johnston@gmail.com>>:
>

>
> Hi David.
>
> The SQLs inside the function works.... I'm just having problem about
> limiting the query to the number of rows I want, and also, to teach the
> update SQL to only touch the records the other SQLs inside the function
> have touched.

See notes inline.
>
> This is the function updated:
>
>             CREATE or REPLACE FUNCTION function_data_1(rows integer)
>
>             RETURNS INTEGER AS $$
>
>
>             declare
>
>               completed integer;
>
>               offset_num integer;
>
>               crtRow record;
>
>
>             BEGIN
>
>               offset_num = 0;
>
>
>             INSERT INTO table2_y_b (note_id, size, file_id, full_path)
>
>                 (
>
>                         SELECT
>
>                                 t1.note_id,
>
>                                 t1.size,
>
>                                 t1.file_id,
>
>                                 t1.full_path
>
>                         FROM
>
>                                 table1_n_b t1
>
>                         JOIN
>
>                                 table3_n_b t3 ON t3.file_id = t1.file_id
>
>                 );


Why are you joining to table3_nb?
You do not use any fields from it.

How do you know what data in table1_n_b to get?
I see this grabbing the same information over and over again.

>
>
>             UPDATE table2_y_b t2 SET segment_data =
>
>                 (
>
>                         SELECT
>
>                                 o1.data
>
>                         FROM
>
>                                 original_table1_b o1
>
>                         JOIN
>
>                                 table3_n_b t3 ON t3.file_id = o1.file_id
>
>                         WHERE
>
>                                 t2.migrated = 0
>
>                         AND
>
>                                 t2.file_id = o1.file_id
>
>                 );
>
>
>             UPDATE table2_y_b SET migrated = 1 WHERE file_id =
>             crtRow.file_id AND migrated = 0;
>
>
>             UPDATE original_table1_b SET data = NULL WHERE file_id =
>             crtRow.file_id;

All the above would seem to be handled in a LOOP.
Grab the data from:

         SELECT

                             t1.note_id,

                             t1.size,

                             t1.file_id,

                             t1.full_path

                     FROM

                             table1_n_b t1

with suitable WHERE clause and use:

https://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING

to iterate over the results. As part of the iteration do your INSERT and
UPDATE using the RECORD.file_id. This includes setting migrated=1 and
data=NULL.


>
>
>             END
>
>
>             $$ language 'plpgsql';
>
>
>
>
>
> - As you can see, the first *insert*, inserts data into a new table from
> another select. This query must be limited by the number of rows I'll
> provide when calling the function; example:
>
>     select function_data_1(5000);
>     select function_data_1(60000);
>     select function_data_1(15000);
>
>
> - The first *update*, copies the BLOBS from the original_table1_b table
> into the new one (as above). Here, I also need the query knows to only
> touch those records that have been touched by the above query.
>
>
> - The second *update*, set the table2_y_b.migrated column from 0 to 1,
> telling me that, that record has been touched by the query. So the next
> call ( select function_data_1(60000); ) will already know that it does
> not need to touch that record; example:
>
>     WHERE
>                         t2.migrated = 0
>
>
> - The third and last *update*, deletes (set the blobs column as null)
> the blobs that have already been touched by the above queries....
> Still.. don't know how to tell postgres to only touches the rows that
> have been touched by the above queries....
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick Baker
Date:



Why are you joining to table3_nb?
You do not use any fields from it.

How do you know what data in table1_n_b to get?
I see this grabbing the same information over and over again.

SELECT * INTO table3_n_b FROM 
(       SELECT account_id, note_id, file_id FROM        (       SELECT DISTINCT ON                (note_id) note_id,                MAX(size),               file_id,               company_id       FROM                table1_n_b       GROUP BY                note_id, size, file_id, company_id       ORDER BY                note_id, size desc       ) AS r1
) AS r2;

Because I just wanna touch the greatest file_id ( by size ) of each note_id
And the file_id I must change is into the table3

That's why:

table3_n_b t3 ON t3.file_id = t1.file_id







            UPDATE table2_y_b t2 SET segment_data =

                (

                        SELECT

                                o1.data

                        FROM

                                original_table1_b o1

                        JOIN

                                table3_n_b t3 ON t3.file_id = o1.file_id

                        WHERE

                                t2.migrated = 0

                        AND

                                t2.file_id = o1.file_id

                );


            UPDATE table2_y_b SET migrated = 1 WHERE file_id =
            crtRow.file_id AND migrated = 0;


            UPDATE original_table1_b SET data = NULL WHERE file_id =
            crtRow.file_id;

All the above would seem to be handled in a LOOP.
Grab the data from:

        SELECT

                            t1.note_id,

                            t1.size,

                            t1.file_id,

                            t1.full_path

                    FROM

                            table1_n_b t1

with suitable WHERE clause and use:

https://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING

Hmm ok...  but...

INSERT INTO table2_y_b (note_id, size, file_id, full_path)
    (
            SELECT
                    t1.note_id,
                    t1.size,
                    t1.file_id,
                    t1.full_path
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1.file_id
    );

I don't need anything else on the WHERE clause , as the ON t3.file_id = t1.file_id is already doing what I need....  ( and it works.. I tested it )

 


to iterate over the results. As part of the iteration do your INSERT and UPDATE using the RECORD.file_id. This includes setting migrated=1 and data=NULL.

Yep.. that's the way I started by doing this... 

Can you please tell me if this would be right?


CREATE or REPLACE FUNCTION function_data_1()
RETURNS INTEGER AS $$

declare
        row record;

BEGIN

-- copying the data to the backup table (not the blobs)
-- Limiting in 5000 rows each call
FOR row IN EXECUTE '
INSERT INTO table2_y_b (note_id, size, file_id, full_path)
    (
            SELECT
                    t1.note_id,
                    t1.size,
                    t1.file_id,
                    t1.full_path
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1.file_id
ORDER BY 1
LIMIT 5000
    )'

LOOP
-- copying the blobs to the table above
        UPDATE table2_y_b t2 SET segment_data =
        (
            SELECT
                    o1.data
            FROM
                    original_table1_b o1
            JOIN
                    table3_n_b t3 ON t3.file_id = o1.file_id
            WHERE
                    t2.migrated = 0
            AND
                    t2.file_id = o1.file_id
        )
        WHERE t2.file_id = row.file_id
END LOOP;

-- updating the migrated column from 0 to 1
LOOP
        UPDATE 
                table2_y_b t2
        SET 
                migrated = 1 
        WHERE 
                t2.file_id = row.file_id 
        AND 
                migrated = 0
END LOOP;

LOOP
        UPDATE 
                original_table1_b o1
        SET 
                data = NULL 
        WHERE 
                o1.file_id = row.file_id;
END LOOP;

END

$$ language 'plpgsql';

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick Baker
Date:
I did:

CREATE or REPLACE FUNCTION function_1_data()
RETURNS INTEGER AS $$

declare
        row record;

BEGIN

-- copying the data to the backup table (not the blobs)
FOR row IN EXECUTE '
            SELECT
                    t1.file_id
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1. file_id ORDER BY 1 LIMIT 3' LOOP

-- Creating the backup table with the essential data
EXECUTE '
INSERT INTO table2_y_b (note_id, size, file_id, full_path)
    (
            SELECT
                    t1.note_id,
                    t1.size,
                    t1.file_id,
                    t1.full_path
            FROM
                    table1_n_b t1
            JOIN
                    table3_n_b t3 ON t3.file_id = t1.file_id
    ) ';

-- copying the blobs to the table above
EXECUTE '
        UPDATE table2_y_b t2 SET data =
        (
            SELECT
                    o1.data
            FROM
                    original_table1_b o1
            JOIN
                    table3_n_b t3 ON t3.file_id = o1.file_id
            WHERE
                    t3.migrated = 0
            AND
                    t2.file_id = o1.file_id
        )
        WHERE t2.file_id = row.file_id ';
        
-- updating the migrated column from 0 to 1
EXECUTE '
        UPDATE
                table2_y_b t2
        SET
                migrated = 1
        WHERE
                t2.file_id = row.file_id
        AND
                migrated = 0 ';
                
-- setting the blob as null
EXECUTE '
        UPDATE
                original_table1_b o1
        SET
                data = NULL
        WHERE
                o1.file_id = row.file_id ';
END LOOP;

return row.file_id;

END

$$ language 'plpgsql';


And I'm getting the error:
missing FROM-clause entry for table "row"
WHERE t2.st_ino = row.st_ino

Why does that happen? 

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick Baker
Date:
Hi guys,

  • The function works... All the data is updated as expected. However, when I call the function for the second time, it touches the rows that had already been touched by the previous call....

  • It triplicate ( LIMIT 3 ) the records.

Question:

How can I make the function to gets the next 3 rows and not use the same rows that have been used before?

Function updated:

CREATE or REPLACE FUNCTION function_data_1()
RETURNS SETOF bigint AS $$

declare       row record;

BEGIN

-- copying the data to the backup table (not the blobs)
-- Limiting in 5000 rows each call
FOR row IN EXECUTE '           SELECT                   t1.file_id           FROM                   table1 t1           JOIN                   table3 t3 ON t3.file_id = t1.file_id           ORDER BY 1 LIMIT 3 ' 
LOOP

-- Creating the backup table with the essential data
INSERT INTO table2 (note_id, size, file_id, full_path)   (           SELECT                   t1.note_id,                   t1.size,                   t1.file_id,                   t1.full_path           FROM                   table1 t1           JOIN                   table3 t3 ON t3.file_id = t1.file_id           WHERE                   t1.file_id = row.file_id   );

-- copying the blobs to the table above table2       UPDATE junk.table2 t2 SET data =       (           SELECT                   o1.data           FROM                   original_table1_b o1           JOIN                   table3 t3 ON t3.file_id = o1.file_id           WHERE                   t3.migrated = 0           AND                   t2.file_id = o1.file_id           AND                   o1.file_id = row.file_id       )       WHERE t2.file_id = row.file_id;

-- updating the migrated column from 0 to 1       UPDATE               table3 t2       SET               migrated = 1       WHERE               t2.file_id = row.file_id       AND               migrated = 0;

-- set the blobs as null       UPDATE               original_table1_b o1       SET               data = NULL       WHERE               o1.file_id = row.file_id;
END LOOP;

END

$$ language 'plpgsql';

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
"David G. Johnston"
Date:
On Thu, Jun 2, 2016 at 11:37 PM, Patrick Baker <patrickbakerbr@gmail.com> wrote:

How can I make the function to gets the next 3 rows and not use the same rows that have been used before?

​WHERE migrated = 0
​David J.


Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick Baker
Date:


2016-06-03 15:50 GMT+12:00 David G. Johnston <david.g.johnston@gmail.com>:
On Thu, Jun 2, 2016 at 11:37 PM, Patrick Baker <patrickbakerbr@gmail.com> wrote:

How can I make the function to gets the next 3 rows and not use the same rows that have been used before?

​WHERE migrated = 0
​David J.




lol... that's right David J. Thanks for that! it's working... ;)


Last thing.. how to select the number of rows that have been modified?

I mean.. when doing: select function_data_1():

I want to get back the number of rows that have been touched.. 

do u know how ?

thanks again

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Adrian Klaver
Date:
On 06/02/2016 08:37 PM, Patrick Baker wrote:
> Hi guys,
>
>   *
>
>     The function works... All the data is updated as expected. However,
>     when I call the function for the second time, it touches the rows
>     that had already been touched by the previous call....
>
>   *
>
>     It triplicate ( |LIMIT 3| ) the records.
>
> *Question:*
>
> How can I make the function to gets the next 3 rows and not use the same
> rows that have been used before?
>
> Function updated:
>
> |CREATEorREPLACE FUNCTIONfunction_data_1()RETURNS SETOF bigint
> AS$$declarerowrecord;BEGIN-- copying the data to the backup table (not
> the blobs)-- Limiting in 5000 rows each callFORrowINEXECUTE' SELECT
> t1.file_id FROM table1 t1 JOIN table3 t3 ON t3.file_id = t1.file_id
> ORDER BY 1 LIMIT 3 'LOOP -- Creating the backup table with the essential
> dataINSERTINTOtable2
> (note_id,size,file_id,full_path)(SELECTt1.note_id,t1.size,t1.file_id,t1.full_path
> FROMtable1 t1 JOINtable3 t3 ONt3.file_id =t1.file_id WHEREt1.file_id
> =row.file_id );-- copying the blobs to the table above
> table2UPDATEjunk.table2 t2 SETdata =(SELECTo1.data FROMoriginal_table1_b
> o1 JOINtable3 t3 ONt3.file_id =o1.file_id WHEREt3.migrated
> =0ANDt2.file_id =o1.file_id ANDo1.file_id =row.file_id )WHEREt2.file_id
> =row.file_id;-- updating the migrated column from 0 to 1UPDATEtable3 t2
> SETmigrated =1WHEREt2.file_id =row.file_id ANDmigrated =0;-- set the
> blobs as nullUPDATEoriginal_table1_b o1 SETdata =NULLWHEREo1.file_id
> =row.file_id;ENDLOOP;END$$language 'plpgsql';|
>
> |
> |



"
CREATE or REPLACE FUNCTION function_data_1()
RETURNS SETOF bigint AS $$

declare
         row record;

BEGIN

-- copying the data to the backup table (not the blobs)
-- Limiting in 5000 rows each call
FOR row IN EXECUTE '
             SELECT
                     t1.file_id
             FROM
                     table1 t1
             JOIN
                     table3 t3 ON t3.file_id = t1.file_id
             ORDER BY 1 LIMIT 3 '
LOOP

-- Creating the backup table with the essential data
INSERT INTO table2 (note_id, size, file_id, full_path)
     (
             SELECT
                     t1.note_id,
                     t1.size,
                     t1.file_id,
                     t1.full_path
             FROM
                     table1 t1
             JOIN
                     table3 t3 ON t3.file_id = t1.file_id
             WHERE
                     t1.file_id = row.file_id
     );

......."

Are you not repeating yourself, why not?:

CREATE or REPLACE FUNCTION function_data_1()
RETURNS SETOF bigint AS $$

declare
         row record;

BEGIN

-- copying the data to the backup table (not the blobs)
-- Limiting in 5000 rows each call
FOR row IN EXECUTE '
             SELECT
                     t1.file_id
                     t1.size,
                     t1.file_id,
                     t1.full_path
             FROM
                     table1 t1
             JOIN
                     table3 t3 ON t3.file_id = t1.file_id
             ORDER BY 1 LIMIT 3 '
LOOP

-- Creating the backup table with the essential data
INSERT INTO table2 (row.note_id, row.size, row.file_id, row.full_path)

.....

Still not seeing what the JOIN to table3 t3 gets you?

Any way the function works.



--
Adrian Klaver
adrian.klaver@aklaver.com


Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick Baker
Date:


-- Creating the backup table with the essential data
INSERT INTO table2 (row.note_id, row.size, row.file_id, row.full_path)

.....

Still not seeing what the JOIN to table3 t3 gets you?

Any way the function works.


I changed the function to use row.note_id, row.size, etc... think it's more intelligent that way! :)


 Is there any way to create another function to restore the data back?

Example:


select function_data_1_restore(123414);

Where 123414 = file_id

How can I tell the function to get the file_id that I'll insert into the call?
Can you please guys tell me?

cheers

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Adrian Klaver
Date:
On 06/03/2016 12:23 AM, Patrick Baker wrote:
>
>
>     -- Creating the backup table with the essential data
>     INSERT INTO table2 (row.note_id, row.size, row.file_id, row.full_path)
>
>     .....
>
>     Still not seeing what the JOIN to table3 t3 gets you?
>
>     Any way the function works.
>
>
> I changed the function to use row.note_id, row.size, etc... think it's
> more intelligent that way! :)
>
>
>  Is there any way to create another function to restore the data back?

I am sure there is, but it will probably be more difficult then copying
that data in the first place. From your previous function there seems to
be lot of moving parts. Unwinding those tables and any other data that
is dependent on those tables could be a chore.

>
> Example:
>
>
> select function_data_1_restore(123414);
>
> Where 123414 = file_id
>
> How can I tell the function to get the file_id that I'll insert into the
> call?

That would depend on why and what you want to restore. The function is
going to need some sort of prompting from the user on what criteria to
use to determine the records to select and restore.

> Can you please guys tell me?

My help would be to say, first sit down and draw out the dependencies
you have between the data and the various tables. Then work out an
outline form of how to walk the data back from those tables into its
original location(s).

>
> cheers


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
"David G. Johnston"
Date:
On Fri, Jun 3, 2016 at 3:16 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 06/03/2016 12:23 AM, Patrick Baker wrote:


    -- Creating the backup table with the essential data
    INSERT INTO table2 (row.note_id, row.size, row.file_id, row.full_path)

    .....

    Still not seeing what the JOIN to table3 t3 gets you?

    Any way the function works.


I changed the function to use row.note_id, row.size, etc... think it's
more intelligent that way! :)


 Is there any way to create another function to restore the data back?

I am sure there is, but it will probably be more difficult then copying that data in the first place. From your previous function there seems to be lot of moving parts. Unwinding those tables and any other data that is dependent on those tables could be a chore.


​Adrian,​

​I don't think its that bad.  All that was done is updating a bytea (or text...) field to NULL after saving the original contents elsewhere.  Restoring should be as simple as

UPDATE tbl SET data = archived_data
FROM archive_tbl
WHERE tbl.file_id = archive_tbl.file_id
AND tbl.file_id = <user input>;

Updating the main migrated flag and cleaning up extraneous entries in the archive would be simple.  No rows in the main tables were added or removed.

Patrick,

You already wrote the archive function; you should be capable of at least attempting to write its inverse.

If you are wondering how to pass the value 123414 in:

select function_data_1_restore(123414);

That would depend on the client.  In psql you'd just type it in.  In Java you probably do something like:

stmt = conn.prepareStatement("SELECT function_data_1_restore(?)");
stmt.setInteger(1, new Integer(123414));
stmt.execute();

David J.


Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
"David G. Johnston"
Date:
On Thu, Jun 2, 2016 at 11:59 PM, Patrick Baker <patrickbakerbr@gmail.com> wrote:


2016-06-03 15:50 GMT+12:00 David G. Johnston <david.g.johnston@gmail.com>:
On Thu, Jun 2, 2016 at 11:37 PM, Patrick Baker <patrickbakerbr@gmail.com> wrote:

How can I make the function to gets the next 3 rows and not use the same rows that have been used before?

​WHERE migrated = 0
​David J.




lol... that's right David J. Thanks for that! it's working... ;)


Last thing.. how to select the number of rows that have been modified?

I mean.. when doing: select function_data_1():

I want to get back the number of rows that have been touched.. 

do u know how ?

thanks again

​Add a counter variable, increment it within the loop, and return it.

David J.

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick B
Date:
Ok so guys....

CREATE or REPLACE FUNCTION function_data_1()
RETURNS SETOF bigint AS $$

declare
        row record;

BEGIN

[...]

FOR row IN EXECUTE '
            SELECT
                    t1.file_id,
                    t1.path,
                    t1.account_id
            FROM
                    table1 t1
            JOIN
                    table3 t3 ON t3.file_Id = t1.file_id
            WHERE
                    t3.migrated = 0
            AND 
                    t3.account_id = 1112
            ORDER BY 1 LIMIT 30 '

[...]


 How can I make the function works with account_id?

Example: select function_data_1(1112)

and then it will do all the work just for that specific account_id?

If you guys please could give me the way to do that.. 
thanks
Patrick

Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Adrian Klaver
Date:
On 06/06/2016 09:01 PM, Patrick B wrote:
> Ok so guys....
>
>         CREATE or REPLACE FUNCTION function_data_1()
>
>         RETURNS SETOF bigint AS $$
>
>
>         declare
>
>                 row record;
>
>
>         BEGIN
>
>
>         [...]
>
>
>         FOR row IN EXECUTE '
>
>                     SELECT
>
>                             t1.file_id,
>
>                             t1.path,
>
>                             t1.account_id
>
>                     FROM
>
>                             table1 t1
>
>                     JOIN
>
>                             table3 t3 ON t3.file_Id = t1.file_id
>
>                     WHERE
>
>                             t3.migrated = 0
>
>                     AND
>
>                            *t3.account_id = 1112*
>
>                     ORDER BY 1 LIMIT 30 '
>
>
>         [...]
>
>
>
>  How can I make the function works with account_id?
>
> *Example: select function_data_1(1112)*
>
> and then it will do all the work just for that specific account_id?

https://www.postgresql.org/docs/9.5/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-PARAMETERS

>
> If you guys please could give me the way to do that..
> thanks
> Patrick


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: PL/PGSQL + inserts+updates+limit - Postgres 9.3

From
Patrick B
Date:
Thanks Adrian... it's working ;)