Thread: two questions about toast

two questions about toast

From
Luca Ferrari
Date:
Hi all,
I've a couple of doubts about toast:
1) is a REINDEX DATABASE hitting also toast tables or they need to be
reindexed manually?
2) while executing a query against toasted values I got this debug
message that I don't know what is meaning
DEBUG:  building index "pg_toast_33875_index" on table "pg_toast_33875" serially

The query I was executing was:
SELECT  lower( f::text )  ||  lower( t::text )  FROM crashy_table
WHERE id = '16385'
via a plpgsql PERFORM.

Thanks,
Luca



Re: two questions about toast

From
Adrian Klaver
Date:
On 11/15/20 9:03 AM, Luca Ferrari wrote:
> Hi all,
> I've a couple of doubts about toast:
> 1) is a REINDEX DATABASE hitting also toast tables or they need to be
> reindexed manually?

https://www.postgresql.org/docs/current/sql-reindex.html

"DATABASE

     Recreate all indexes within the current database. Indexes on shared 
system catalogs are also processed. This form of REINDEX cannot be 
executed inside a transaction block.
"

"TABLE

     Recreate all indexes of the specified table. If the table has a 
secondary “TOAST” table, that is reindexed as well.
"

Database --> Table --> TOAST table.

> 2) while executing a query against toasted values I got this debug
> message that I don't know what is meaning

This happened when you where doing the REINDEX DATABASE?

> DEBUG:  building index "pg_toast_33875_index" on table "pg_toast_33875" serially
> 
> The query I was executing was:
> SELECT  lower( f::text )  ||  lower( t::text )  FROM crashy_table
> WHERE id = '16385'
> via a plpgsql PERFORM.
> 
> Thanks,
> Luca
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: two questions about toast

From
Luca Ferrari
Date:
On Sun, Nov 15, 2020 at 6:45 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> Database --> Table --> TOAST table.

Thanks, I was not sure about.
>
> > 2) while executing a query against toasted values I got this debug
> > message that I don't know what is meaning
>
> This happened when you where doing the REINDEX DATABASE?


No, there was no indeed involved at that time.

Luca



INSERT Trigger to check for existing records

From
Hagen Finley
Date:

Hello,

I am definitely out over my skis here so I’ll apologize in advance 😉. Running version 12.5-1.pgdg20.04+1 on ubuntu. It’s essentially a personal database I use to ingest sales forecast spreadsheets from which I  create custom reports for my job function.

I pull a new forecast spreadsheet each Monday. 80% of the records are the same as the existing records from the week before.

Here’s what I (REALLY) want:

Trigger looks at three fields prior to new insert: Deal ID (numeric), revenue (numeric), stage(char)     Example: 19743576    22072.37    Commit - 90%

  1. If the NEW dealid doesn't match any of the OLD dealids, insert the new row
  2. if the NEW dealid, revenue and stage fields ALL match the OLD dealid, revenue and stage, skip (don't insert the NEW row)
  3. If the NEW dealid matches an OLD dealid but either the NEW revenue OR the stage fields have changed (don't match OLD record) insert new row (I'll review both rows manually)


Attempt 1: Update chk field with 'same' if OLD revusd OR stage are different than the NEW revusd OR stage

CREATE TRIGGER chk4chg
BEFORE
    INSERT ON sfdc
    FOR EACH ROW
BEGIN
    UPDATE sfdc
    SET chk = 'same'
    WHERE ndealid = :NEW.ndealid
      AND revusd = :NEW.revusd
      AND stage = :NEW.stage
END chk4chg;

Remarkably, that works in that it will UPDATE the chk field with 'same'


|ndealid |revusd |stage                 |chk       |

|17713063|1300000|Propose - 60%         |same      |

However, I must manually enter the parameters in dialogue box that (inexplicably) pops up when I run this command.


Attempt 2:

CREATE OR REPLACE FUNCTION update_insert() RETURNS trigger AS $$
BEGIN
    UPDATE sfdc
    SET chk = 'same'
    WHERE ndealid = OLD.ndealid;
    AND NEW.revusd = OLD.revusd
    AND NEW.stage = OLD.stage;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION Query returned successfully in 136 msec.

That's good news but the trigger doesn't actually update. It lacks BEFORE INSERT ON sfdc FOR EACH ROW so low prospect for success :-).


Attempt 3: A little more sophisticated executing Function from Trigger

CREATE TRIGGER smart_update_same BEFORE INSERT ON sfdc
FOR EACH ROW EXECUTE FUNCTION update_insert();

CREATE OR REPLACE FUNCTION update_insert() RETURNS trigger AS $$
BEGIN
    UPDATE sfdc
    SET sfdc.chk = 'same'
    WHERE NEW.ndealid = OLD.ndealid
      AND NEW.revusd = OLD.revusd
      AND NEW.stage = OLD.stage;
END;
$$ LANGUAGE plpgsql;

These 2 CREATEs return successfully but do not update the chk field on a successful INSERT:

sales=# select ndealid,revusd,stage,chk from sfdc where ndealid = 19743576;

  ndealid   |  revusd  |         stage          | chk

 19743576 | 22072.37 | Commit - 90%           |
   19743576 | 22072.37 | Commit - 90%           |
   19743576 | 22072.37 | Commit - 90%           |

These 3 attempts won't give me what I REALLY want but I figure I could use the chk field to delete the new inserts I didn't need.

Am I anywhere close (same county) to the right code?

Hagen

Larimer County, CO

Re: INSERT Trigger to check for existing records

From
"David G. Johnston"
Date:


On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com> wrote:


I pull a new forecast spreadsheet each Monday. 80% of the records are the same as the existing records from the week before.

Here’s what I (REALLY) want:

Trigger looks at three fields prior to new insert: Deal ID (numeric), revenue (numeric), stage(char)     Example: 19743576    22072.37    Commit - 90%

  1. If the NEW dealid doesn't match any of the OLD dealids, insert the new row
  2. if the NEW dealid, revenue and stage fields ALL match the OLD dealid, revenue and stage, skip (don't insert the NEW row)
  3. If the NEW dealid matches an OLD dealid but either the NEW revenue OR the stage fields have changed (don't match OLD record) insert new row (I'll review both rows manually)


Am I anywhere close (same county) to the right code?



IMO, don’t use triggers.  Load the data into a temporary, or unlogged table, and then run commands to do what you want against the live tables.  Truncate/drop before doing that again the following week.

David J.

Re: INSERT Trigger to check for existing records

From
Adrian Klaver
Date:
On 11/21/20 8:00 AM, Hagen Finley wrote:
> Hello,
> 
> I am definitely out over my skis here so I’ll apologize in advance 😉. 
> Running version 12.5-1.pgdg20.04+1 on ubuntu. It’s essentially a 
> personal database I use to ingest sales forecast spreadsheets from which 
> I  create custom reports for my job function.
> 
> I pull a new forecast spreadsheet each Monday. 80% of the records are 
> the same as the existing records from the week before.
> 
> Here’s what I (REALLY) want:
> 
> Trigger looks at three fields prior to new insert: Deal ID (numeric), 
> revenue (numeric), stage(char)     Example: 19743576 22072.37    Commit 
> - 90%
> 
>  1. If the NEW dealid doesn't match any of the OLD dealids, insert the
>     new row
>  2. if the NEW dealid, revenue and stage fields ALL match the OLD
>     dealid, revenue and stage, skip (don't insert the NEW row)
>  3. If the NEW dealid matches an OLD dealid but either the NEW revenue
>     OR the stage fields have changed (don't match OLD record) insert new
>     row (I'll review both rows manually)
> 
> 
> *Attempt 1: *Update chk field with 'same' if OLD revusd OR stage are 
> different than the NEW revusd OR stage
> 
> CREATE TRIGGER chk4chg
> BEFORE
>      INSERT ON sfdc
>      FOR EACH ROW
> BEGIN
>      UPDATE sfdc
>      SET chk = 'same'
>      WHERE ndealid = :NEW.ndealid
>        AND revusd = :NEW.revusd
>        AND stage = :NEW.stage
> END chk4chg;
> 
> Remarkably, that works in that it will UPDATE the chk field with 'same'

Not sure how.
More comments below.

> 
> 
> |ndealid |revusd |stage                 |chk       |
> 
> |17713063|1300000|Propose - 60%         |same      |
> 
> However, I must manually enter the parameters in dialogue box that 
> (inexplicably) pops up when I run this command.

What client are you using?

> 
> 
> *Attempt 2:*
> 
> CREATE OR REPLACE FUNCTION update_insert() RETURNS trigger AS $$
> BEGIN
>      UPDATE sfdc
>      SET chk = 'same'
>      WHERE ndealid = OLD.ndealid;
>      AND NEW.revusd = OLD.revusd
>      AND NEW.stage = OLD.stage;
>      RETURN NEW;
> END;
> $$ LANGUAGE plpgsql;
> 
> CREATE FUNCTION Query returned successfully in 136 msec.
> 
> That's good news but the trigger doesn't actually update. It lacks 
> BEFORE INSERT ON sfdc FOR EACH ROW so low prospect for success :-).
> 
> 
> *Attempt 3: *A little more sophisticated executing Function from Trigger
> 
> CREATE TRIGGER smart_update_same BEFORE INSERT ON sfdc
> FOR EACH ROW EXECUTE FUNCTION update_insert();
> 
> CREATE OR REPLACE FUNCTION update_insert() RETURNS trigger AS $$
> BEGIN
>      UPDATE sfdc
>      SET sfdc.chk = 'same'
>      WHERE NEW.ndealid = OLD.ndealid
>        AND NEW.revusd = OLD.revusd
>        AND NEW.stage = OLD.stage;
> END;
> $$ LANGUAGE plpgsql;
> 
> These 2 CREATEs return successfully but do not update the chk field on a 
> successful INSERT:
> 
> sales=# select ndealid,revusd,stage,chk from sfdc where ndealid = 19743576;
> 
>    ndealid   |  revusd  |         stage          | chk
> 
>   19743576 | 22072.37 | Commit - 90%           |
>     19743576 | 22072.37 | Commit - 90%           |
>     19743576 | 22072.37 | Commit - 90%           |
> 
> These 3 attempts won't give me what I REALLY want but I figure I could 
> use the chk field to delete the new inserts I didn't need.
> 
> Am I anywhere close (same county) to the right code?

Don't do the UPDATE. Also I thought when the values matched you did
not want INSERT?

Instead:

IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
         AND NEW.stage = OLD.stage THEN
    RETURN NULL; --Will cancel INSERT
ELSE
     RETURN NEW;

END IF;

> 
> Hagen
> 
> Larimer County, CO
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: INSERT Trigger to check for existing records

From
Adrian Klaver
Date:
On 11/21/20 8:20 AM, Adrian Klaver wrote:
> On 11/21/20 8:00 AM, Hagen Finley wrote:
>> Hello,

> 
> Instead:
> 
> IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
>          AND NEW.stage = OLD.stage THEN
>      RETURN NULL; --Will cancel INSERT
> ELSE
>      RETURN NEW;
> 
> END IF;

Well this is what happens when I answer BC(before coffee). The above 
will not work, if for no other reason then OLD does not exist in an 
INSERT. Will try to come up with something that is in the realm of 
possibility.

> 
>>
>> Hagen
>>
>> Larimer County, CO
>>
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: INSERT Trigger to check for existing records

From
Adrian Klaver
Date:
On 11/21/20 8:47 AM, Adrian Klaver wrote:
> On 11/21/20 8:20 AM, Adrian Klaver wrote:
>> On 11/21/20 8:00 AM, Hagen Finley wrote:
>>> Hello,
> 
>>
>> Instead:
>>
>> IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
>>          AND NEW.stage = OLD.stage THEN
>>      RETURN NULL; --Will cancel INSERT
>> ELSE
>>      RETURN NEW;
>>
>> END IF;
> 
> Well this is what happens when I answer BC(before coffee). The above 
> will not work, if for no other reason then OLD does not exist in an 
> INSERT. Will try to come up with something that is in the realm of 
> possibility.

Alright caffeine in the blood stream, so something that might actually work:

DECLARE
     match_ct integer;
BEGIN

     SELECT INTO
    match_ct count(*)
     FROM
         sfdc
     WHERE
         ndealid = NEW.ndealid
     AND
         revusd = NEW.revusd
     AND
        stage = NEW.stage;

    IF match_ct > 0 THEN
        RETURN NULL; --Will cancel INSERT
    ELSE
        RETURN NEW;
    END IF;

END;

Though I would also point you at David's solution. Given that you are 
only looking at ~20% of the records being different it would save you a 
lot of churning through INSERTs.

> 
>>
>>>
>>> Hagen
>>>
>>> Larimer County, CO
>>>
>>
>>
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: INSERT Trigger to check for existing records

From
Hagen Finley
Date:
Thanks so much Adrian,

I like this approach but as you indicated it doesn't actually NULL the 
INSERT.

Could we UPDATE the existing record (per my fledgling chk UPDATE and 
then RETURN NULL? (More proof I don't know what I am talking about ;-).

Hagen


On 11/21/20 10:11 AM, Adrian Klaver wrote:
> On 11/21/20 8:47 AM, Adrian Klaver wrote:
>> On 11/21/20 8:20 AM, Adrian Klaver wrote:
>>> On 11/21/20 8:00 AM, Hagen Finley wrote:
>>>> Hello,
>>
>>>
>>> Instead:
>>>
>>> IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
>>>          AND NEW.stage = OLD.stage THEN
>>>      RETURN NULL; --Will cancel INSERT
>>> ELSE
>>>      RETURN NEW;
>>>
>>> END IF;
>>
>> Well this is what happens when I answer BC(before coffee). The above 
>> will not work, if for no other reason then OLD does not exist in an 
>> INSERT. Will try to come up with something that is in the realm of 
>> possibility.
>
> Alright caffeine in the blood stream, so something that might actually 
> work:
>
> DECLARE
>     match_ct integer;
> BEGIN
>
>     SELECT INTO
>     match_ct count(*)
>     FROM
>         sfdc
>     WHERE
>         ndealid = NEW.ndealid
>     AND
>         revusd = NEW.revusd
>     AND
>        stage = NEW.stage;
>
>    IF match_ct > 0 THEN
>        RETURN NULL; --Will cancel INSERT
>    ELSE
>        RETURN NEW;
>    END IF;
>
> END;
>
> Though I would also point you at David's solution. Given that you are 
> only looking at ~20% of the records being different it would save you 
> a lot of churning through INSERTs.
>
>>
>>>
>>>>
>>>> Hagen
>>>>
>>>> Larimer County, CO
>>>>
>>>
>>>
>>
>>
>
>



Re: INSERT Trigger to check for existing records

From
Hagen Finley
Date:

David,

That's an interesting idea. I WOULD like  to retain the OLD records that are the same and only INSERT new or changed records. Is there a way to compare the old and the new records without a trigger?

Hagen

On 11/21/20 9:15 AM, David G. Johnston wrote:


On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com> wrote:


I pull a new forecast spreadsheet each Monday. 80% of the records are the same as the existing records from the week before.

Here’s what I (REALLY) want:

Trigger looks at three fields prior to new insert: Deal ID (numeric), revenue (numeric), stage(char)     Example: 19743576    22072.37    Commit - 90%

  1. If the NEW dealid doesn't match any of the OLD dealids, insert the new row
  2. if the NEW dealid, revenue and stage fields ALL match the OLD dealid, revenue and stage, skip (don't insert the NEW row)
  3. If the NEW dealid matches an OLD dealid but either the NEW revenue OR the stage fields have changed (don't match OLD record) insert new row (I'll review both rows manually)


Am I anywhere close (same county) to the right code?



IMO, don’t use triggers.  Load the data into a temporary, or unlogged table, and then run commands to do what you want against the live tables.  Truncate/drop before doing that again the following week.

David J.

Re: INSERT Trigger to check for existing records

From
Adrian Klaver
Date:
On 11/21/20 9:47 AM, Hagen Finley wrote:
> Thanks so much Adrian,
> 
> I like this approach but as you indicated it doesn't actually NULL the 
> INSERT.

It should cause the INSERT not to happen if a row exists with the same 
values for ndealid, revusd and stage. Are you seeing an INSERT for those 
conditions?

> 
> Could we UPDATE the existing record (per my fledgling chk UPDATE and 
> then RETURN NULL? (More proof I don't know what I am talking about ;-).

The INSERT won't happen so I'm not sure what you want to check against?

> 
> Hagen
> 
> 
> On 11/21/20 10:11 AM, Adrian Klaver wrote:
>> On 11/21/20 8:47 AM, Adrian Klaver wrote:
>>> On 11/21/20 8:20 AM, Adrian Klaver wrote:
>>>> On 11/21/20 8:00 AM, Hagen Finley wrote:
>>>>> Hello,
>>>
>>>>
>>>> Instead:
>>>>
>>>> IF NEW.ndealid = OLD.ndealid AND NEW.revusd = OLD.revusd
>>>>          AND NEW.stage = OLD.stage THEN
>>>>      RETURN NULL; --Will cancel INSERT
>>>> ELSE
>>>>      RETURN NEW;
>>>>
>>>> END IF;
>>>
>>> Well this is what happens when I answer BC(before coffee). The above 
>>> will not work, if for no other reason then OLD does not exist in an 
>>> INSERT. Will try to come up with something that is in the realm of 
>>> possibility.
>>
>> Alright caffeine in the blood stream, so something that might actually 
>> work:
>>
>> DECLARE
>>     match_ct integer;
>> BEGIN
>>
>>     SELECT INTO
>>     match_ct count(*)
>>     FROM
>>         sfdc
>>     WHERE
>>         ndealid = NEW.ndealid
>>     AND
>>         revusd = NEW.revusd
>>     AND
>>        stage = NEW.stage;
>>
>>    IF match_ct > 0 THEN
>>        RETURN NULL; --Will cancel INSERT
>>    ELSE
>>        RETURN NEW;
>>    END IF;
>>
>> END;
>>
>> Though I would also point you at David's solution. Given that you are 
>> only looking at ~20% of the records being different it would save you 
>> a lot of churning through INSERTs.
>>
>>>
>>>>
>>>>>
>>>>> Hagen
>>>>>
>>>>> Larimer County, CO
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: INSERT Trigger to check for existing records

From
"David G. Johnston"
Date:
On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com> wrote:

David,

That's an interesting idea. I WOULD like  to retain the OLD records that are the same and only INSERT new or changed records. Is there a way to compare the old and the new records without a trigger?


A where clause?

David J. 

Re: INSERT Trigger to check for existing records

From
"David G. Johnston"
Date:
On Saturday, November 21, 2020, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com> wrote:

David,

That's an interesting idea. I WOULD like  to retain the OLD records that are the same and only INSERT new or changed records. Is there a way to compare the old and the new records without a trigger?


A where clause?


Did you get the part in the plan where there are two tables, existing and new? You write queries that join the two tables together and use the where clause in those queries to limit records.

David J.
 

RE: INSERT Trigger to check for existing records

From
Date:

Yes but it didn’t sink in but the two table join idea does make sense – I’ll give that a try. THANK YOU.

 

From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Saturday, November 21, 2020 11:25 AM
To: Hagen Finley <hagen@datasundae.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: INSERT Trigger to check for existing records

 

On Saturday, November 21, 2020, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Saturday, November 21, 2020, Hagen Finley <hagen@datasundae.com> wrote:

David,

That's an interesting idea. I WOULD like  to retain the OLD records that are the same and only INSERT new or changed records. Is there a way to compare the old and the new records without a trigger?

 

A where clause?

 

 

Did you get the part in the plan where there are two tables, existing and new? You write queries that join the two tables together and use the where clause in those queries to limit records.

 

David J.

 

Re: INSERT Trigger to check for existing records

From
Michael Lewis
Date:
If you can modify your insert statement, and live with an extra column in the data, no trigger is needed as best I can figure.

Create a unique index over the existing columns, add a "created_on" field and call insert on conflict (unique index) do nothing.

This should give the behavior you want.

Re: INSERT Trigger to check for existing records

From
Hagen Finley
Date:

Hello Michael,

Thanks so much for this advice. As I mentioned previously, I'm not very good at this yet, so forgive me if my response is obtuse. I really love databases but my sentiments may be unrequited.

In reality my table has lots of columns (~30) including a report date (repdate) and each week's pull has a new repdate ( in this case 2020-11-02 and 2020-11-09) which could function as a "created on" field.

To clarify, I would create an unique index on all the columns in the old report records (2020-11-02)  or just the three I am comparing (dealid,stage and revenue)?

In either case, so far in my efforts it looks like the create index fails because there are lots of rows with the same stage value, and a few with the same revenue value.

Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);

ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid, stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL state: 23505

I probably could create an unique index on the dealid column as that should be unique. Would that be enough? It seems like that would insert ONLY the new records with a new (unique) dealid and that would definitely by an important step forward.

I hesitate to admit I have no idea how I would code the "call insert on conflict (unique index) do nothing" syntax, but I would be excited to learn.

Best,

Hagen


On 11/22/20 8:54 AM, Michael Lewis wrote:

If you can modify your insert statement, and live with an extra column in the data, no trigger is needed as best I can figure.

Create a unique index over the existing columns, add a "created_on" field and call insert on conflict (unique index) do nothing.

This should give the behavior you want.

Re: INSERT Trigger to check for existing records

From
Adrian Klaver
Date:
On 11/22/20 9:53 AM, Hagen Finley wrote:
> Hello Michael,
> 
> Thanks so much for this advice. As I mentioned previously, I'm not very 
> good at this yet, so forgive me if my response is obtuse. I really love 
> databases but my sentiments may be unrequited.

The overriding issue is lack of a plan. From your first post:

"Here’s what I (REALLY) want:


Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char)     Example: 19743576 22072.37    Commit
- 90%


  1. If the NEW dealid doesn't match any of the OLD dealids, insert the
     new row
  2. if the NEW dealid, revenue and stage fields ALL match the OLD
     dealid, revenue and stage, skip (don't insert the NEW row)
  3. If the NEW dealid matches an OLD dealid but either the NEW revenue
     OR the stage fields have changed (don't match OLD record) insert new
     row (I'll review both rows manually)
"

And from later post:

" I figure I could
use the chk field to delete the new inserts I didn't need."


 From this I come up with the following:

1) Data rules

   a) If dealid in new data does not exist in old data INSERT row.
   b) Id dealid is in both new and old data AND revenue OR stage don't 
  match then INSERT and mark for review.
   c) If new dealid, revenue, stage match old dealid, revenue, stage 
then do not INSERT.

2) Process the data. Choices
    a) Use trigger on table sfdc
    b) Use staging table to hold new data and then process into sfdc table

3) Process the data. Same basic principle for both choices in 2) Flowchart
    a) In new data search for dealid in table sfdc if it does not exist 
add data to sfdc.
    b) If new data dealid does exist in sfdc
    1) If revenue or stage field differ mark for review
    2) If they do match skip further processing
4) Thoughts about above.
    a) To me table sfdc should only hold vetted data that is known to be 
unique per row.
    b) The data for review  1)b) 3)b) should end up in another review 
table e.g. sfdc_review.
    c) Since from OP ' 80% of the records are
the same as the existing records from the week before.' it makes sense 
to use the staging table 2)b) process rather then throwing away a lot of 
INSERTs.

If this makes sense then it comes down to decision in which choice in 2) 
to use. At that point it is filling in the flowchart with the exact 
steps to take.

> 
> In reality my table has lots of columns (~30) including a report date 
> (repdate) and each week's pull has a new repdate ( in this case 
> 2020-11-02 and 2020-11-09) which could function as a "created on" field.
> 
> To clarify, I would create an unique index on all the columns in the old 
> report records (2020-11-02)  or just the three I am comparing 
> (dealid,stage and revenue)?
> 
> In either case, so far in my efforts it looks like the create index 
> fails because there are lots of rows with the same stage value, and a 
> few with the same revenue value.
> 
> Create UNIQUE INDEX idx_sfdc
> ON sfdc(ndealid, stage, revusd);
> 
> ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid, 
> stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL 
> state: 23505
> 
> I probably could create an unique index on the dealid column as that 
> should be unique. Would that be enough? It seems like that would insert 
> ONLY the new records with a new (unique) dealid and that would 
> definitely by an important step forward.
> 
> I hesitate to admit I have no idea how I would code the "call insert on 
> conflict (unique index) do nothing" syntax, but I would be excited to learn.
> 
> Best,
> 
> Hagen
> 
> 
> On 11/22/20 8:54 AM, Michael Lewis wrote:
>> If you can modify your insert statement, and live with an extra column 
>> in the data, no trigger is needed as best I can figure.
>>
>> Create a unique index over the existing columns, add a "created_on" 
>> field and call insert on conflict (unique index) do nothing.
>>
>> This should give the behavior you want.


-- 
Adrian Klaver
adrian.klaver@aklaver.com



RE: INSERT Trigger to check for existing records

From
Date:
Thank you Adrian,

Your summary looks exactly right to me. I think option 2b looks more in reach for my limited skillset.
Let me see if I can make that work (dubious) and report.

Best,

Hagen

-----Original Message-----
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Sunday, November 22, 2020 11:26 AM
To: Hagen Finley <hagen@datasundae.com>; pgsql-general@lists.postgresql.org
Subject: Re: INSERT Trigger to check for existing records

On 11/22/20 9:53 AM, Hagen Finley wrote:
> Hello Michael,
>
> Thanks so much for this advice. As I mentioned previously, I'm not
> very good at this yet, so forgive me if my response is obtuse. I
> really love databases but my sentiments may be unrequited.

The overriding issue is lack of a plan. From your first post:

"Here’s what I (REALLY) want:


Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char)     Example: 19743576 22072.37    Commit
- 90%


  1. If the NEW dealid doesn't match any of the OLD dealids, insert the
     new row
  2. if the NEW dealid, revenue and stage fields ALL match the OLD
     dealid, revenue and stage, skip (don't insert the NEW row)
  3. If the NEW dealid matches an OLD dealid but either the NEW revenue
     OR the stage fields have changed (don't match OLD record) insert new
     row (I'll review both rows manually) "

And from later post:

" I figure I could
use the chk field to delete the new inserts I didn't need."


 From this I come up with the following:

1) Data rules

   a) If dealid in new data does not exist in old data INSERT row.
   b) Id dealid is in both new and old data AND revenue OR stage don't
  match then INSERT and mark for review.
   c) If new dealid, revenue, stage match old dealid, revenue, stage then do not INSERT.

2) Process the data. Choices
    a) Use trigger on table sfdc
    b) Use staging table to hold new data and then process into sfdc table

3) Process the data. Same basic principle for both choices in 2) Flowchart
    a) In new data search for dealid in table sfdc if it does not exist add data to sfdc.
    b) If new data dealid does exist in sfdc
    1) If revenue or stage field differ mark for review
    2) If they do match skip further processing
4) Thoughts about above.
    a) To me table sfdc should only hold vetted data that is known to be unique per row.
    b) The data for review  1)b) 3)b) should end up in another review table e.g. sfdc_review.
    c) Since from OP ' 80% of the records are the same as the existing records from the week before.' it makes sense to
usethe staging table 2)b) process rather then throwing away a lot of INSERTs. 

If this makes sense then it comes down to decision in which choice in 2) to use. At that point it is filling in the
flowchartwith the exact steps to take. 

>
> In reality my table has lots of columns (~30) including a report date
> (repdate) and each week's pull has a new repdate ( in this case
> 2020-11-02 and 2020-11-09) which could function as a "created on" field.
>
> To clarify, I would create an unique index on all the columns in the
> old report records (2020-11-02)  or just the three I am comparing
> (dealid,stage and revenue)?
>
> In either case, so far in my efforts it looks like the create index
> fails because there are lots of rows with the same stage value, and a
> few with the same revenue value.
>
> Create UNIQUE INDEX idx_sfdc
> ON sfdc(ndealid, stage, revusd);
>
> ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid,
> stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL
> state: 23505
>
> I probably could create an unique index on the dealid column as that
> should be unique. Would that be enough? It seems like that would
> insert ONLY the new records with a new (unique) dealid and that would
> definitely by an important step forward.
>
> I hesitate to admit I have no idea how I would code the "call insert
> on conflict (unique index) do nothing" syntax, but I would be excited to learn.
>
> Best,
>
> Hagen
>
>
> On 11/22/20 8:54 AM, Michael Lewis wrote:
>> If you can modify your insert statement, and live with an extra
>> column in the data, no trigger is needed as best I can figure.
>>
>> Create a unique index over the existing columns, add a "created_on"
>> field and call insert on conflict (unique index) do nothing.
>>
>> This should give the behavior you want.


--
Adrian Klaver
adrian.klaver@aklaver.com




Re: INSERT Trigger to check for existing records

From
Hagen Finley
Date:

Adrian and Michael,

My current insert logic (which works) is in a psycopg2 python script which reads a spreadsheet row into an array, so for the moment I didn't want to add that integration to my struggle.

cur = conn.cursor()
                                                                                                                                                                                                                      \
query = "INSERT INTO sfdc(theater,country,account,smotion,opname,cprod,opid,ndealid,qnum,stage,revusd,cdate,bdate,age,opown,opnum,sonum,fbdate,region,dqnum,pid,closed,won,onum,repdate)

VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s);"
                                   
cur.executemany(query, frecords)
conn.commit()
conn.close()


However, the following is something of a stepping stone towards the destination and, (dare I say it? ;-) it works:

CREATE OR REPLACE FUNCTION same_test(did numeric)
RETURNS numeric AS $$
BEGIN
  IF $1 IN
      (SELECT ndealid from hygiene_112)
  THEN
    UPDATE hygiene_119 SET paid = 'SAME';
  ELSE
      UPDATE hygiene_119 SET paid = 'NEW';
  END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;


I changed the dealid to something that doesn't exist (14593030) in the 11-2 table and the function updates the 11-9 table.status field to "NEW":

sales=# UPDATE hygiene_112 SET ndealid = 14593030 WHERE ndealid = 14593039;
UPDATE 1

SELECT same_test(14593039);

+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|repdate   |ndealid |revusd    |stage                 |status  |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|2020-11-09|14593039|1015624.23|Propose - 60%         |NEW   |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+

When I changed it back I get the proper "SAME" update:

sales=# UPDATE hygiene_112 SET ndealid = 14593039 WHERE ndealid = 14593030;
UPDATE 1

SELECT same_test(14593039);

+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|repdate   |ndealid |revusd    |stage                 |status  |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|2020-11-09|14593039|1015624.23|Propose - 60%         |SAME   |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+


I'm generally amazed when ANYTHING works so this is good news (to me). It seems logical I could replace the UPDATE statement with an INSERT statement at this point.

However, that only addresses one of the data rules on my checklist.

I'll keep forging ahead here and see what additional progress I can attain. Very much appreciate your patient assistance here.

Best,

Hagen


On 11/22/20 11:26 AM, Adrian Klaver wrote:
On 11/22/20 9:53 AM, Hagen Finley wrote:
Hello Michael,

Thanks so much for this advice. As I mentioned previously, I'm not very good at this yet, so forgive me if my response is obtuse. I really love databases but my sentiments may be unrequited.

The overriding issue is lack of a plan. From your first post:

"Here’s what I (REALLY) want:


Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char)     Example: 19743576 22072.37    Commit
- 90%


 1. If the NEW dealid doesn't match any of the OLD dealids, insert the
    new row
 2. if the NEW dealid, revenue and stage fields ALL match the OLD
    dealid, revenue and stage, skip (don't insert the NEW row)
 3. If the NEW dealid matches an OLD dealid but either the NEW revenue
    OR the stage fields have changed (don't match OLD record) insert new
    row (I'll review both rows manually)
"

And from later post:

" I figure I could
use the chk field to delete the new inserts I didn't need."


From this I come up with the following:

1) Data rules

  a) If dealid in new data does not exist in old data INSERT row.
  b) Id dealid is in both new and old data AND revenue OR stage don't  match then INSERT and mark for review.
  c) If new dealid, revenue, stage match old dealid, revenue, stage then do not INSERT.

2) Process the data. Choices
   a) Use trigger on table sfdc
   b) Use staging table to hold new data and then process into sfdc table

3) Process the data. Same basic principle for both choices in 2) Flowchart
   a) In new data search for dealid in table sfdc if it does not exist add data to sfdc.
   b) If new data dealid does exist in sfdc
    1) If revenue or stage field differ mark for review
    2) If they do match skip further processing
4) Thoughts about above.
   a) To me table sfdc should only hold vetted data that is known to be unique per row.
   b) The data for review  1)b) 3)b) should end up in another review table e.g. sfdc_review.
   c) Since from OP ' 80% of the records are
the same as the existing records from the week before.' it makes sense to use the staging table 2)b) process rather then throwing away a lot of INSERTs.

If this makes sense then it comes down to decision in which choice in 2) to use. At that point it is filling in the flowchart with the exact steps to take.


In reality my table has lots of columns (~30) including a report date (repdate) and each week's pull has a new repdate ( in this case 2020-11-02 and 2020-11-09) which could function as a "created on" field.

To clarify, I would create an unique index on all the columns in the old report records (2020-11-02)  or just the three I am comparing (dealid,stage and revenue)?

In either case, so far in my efforts it looks like the create index fails because there are lots of rows with the same stage value, and a few with the same revenue value.

Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);

ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid, stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL state: 23505

I probably could create an unique index on the dealid column as that should be unique. Would that be enough? It seems like that would insert ONLY the new records with a new (unique) dealid and that would definitely by an important step forward.

I hesitate to admit I have no idea how I would code the "call insert on conflict (unique index) do nothing" syntax, but I would be excited to learn.

Best,

Hagen


On 11/22/20 8:54 AM, Michael Lewis wrote:
If you can modify your insert statement, and live with an extra column in the data, no trigger is needed as best I can figure.

Create a unique index over the existing columns, add a "created_on" field and call insert on conflict (unique index) do nothing.

This should give the behavior you want.


Folks,

Just a quick question. Using this FUNCTION:

CREATE OR REPLACE FUNCTION same_test(did numeric)
RETURNS numeric AS $$
BEGIN
  IF $1 IN
      (SELECT dealid from hygiene_112)
  THEN
    UPDATE hygiene_119 SET paid = 'SAME';
  ELSE
      UPDATE hygiene_119 SET paid = 'NEW';
  END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;

Does the following query input the the dealids that result from the SELECT statement into the parameter of the sames_test() FUNCTION?

Select dealid sametest(dealid) FROM hygiene_123;

I doubt it does (my query runs a long time) :-). I know I can utilize python to push SELECT results into a array and then run a 'FOR d in dealids' LOOP to feed the FUNCTION parameter but I'd like to learn how to do that with nested SQL statements or FUNCTIONS.

Thanks!


Hagen


On 11/22/20 4:28 PM, Hagen Finley wrote:

Adrian and Michael,

My current insert logic (which works) is in a psycopg2 python script which reads a spreadsheet row into an array, so for the moment I didn't want to add that integration to my struggle.

cur = conn.cursor()
                                                                                                                                                                                                                      \
query = "INSERT INTO sfdc(theater,country,account,smotion,opname,cprod,opid,ndealid,qnum,stage,revusd,cdate,bdate,age,opown,opnum,sonum,fbdate,region,dqnum,pid,closed,won,onum,repdate)

VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s);"
                                   
cur.executemany(query, frecords)
conn.commit()
conn.close()


However, the following is something of a stepping stone towards the destination and, (dare I say it? ;-) it works:

CREATE OR REPLACE FUNCTION same_test(did numeric)
RETURNS numeric AS $$
BEGIN
  IF $1 IN
      (SELECT ndealid from hygiene_112)
  THEN
    UPDATE hygiene_119 SET paid = 'SAME';
  ELSE
      UPDATE hygiene_119 SET paid = 'NEW';
  END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;


I changed the dealid to something that doesn't exist (14593030) in the 11-2 table and the function updates the 11-9 table.status field to "NEW":

sales=# UPDATE hygiene_112 SET ndealid = 14593030 WHERE ndealid = 14593039;
UPDATE 1

SELECT same_test(14593039);

+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|repdate   |ndealid |revusd    |stage                 |status  |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|2020-11-09|14593039|1015624.23|Propose - 60%         |NEW   |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+

When I changed it back I get the proper "SAME" update:

sales=# UPDATE hygiene_112 SET ndealid = 14593039 WHERE ndealid = 14593030;
UPDATE 1

SELECT same_test(14593039);

+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|repdate   |ndealid |revusd    |stage                 |status  |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+
|2020-11-09|14593039|1015624.23|Propose - 60%         |SAME   |
+----------+-----------------------------------+------------------------------------------------------+--------+----------+----------------------+------+


I'm generally amazed when ANYTHING works so this is good news (to me). It seems logical I could replace the UPDATE statement with an INSERT statement at this point.

However, that only addresses one of the data rules on my checklist.

I'll keep forging ahead here and see what additional progress I can attain. Very much appreciate your patient assistance here.

Best,

Hagen


On 11/22/20 11:26 AM, Adrian Klaver wrote:
On 11/22/20 9:53 AM, Hagen Finley wrote:
Hello Michael,

Thanks so much for this advice. As I mentioned previously, I'm not very good at this yet, so forgive me if my response is obtuse. I really love databases but my sentiments may be unrequited.

The overriding issue is lack of a plan. From your first post:

"Here’s what I (REALLY) want:


Trigger looks at three fields prior to new insert: Deal ID (numeric),
revenue (numeric), stage(char)     Example: 19743576 22072.37    Commit
- 90%


 1. If the NEW dealid doesn't match any of the OLD dealids, insert the
    new row
 2. if the NEW dealid, revenue and stage fields ALL match the OLD
    dealid, revenue and stage, skip (don't insert the NEW row)
 3. If the NEW dealid matches an OLD dealid but either the NEW revenue
    OR the stage fields have changed (don't match OLD record) insert new
    row (I'll review both rows manually)
"

And from later post:

" I figure I could
use the chk field to delete the new inserts I didn't need."


From this I come up with the following:

1) Data rules

  a) If dealid in new data does not exist in old data INSERT row.
  b) Id dealid is in both new and old data AND revenue OR stage don't  match then INSERT and mark for review.
  c) If new dealid, revenue, stage match old dealid, revenue, stage then do not INSERT.

2) Process the data. Choices
   a) Use trigger on table sfdc
   b) Use staging table to hold new data and then process into sfdc table

3) Process the data. Same basic principle for both choices in 2) Flowchart
   a) In new data search for dealid in table sfdc if it does not exist add data to sfdc.
   b) If new data dealid does exist in sfdc
    1) If revenue or stage field differ mark for review
    2) If they do match skip further processing
4) Thoughts about above.
   a) To me table sfdc should only hold vetted data that is known to be unique per row.
   b) The data for review  1)b) 3)b) should end up in another review table e.g. sfdc_review.
   c) Since from OP ' 80% of the records are
the same as the existing records from the week before.' it makes sense to use the staging table 2)b) process rather then throwing away a lot of INSERTs.

If this makes sense then it comes down to decision in which choice in 2) to use. At that point it is filling in the flowchart with the exact steps to take.


In reality my table has lots of columns (~30) including a report date (repdate) and each week's pull has a new repdate ( in this case 2020-11-02 and 2020-11-09) which could function as a "created on" field.

To clarify, I would create an unique index on all the columns in the old report records (2020-11-02)  or just the three I am comparing (dealid,stage and revenue)?

In either case, so far in my efforts it looks like the create index fails because there are lots of rows with the same stage value, and a few with the same revenue value.

Create UNIQUE INDEX idx_sfdc
ON sfdc(ndealid, stage, revusd);

ERROR: could not create unique index "idx_sfdc" DETAIL: Key (ndealid, stage, revusd)=(19743576, Commit - 90% , 22072.37) is duplicated. SQL state: 23505

I probably could create an unique index on the dealid column as that should be unique. Would that be enough? It seems like that would insert ONLY the new records with a new (unique) dealid and that would definitely by an important step forward.

I hesitate to admit I have no idea how I would code the "call insert on conflict (unique index) do nothing" syntax, but I would be excited to learn.

Best,

Hagen


On 11/22/20 8:54 AM, Michael Lewis wrote:
If you can modify your insert statement, and live with an extra column in the data, no trigger is needed as best I can figure.

Create a unique index over the existing columns, add a "created_on" field and call insert on conflict (unique index) do nothing.

This should give the behavior you want.


On 11/25/20 7:41 AM, Hagen Finley wrote:
> Folks,
> 
> Just a quick question. *Using this FUNCTION:*
> 
>     CREATE OR REPLACE FUNCTION same_test(did numeric)
>     RETURNS numeric AS $$
>     BEGIN
>        IF $1 IN
>            (SELECT dealid from hygiene_112)
>        THEN
>          UPDATE hygiene_119 SET paid = 'SAME';
>        ELSE
>            UPDATE hygiene_119 SET paid = 'NEW';
>        END IF;
>     RETURN NULL;
>     END;
>     $$ LANGUAGE plpgsql;

The above is broken in multiple ways:

1) You have RETURNS numeric and then RETURN NULL; This means you will 
not actually return anything

2) You have the input argument did but you never use it to restrict your 
UPDATEs.

3) Not sure the logic in the IF actually works even if you filtered by 
did. This assumes that there will always be a row in hygiene_119 that 
matches one in hygiene_112. Given that you setting a 'NEW' flag I'm 
guessing that is not the case.

You will need to sketch out the thought process at work here before we 
can go any further with this.



> 
> *Does the following query input the the dealids that result from the 
> SELECT statement into the parameter of the sames_test() FUNCTION?*

> 
> Select dealid sametest(dealid) FROM hygiene_123;

Have no idea what that is supposed to do?

If you want to use the function(after fixing it) you would have to do:

select * from some_test(some_number);

> 
> I doubt it does (my query runs a /long time)/ :-). I know I can utilize 
> python to push SELECT results into a array and then run a 'FOR d in 
> dealids' LOOP to feed the FUNCTION parameter but I'd like to learn how 
> to do that with nested SQL statements or FUNCTIONS.
> 
> Thanks!
> 
> 
> Hagen
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com



Adrian,

Thanks for your detailed response. That's very kind and much appreciated.

1. OK that's just me groping for a RETURN statement that doesn't throw a rod. I don't actually need to return anything
asthe goal of the FUNCTION (for the moment)  is to perform updates to a table. It might be nice to return some sort of
confirmationbut it's not necessary. Apparently I don't fully understand the RETURN concept (oh really? 😉. Any
suggestionswhere to research or read? 

2. I have two tables:
    a) sfdc which is the baseline - doesn't change -  isn't updated by this FUNTION
    b) hygiene_119 a new table which has some records (~80%) which are identical to those already in sfdc.

The logic flow is:
        i) SELECT the dealids from hygiene_119 (latest or new  report dated 11/9)
        ii) compare those hygiene_119.dealids with the existing sfdc.dealids  -  hence the IF $1 (one result from the
hygiene_119.dealdidSELECT) is IN (matches) any of the sfdc.dealids THEN 
        iii) UPDATE hygiene_119 SET status = 'SAME' WHERE dealid = $1; --flag that hygiene_119 record as the SAME or a
duplicaterecord 
        iv) ELSE UPDATE hygiene_119 SET status = 'NEW' WHERE dealid = $1; --flag that hygiene_119 record as NEW  or a
newrecord 
    Once I have inspected the "NEW" records in hygiene_119 I will INSERT then into sfdc. Then rinse and repeat each
weekwith a new report. 

3. Not positive the IF is doing what I want,  but if I copy a sfdc.dealid into the same_test() parameter field the
FUNTIONdoes update the hygiene_119.status field properly. To me, it appears I just need a way to iterate through and
insertone hygiene_119.dealid in the same_test parameter field. Then the UPDATE should flag all the hygiene_119 records
asSAME or NEW. Obviously I don't REALLY need both flags as the absence of a flag would indicate status too. 

Does that articulate the thought process adequately?

Best,

Hagen


-----Original Message-----
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Wednesday, November 25, 2020 9:07 AM
To: Hagen Finley <hagen@datasundae.com>; pgsql-general@lists.postgresql.org
Subject: Re: INSERT Trigger to check for existing records : Does this do what I hope it is doing?

On 11/25/20 7:41 AM, Hagen Finley wrote:
> Folks,
>
> Just a quick question. *Using this FUNCTION:*
>
>     CREATE OR REPLACE FUNCTION same_test(did numeric)
>     RETURNS numeric AS $$
>     BEGIN
>        IF $1 IN
>            (SELECT dealid from sfdc)
>        THEN
>          UPDATE hygiene_119 SET status = 'SAME';
>        ELSE
>            UPDATE hygiene_119 SET status = 'NEW';
>        END IF;
>     RETURN NULL;
>     END;
>     $$ LANGUAGE plpgsql;

The above is broken in multiple ways:

1) You have RETURNS numeric and then RETURN NULL; This means you will not actually return anything

2) You have the input argument did but you never use it to restrict your UPDATEs.

3) Not sure the logic in the IF actually works even if you filtered by did. This assumes that there will always be a
rowin hygiene_119 that matches one in hygiene_112. Given that you setting a 'NEW' flag I'm guessing that is not the
case.

You will need to sketch out the thought process at work here before we can go any further with this.



>
> *Does the following query input the the dealids that result from the
> SELECT statement into the parameter of the sames_test() FUNCTION?*

>
> Select dealid sametest(dealid) FROM hygiene_123;

Have no idea what that is supposed to do?

If you want to use the function(after fixing it) you would have to do:

select * from some_test(some_number);

>
> I doubt it does (my query runs a /long time)/ :-). I know I can utilize
> python to push SELECT results into a array and then run a 'FOR d in
> dealids' LOOP to feed the FUNCTION parameter but I'd like to learn how
> to do that with nested SQL statements or FUNCTIONS.
>
> Thanks!
>
>
> Hagen
>

--
Adrian Klaver
adrian.klaver@aklaver.com






On 11/25/20 8:43 AM, hagen@datasundae.com wrote:
> Adrian,
> 
> Thanks for your detailed response. That's very kind and much appreciated.
> 
> 1. OK that's just me groping for a RETURN statement that doesn't throw a rod. 

Things can still work, sort of. I once cranked up and ran(for a short 
time) a JD 4020 that had a rod coming through the block. It was smoky 
and rough, but it ran. OT I know but that image came back clear as day.

I don't actually need to return anything as the goal of the FUNCTION 
(for the moment)  is to perform updates to a table. It might be nice to 
return some sort of confirmation but it's not necessary. Apparently I 
don't fully understand the RETURN concept (oh really? 😉. Any 
suggestions where to research or read?
> 
> 2. I have two tables:
>     a) sfdc which is the baseline - doesn't change -  isn't updated by this FUNTION
>     b) hygiene_119 a new table which has some records (~80%) which are identical to those already in sfdc.
> 
> The logic flow is:
>         i) SELECT the dealids from hygiene_119 (latest or new  report dated 11/9)

Not seeing where that is done?

>         ii) compare those hygiene_119.dealids with the existing sfdc.dealids  -  hence the IF $1 (one result from the
hygiene_119.dealdidSELECT) is IN (matches) any of the sfdc.dealids THEN
 

Again not seeing any comparison to sfdc?

>         iii) UPDATE hygiene_119 SET status = 'SAME' WHERE dealid = $1; --flag that hygiene_119 record as the SAME or
aduplicate record
 
>         iv) ELSE UPDATE hygiene_119 SET status = 'NEW' WHERE dealid = $1; --flag that hygiene_119 record as NEW  or a
newrecord
 
>     Once I have inspected the "NEW" records in hygiene_119 I will INSERT then into sfdc. Then rinse and repeat each
weekwith a new report.
 

Until the previous questions are addressed the above is not doable.

> 
> 3. Not positive the IF is doing what I want,  but if I copy a sfdc.dealid into the same_test() parameter field the
FUNTIONdoes update the hygiene_119.status field properly. To me, it appears I just need a way to iterate through and
insertone hygiene_119.dealid in the same_test parameter field. Then the UPDATE should flag all the hygiene_119 records
asSAME or NEW. Obviously I don't REALLY need both flags as the absence of a flag would indicate status too.
 

Before continuing with the function I would try some SELECT functions 
that do what you want.

> 
> Does that articulate the thought process adequately?
> 
> Best,
> 
> Hagen
> 
> 
> -----Original Message-----
> From: Adrian Klaver <adrian.klaver@aklaver.com>
> Sent: Wednesday, November 25, 2020 9:07 AM
> To: Hagen Finley <hagen@datasundae.com>; pgsql-general@lists.postgresql.org
> Subject: Re: INSERT Trigger to check for existing records : Does this do what I hope it is doing?
> 
> On 11/25/20 7:41 AM, Hagen Finley wrote:
>> Folks,
>>
>> Just a quick question. *Using this FUNCTION:*
>>
>>      CREATE OR REPLACE FUNCTION same_test(did numeric)
>>      RETURNS numeric AS $$
>>      BEGIN
>>         IF $1 IN
>>             (SELECT dealid from sfdc)
>>         THEN
>>           UPDATE hygiene_119 SET status = 'SAME';
>>         ELSE
>>             UPDATE hygiene_119 SET status = 'NEW';
>>         END IF;
>>      RETURN NULL;
>>      END;
>>      $$ LANGUAGE plpgsql;
> 
> The above is broken in multiple ways:
> 
> 1) You have RETURNS numeric and then RETURN NULL; This means you will not actually return anything
> 
> 2) You have the input argument did but you never use it to restrict your UPDATEs.
> 
> 3) Not sure the logic in the IF actually works even if you filtered by did. This assumes that there will always be a
rowin hygiene_119 that matches one in hygiene_112. Given that you setting a 'NEW' flag I'm guessing that is not the
case.
> 
> You will need to sketch out the thought process at work here before we can go any further with this.
> 
> 
> 
>>
>> *Does the following query input the the dealids that result from the
>> SELECT statement into the parameter of the sames_test() FUNCTION?*
> 
>>
>> Select dealid sametest(dealid) FROM hygiene_123;
> 
> Have no idea what that is supposed to do?
> 
> If you want to use the function(after fixing it) you would have to do:
> 
> select * from some_test(some_number);
> 
>>
>> I doubt it does (my query runs a /long time)/ :-). I know I can utilize
>> python to push SELECT results into a array and then run a 'FOR d in
>> dealids' LOOP to feed the FUNCTION parameter but I'd like to learn how
>> to do that with nested SQL statements or FUNCTIONS.
>>
>> Thanks!
>>
>>
>> Hagen
>>
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



[Hagen] Answers inline

-----Original Message-----
From: Adrian Klaver <adrian.klaver@aklaver.com>
Sent: Wednesday, November 25, 2020 10:13 AM
To: hagen@datasundae.com; pgsql-general@lists.postgresql.org
Subject: Re: INSERT Trigger to check for existing records : Does this do what I hope it is doing?

On 11/25/20 8:43 AM, hagen@datasundae.com wrote:
> Adrian,
>
> Thanks for your detailed response. That's very kind and much appreciated.
>
> 1. OK that's just me groping for a RETURN statement that doesn't throw a rod.

Things can still work, sort of. I once cranked up and ran(for a short
time) a JD 4020 that had a rod coming through the block. It was smoky and rough, but it ran. OT I know but that image
cameback clear as day. 

I don't actually need to return anything as the goal of the FUNCTION (for the moment)  is to perform updates to a
table.It might be nice to return some sort of confirmation but it's not necessary. Apparently I don't fully understand
theRETURN concept (oh really? 😉. Any suggestions where to research or read? 
>
> 2. I have two tables:
>     a) sfdc which is the baseline - doesn't change -  isn't updated by this FUNTION
>     b) hygiene_119 a new table which has some records (~80%) which are identical to those already in sfdc.
>
> The logic flow is:
>         i) SELECT the dealids from hygiene_119 (latest or new  report dated
> 11/9)

Not seeing where that is done?

[Hagen] I was hoping to do the SELECT from hygiene_119 when I called the FUNCTION same_test() by SELECTING
hygiene_119.dealidand then using that list as an input via same_test (hygiene_119.dealid) 

[Hagen] SELECT dealid sametest(dealid) FROM hygiene_123; (more precisely SELECT hygiene_119.dealid,
same_test(hygiene_119.dealid)FROM hygiene_119;  ). 

>         ii) compare those hygiene_119.dealids with the existing sfdc.dealids
> -  hence the IF $1 (one result from the hygiene_119.dealdid SELECT) is
> IN (matches) any of the sfdc.dealids THEN

Again not seeing any comparison to sfdc?

[Hagen] Assuming the same_test(hygiene_119.dealid) call worked - IF $1 = hygiene_119.dealid[0] in python array
vernacularwould be compared against the list of the SELECT sfdc.dealid results. 

[Hagen] Spelling it out more clearly isn't exactly boosting my confidence in my approach 😉

>         iii) UPDATE hygiene_119 SET status = 'SAME' WHERE dealid = $1; --flag that hygiene_119 record as the SAME or
aduplicate record 
>         iv) ELSE UPDATE hygiene_119 SET status = 'NEW' WHERE dealid = $1; --flag that hygiene_119 record as NEW  or a
newrecord 
>     Once I have inspected the "NEW" records in hygiene_119 I will INSERT then into sfdc. Then rinse and repeat each
weekwith a new report. 

Until the previous questions are addressed the above is not doable.

>
> 3. Not positive the IF is doing what I want,  but if I copy a sfdc.dealid into the same_test() parameter field the
FUNTIONdoes update the hygiene_119.status field properly. To me, it appears I just need a way to iterate through and
insertone hygiene_119.dealid in the same_test parameter field. Then the UPDATE should flag all the hygiene_119 records
asSAME or NEW. Obviously I don't REALLY need both flags as the absence of a flag would indicate status too. 

Before continuing with the function I would try some SELECT functions that do what you want.

>
> Does that articulate the thought process adequately?
>
> Best,
>
> Hagen
>
>
> -----Original Message-----
> From: Adrian Klaver <adrian.klaver@aklaver.com>
> Sent: Wednesday, November 25, 2020 9:07 AM
> To: Hagen Finley <hagen@datasundae.com>;
> pgsql-general@lists.postgresql.org
> Subject: Re: INSERT Trigger to check for existing records : Does this do what I hope it is doing?
>
> On 11/25/20 7:41 AM, Hagen Finley wrote:
>> Folks,
>>
>> Just a quick question. *Using this FUNCTION:*
>>
>>      CREATE OR REPLACE FUNCTION same_test(did numeric)
>>      RETURNS numeric AS $$
>>      BEGIN
>>         IF $1 IN
>>             (SELECT dealid from sfdc)
>>         THEN
>>           UPDATE hygiene_119 SET status = 'SAME';
>>         ELSE
>>             UPDATE hygiene_119 SET status = 'NEW';
>>         END IF;
>>      RETURN NULL;
>>      END;
>>      $$ LANGUAGE plpgsql;
>
> The above is broken in multiple ways:
>
> 1) You have RETURNS numeric and then RETURN NULL; This means you will
> not actually return anything
>
> 2) You have the input argument did but you never use it to restrict your UPDATEs.
>
> 3) Not sure the logic in the IF actually works even if you filtered by did. This assumes that there will always be a
rowin hygiene_119 that matches one in hygiene_112. Given that you setting a 'NEW' flag I'm guessing that is not the
case.
>
> You will need to sketch out the thought process at work here before we can go any further with this.
>
>
>
>>
>> *Does the following query input the the dealids that result from the
>> SELECT statement into the parameter of the sames_test() FUNCTION?*
>
>>
>> Select dealid sametest(dealid) FROM hygiene_123;
>
> Have no idea what that is supposed to do?
>
> If you want to use the function(after fixing it) you would have to do:
>
> select * from some_test(some_number);
>
>>
>> I doubt it does (my query runs a /long time)/ :-). I know I can
>> utilize python to push SELECT results into a array and then run a
>> 'FOR d in dealids' LOOP to feed the FUNCTION parameter but I'd like
>> to learn how to do that with nested SQL statements or FUNCTIONS.
>>
>> Thanks!
>>
>>
>> Hagen
>>
>


--
Adrian Klaver
adrian.klaver@aklaver.com






On 11/25/20 9:13 AM, Adrian Klaver wrote:
> On 11/25/20 8:43 AM, hagen@datasundae.com wrote:
>> Adrian,
>>

> 
> Until the previous questions are addressed the above is not doable.
> 
>>
>> 3. Not positive the IF is doing what I want,  but if I copy a 
>> sfdc.dealid into the same_test() parameter field the FUNTION does 
>> update the hygiene_119.status field properly. To me, it appears I just 
>> need a way to iterate through and  insert one hygiene_119.dealid in 
>> the same_test parameter field. Then the UPDATE should flag all the 
>> hygiene_119 records as SAME or NEW. Obviously I don't REALLY need both 
>> flags as the absence of a flag would indicate status too.
> 
> Before continuing with the function I would try some SELECT functions 
> that do what you want.
> 


Would probably help if I expanded on that. So to find dealids that are 
new, something like:

SELECT
     new_data.dealid
FROM
     hygiene_119 AS new_data
LEFT JOIN
     sfdc sp AS old_data
ON
     new_data.dealid = old_data.dealid
WHERE
     old_data.dealid IS NULL;

The LEFT JOIN will return all the dealids from hygiene_119 and if a 
dealid does not exist in sfdc the fields for it will be set to NULL. So 
by filtering on old_data.dealid IS NULL you find the dealids that exist 
in the new data but not the old.


>> Best,
>>
>> Hagen
>>



-- 
Adrian Klaver
adrian.klaver@aklaver.com