Thread: How to always run UPDATE FROM despite missing records in the source table?

How to always run UPDATE FROM despite missing records in the source table?

From
Alexander Farber
Date:
Good afternoon

I have prepared a simplified test case for my question: https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0

In PostgreSQL 10.6 there are 2 tables:

CREATE TABLE users (
        uid SERIAL PRIMARY KEY,
        created       timestamptz NOT NULL,
        visited       timestamptz NOT NULL,
        ip            inet        NOT NULL,
        lat           double precision,
        lng           double precision
  );

  CREATE TABLE geoip (
        block   inet    PRIMARY KEY,
        lat     double precision,
        lng     double precision
);

CREATE INDEX ON geoip USING SPGIST (block);

which are filled with the following test data:

INSERT INTO users (created, visited, ip) VALUES
  (now(), now(), '1.2.3.4'::inet),
  (now(), now(), '1.2.3.5'::inet),
  (now(), now(), '1.2.3.6'::inet);

INSERT INTO geoip (block, lat, lng) VALUES
 ('1.2.3.0/24', -33.4940, 143.2104),
 ('10.0.0.0/8', 34.6617, 133.9350);

Then in a stored function I run the following UPDATE command -

UPDATE users u SET
    visited = now(),
    ip      = '10.10.10.10'::inet,
    lat     = i.lat,
    lng     = i.lng
FROM geoip i
WHERE u.uid = 1 AND '10.10.10.10'::inet <<= i.block;

(the 1 and the ip address are actually in_uid and in_ip parameters in my stored function).

The above query works well and updates all 4 fields in the users table.

However the following query does not work as intended and does not update any fields, because there is no matching block in the geoip table found:

UPDATE users u SET
    visited = now(),                  -- HOW TO ALWAYS UPDATE THIS FIELD?
    ip      = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
    lat     = i.lat,
    lng     = i.lng
FROM geoip i
WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;

The field visited and ip however should be always updated - regardless if the block was found or not.

Kind of LEFT JOIN, but for UPDATE - how to achieve this please?

The only workaround that I could think of is -

UPDATE users SET
    visited = now(),
    ip      = '20.20.20.20'::inet,
    lat     = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
    lng     = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
WHERE uid = 2;

But that would run the same subquery twice (correct?) and my geoip table is already slow with 3073410 records (and that is why I am trying to cache its lat and lng values in the users table on each user login event)

Regards
Alex

RE: How to always run UPDATE FROM despite missing records in thesource table?

From
Kevin Brannen
Date:

From: Alexander Farber <alexander.farber@gmail.com>

  • The only workaround that I could think of is -

    UPDATE users SET
        visited = now(),
        ip      = '20.20.20.20'::inet,
        lat     = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
        lng     = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
    WHERE uid = 2;

    But that would run the same subquery twice (correct?) and my geoip table is already slow with 3073410 records (and that is why I am trying to cache its lat and lng values in the users table on each user login event)

 

Have you considered using a WITH clause to get the data so the query is only run once?
See section 7.8.2 at https://www.postgresql.org/docs/9.6/queries-with.html

Kevin

This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, distribution, review, copy or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify us by reply e-mail, and destroy the original transmission and its attachments without reading them or saving them to disk. Thank you.

Re: How to always run UPDATE FROM despite missing records in thesource table?

From
Adrian Klaver
Date:
On 1/11/19 4:50 AM, Alexander Farber wrote:
> Good afternoon
> 
> I have prepared a simplified test case for my question: 
> https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0
> 
> In PostgreSQL 10.6 there are 2 tables:
> 
> CREATE TABLE users (
>          uid SERIAL PRIMARY KEY,
>          created       timestamptz NOT NULL,
>          visited       timestamptz NOT NULL,
>          ip            inet        NOT NULL,
>          lat           double precision,
>          lng           double precision
>    );
> 
>    CREATE TABLE geoip (
>          block   inet    PRIMARY KEY,
>          lat     double precision,
>          lng     double precision
> );
> 
> CREATE INDEX ON geoip USING SPGIST (block);
> 
> which are filled with the following test data:
> 
> INSERT INTO users (created, visited, ip) VALUES
>    (now(), now(), '1.2.3.4'::inet),
>    (now(), now(), '1.2.3.5'::inet),
>    (now(), now(), '1.2.3.6'::inet);
> 
> INSERT INTO geoip (block, lat, lng) VALUES
>   ('1.2.3.0/24 <http://1.2.3.0/24>', -33.4940, 143.2104),
>   ('10.0.0.0/8 <http://10.0.0.0/8>', 34.6617, 133.9350);
> 
> Then in a stored function I run the following UPDATE command -
> 
> UPDATE users u SET
>      visited = now(),
>      ip      = '10.10.10.10'::inet,
>      lat     = i.lat,
>      lng     = i.lng
> FROM geoip i
> WHERE u.uid = 1 AND '10.10.10.10'::inet <<= i.block;
> 
> (the 1 and the ip address are actually in_uid and in_ip parameters in my 
> stored function).
> 
> The above query works well and updates all 4 fields in the users table.
> 
> However the following query does not work as intended and does not 
> update any fields, because there is no matching block in the geoip table 
> found:
> 
> UPDATE users u SET
>      visited = now(),                  -- HOW TO ALWAYS UPDATE THIS FIELD?
>      ip      = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
>      lat     = i.lat,
>      lng     = i.lng
> FROM geoip i
> WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;
> 
> The field visited and ip however should be always updated - regardless 
> if the block was found or not.
> 
> Kind of LEFT JOIN, but for UPDATE - how to achieve this please?
> 
> The only workaround that I could think of is -
> 
> UPDATE users SET
>      visited = now(),
>      ip      = '20.20.20.20'::inet,
>      lat     = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
>      lng     = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
> WHERE uid = 2;
> 
> But that would run the same subquery twice (correct?) and my geoip table 
> is already slow with 3073410 records (and that is why I am trying to 
> cache its lat and lng values in the users table on each user login event)

Why not put a test for the block in the function and then use different 
UPDATE's depending on the result?

Pseudo code:

IF ip IN block THEN
    UPDATE users u SET
     visited = now(),
     ip      = '10.10.10.10'::inet,
     lat     = i.lat,
     lng     = i.lng
FROM geoip i
WHERE u.uid = 1;

ELSE

UPDATE users u SET
     visited = now(),
     ip      = '20.20.20.20'::inet
FROM geoip i
WHERE u.uid = 2 ;

END IF;

> 
> Regards
> Alex


-- 
Adrian Klaver
adrian.klaver@aklaver.com


Re: How to always run UPDATE FROM despite missing records in the source table?

From
Andrew Gierth
Date:
>>>>> "Alexander" == Alexander Farber <alexander.farber@gmail.com> writes:

 Alexander> However the following query does not work as intended and
 Alexander> does not update any fields, because there is no matching
 Alexander> block in the geoip table found:

 Alexander> UPDATE users u SET
 Alexander>     visited = now(),                  -- HOW TO ALWAYS UPDATE THIS FIELD?
 Alexander>     ip      = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
 Alexander>     lat     = i.lat,
 Alexander>     lng     = i.lng
 Alexander> FROM geoip i
 Alexander> WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;

 Alexander> The field visited and ip however should be always updated -
 Alexander> regardless if the block was found or not.

 Alexander> Kind of LEFT JOIN, but for UPDATE - how to achieve this please?

It can be done like this (this assumes you want to preserve the previous
values of u.lat/u.lng if the block was not found; if you want to set
them to null instead, then remove the coalesce() calls):

UPDATE users u
   SET visited = now(),
       ip = v.ip,
       lat = coalesce(i.lat, u.lat),
       lng = coalesce(i.lng, u.lng)
  FROM (VALUES ('20.20.20.20'::inet)) v(ip)
       LEFT JOIN geoip i ON (v.ip <<= i.block)
 WHERE u.uid = 2;

 Alexander> But that would run the same subquery twice (correct?) and my
 Alexander> geoip table is already slow with 3073410 records

Slow even with a gist or spgist index? what does the explain analyze
look like?

(You could also try using the ip4r module; I've not done any serious
benchmarking to see if it's faster than the built-in index types, though
it has some theoretical advantages due to not being restricted to CIDR
ranges. In pg versions before the built-in inet type got a gist index
method, ip4r was _the_ way to do ip block lookups for geoip etc.)

-- 
Andrew (irc:RhodiumToad)


Re: How to always run UPDATE FROM despite missing records in thesource table?

From
Alexander Farber
Date:
Thank you Adrian -

On Fri, Jan 11, 2019 at 4:55 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/11/19 4:50 AM, Alexander Farber wrote:
> https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0

Why not put a test for the block in the function and then use different
UPDATE's depending on the result?

I didn't want to use IF and switch that statement to PL/pgSQL from pure SQL, so Andrew's answer

 UPDATE users u SET 
    visited = now(),
    ip = v.ip,
    lat = i.lat,
    lng = i.lng
FROM (VALUES ('20.20.20.20'::inet)) v(ip)
      LEFT JOIN geoip i ON (v.ip <<= i.block)
WHERE u.uid = 2;

suits me better, even though I wonder what is the (VALUES ('20.20.20.20'::inet)) v(ip) construct there, some temporary table which is then LEGT JOINed to the geoip table?

Also, Andrew you have been right - with spgist index my queries against geoip are fast enough, I was looking at the wrong spot in my EXPLAIN ANALYZE output (the average values are slow, I am going to cache them soon)

Regards
Alex

Re: How to always run UPDATE FROM despite missing records in the source table?

From
Andrew Gierth
Date:
>>>>> "Alexander" == Alexander Farber <alexander.farber@gmail.com> writes:

 Alexander> even though I wonder what is the (VALUES
 Alexander> ('20.20.20.20'::inet)) v(ip) construct there, some temporary
 Alexander> table which is then LEGT JOINed to the geoip table?

The SQL spec calls it a <table value constructor>. The VALUES clause
behaves like a SELECT that returns a fixed number of rows (1 or more)
whose columns contain the results of the specified expressions. The
v(ip) part is just a table and column alias (I omitted the optional AS
keyword out of long habit) to name the constructed table.

-- 
Andrew (irc:RhodiumToad)