Thread: How to always run UPDATE FROM despite missing records in the source table?
How to always run UPDATE FROM despite missing records in the source table?
From
Alexander Farber
Date:
Good afternoon
I have prepared a simplified test case for my question: https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0
In PostgreSQL 10.6 there are 2 tables:
CREATE TABLE users (
uid SERIAL PRIMARY KEY,
created timestamptz NOT NULL,
visited timestamptz NOT NULL,
ip inet NOT NULL,
lat double precision,
lng double precision
);
CREATE TABLE geoip (
block inet PRIMARY KEY,
lat double precision,
lng double precision
);
CREATE INDEX ON geoip USING SPGIST (block);
which are filled with the following test data:
INSERT INTO users (created, visited, ip) VALUES
(now(), now(), '1.2.3.4'::inet),
(now(), now(), '1.2.3.5'::inet),
(now(), now(), '1.2.3.6'::inet);
INSERT INTO geoip (block, lat, lng) VALUES
('1.2.3.0/24', -33.4940, 143.2104),
('10.0.0.0/8', 34.6617, 133.9350);
Then in a stored function I run the following UPDATE command -
UPDATE users u SET
visited = now(),
ip = '10.10.10.10'::inet,
lat = i.lat,
lng = i.lng
FROM geoip i
WHERE u.uid = 1 AND '10.10.10.10'::inet <<= i.block;
(the 1 and the ip address are actually in_uid and in_ip parameters in my stored function).
The above query works well and updates all 4 fields in the users table.
However the following query does not work as intended and does not update any fields, because there is no matching block in the geoip table found:
UPDATE users u SET
visited = now(), -- HOW TO ALWAYS UPDATE THIS FIELD?
ip = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
lat = i.lat,
lng = i.lng
FROM geoip i
WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;
The field visited and ip however should be always updated - regardless if the block was found or not.
Kind of LEFT JOIN, but for UPDATE - how to achieve this please?
The only workaround that I could think of is -
UPDATE users SET
visited = now(),
ip = '20.20.20.20'::inet,
lat = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
lng = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
WHERE uid = 2;
But that would run the same subquery twice (correct?) and my geoip table is already slow with 3073410 records (and that is why I am trying to cache its lat and lng values in the users table on each user login event)
I have prepared a simplified test case for my question: https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0
In PostgreSQL 10.6 there are 2 tables:
CREATE TABLE users (
uid SERIAL PRIMARY KEY,
created timestamptz NOT NULL,
visited timestamptz NOT NULL,
ip inet NOT NULL,
lat double precision,
lng double precision
);
CREATE TABLE geoip (
block inet PRIMARY KEY,
lat double precision,
lng double precision
);
CREATE INDEX ON geoip USING SPGIST (block);
which are filled with the following test data:
INSERT INTO users (created, visited, ip) VALUES
(now(), now(), '1.2.3.4'::inet),
(now(), now(), '1.2.3.5'::inet),
(now(), now(), '1.2.3.6'::inet);
INSERT INTO geoip (block, lat, lng) VALUES
('1.2.3.0/24', -33.4940, 143.2104),
('10.0.0.0/8', 34.6617, 133.9350);
Then in a stored function I run the following UPDATE command -
UPDATE users u SET
visited = now(),
ip = '10.10.10.10'::inet,
lat = i.lat,
lng = i.lng
FROM geoip i
WHERE u.uid = 1 AND '10.10.10.10'::inet <<= i.block;
(the 1 and the ip address are actually in_uid and in_ip parameters in my stored function).
The above query works well and updates all 4 fields in the users table.
However the following query does not work as intended and does not update any fields, because there is no matching block in the geoip table found:
UPDATE users u SET
visited = now(), -- HOW TO ALWAYS UPDATE THIS FIELD?
ip = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD?
lat = i.lat,
lng = i.lng
FROM geoip i
WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block;
The field visited and ip however should be always updated - regardless if the block was found or not.
Kind of LEFT JOIN, but for UPDATE - how to achieve this please?
The only workaround that I could think of is -
UPDATE users SET
visited = now(),
ip = '20.20.20.20'::inet,
lat = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
lng = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
WHERE uid = 2;
But that would run the same subquery twice (correct?) and my geoip table is already slow with 3073410 records (and that is why I am trying to cache its lat and lng values in the users table on each user login event)
Regards
Alex
RE: How to always run UPDATE FROM despite missing records in thesource table?
From
Kevin Brannen
Date:
From: Alexander Farber <alexander.farber@gmail.com>
- The only workaround that I could think of is -
UPDATE users SET
visited = now(),
ip = '20.20.20.20'::inet,
lat = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block),
lng = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block)
WHERE uid = 2;
But that would run the same subquery twice (correct?) and my geoip table is already slow with 3073410 records (and that is why I am trying to cache its lat and lng values in the users table on each user login event)
Have you considered using a WITH clause to get the data so the query is only run once?
See section 7.8.2 at https://www.postgresql.org/docs/9.6/queries-with.html
Kevin
Re: How to always run UPDATE FROM despite missing records in thesource table?
From
Adrian Klaver
Date:
On 1/11/19 4:50 AM, Alexander Farber wrote: > Good afternoon > > I have prepared a simplified test case for my question: > https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0 > > In PostgreSQL 10.6 there are 2 tables: > > CREATE TABLE users ( > uid SERIAL PRIMARY KEY, > created timestamptz NOT NULL, > visited timestamptz NOT NULL, > ip inet NOT NULL, > lat double precision, > lng double precision > ); > > CREATE TABLE geoip ( > block inet PRIMARY KEY, > lat double precision, > lng double precision > ); > > CREATE INDEX ON geoip USING SPGIST (block); > > which are filled with the following test data: > > INSERT INTO users (created, visited, ip) VALUES > (now(), now(), '1.2.3.4'::inet), > (now(), now(), '1.2.3.5'::inet), > (now(), now(), '1.2.3.6'::inet); > > INSERT INTO geoip (block, lat, lng) VALUES > ('1.2.3.0/24 <http://1.2.3.0/24>', -33.4940, 143.2104), > ('10.0.0.0/8 <http://10.0.0.0/8>', 34.6617, 133.9350); > > Then in a stored function I run the following UPDATE command - > > UPDATE users u SET > visited = now(), > ip = '10.10.10.10'::inet, > lat = i.lat, > lng = i.lng > FROM geoip i > WHERE u.uid = 1 AND '10.10.10.10'::inet <<= i.block; > > (the 1 and the ip address are actually in_uid and in_ip parameters in my > stored function). > > The above query works well and updates all 4 fields in the users table. > > However the following query does not work as intended and does not > update any fields, because there is no matching block in the geoip table > found: > > UPDATE users u SET > visited = now(), -- HOW TO ALWAYS UPDATE THIS FIELD? > ip = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD? > lat = i.lat, > lng = i.lng > FROM geoip i > WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block; > > The field visited and ip however should be always updated - regardless > if the block was found or not. > > Kind of LEFT JOIN, but for UPDATE - how to achieve this please? > > The only workaround that I could think of is - > > UPDATE users SET > visited = now(), > ip = '20.20.20.20'::inet, > lat = (SELECT lat FROM geoip WHERE '20.20.20.20'::inet <<= block), > lng = (SELECT lng FROM geoip WHERE '20.20.20.20'::inet <<= block) > WHERE uid = 2; > > But that would run the same subquery twice (correct?) and my geoip table > is already slow with 3073410 records (and that is why I am trying to > cache its lat and lng values in the users table on each user login event) Why not put a test for the block in the function and then use different UPDATE's depending on the result? Pseudo code: IF ip IN block THEN UPDATE users u SET visited = now(), ip = '10.10.10.10'::inet, lat = i.lat, lng = i.lng FROM geoip i WHERE u.uid = 1; ELSE UPDATE users u SET visited = now(), ip = '20.20.20.20'::inet FROM geoip i WHERE u.uid = 2 ; END IF; > > Regards > Alex -- Adrian Klaver adrian.klaver@aklaver.com
Re: How to always run UPDATE FROM despite missing records in the source table?
From
Andrew Gierth
Date:
>>>>> "Alexander" == Alexander Farber <alexander.farber@gmail.com> writes: Alexander> However the following query does not work as intended and Alexander> does not update any fields, because there is no matching Alexander> block in the geoip table found: Alexander> UPDATE users u SET Alexander> visited = now(), -- HOW TO ALWAYS UPDATE THIS FIELD? Alexander> ip = '20.20.20.20'::inet, -- HOW TO ALWAYS UPDATE THIS FIELD? Alexander> lat = i.lat, Alexander> lng = i.lng Alexander> FROM geoip i Alexander> WHERE u.uid = 2 AND '20.20.20.20'::inet <<= i.block; Alexander> The field visited and ip however should be always updated - Alexander> regardless if the block was found or not. Alexander> Kind of LEFT JOIN, but for UPDATE - how to achieve this please? It can be done like this (this assumes you want to preserve the previous values of u.lat/u.lng if the block was not found; if you want to set them to null instead, then remove the coalesce() calls): UPDATE users u SET visited = now(), ip = v.ip, lat = coalesce(i.lat, u.lat), lng = coalesce(i.lng, u.lng) FROM (VALUES ('20.20.20.20'::inet)) v(ip) LEFT JOIN geoip i ON (v.ip <<= i.block) WHERE u.uid = 2; Alexander> But that would run the same subquery twice (correct?) and my Alexander> geoip table is already slow with 3073410 records Slow even with a gist or spgist index? what does the explain analyze look like? (You could also try using the ip4r module; I've not done any serious benchmarking to see if it's faster than the built-in index types, though it has some theoretical advantages due to not being restricted to CIDR ranges. In pg versions before the built-in inet type got a gist index method, ip4r was _the_ way to do ip block lookups for geoip etc.) -- Andrew (irc:RhodiumToad)
Re: How to always run UPDATE FROM despite missing records in thesource table?
From
Alexander Farber
Date:
Thank you Adrian -
On Fri, Jan 11, 2019 at 4:55 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/11/19 4:50 AM, Alexander Farber wrote:
> https://www.db-fiddle.com/f/22jfWnsvqD8hVeFPXsyLbV/0
Why not put a test for the block in the function and then use different
UPDATE's depending on the result?
I didn't want to use IF and switch that statement to PL/pgSQL from pure SQL, so Andrew's answer
UPDATE users u SET
visited = now(),
ip = v.ip,
lat = i.lat,
lng = i.lng
FROM (VALUES ('20.20.20.20'::inet)) v(ip)
LEFT JOIN geoip i ON (v.ip <<= i.block)
WHERE u.uid = 2;
suits me better, even though I wonder what is the (VALUES ('20.20.20.20'::inet)) v(ip) construct there, some temporary table which is then LEGT JOINed to the geoip table?
Also, Andrew you have been right - with spgist index my queries against geoip are fast enough, I was looking at the wrong spot in my EXPLAIN ANALYZE output (the average values are slow, I am going to cache them soon)
Regards
Alex
Re: How to always run UPDATE FROM despite missing records in the source table?
From
Andrew Gierth
Date:
>>>>> "Alexander" == Alexander Farber <alexander.farber@gmail.com> writes: Alexander> even though I wonder what is the (VALUES Alexander> ('20.20.20.20'::inet)) v(ip) construct there, some temporary Alexander> table which is then LEGT JOINed to the geoip table? The SQL spec calls it a <table value constructor>. The VALUES clause behaves like a SELECT that returns a fixed number of rows (1 or more) whose columns contain the results of the specified expressions. The v(ip) part is just a table and column alias (I omitted the optional AS keyword out of long habit) to name the constructed table. -- Andrew (irc:RhodiumToad)