Thread: How to speed up product code and subcode match

How to speed up product code and subcode match

From
Andrus
Date:
Hi!

Price list of main products vordlusajuhinnak contains 3 prices for 
product (column toode) and has 39433 products:

     create table vordlusajuhinnak( toode varchar(60), n2 numeric(8,2), 
n3 numeric(8,2), n4 numeric(8,2) );

toode column in unique, may be primary key in table and contains upper 
case letters, digits and - characters.

product table (toode) contains 733021 products:

     CREATE TABLE toode (
         grupp character(1),
         toode character(60) primary key,
         ... lot of other columns
       );

Both tables have pattern indexes to speed up queries:

     CREATE INDEX toode_toode_pattern_idx
         ON toode (toode bpchar_pattern_ops ASC NULLS LAST);
     -- This index is probably not used, should removed:
     CREATE INDEX vordlusajuhinnak_toode_pattern_idx ON 
vordlusajuhinnak(toode bpchar_pattern_ops);

Product table as both main products and subproducts with sizes. Size is 
separated by main product code always by / character:


     SHOE1-BLACK
     SHOE1-BLACK/38
     SHOE1-BLACK/41
     SHOE1-BLACK/42
     SHOE1-BLACK/43
     SHOE2/XXL
     SHOE2/L

Product codes contain upper characers only in this table also.

Trying to get prices for all products using

     create table peatoode as
     select toode.toode , n2, n3, n4
     from toode, vordlusajuhinnak
     where  toode.toode between vordlusajuhinnak.toode and 
vordlusajuhinnak.toode||'/z'

Takes 4.65 hours. How to speed this query up?
Output from explain:

     > "Nested Loop  (cost=0.55..272273178.12 rows=3202240012 width=78)" "
     > ->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433 
width=32)" "  ->  Index Only Scan using toode_pkey on toode
     > (cost=0.55..6092.62 rows=81207 width=60)" "        Index Cond: (toode
     > >= (vordlusajuhinnak.toode)::bpchar)" "        Filter: 
((toode)::text <= ((vordlusajuhinnak.toode)::text || '/z'::text))"

Using  PostgreSQL 13.2, compiled by Visual C++ build 1900, 64-bit in 
Windows server and psqlODBC driver.
Upgrading Postgres is possible, if this helps.

Tried also using like:

     WHERE toode.toode=vordlusajuhinnak.toode OR
       toode.toode LIKE vordlusajuhinnak.toode||'/%'

Posted also in

https://stackoverflow.com/questions/76311957/how-to-match-character-columns-with-separated-subcolumns

Andrus.




Re: How to speed up product code and subcode match

From
Andrus
Date:

Hi!

I ran

analyze toode;
create index vordlusajuhinnak_toode_pattern_idx on vordlusajuhinnak(toode bpchar_pattern_ops);

create index vordlusajuhinnak_toode_idx on vordlusajuhinnak(toode);
analyze vordlusajuhinnak;

Select runs now more than one hour. Output from explain

explain create table peatoode as
select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
WHERE toode.toode=vordlusajuhinnak.toode OR  toode.toode LIKE vordlusajuhinnak.toode||'/%'

"Gather  (cost=1000.55..443361894.55 rows=143828910 width=78)"
"  Workers Planned: 2"
"  ->  Nested Loop  (cost=0.55..428978003.55 rows=59928712 width=78)"
"        Join Filter: ((toode.toode = (vordlusajuhinnak.toode)::bpchar) OR (toode.toode ~~ ((vordlusajuhinnak.toode)::text || '/%'::text)))"
"        ->  Parallel Index Only Scan using toode_pkey on toode  (cost=0.55..95017.93 rows=303869 width=60)"
"        ->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433 width=32)"

with

Set enable_nestloop to off;

explain output is:

"Gather  (cost=10000001000.55..10443361906.55 rows=143828910 width=78)"
"  Workers Planned: 2"
"  ->  Nested Loop  (cost=10000000000.55..10428978015.55 rows=59928712 width=78)"
"        Join Filter: ((toode.toode = (vordlusajuhinnak.toode)::bpchar) OR (toode.toode ~~ ((vordlusajuhinnak.toode)::text || '/%'::text)))"
"        ->  Parallel Index Only Scan using toode_pkey on toode  (cost=0.55..95029.93 rows=303869 width=60)"
"        ->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433 width=32)"

How to speed it up?

Andrus.

23.05.2023 14:32 Bzm@g kirjutas:
Great,

However I think it is still way to slow. 
Next step is to run analyze also for the other table  vordlusajuhinnak. 

And make sure you have an index on vordlusajuhinnak.toode similar to the index on toode.toode

--
Boris


Am 23.05.2023 um 12:56 schrieb Andrus <kobruleht2@hot.ee>:



Hi!

I ran analyze firma2.toode and changed where clause to use like:

create table peatoode as
select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
WHERE toode.toode=vordlusajuhinnak.toode OR  toode.toode LIKE vordlusajuhinnak.toode||'/%'

In this case it took 37 minutes, returned 277966 rows.

Thank you for help.

Andrus.

23.05.2023 11:24 Bzm@g kirjutas:
Also your row count is way off I guess. Did you ever run analyze bigtable?

--
Boris


Am 23.05.2023 um 10:22 schrieb bzm@2bz.de:

Hi there,

I guess the main problem is the nested loop. 

As a quick recheck what happened if you run your query Without nested loop?

This is not a solution but a quickt test

In a Session 

Set enable_nestedloop = off;
Explain Select your query ;
--
Boris


Am 23.05.2023 um 08:53 schrieb Andrus <kobruleht2@hot.ee>:

Hi!

Price list of main products vordlusajuhinnak contains 3 prices for product (column toode) and has 39433 products:
   create table vordlusajuhinnak( toode varchar(60), n2 numeric(8,2), n3 numeric(8,2), n4 numeric(8,2) );

toode column in unique, may be primary key in table and contains upper case letters, digits and - characters.

product table (toode) contains 733021 products:
   CREATE TABLE toode (       grupp character(1),       toode character(60) primary key,       ... lot of other columns     );

Both tables have pattern indexes to speed up queries:
   CREATE INDEX toode_toode_pattern_idx       ON toode (toode bpchar_pattern_ops ASC NULLS LAST);   -- This index is probably not used, should removed:   CREATE INDEX vordlusajuhinnak_toode_pattern_idx ON vordlusajuhinnak(toode bpchar_pattern_ops);

Product table as both main products and subproducts with sizes. Size is separated by main product code always by / character:

   SHOE1-BLACK   SHOE1-BLACK/38   SHOE1-BLACK/41   SHOE1-BLACK/42   SHOE1-BLACK/43   SHOE2/XXL   SHOE2/L

Product codes contain upper characers only in this table also.

Trying to get prices for all products using
   create table peatoode as   select toode.toode , n2, n3, n4   from toode, vordlusajuhinnak   where  toode.toode between vordlusajuhinnak.toode and vordlusajuhinnak.toode||'/z'

Takes 4.65 hours. How to speed this query up?
Output from explain:

"Nested Loop  (cost=0.55..272273178.12 rows=3202240012 width=78)" "
->  Seq Scan on vordlusajuhinnak  (cost=0.00..721.33 rows=39433 width=32)" "  ->  Index Only Scan using toode_pkey on toode
(cost=0.55..6092.62 rows=81207 width=60)" "        Index Cond: (toode
= (vordlusajuhinnak.toode)::bpchar)" "        Filter: ((toode)::text <= ((vordlusajuhinnak.toode)::text || '/z'::text))"
Using  PostgreSQL 13.2, compiled by Visual C++ build 1900, 64-bit in Windows server and psqlODBC driver.
Upgrading Postgres is possible, if this helps.

Tried also using like:
   WHERE toode.toode=vordlusajuhinnak.toode OR     toode.toode LIKE vordlusajuhinnak.toode||'/%'

Posted also in

https://stackoverflow.com/questions/76311957/how-to-match-character-columns-with-separated-subcolumns

Andrus.



Re: How to speed up product code and subcode match

From
Tom Lane
Date:
Andrus <kobruleht2@hot.ee> writes:
> Product table as both main products and subproducts with sizes. Size is 
> separated by main product code always by / character:

>      SHOE1-BLACK
>      SHOE1-BLACK/38
>      SHOE1-BLACK/41
>      SHOE1-BLACK/42
>      SHOE1-BLACK/43
>      SHOE2/XXL
>      SHOE2/L

You could probably have devised a worse data representation if
you really tried, but it would have taken some effort.  Separate
the product code and size into two columns --- if there's somebody
who really wants to see them in the above format, give them a
view or generated column.  Then instead of the impossible-to-optimize
queries you showed, you could do something like

select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode = vordlusajuhinnak.toode;

Anytime you're trying to join two tables on something that isn't
a plain equality condition (or ANDed conditions), you're in for
a world of hurt.

            regards, tom lane



Re: How to speed up product code and subcode match

From
Andrus
Date:

Hi!

Separate
the product code and size into two columns --- if there's somebody
who really wants to see them in the above format, give them a
view or generated column.  Then instead of the impossible-to-optimize
queries you showed, you could do something like

select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode = vordlusajuhinnak.toode;

Can function index

create index on toode ( split_part( toode, '/',1) )

and query

select toode.toode , n2, n3, n4

from toode, vordlusajuhinnak

where split_part( toode.toode, '/',1) = vordlusajuhinnak.toode;
used and keeping existing table structure? Functional index should produce same speed improvement as using separate column?

Andrus.

Re: How to speed up product code and subcode match

From
Andrus
Date:

Hi!

Using index

create index on toode ( split_part( toode, '/',1) )

and query

select toode.toode , n2, n3, n4

from toode, vordlusajuhinnak

where split_part( toode.toode, '/',1) = vordlusajuhinnak.toode;

reduces run time to 5 minutes.

Andrus.


23.05.2023 17:26 Andrus kirjutas:

Hi!

Separate
the product code and size into two columns --- if there's somebody
who really wants to see them in the above format, give them a
view or generated column.  Then instead of the impossible-to-optimize
queries you showed, you could do something like

select toode.toode , n2, n3, n4
from toode, vordlusajuhinnak
where toode.toode = vordlusajuhinnak.toode;

Can function index

create index on toode ( split_part( toode, '/',1) )

and query

select toode.toode , n2, n3, n4

from toode, vordlusajuhinnak

where split_part( toode.toode, '/',1) = vordlusajuhinnak.toode;
used and keeping existing table structure? Functional index should produce same speed improvement as using separate column?

Andrus.