Thread: OPtimize the performance of a query

OPtimize the performance of a query

From

hmidi slim

Date:

16 January 2018, 19:32:47

Hi,

I have two tables in the same database: geoname and test_table.

The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and so on.

The second table 'test_table' contains only the columns: city, state.

There is no join between the two tables and I want to make a match between the data contained in each of them because I need the result for a farther process.

I wrote this query:

select g.name, t.city

from geoname as g, test_table as t

where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')

and lower(g.country_code) like 'US'

and lower(g.admin1) like lower(t.state)

and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))

The table geoname contains 370260 rows and the table test_table contains 10270 rows.

The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table which contains the country_code and make an inner join with the geoname table or should I use indexs to accelerate the process?

Re: OPtimize the performance of a query

From

hmidi slim

Date:

16 January 2018, 19:44:29

Sorry I forget the lower command when I wrote the code, it is like this: lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like lower('L'))

2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:

>Hi,
>I have two tables in the same database: geoname and test_table.
>The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and so on.
>The second table 'test_table' contains only the columns: city, state.
>There is no join between the two tables and I want to make a match between the data contained in each of them because I need the result for a farther process.
>I wrote this query:
>select g.name, t.city
>from geoname as g, test_table as t
>where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
>and lower(g.country_code) like 'US'
>and lower(g.admin1) like lower(t.state)
>and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))

>The table geoname contains 370260 rows and the table test_table contains 10270 rows.
>The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table which contains the country_code and make an inner join with the geoname table or >should I use indexs to accelerate the process?

Indexes are your friends ☺

I’d certainly add indexes on lower(g.feature_class, g.country_code) and lower(t.state)

Note “and lower(g.country_code) like 'US'” will not return any results as you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')

Why are you using LIKE? Equals (=) is surely correct and probably faster?

Martin.

Re: OPtimize the performance of a query

From

James Keener

Date:

16 January 2018, 19:46:50

Do you have any indecies? https://www.postgresql.org/docs/current/static/indexes-expressional.html might be helpful to you.

Also, EXPLAIN will help you understand how your query is being run and where it can be improved.

https://www.postgresql.org/docs/current/static/using-explain.html

http://postgresguide.com/performance/explain.html

http://jimkeener.com/posts/explain-pg

Jim

On Tue, Jan 16, 2018 at 11:32 AM, hmidi slim <hmidi.slim2@gmail.com> wrote:

Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match between the data contained in each of them because I need the result for a farther process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))

The table geoname contains 370260 rows and the table test_table contains 10270 rows.
The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table which contains the country_code and make an inner join with the geoname table or should I use indexs to accelerate the process?

Re: OPtimize the performance of a query

From

Melvin Davidson

Date:

16 January 2018, 19:48:40

On Tue, Jan 16, 2018 at 11:46 AM, James Keener <jim@jimkeener.com> wrote:

Do you have any indecies? https://www.postgresql.org/docs/current/static/indexes-expressional.html might be helpful to you.

Also, EXPLAIN will help you understand how your query is being run and where it can be improved.

https://www.postgresql.org/docs/current/static/using-explain.html
http://postgresguide.com/performance/explain.html
http://jimkeener.com/posts/explain-pg

Jim

On Tue, Jan 16, 2018 at 11:32 AM, hmidi slim <hmidi.slim2@gmail.com> wrote:
Hi,
I have two tables in the same database: geoname and test_table.
The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and so on.
The second table 'test_table' contains only the columns: city, state.
There is no join between the two tables and I want to make a match between the data contained in each of them because I need the result for a farther process.
I wrote this query:
select g.name, t.city
from geoname as g, test_table as t
where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
and lower(g.country_code) like 'US'
and lower(g.admin1) like lower(t.state)
and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))

The table geoname contains 370260 rows and the table test_table contains 10270 rows.
The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table which contains the country_code and make an inner join with the geoname table or should I use indexs to accelerate the process?

It would have been more helpful if you had included
the actual table structures for both tables.
However, I would start by creating separate indexes on
lower(feature_class)
lower(country_code)
lower(admin1)
lower(name)
lower(city)

That being said, you are better off forcing lowercase on all fields
BEFORE inserting into the table.
EG:

INSERT INTO test_table VALUES (lower(some_key), lower(name), lower(feature_class), ....)

Then you would would not need to use lower() in the indexes or the query.

Please, in the future, always include your version of PostgreSQL and O/S

--

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Re: OPtimize the performance of a query

From

Pavel Stehule

Date:

16 January 2018, 19:49:08

2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:

Sorry I forget the lower command when I wrote the code, it is like this: lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like lower('L'))

please, don't do top post.

Your query must be slow. When you use LIKE instead =. It is terrible performance issue.

So don't use "LIKE" is first rule. Second - you can create functional indexes

CREATE INDEX ON geoname ((lower(name)))

Regards

Pavel

2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:

>Hi,
>I have two tables in the same database: geoname and test_table.
>The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and so on.
>The second table 'test_table' contains only the columns: city, state.
>There is no join between the two tables and I want to make a match between the data contained in each of them because I need the result for a farther process.
>I wrote this query:
>select g.name, t.city
>from geoname as g, test_table as t
>where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
>and lower(g.country_code) like 'US'
>and lower(g.admin1) like lower(t.state)
>and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))

>The table geoname contains 370260 rows and the table test_table contains 10270 rows.
>The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table which contains the country_code and make an inner join with the geoname table or >should I use indexs to accelerate the process?

Indexes are your friends ☺

I’d certainly add indexes on lower(g.feature_class, g.country_code) and lower(t.state)

Note “and lower(g.country_code) like 'US'” will not return any results as you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')

Why are you using LIKE? Equals (=) is surely correct and probably faster?

Martin.

Re: OPtimize the performance of a query

From

Alban Hertroys

Date:

16 January 2018, 19:49:30

> On 16 Jan 2018, at 17:32, hmidi slim <hmidi.slim2@gmail.com> wrote:
>
> Hi,
> I have two tables in the same database: geoname and test_table.
> The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and
soon. 
> The second table 'test_table' contains only the columns: city, state.
> There is no join between the two tables and I want to make a match between the data contained in each of them because
Ineed the result for a farther process. 
> I wrote this query:
> select g.name, t.city
> from geoname as g, test_table as t
> where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
> and lower(g.country_code) like 'US'
> and lower(g.admin1) like lower(t.state)
> and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))

That query is equivalent to:

select g.name, t.city
from geoname as g, test_table as t
where false or lower(g.name) = lower(t.city || 'city'));

So those are probably not the results you want.

At the very least, if you're lower-casing column contents, don't compare those to an upper-cased constant ;)
Also, AND has precedence over OR, which is the other reason why my equivalent query is so much shorter.
And finally, LIKE is equivalent to = (equals) without any wildcards.

> The table geoname contains 370260 rows and the table test_table contains 10270 rows.
> The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table
whichcontains the country_code and make an inner join with the geoname table or should I use indexs to accelerate the
process?

Some indices on lower(g.name) and lower(t.city) from your query would be useful, but in that case make sure you take
theconcatenation of 'city' out of the lower()-call in your query. 

Just reading your latest addition - using lower() on constants is just a waste of cycles. It won't hurt your query much
though.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: OPtimize the performance of a query

From

hmidi slim

Date:

16 January 2018, 20:57:36

I changed the operator like and I'm using the operator = .I got the results much faster but I still have another question about operator. For difference should I use '<>' or 'is distinct from' with indexes?

2018-01-16 17:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

Hi

2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:
Sorry I forget the lower command when I wrote the code, it is like this: lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like lower('L'))

please, don't do top post.

Your query must be slow. When you use LIKE instead =. It is terrible performance issue.

So don't use "LIKE" is first rule. Second - you can create functional indexes

CREATE INDEX ON geoname ((lower(name)))

Regards

Pavel

2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:

>Hi,
>I have two tables in the same database: geoname and test_table.
>The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and so on.
>The second table 'test_table' contains only the columns: city, state.
>There is no join between the two tables and I want to make a match between the data contained in each of them because I need the result for a farther process.
>I wrote this query:
>select g.name, t.city
>from geoname as g, test_table as t
>where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
>and lower(g.country_code) like 'US'
>and lower(g.admin1) like lower(t.state)
>and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))

>The table geoname contains 370260 rows and the table test_table contains 10270 rows.
>The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table which contains the country_code and make an inner join with the geoname table or >should I use indexs to accelerate the process?

Indexes are your friends ☺

I’d certainly add indexes on lower(g.feature_class, g.country_code) and lower(t.state)

Note “and lower(g.country_code) like 'US'” will not return any results as you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')

Why are you using LIKE? Equals (=) is surely correct and probably faster?

Martin.

Re: OPtimize the performance of a query

From

Pavel Stehule

Date:

16 January 2018, 21:08:22

2018-01-16 18:57 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:

I changed the operator like and I'm using the operator = .I got the results much faster but I still have another question about operator. For difference should I use '<>' or 'is distinct from' with indexes?

https://en.wikipedia.org/wiki/Posting_style#Top-posting .. please, don't do it.

IS DISTINCT FROM has sense if your data - or your queries has NULL. If not, and it is probably your case, then <> should be preferred.

Regards

Pavel

2018-01-16 17:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:
Hi

2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:
Sorry I forget the lower command when I wrote the code, it is like this: lower(g.country_code) like lower('US')
(lower(g.feature_class) like lowwer('P') or lower(g.feature_class) like lower('L'))

please, don't do top post.

Your query must be slow. When you use LIKE instead =. It is terrible performance issue.

So don't use "LIKE" is first rule. Second - you can create functional indexes

CREATE INDEX ON geoname ((lower(name)))

Regards

Pavel

2018-01-16 17:40 GMT+01:00 Martin Moore <martin.moore@avbrief.com>:

>Hi,
>I have two tables in the same database: geoname and test_table.
>The geoname table contains many columns which are: name, feature_class, feature_code, admin1, admin2,admin3, name and so on.
>The second table 'test_table' contains only the columns: city, state.
>There is no join between the two tables and I want to make a match between the data contained in each of them because I need the result for a farther process.
>I wrote this query:
>select g.name, t.city
>from geoname as g, test_table as t
>where (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')
>and lower(g.country_code) like 'US'
>and lower(g.admin1) like lower(t.state)
>and (lower(g.name) like lower(t.city) or lower(g.name) like lower(t.city || 'city'))

>The table geoname contains 370260 rows and the table test_table contains 10270 rows.
>The query took a long time to accomplish more than half an hour.Should I add another column in the table test_table which contains the country_code and make an inner join with the geoname table or >should I use indexs to accelerate the process?

Indexes are your friends ☺

I’d certainly add indexes on lower(g.feature_class, g.country_code) and lower(t.state)

Note “and lower(g.country_code) like 'US'” will not return any results as you’ve got the ‘US’ in UPPER…. Same as (lower(g.feature_class) like 'P' or lower(g.feature_class) like 'L')

Why are you using LIKE? Equals (=) is surely correct and probably faster?

Martin.

Re: OPtimize the performance of a query

From

hmidi slim

Date:

16 January 2018, 21:35:03

Thank you for your advices and thanks for all people who give me some best practises and useful ideas.

Re: OPtimize the performance of a query

From

Pavel Stehule

Date:

16 January 2018, 21:36:36

2018-01-16 19:35 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com>:

Thank you for your advices and thanks for all people who give me some best practises and useful ideas.

you are welcome

Regards

Pavel

Re: OPtimize the performance of a query

From

Gavin Flower

Date:

16 January 2018, 23:23:10

Hi Hmidi,

On 17/01/18 06:57, hmidi slim wrote:
> I changed the operator like and I'm using the operator = .I got the 
> results much faster but I still have another question about operator. 
> For difference should I use '<>' or 'is distinct from' with indexes?
>
> 2018-01-16 17:49 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com 
> <mailto:pavel.stehule@gmail.com>>:
>
>     Hi
>
>     2018-01-16 17:44 GMT+01:00 hmidi slim <hmidi.slim2@gmail.com
>     <mailto:hmidi.slim2@gmail.com>>:
>
>         Sorry I forget the lower command when I wrote the code, it is
>         like this: lower(g.country_code) like lower('US')
>         (lower(g.feature_class) like lowwer('P') or
>         lower(g.feature_class) like lower('L'))
>
>
>     please, don't do top post.
>
[...]

Top posting is when you reply at the top of the email, rather than at 
the bottom like this.

Bottom posting allows people to see the context before your reply.  You 
can trim excess, or no longer relevant, content - but note the bits that 
you have omitted with '[...]'

Bottom posting is preferred.


Cheers,
Gavin