Thread: Index on integer or on string field

Index on integer or on string field

From
Job
Date:
Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve
performanceand reduce workload. 

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with
numericalinteger values, the 50 recurring names. 

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Thank you!
Francesco

Re: Index on integer or on string field

From
Arthur Silva
Date:

You should probably experiment with a btree-gin index on those.

Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it> escreveu:
Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

R: Index on integer or on string field

From
Job
Date:
Hello Arthur!

So, i read that btree-gin have got "the ability to enforce uniqueness".
 
If in this 10.millions long table i have, in index, 50 recurring values, i can leave the alphabetical field and change to btree-gin the index on it?!

Thank you!
Francesco
 

Da: Arthur Silva [arthurprs@gmail.com]
Inviato: venerdì 15 maggio 2015 17.26
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Index on integer or on string field

You should probably experiment with a btree-gin index on those.

Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it> escreveu:
Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: R: Index on integer or on string field

From
Rob Sargent
Date:
Are you saying your indexed field has only 50 distinct values? Seems a horrible candidate for an index. Might be good to partition on those fifty values but ten million records probably doesn't warrant that. 

Sent from my iPhone

On May 15, 2015, at 9:34 AM, Job <Job@colliniconsulting.it> wrote:

Hello Arthur!

So, i read that btree-gin have got "the ability to enforce uniqueness".
 
If in this 10.millions long table i have, in index, 50 recurring values, i can leave the alphabetical field and change to btree-gin the index on it?!

Thank you!
Francesco
 

Da: Arthur Silva [arthurprs@gmail.com]
Inviato: venerdì 15 maggio 2015 17.26
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Index on integer or on string field

You should probably experiment with a btree-gin index on those.

Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it> escreveu:
Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: R: Index on integer or on string field

From
Arthur Silva
Date:

Yes that's my suggestion. Btree-Gin deals with lots of repeated values much better than the Btree index as repeated keys are only stored once.

Em 15/05/2015 12:38, "Job" <Job@colliniconsulting.it> escreveu:
Hello Arthur!

So, i read that btree-gin have got "the ability to enforce uniqueness".
 
If in this 10.millions long table i have, in index, 50 recurring values, i can leave the alphabetical field and change to btree-gin the index on it?!

Thank you!
Francesco
 

Da: Arthur Silva [arthurprs@gmail.com]
Inviato: venerdì 15 maggio 2015 17.26
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Index on integer or on string field

You should probably experiment with a btree-gin index on those.

Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it> escreveu:
Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: R: Index on integer or on string field

From
William Dunn
Date:
Hello Francesco,

You should probably set timing on, run an explain analyze, and use pgbadger to diagnose your performance issue.

While it may be the case that comparison in the index might be slightly faster because of the modulo arithmetic, those in-memory operations are extremely fast and it is likely that the seek in that index is the fastest part of your query. And since you only have 50 distinct values the btree is probably extremely shallow and there will be very few comparisons anyway.

I don't know much about your query but I suspect that the issue is that your index scan is not selective enough so Postgres needs to scan a lot from disk (which is extremely slow). If you want to improve the performance you should first try to make the query as selective as possible, and try to put an index on a more selective column of the WHERE clause.

If you really must rely primarily on that column which has only 50 distinct values you can try periodically running a CLUSTER command on the table for that column index (doc: http://www.postgresql.org/docs/devel/static/sql-cluster.html), or partition the table (doc: http://www.postgresql.org/docs/devel/static/ddl-partitioning.html) so that the data you are scanning is close together on disk and you can get as much of it per IO operation as possible.

Will J. Dunn

On Fri, May 15, 2015 at 1:32 PM, Arthur Silva <arthurprs@gmail.com> wrote:

Yes that's my suggestion. Btree-Gin deals with lots of repeated values much better than the Btree index as repeated keys are only stored once.

Em 15/05/2015 12:38, "Job" <Job@colliniconsulting.it> escreveu:
Hello Arthur!

So, i read that btree-gin have got "the ability to enforce uniqueness".
 
If in this 10.millions long table i have, in index, 50 recurring values, i can leave the alphabetical field and change to btree-gin the index on it?!

Thank you!
Francesco
 

Da: Arthur Silva [arthurprs@gmail.com]
Inviato: venerdì 15 maggio 2015 17.26
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Index on integer or on string field

You should probably experiment with a btree-gin index on those.

Em 15/05/2015 12:22, "Job" <Job@colliniconsulting.it> escreveu:
Hello,

i have a table of about 10 millions of records, with the index on a string field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify, with numerical integer values, the 50 recurring names.

Is index are integer and not characteral, performance are better and workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Index on integer or on string field

From
Scott Marlowe
Date:
On Fri, May 15, 2015 at 9:18 AM, Job <Job@colliniconsulting.it> wrote:
> Hello,
>
> i have a table of about 10 millions of records, with the index on a string field.
> Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve
performanceand reduce workload. 
>
> The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify,
withnumerical integer values, the 50 recurring names. 
>
> Is index are integer and not characteral, performance are better and workload reduces?
>
> Is there any comparisons?

Have you considered using a multi-column index here? if there's a more
selective field you could index along with your rather non-selective
one that might work better. But it's hard to tell without looking at
you database usage etc.


Re: Index on integer or on string field

From
"FarjadFarid\(ChkNet\)"
Date:
Hi,

My approach would be to improve the uniqueness of each record/row. Otherwise
you'll have to traverse the entire table for every query. At 100/200 queries
per second you are asking for trouble on several fronts. Including crashing
your hard disk faster than need be.

Hope this helps. Good luck.

Best Regards


Farjad Farid


-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Job
Sent: 15 May 2015 16:19
To: pgsql-general@postgresql.org
Subject: [GENERAL] Index on integer or on string field

Hello,

i have a table of about 10 millions of records, with the index on a string
field.
Actually is alphabetical; since queries are about 100/200 per seconds, i was
looking for a better way to improve performance and reduce workload.

The unique values, of that fields, are about the 50 (category name), and we
could create a second table to codify, with numerical integer values, the 50
recurring names.

Is index are integer and not characteral, performance are better and
workload reduces?

Is there any comparisons?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: Index on integer or on string field

From
Yves Dorfsman
Date:
On 2015-05-15 18:31, Scott Marlowe wrote:
> On Fri, May 15, 2015 at 9:18 AM, Job <Job@colliniconsulting.it> wrote:
>> i have a table of about 10 millions of records, with the index on a string field.
>> Actually is alphabetical; since queries are about 100/200 per seconds, i was looking for a better way to improve
performanceand reduce workload. 
>>
>> The unique values, of that fields, are about the 50 (category name), and we could create a second table to codify,
withnumerical integer values, the 50 recurring names. 
>>
>> Is index are integer and not characteral, performance are better and workload reduces?
>>
>> Is there any comparisons?
>
> Have you considered using a multi-column index here? if there's a more
> selective field you could index along with your rather non-selective
> one that might work better. But it's hard to tell without looking at
> you database usage etc.

To add to Scott suggestion, think about what columns you include in your
queries when you are looking for a unique row (what columns are on the right
hand side of the WHEN clause). This will help you narrow down what makes
unique conditions on your table, from which you can create a better index.

--
http://yves.zioup.com
gpg: 4096R/32B0F416