Thread: Please suggest me on my table design (indexes!)
Hi guys, im tryin to optimize a simple table, suited for contain users. So, my table at the moment is: ----- CREATE TABLE contacts( id BIGSERIAL PRIMARY KEY NOT NULL UNIQUE, company_id BIGINT, code varchar(10), company_name varchar(120), name varchar(120), surname varchar(120), phone varchar(80), email varchar(80), kind varchar(8) ); ----- I use this layout in order to store 3 kind of users: Private, Company and Company's Employee.. the col 'kind' infact will contain just 'private', 'company' or 'employee', but is unnecessary, i can understand what kind a user is by those rules: Private are stand-alone users, company_name and company_id are always NULL; Company have the company_name; Employees have the company name and the company_id (with the id of the company's row); Example: id|company_id|code| company_name|name|surname|phone| email | kind 1 |NULL |C001| Sunday Inc. | John | Doe |88888 | j@sunday.com | company 2 | 1 |E001| Sunday Inc. |Paul | Smith | 77777| smith@sunday.com| employee 3 | NULL |P001| NULL | Rose | Mary | 66666| rose@mary.com | private So, first of all, does this layout looks good? Before i used to keep employees in a different table, becose usually employees have just few data (name, surname, direct email and direct phone.. all the addresses, bank data, etc.. belongs tot he company), but noe i preferred this way to avoid constant inner joins. Now, i aspect that my users will search the contact table just for the fields company_name, name, surname, email, code. That kind of query cant be as WHERE company_name = '$x' but will be much like WHERE company_name LIKE '$x%', both becose i use an autocomplete field for the quick search, both becose.. well, that's how users search data's (in my experience). So i created those index, to let the query planner use the indexes in the LIKE query: ----- CREATE INDEX contact_companyname_i ON contact USING btree (lower (company_name::text) varchar_pattern_ops) #this kind of index is the same for name, surname #the email must be unique: CREATE UNIQUE INDEX contact_email_i ON contact USING btree (lower (email::text) varchar_pattern_ops); ----- Therefore the phone column will be searched with the LIKE condition, but dont need the case-sensitive-loser-trick: ----- CREATE INDEX contact_n_phone_i ON contact USING btree (phone varchar_pattern_ops) ----- However for the email, code, id and company_id columns i want to permit an index search even with the exact pattern, so i added those indexes too: ----- CREATE INDEX contact_n_email_i ON contact USING btree (email); CREATE INDEX contact_n_code_i ON contact USING btree (code); CREATE UNIQUE INDEX contact_pkey ON contact USING btree (id); #PRIMARY KEY CREATE INDEX contact_n_idcompany_i ON contact USING btree (id_company) ----- Again, feel free to suggest/insult me if this behavior looks bad (im here to learn ;) Well, the table will have other fields that dont need to be searched directly (like the addresses) and so dont need indexes. My doubt is: am i using too many indexes? Will my insert/delete/update queryes be too slow, and does the select optimization worth the price? (and, does this way really optimize the selects queryes?) Consider that my application wont do many insert-delete-update sequentially.
Almost forgot: one fo the heavier select query can be: ----- SELECT contact.id, contact.company_id, contact.name AS nome, contact.surname AS cognome, contact.email AS email, contact.company_name AS azienda FROM contact WHERE ( lower(contact.company_name) LIKE 'smit%' OR lower(contact.name) LIKE 'smit%' OR lower(contact.surname) LIKE 'smit%' OR contact.phone LIKE 'smit%' OR lower(contact.code) LIKE 'smit%' OR lower(contact.email) LIKE 'smit%' ) ORDER BY contact.company_name ASC, contact.company_id DESC ----- And this is the queyr plain: http://explain.depesz.com/s/Il
it looks ok on explain, that is - the cost isn't too high. So what's the problem ?
The `problem` is that i dont know if having so many indexes will raise problems as the data dimension grown. And i am not even sure that this design is truly reliable; For example, if i would to know how many employees have every company, i'll have to run that query: ----- EXPLAIN ANALYZE SELECT c.id, c.company_name, count(e.id) AS num_employee FROM contact AS c LEFT JOIN contact AS e ON (c.id = e.id_company) WHERE c.tipo = 'company' GROUP BY c.id, c.company_name ----- That seem to be not very efficient: http://explain.depesz.com/s/Q0m (I loaded some test data from http://www.generatedata.com/#generator, so now i have 23893 rows in the table.. oh, i added a index on the 'kind' (tipo) column too) On 23 Giu, 13:47, gryz...@gmail.com (Grzegorz Jaśkiewicz) wrote: > it looks ok on explain, that is - the cost isn't too high. > So what's the problem ? > > -- > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general
2009/6/23 DaNieL..! <daniele.pignedoli@gmail.com>: > The `problem` is that i dont know if having so many indexes will raise > problems as the data dimension grown. > That seem to be not very efficient: http://explain.depesz.com/s/Q0m Well, this is slow, because for some reason postgres decided to use seq scan on contact e. As for speed with many indices. Btree for varchar is going to be slower, than - say for integer, or any other fixed length type. This is due to nature of index. Having said that, if you expect a lot of repetition, split it/normalize it. Index performance also hurts, when you get a lot of variants of data (worse case, all varchar rows are different, and don't share too many leafs on index). So it is always beneficial to have separate table, if data is redundant - especially when it is text/varchar/bytea. Also, index size grows pretty badly when you modify table's content a lot in postgresql. Rule of thumb, bigger the index in size, comparable to data size - the worse. -- GJ
Yes, after my post i've tryed the versione with 2 separate table (a copy of the contact table) with inside just the employees, and, with my surprise, the query planner looks identical, both with 1 big table and with 2 splitted table. This sound a bit strange for me, becose in my test the 'employees rows' are about 13K, the company about 3k, the private around 7K. So i thougt that moving the employees to another table would decrese much the indexes weight on the original table, and so raise the speed for the remaining original contact table (with inside just private and company users) On 23 Giu, 16:29, gryz...@gmail.com (Grzegorz Jaśkiewicz) wrote: > 2009/6/23 DaNieL..! <daniele.pigned...@gmail.com>: > > > The `problem` is that i dont know if having so many indexes will raise > > problems as the data dimension grown. > > That seem to be not very efficient:http://explain.depesz.com/s/Q0m > > Well, this is slow, because for some reason postgres decided to use > seq scan on contact e. > As for speed with many indices. Btree for varchar is going to be > slower, than - say for integer, or any other fixed length type. > This is due to nature of index. Having said that, if you expect a lot > of repetition, split it/normalize it. > Index performance also hurts, when you get a lot of variants of data > (worse case, all varchar rows are different, and don't share too many > leafs on index). > So it is always beneficial to have separate table, if data is > redundant - especially when it is text/varchar/bytea. > > Also, index size grows pretty badly when you modify table's content a > lot in postgresql. > Rule of thumb, bigger the index in size, comparable to data size - the worse. > > -- > GJ > > -- > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general
IMHO running queries on 23k'ish worth of rows isn't liable to stress any reasonably modern server, likely several times over that shouldn't either for simple "LIKE" searches. What kind of growth are you expecting? > -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of DaNieL..! > Sent: Tuesday, June 23, 2009 9:44 AM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Please suggest me on my table design (indexes!) > > Yes, after my post i've tryed the versione with 2 separate table (a > copy of the contact table) with inside just the employees, and, with > my surprise, the query planner looks identical, both with 1 big table > and with 2 splitted table. > This sound a bit strange for me, becose in my test the 'employees > rows' are about 13K, the company about 3k, the private around 7K. > So i thougt that moving the employees to another table would decrese > much the indexes weight on the original table, and so raise the speed > for the remaining original contact table (with inside just private and > company users) > > On 23 Giu, 16:29, gryz...@gmail.com (Grzegorz Jaśkiewicz) wrote: > > 2009/6/23 DaNieL..! <daniele.pigned...@gmail.com>: > > > > > The `problem` is that i dont know if having so many indexes will > raise > > > problems as the data dimension grown. > > > That seem to be not very efficient:http://explain.depesz.com/s/Q0m > > > > Well, this is slow, because for some reason postgres decided to use > > seq scan on contact e. > > As for speed with many indices. Btree for varchar is going to be > > slower, than - say for integer, or any other fixed length type. > > This is due to nature of index. Having said that, if you expect a lot > > of repetition, split it/normalize it. > > Index performance also hurts, when you get a lot of variants of data > > (worse case, all varchar rows are different, and don't share too many > > leafs on index). > > So it is always beneficial to have separate table, if data is > > redundant - especially when it is text/varchar/bytea. > > > > Also, index size grows pretty badly when you modify table's content a > > lot in postgresql. > > Rule of thumb, bigger the index in size, comparable to data size - > the worse. > > > > -- > > GJ > > > > -- > > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > > To make changes to your > subscription:http://www.postgresql.org/mailpref/pgsql-general > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
DaNieL wrote: > Hi guys, im tryin to optimize a simple table, suited for contain > users. > So, my table at the moment is: > > ----- > CREATE TABLE contacts( > id BIGSERIAL PRIMARY KEY NOT NULL UNIQUE, > company_id BIGINT, > code varchar(10), > company_name varchar(120), > name varchar(120), > surname varchar(120), > phone varchar(80), > email varchar(80), > kind varchar(8) > ); > ----- > > I use this layout in order to store 3 kind of users: Private, Company > and Company's Employee.. the col 'kind' infact will contain just > 'private', 'company' or 'employee', but is unnecessary, i can > understand what kind a user is by those rules: > Private are stand-alone users, company_name and company_id are always > NULL; > Company have the company_name; > Employees have the company name and the company_id (with the id of the > company's row); > The layout looks find although i would not use Bigserial unless you expect to exceed 2.1 billion records > Example: > > id|company_id|code| company_name|name|surname|phone| > email | kind > 1 |NULL |C001| Sunday Inc. | John | Doe |88888 | > j@sunday.com | company > 2 | 1 |E001| Sunday Inc. |Paul | Smith | 77777| > smith@sunday.com| employee > 3 | NULL |P001| NULL | Rose | Mary | 66666| > rose@mary.com | private > > So, first of all, does this layout looks good? > Before i used to keep employees in a different table, becose usually > employees have just few data (name, surname, direct email and direct > phone.. all the addresses, bank data, etc.. belongs tot he company), > but noe i preferred this way to avoid constant inner joins. > > Now, i aspect that my users will search the contact table just for the > fields company_name, name, surname, email, code. > That kind of query cant be as > WHERE company_name = '$x' > but will be much like > WHERE company_name LIKE '$x%', > both becose i use an autocomplete field for the quick search, both > becose.. well, that's how users search data's (in my experience). > What i have done with searches on small strings where the user is unsure what they are looking for or the spelling, I do something like this where substr(company_name,1,length($searchtext$ UserSearchString $searchtext$)) ilike $searchtext$UserSearchString $searchtext$ and mix it with soundex. This way the user get a list of possible matches with only handful to type characters Draw back is this type of search is it can't be indexed. > So i created those index, to let the query planner use the indexes in > the LIKE query: > > My doubt is: am i using too many indexes? > Will my insert/delete/update queryes be too slow, and does the select > optimization worth the price? (and, does this way really optimize the > selects queryes?) > The more indexes you have the slower updates will be. Yet not a horrible amount. The answer to this is it depends on the work load can the system suffer the overhead of the indexes and still give adequate results on queries. > Consider that my application wont do many insert-delete-update > sequentially. >
I thought to analyze the input chars to avoid useless searches, for example, if the digit is EXXXXX, where X is number, it is the user_code, and i'll search just that field; otherwise if the digit is an email, i'll look only at the email column. But, the things get little deeper, with the custom fields. Like in wordpress, if for example someone need 2 email fields for the user table, it can set how many custome field he want, this with 2 tables: ----- CREATE TABLE contact_custom_field ( id serial NOT NULL UNIQUE; kind varchar(20); input varchar(10); ); CREATE TABLE contact_custom_values ( id serial NOT NULL UNIQUE; id_contact integer NOT NULL REFERENCES contact ON (id); kind varchar(20) REFERENCES contact_custom_field ON (kind); value varchar(250) ); ----- That layout is just a draft; So, if the user need 2 email fields, i'll have those situation: table contact_custom_value (i'll need a shorter name ;): 1 |email | manual table contact_custom_values: 1 | 1 | 20 | email | something@example.com | And so on for every contact who have 2 emaila address. And obviusly the 'master search' query must look even in those custom field (just in some kind of them). So, the situation is a bit complicated -.- > What i have done with searches on small strings where the user is unsure > what they are looking for or the spelling, I do something like this > > where substr(company_name,1,length($searchtext$ UserSearchString > $searchtext$)) ilike $searchtext$UserSearchString $searchtext$ > > and mix it with soundex. This way the user get a list of possible > matches with only handful to type characters > > Draw back is this type of search is it can't be indexed.
Yes, surely bigserial of overdimensioned. Anyway, i need to keep the postgresql cpu and ram usage as lower as possible, same for the disk usage. I'm starded an application that would be similar to shopify.com, but free and opensource (it will be opensource just when i'll finish at least the first beta), so everyone can register, everyone will have it own database, with the same structure for all.. and even the 'engine', php and python scripts, will be the same for everyone. And all that must run on one server, at the moment.. Till now i tested with 10 beta-tester (the old version of the project, using mysql database) and everythings works fine, but i'll need to have at least 1000 users using it 24/7, keeping the performance acceptable. On 23 Giu, 17:35, rfu...@gmail.com ("Chris Spotts") wrote: > IMHO running queries on 23k'ish worth of rows isn't liable to stress any > reasonably modern server, likely several times over that shouldn't either > for simple "LIKE" searches. > What kind of growth are you expecting? > > > > > -----Original Message----- > > From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- > > ow...@postgresql.org] On Behalf Of DaNieL..! > > Sent: Tuesday, June 23, 2009 9:44 AM > > To: pgsql-gene...@postgresql.org > > Subject: Re: [GENERAL] Please suggest me on my table design (indexes!) > > > Yes, after my post i've tryed the versione with 2 separate table (a > > copy of the contact table) with inside just the employees, and, with > > my surprise, the query planner looks identical, both with 1 big table > > and with 2 splitted table. > > This sound a bit strange for me, becose in my test the 'employees > > rows' are about 13K, the company about 3k, the private around 7K. > > So i thougt that moving the employees to another table would decrese > > much the indexes weight on the original table, and so raise the speed > > for the remaining original contact table (with inside just private and > > company users) > > > On 23 Giu, 16:29, gryz...@gmail.com (Grzegorz Ja¶kiewicz) wrote: > > > 2009/6/23 DaNieL..! <daniele.pigned...@gmail.com>: > > > > > The `problem` is that i dont know if having so many indexes will > > raise > > > > problems as the data dimension grown. > > > > That seem to be not very efficient:http://explain.depesz.com/s/Q0m > > > > Well, this is slow, because for some reason postgres decided to use > > > seq scan on contact e. > > > As for speed with many indices. Btree for varchar is going to be > > > slower, than - say for integer, or any other fixed length type. > > > This is due to nature of index. Having said that, if you expect a lot > > > of repetition, split it/normalize it. > > > Index performance also hurts, when you get a lot of variants of data > > > (worse case, all varchar rows are different, and don't share too many > > > leafs on index). > > > So it is always beneficial to have separate table, if data is > > > redundant - especially when it is text/varchar/bytea. > > > > Also, index size grows pretty badly when you modify table's content a > > > lot in postgresql. > > > Rule of thumb, bigger the index in size, comparable to data size - > > the worse. > > > > -- > > > GJ > > > > -- > > > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > > > To make changes to your > > subscription:http://www.postgresql.org/mailpref/pgsql-general > > > -- > > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > > To make changes to your subscription: > >http://www.postgresql.org/mailpref/pgsql-general > > -- > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general