Re: Comparing user attributes with bitwise operators - Mailing list pgsql-performance
From | Greg Stark |
---|---|
Subject | Re: Comparing user attributes with bitwise operators |
Date | |
Msg-id | 87sm9eluc3.fsf@stark.xeocode.com Whole thread Raw |
In response to | Re: Comparing user attributes with bitwise operators (Patrick Clery <patrick@phpforhire.com>) |
Responses |
Re: Comparing user attributes with bitwise operators
|
List | pgsql-performance |
Patrick Clery <patrick@phpforhire.com> writes: > PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=6.03..6.03 rows=1 width=68) (actual time=69.391..69.504 rows=10 loops=1) > -> Sort (cost=6.03..6.03 rows=1 width=68) (actual time=69.381..69.418 rows=10 loops=1) > Sort Key: age > -> Index Scan using people_attributes_search on people_attributes (cost=0.00..6.02 rows=1 width=68) (actual time=0.068..61.648rows=937 loops=1) > Index Cond: (('{30,31,32,33,34,35,36,37,38,39,40}'::integer[] && age) AND ('{2}'::integer[] && gender) AND('{1,2,4}'::integer[] && orientation)) > Total runtime: 69.899 ms > (6 rows) ... > - Is there a way of speeding up the sort? The sort seems to have only taken 8ms out of 69ms or just over 10%. As long as the index scan doesn't match too many records the sort should never be any slower so it shouldn't be the performance bottleneck. You might consider putting a subquery inside the order by with a limit to ensure that the sort never gets more than some safe maximum. Something like: select * from (select * from people_attributes where ... limit 1000) order by age limit 10 This means if the query matches more than 1000 it won't be sorted properly by age; you'll get the top 10 out of some random subset. But you're protected against ever having to sort more than 1000 records. > - Will using queries like " WHERE orientation IN (1,2,4) " be any better/worse? Well they won't use the GiST index, so no. If there was a single column with a btree index then this would be the cleanest way to go. > - The queries with the GiST index are faster, but is it of any benefit when > the int[] arrays all contain a single value? Well you've gone from 5 minutes to 60ms. You'll have to do more than one test to be sure but it sure seems like it's of some benefit. If they're always a single value you could make it an expression index instead and not have to change your data model. Just have the fields be integers individually and make an index as: create index idx on people_attributes using gist ( (array[age]) gist__int_ops, (array[gender]) gist__int_ops, ... ) However I would go one step further. I would make the index simply: create index idx on people_attributes using gist ( (array[age,gender,orientation,...]) gist__int_ops ) And ensure that all of these attributes have distinct domains. Ie, that they don't share any values. There are 4 billion integer values available so that shouldn't be an issue. Then you could use query_int to compare them the way you want. You misunderstood how query_int is supposed to work. You index an array column and then you can check it against a query_int just as you're currently checking for overlap. Think of @@ as a more powerful version of the overlap operator that can do complex logical expressions. The equivalent of where '{30,31,32,33,34,35,36,37,38,39,40}'::int[] && age and '{2}'::int[] && gender and '{1,2,4}'::int[] && orientation would then become: WHERE array[age,gender,orientation] @@ '(30|31|32|33|34|35|36|37|38|39|40)&(2)&(1|2|4)' except you would have to change orientation and gender to not both have a value of 2. You might consider doing the expression index a bit of overkill actually. You might consider just storing a column "attributes" with an integer array directly in the table. You would also want a table that lists the valid attributes to be sure not to have any overlaps: 1 age 1 2 age 2 ... 101 gender male 102 gender female 103 orientation straight 104 orientation gay 105 orientation bi 106 bodytype scrawny ... > - Is there any hope for this structure? You'll have to test this carefully. I tried using GiST indexes for my project and found that I couldn't load the data and build the GiST indexes fast enough. You have to test the costs of building and maintaining this index, especially since it has so many columns in it. But it looks like your queries are in trouble without it so hopefully it'll be ok on the insert/update side for you. -- greg
pgsql-performance by date: