optimizing 2-table join w/millions of rows - Mailing list pgsql-sql
From | Michael Olivier |
---|---|
Subject | optimizing 2-table join w/millions of rows |
Date | |
Msg-id | 19981120022608.22274.rocketmail@send102.yahoomail.com Whole thread Raw |
Responses |
Re: [SQL] optimizing 2-table join w/millions of rows
|
List | pgsql-sql |
Hi, I'm trying to optimize a query between two large tables, which ultimately will have millions of entries each. I'd appreciate any suggestions folks have on optimizing. I've tried several things but haven't gotten the performance I need yet. The tables are a user table and a background table. Each user can have multiple backgrounds, that's why they're separate tables. Relationship is: 1 user <--------> many backgrounds +------------------+----------------------------------+----------+ | Owner | Relation | Type | +------------------+----------------------------------+----------+ | michael | bgndtest | table | | michael | bgndacctname_key | index | | michael | bgndage_key | index | | michael | bgndloc_key | index | | michael | usertest | table | | michael | userage_key | index | | michael | userloc_key | index | | michael | usertestkey | index | +------------------+----------------------------------+----------+ Table = bgndtest +------------------------+---------------------------------+-------+ | Field | Type | Length| +------------------------+---------------------------------+-------+ | acctname | text | var |* | part_needed | int4 | 4 |* | loc_needed | int4 | 4 |* +------------- ----------+---------------------------------+-------+ Table = usertest +-------------------------+---------------------------------+-------+ | Field | Type | Length| +-------------------------+---------------------------------+-------+ | acctname | text | var |* | email | text | var | | part | int4 | 4 |* | loc | int4 | 4 |* +-------------------------+---------------------------------+-------+ * indicates a single column index is available select U.acctname from usertest U, bgndtest B where B.part_needed=3 and B.loc_needed=5 and B.acctname=U.acctname and U.acctname in (select acctname from usertest where part=2 and loc=3) (query plan at bottom of this message) There are other similar queries I'll need to do which will have even more conditions in the where clause than this... I noticed it's about 4x slower when I change just one of the = comparisons to a < ... but I thought with btree indexes, < comparisons would run fast. Aren't btrees the default for indexes? I ultimately will be doing a lot of range comparisons, not equality. If I change "U.acctname in (select...)" to "B.acctname in (select...)", is there any difference in performance? Seems to be but I don't know why. I'm currently benchmarking performance that degrades 4x for every 2x growth in the tables. Presumably the performance is linear to (users * backgrounds), and in my tests the number of users rows and backgrounds rows is the same. But this won't be acceptable once I get to millions of users, when 20k users costs 3.4 secs on this query (RH 5.1, Postgres 6.3.2-10, 350 MHz Pentium II). Re. optimization tips from the FAQ: * vacuum analyze made a world of difference (10x) after creating indexes and adding tons of test data * I've tried running postmaster with -o -F to get rid of fsync(), didn't seem to help much * I've tried using -B 1280 because I'm happy to eat up memory for performance, didn't notice a big difference - where can I read more about using this parameter? * I'm not doing a lot of sorts, so increasing memory for sorts won't help, I don't think * Cluster doesn't seem like it will help because I'm finding rows based on several parameters * returning OID from subquery and indexing OID column didn't seem to help performance Is there any way to get postgres to load a whole table into memory? I'd like to do some benchmarking with small number of table entries to find out how it's going to scale. In production, I can buy a machine with lots of memory and take the altavista approach, but this requires a DB that will hold tables in memory. If you think I'd be better off with another DBMS (could be Linux or NT, must have eval period of 60 days or more, or be cheap or free) -- please let me know. Thanks very much for any input, pointers to docs, etc! I'm new to query optimization and I'm working hard to get an understanding of how PostgreSQL will scale for my project. --Michael ---------------- query plan: Nested Loop (cost=11.09 size=1 width=24) -> Index Scan on b (cost=9.04 size=1 width=12) SubPlan -> Index Scan on usertest (cost=2.05 size=1 width=12) -> Index Scan on u (cost=2.05 size=10200 width=12) verbose query plan: {NESTLOOP :cost 11.0861 :size 1 :width 24 :state <> :qptargetlist ({TLE :resdom {RESDOM :resno 1 :restype 25 :restypmod -1 :resname acctname :reskey 0 :reskeyop 0 :resjunk 0 } :expr {VAR :varno 65001 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 }}) :qpqual <> :lefttree {INDEXSCAN :cost 9.03614 :size 1 :width 12 :state <> :qptargetlist ({TLE :resdom {RESDOM :resno 1 :restype 25 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :resjunk 0 } :expr {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 }}) :qpqual ({EXPR :typeOid 0 :opType op :oper {OPER :opno 96 :opid 65 :opresulttype 16 } :args ({VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2 } {CONST :consttype 23 :constlen 4 :constisnull false :constvalue 4 [ 26 0 0 0 ] :constbyval true})} {EXPR :typeOid 16 :opType subp :oper {SUBPLAN :plan {INDEXSCAN :cost 2.05 :size 1 :width 12 :state <> :qptargetlist ({TLE :resdom {RESDOM :resno 1 :restype 25 :restypmod -1 :resname acctname :reskey 0 :reskeyop 0 :resjunk 0 } :expr {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 }}) :qpqual ({EXPR :typeOid 0 :opType op :oper {OPER :opno 96 :opid 65 :opresulttype 16 } :args ({VAR :varno 1 :varattno 3 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3 } {CONST :consttype 23 :constlen 4 :constisnull false :constvalue 4 [ 12 0 0 0 ] :constbyval true})}) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 :indxid ( 189085 ) :indxqual (({EXPR :typeOid 0 :opType op :oper {OPER :opno 96 :opid 65 :opresulttype 16 } :args ({VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 4 } {CONST :consttype 23 :constlen 4 :constisnull false :constvalue 4 [ 27 0 0 0 ] :constbyval true})}))} :planid 0 :rtable ({RTE :relname usertest :refname usertest :relid 189056 :inh false :inFromCl true :skipAcl false}) :setprm () :parprm () :slink {SUBLINK :subLinkType 2 :useor false :lefthand ({VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 }) :oper ({EXPR :typeOid 16 :opType op :oper {OPER :opno 98 :opid 67 :opresulttype 16 } :args ({VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 } {CONST :consttype 25 :constlen 0 :constisnull true :constvalue <> :constbyval true})}) :subselect <>}} :args <>}) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 :indxid ( 189094 ) :indxqual (({EXPR :typeOid 0 :opType op :oper {OPER :opno 96 :opid 65 :opresulttype 16 } :args ({VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3 } {CONST :consttype 23 :constlen 4 :constisnull false :constvalue 4 [ 5 0 0 0 ] :constbyval true})}))} :righttree {INDEXSCAN :cost 2.05 :size 10200 :width 12 :state <> :qptargetlist ({TLE :resdom {RESDOM :resno 1 :restype 25 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :resjunk 0 } :expr {VAR :varno 2 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 1 }}) :qpqual <> :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 2 :indxid ( 189079 ) :indxqual (({EXPR :typeOid 0 :opType op :oper {OPER :opno 98 :opid 67 :opresulttype 16 } :args ({VAR :varno 65001 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1 } {VAR :varno 2 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 1 })}))} :extprm () :locprm () :initplan <> :nprm 0 } _________________________________________________________ DO YOU YAHOO!? Get your free @yahoo.com address at http://mail.yahoo.com