Seq Scan vs. Index Scan - Mailing list pgsql-performance

From Nassib Nassar
Subject Seq Scan vs. Index Scan
Date
Msg-id BD2648C0-836B-4F4C-834A-4B26CF092B20@renci.org
Whole thread Raw
Responses Re: Seq Scan vs. Index Scan
List pgsql-performance
In this example it looks to me like the planner is choosing a Seq Scan resulting in 18x running time compared to
runningit with enable_seqscan = 'off'.  Adding more indexes to public.gene (please see below) seemed to make things
worse. I definitely have run VACUUM ANALYZE on everything, manually.  What am I missing?  Thank you for any feedback. 

Query:
SELECT *
    FROM gene_af_polyphen
    WHERE dataset_id = '001-1' AND
          (vartype = 'snp' OR
              vartype = 'ins' OR
              vartype = 'del' OR
              vartype = 'sub');

Query plan:
http://explain.depesz.com/s/qnZ

Query plan after SET enable_seqscan TO 'off':
http://explain.depesz.com/s/N5q

Hardware:
24GB memory / 8 core running Linux 2.6.32 x86_64

Database configuration:
 version                       | PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Ubuntu
4.4.3-4ubuntu5)4.4.3, 64-bit 
 autovacuum                    | off
 default_transaction_isolation | serializable
 effective_cache_size          | 18GB
 lc_collate                    | en_US.UTF-8
 lc_ctype                      | en_US.UTF-8
 listen_addresses              | *
 max_connections               | 100
 max_stack_depth               | 2MB
 server_encoding               | UTF8
 shared_buffers                | 6GB
 TimeZone                      | US/Eastern
 work_mem                      | 1GB

Table row counts:
public.gene    ~1 billion (may have lots of NULLs in several columns)
public.af    38878319
public.polyphen    25821

Database:

                    Table "public.gene"
        Column         |          Type          | Modifiers
-----------------------+------------------------+-----------
 dataset_id            | character varying(255) |
 referencename         | character varying(255) |
 index                 | integer                |
 locus                 | integer                |
 haplotype             | integer                |
 chromosome            | character varying(255) |
 begincoord            | integer                |
 endcoord              | integer                |
 vartype               | character varying(255) |
 reference             | character varying(255) |
 call                  | character varying(255) |
 xref                  | text                   |
 geneid                | integer                |
 mrnaacc               | character varying(255) |
 proteinacc            | character varying(255) |
 symbol                | character varying(255) |
 orientation           | character(1)           |
 exoncategory          | character varying(255) |
 exon                  | integer                |
 codingregionknown     | character(1)           |
 aacategory            | character varying(255) |
 nucleotidepos         | character varying(255) |
 proteinpos            | character varying(255) |
 aaannot               | character varying(255) |
 aacall                | character varying(255) |
 aaref                 | character varying(255) |
 allele                | character varying(255) |
 component             | character varying(255) |
 componentindex        | character varying(255) |
 impact                | character varying(255) |
 annotationrefsequence | character varying(255) |
 samplesequence        | character varying(255) |
 genomerefsequence     | character varying(255) |
 pfam                  | character varying(255) |
 unknown1              | character varying(255) |
Indexes:
    "gene_dataset_id_idx" btree (dataset_id), tablespace "indexspace"

                Table "public.af"
   Column    |          Type          | Modifiers
-------------+------------------------+-----------
 chromosome  | character varying(255) | not null
 endcoord    | integer                | not null
 rs_id       | character varying(255) |
 reference   | character varying(255) | not null
 call        | character varying(255) | not null
 allele_freq | numeric                |
Indexes:
    "af_allele_freq_idx" btree (allele_freq), tablespace "indexspace"
    "af_call_idx" btree (call), tablespace "indexspace"
    "af_chromosome_idx" btree (chromosome), tablespace "indexspace"
    "af_endcoord_idx" btree (endcoord), tablespace "indexspace"
    "af_reference_idx" btree (reference), tablespace "indexspace"

                   Table "public.polyphen"
         Column          |          Type          | Modifiers
-------------------------+------------------------+-----------
 mrnaacc                 | character varying(255) | not null
 proteinpos              | character varying(255) | not null
 annotationrefsequence   | character varying(255) | not null
 samplesequence          | character varying(255) | not null
 prediction              | character varying(255) |
 probability_deleterious | numeric                |
Indexes:
    "polyphen_annotationrefsequence_idx1" btree (annotationrefsequence), tablespace "indexspace"
    "polyphen_mrnaacc_idx1" btree (mrnaacc), tablespace "indexspace"
    "polyphen_proteinpos_idx1" btree (proteinpos), tablespace "indexspace"
    "polyphen_samplesequence_idx1" btree (samplesequence), tablespace "indexspace"

CREATE VIEW gene_af_polyphen AS
SELECT gene.dataset_id dataset_id,
       gene.referencename referencename,
       gene.index "index",
       gene.locus locus,
       gene.haplotype haplotype,
       gene.chromosome chromosome,
       gene.begincoord begincoord,
       gene.endcoord endcoord,
       gene.vartype vartype,
       gene.reference reference,
       gene.call call,
       gene.xref xref,
       gene.geneid geneid,
       gene.mrnaacc mrnaacc,
       gene.proteinacc proteinacc,
       gene.symbol symbol,
       gene.orientation orientation,
       gene.exoncategory exoncategory,
       gene.exon exon,
       gene.codingregionknown codingregionknown,
       gene.aacategory aacategory,
       gene.nucleotidepos nucleotidepos,
       gene.proteinpos proteinpos,
       gene.aaannot aaannot,
       gene.aacall aacall,
       gene.aaref aaref,
       gene.allele allele,
       gene.component component,
       gene.componentindex componentindex,
       gene.impact impact,
       gene.annotationrefsequence annotationrefsequence,
       gene.samplesequence samplesequence,
       gene.genomerefsequence genomerefsequence,
       gene.pfam pfam,
       gene.unknown1 unknown1,
       af.rs_id rs_id,
       af.allele_freq allele_freq,
       polyphen.prediction prediction,
       polyphen.probability_deleterious probability_deleterious
    FROM gene
    LEFT JOIN af
        ON gene.chromosome = af.chromosome AND
           gene.endcoord = af.endcoord AND
           gene.reference = af.reference AND
           gene.call = af.call
    LEFT JOIN polyphen
        ON gene.mrnaacc = polyphen.mrnaacc AND
           gene.proteinpos = polyphen.proteinpos AND
           gene.annotationrefsequence = polyphen.annotationrefsequence AND
           gene.samplesequence = polyphen.samplesequence;


pgsql-performance by date:

Previous
From: "Nicholson, Brad (Toronto, ON, CA)"
Date:
Subject: Re: Need to tune for Heavy Write
Next
From: Craig Ringer
Date:
Subject: Re: Need to tune for Heavy Write