I've created a small dump of my database that recreates the problem. I hope that this will help recreate the problem. It is attached. I'd be happy to hear if there is an easier way of doing this.
To rebuild the database:
- create a database
- run from the commandline `$ psql database-name < 1000genomes-schema.sql`
- run this within a psql REPL ` #\copy public.qcregions FROM '/tmp/1000genomes-qcregions.tsv' DELIMITER ' ' CSV;` (where the delimiter is a tab)
- similarly run this within a psql REPL, `#\copy public.vcf FROM '/tmp/1000genomes-vcf.tsv' DELIMITER ' ' CSV;`
To see that the GIST index is not being hit, try running the following query:
EXPLAIN SELECT * FROM vcf
WHERE EXISTS (SELECT region FROM qcregions
WHERE qcregions.chr = vcf.chr
AND vcf.pos <@ qcregions.region);
The actual query I am trying to run is:
EXPLAIN SELECT * FROM vcf
WHERE EXISTS (SELECT region FROM qcregions
WHERE qcregions.chr = vcf.chr
AND qcregions.type = 'include'
AND vcf.pos <@ qcregions.region);
Let me know what else I can try,
Gideon.
What's a good way for me to create a self-contained test case. AFAIU the only way to make these test cases more self-contained would be to inline the second table and its index. How do you create an index to an inlined table of values?
Or perhaps I could send over a dump of a subset of the data?
Yes, I am fairly sure that I am running 9.4.4:
$ psql --version
psql (PostgreSQL) 9.4.4
# select version();
version
-----------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 5.1.0, 64-bit
(1 row)
Thanks for the help,
Gideon.
Gideon Dresdner <gideond@gmail.com> writes:
> I had a discussion on IRC today with RhodiumToad regarding optimizing a
> specific query. We didn't manage to figure out how to get postgres to hit a
> GIST index.
FWIW, I couldn't reproduce the described behavior. Can you provide a
self-contained test case? Are you sure your server is 9.4.4?
regards, tom lane