Hi!
I have 3 databases, the first one is 125 MB,the second one is 1200 MB e the last one is 11 GB.
The bigger table is data and its size is 108 MB in the first one,1060 MB in the second one and 10 GB in the last one.
I tested the dbs performances (time) with 6 queries.
The first test is without indexes,instead the second two are with indexes.
When I used "explain analyze" I noticed that in the bigger db the planner query used a different strategy,but this isn't a problem.
The problem is about time, infact to solve i.e. this query:
"SELECT * FROM data d WHERE d.type = 7 AND (((d.metadata->'chromosome'->>'value')::text IN ('3','7')) AND ((d.metadata->'somatic'->>'value')::boolean = 'true')) and ((d.metadata->'filter'->>'value')::text IN ('pass')) LIMIT 30;"
the first db has an average time about 52.876 ms, the second about 470.1172 ms and the bigger one about 58.3572 ms.
Why is there this difference?it is only for the different strategy?
and then if I created an index on metadata->'chromosome'->>'value' those times increase (115.7986 ms, 572.8914 ms, 246.6348 ms). What is the bug?
Each time I stop the db ( sudo service postgresql stop ), I get empty the cache ( sudo sync; sudo echo 3 > /proc/sys/vm/drop_caches) and I start postgresql ( sudo service postgresql start).
I used the command "\timing" to compute those times.
The last question:
I have this query:
"WITH pd AS (SELECT * FROM personal_details ), nested_1 AS (SELECT * FROM sample WHERE type = 5 AND (((metadata->'mass'->>'value')::float <> '50' AND (metadata->'mass'->>'unit')::text LIKE 'g'))), nested_2 AS (SELECT * FROM sample WHERE type = 6 AND (((metadata->'quantity'->>'value')::float <= '10' AND (metadata->'quantity'->>'unit')::text LIKE 'μg'))), nested_3 AS (SELECT * FROM data WHERE type = 7 AND (((metadata->'chromosome'->>'value')::text IN ('7'))) and ((metadata->'pos'->>'value')::integer >= '17000000') AND ((metadata->'pos'->>'value')::integer <= '28000000')) SELECT DISTINCT d.id FROM subject d LEFT JOIN pd ON pd.id = d.personal_info INNER JOIN nested_1 ON nested_1.parent_subject = d.id INNER JOIN nested_2 ON nested_2.parent_sample = nested_1.id INNER JOIN nested_3 ON nested_3.parent_sample = nested_2.id WHERE d.type = 2 LIMIT 30;"
without index the average time is 701332.4934 ms in the bigger db, with index on metadata->'chromosome'->>'value' is 336978.4922 ms and with the index cast((metadata->'pos'->>'value') as integer) it is 697710.3606 ms.
why does the index on 'pos' not affect the times? I saw in "explain analyze " that the query planner uses this index!
Thank you for your support!
Valentina