Home > mailing lists

Optimizing the Query - Mailing list pgsql-novice

From	Shmagi Kavtaradze
Subject	Optimizing the Query
Date	February 16, 2016 16:39:40
Msg-id	CAHY6mawRXcM91J84Ht4RO4TOoC2RVvDnAtSCcw7MFv5yXZRM3w@mail.gmail.com Whole thread Raw
List	pgsql-novice

Tree view

I have a table (doc(varchar), sentenceId(int), bow(varchar[])). In the bow column I import bag of words. In total there are 4623 rows. Table (nlptable) looks like(for 10 vectors):

doc | sentenceId | bow

-------------------------------------------

corpus | 1 | {1,0,0,1,0,0,0,1,0,1}

corpus | 2 | {0,1,1,1,0,1,0,0,0,0}

The query I run (compare bag of words representation of two sentences):

select a.doc, a.sentenceid, b.doc, b.sentenceid,

cardinality(array(select unnest(array_positions(a.bow, '1')) intersect select unnest(array_positions(b.bow, '1'))))::float / cardinality(a.bow)::float

from

nlptable a, nlptable b

where

a.sentenceid < b.sentenceid;

The problem is that for 10 vectors (10 most common words) the execution time is about 3 minutes, for 100 vectors about 25 minutes and for 500 vectors 80 minutes. I have to make calculation for 10,000 most common words, which will possibly take 1 day. The query is too slow and I want to optimize it, but now idea how. Is there an option to use some boolean type instead of varchar[] to deacrease size of data?

I am relatively new to postgres, so have no idea about optimization. I also heard that arrays in postgres are heavy to deal with. Because of this I searched alternative ways to store bag of words in table, but can not find functionality other than in arrays.

pgsql-novice by date:

From: Shmagi Kavtaradze
Date: 15 February 2016, 16:26:13
Subject: Re: postgres version problem

From: Jack Ort
Date: 18 February 2016, 00:26:13
Subject: Slow Query Performance Using ogr_fdw on Windows 2012 R2 with PG 9.5

Optimizing the Query - Mailing list pgsql-novice

Previous

Next