Thread: Custom index structure and strange count problem

Custom index structure and strange count problem

From

Carsten Kropf

Date:

09 June 2010, 08:45:33

Hi *,
during the last few months I've been building a new index structure as part of a research project.
Everything seems to work properly, however I have some strange issues with the count sql command.
I introduced some custom structures (mainly document and hybrid_query) with which my index access method is supposed to
work.
There is an operator "&&" which is supposed to use my index structure (what also works properly).
The function that maps to the operator "&&" is called hybrid_index_query, which I use to compare my results given from
theindex with the real results that are supposed to appear in the final result set. 
Having described the outer circumstances (in a very short way), I will now show the strange stuff that happens:

test=# select id from documents where hybrid_index_query(to_document(words, points), row('radio pleas news'::tsvector,
'[(-90,-180),(90,180)]')) order by id; id   
------2137215121682207220822092210221122662296
(10 rows)

This query takes a sequential scan and works properly (returning 10 rows).

test=# select id from documents where to_document(words, points) && row('pleas radio news'::tsvector,
'[(-90,-180),(90,180)]')order by id; id   
------2137215121682207220822092210221122662296
(10 rows)

This query uses my index structure and returns the same result as in the sequential scan above.
Until here, everything seems to work fine. However, if I issue the same queries using the count aggregate function in
SQL,there are some odd results: 
test=# select count(*) from documents where hybrid_index_query(to_document(words, points), row('radio pleas
news'::tsvector,'[(-90,-180),(90, 180)]'));count  
-------   10
(1 row)

Using the sequential scan, still, everything seems fine.
However, if I now do the index scan (my function will be called 11 times, returning false at the end), I get the
followingresult: 
test=# select count(*) from documents where to_document(words, points) && row('pleas radio news'::tsvector,
'[(-90,-180),(90,180)]');count 
-------    7
(1 row)

This seems strange, because the same query returned 10 rows (when I didn't use the aggregate). If I issue queries that
countthe id column, I receive the following: 
test=# select count(id) from documents where hybrid_index_query(to_document(words, points), row('radio pleas
news'::tsvector,'[(-90,-180),(90, 180)]'));count  
-------   10
(1 row)

test=# select count(id) from documents where to_document(words, points) && row('pleas radio news'::tsvector,
'[(-90,-180),(90,180)]');count 
-------   10
(1 row)

These two queries do again return the same results.
Thus, I don't know, what's wrong here, does anybody know about that behaviour, or is it my fault that the results are
wrong,somehow? 
Thanks in advance

Carsten Kropf

Re: Custom index structure and strange count problem

From

Robert Haas

Date:

09 June 2010, 17:10:08

On Wed, Jun 9, 2010 at 4:35 AM, Carsten Kropf <ckropf2@fh-hof.de> wrote:
> Hi *,
> during the last few months I've been building a new index structure as part of a research project.
> Everything seems to work properly, however I have some strange issues with the count sql command.
> I introduced some custom structures (mainly document and hybrid_query) with which my index access method is supposed
towork. 
> There is an operator "&&" which is supposed to use my index structure (what also works properly).
> The function that maps to the operator "&&" is called hybrid_index_query, which I use to compare my results given
fromthe index with the real results that are supposed to appear in the final result set. 
> Having described the outer circumstances (in a very short way), I will now show the strange stuff that happens:
>
> test=# select id from documents where hybrid_index_query(to_document(words, points), row('radio pleas
news'::tsvector,'[(-90,-180),(90, 180)]')) order by id; 
>  id
> ------
>  2137
>  2151
>  2168
>  2207
>  2208
>  2209
>  2210
>  2211
>  2266
>  2296
> (10 rows)
>
> This query takes a sequential scan and works properly (returning 10 rows).
>
> test=# select id from documents where to_document(words, points) && row('pleas radio news'::tsvector,
'[(-90,-180),(90,180)]')order by id; 
>  id
> ------
>  2137
>  2151
>  2168
>  2207
>  2208
>  2209
>  2210
>  2211
>  2266
>  2296
> (10 rows)
>
> This query uses my index structure and returns the same result as in the sequential scan above.
> Until here, everything seems to work fine. However, if I issue the same queries using the count aggregate function in
SQL,there are some odd results: 
> test=# select count(*) from documents where hybrid_index_query(to_document(words, points), row('radio pleas
news'::tsvector,'[(-90,-180),(90, 180)]')); 
>  count
> -------
>    10
> (1 row)
>
> Using the sequential scan, still, everything seems fine.
> However, if I now do the index scan (my function will be called 11 times, returning false at the end), I get the
followingresult: 
> test=# select count(*) from documents where to_document(words, points) && row('pleas radio news'::tsvector,
'[(-90,-180),(90,180)]');
>  count
> -------
>     7
> (1 row)
>
> This seems strange, because the same query returned 10 rows (when I didn't use the aggregate). If I issue queries
thatcount the id column, I receive the following: 
> test=# select count(id) from documents where hybrid_index_query(to_document(words, points), row('radio pleas
news'::tsvector,'[(-90,-180),(90, 180)]')); 
>  count
> -------
>    10
> (1 row)
>
> test=# select count(id) from documents where to_document(words, points) && row('pleas radio news'::tsvector,
'[(-90,-180),(90,180)]');
>  count
> -------
>    10
> (1 row)
>
> These two queries do again return the same results.
> Thus, I don't know, what's wrong here, does anybody know about that behaviour, or is it my fault that the results are
wrong,somehow? 
> Thanks in advance

I am guessing this is a bug in your code - have you used EXPLAIN to
verify that the second-to-last of the above queries is really hitting
your code?  If so, I'd recommend attaching with gdb and setting a
breakpoint wherever you return the tuples, and then poke around...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: Custom index structure and strange count problem

From

Carsten Kropf

Date:

09 June 2010, 18:29:04

Hi,
thanks so far.
However, if I attach a Debugger (which I did in advance, too) and I use explain, I get the same results.
My first guess in each case is always that it is my fault. However, I don't know exactly, why this strange behaviour
occurshere. The problem I have is that EXPLAIN, too, always tells me that it uses an index scan (in the cases where the
queryis supposed to use one). 
The query plan looks exactly the same in any case (if I apply count(id) or count(*), respectively). However, the
resultsdiffer. 
The query plan is also the same, if I use the select * or select id query without applying an aggregate with the small
differencethat the aggregate is used, where it is supposed to be. 
I just thought, that somebody has already had problems with something like that (actually it is no "real" problem,
exceptthat the aggregate applied to * queries causes a different count). 
The query data my index structure is called with, stays the same in all cases (mentioned in the previous mail).
Does anybody have some hints according to which checks to perform in order to determine the problem here?

Best regardsCarsten Kropf
Am 09.06.2010 um 19:09 schrieb Robert Haas:

> On Wed, Jun 9, 2010 at 4:35 AM, Carsten Kropf <ckropf2@fh-hof.de> wrote:
>> Hi *,
>> during the last few months I've been building a new index structure as part of a research project.
>> Everything seems to work properly, however I have some strange issues with the count sql command.
>> I introduced some custom structures (mainly document and hybrid_query) with which my index access method is supposed
towork. 
>> There is an operator "&&" which is supposed to use my index structure (what also works properly).
>> The function that maps to the operator "&&" is called hybrid_index_query, which I use to compare my results given
fromthe index with the real results that are supposed to appear in the final result set. 
>> Having described the outer circumstances (in a very short way), I will now show the strange stuff that happens:
>>
>> test=# select id from documents where hybrid_index_query(to_document(words, points), row('radio pleas
news'::tsvector,'[(-90,-180),(90, 180)]')) order by id; 
>>  id
>> ------
>>  2137
>>  2151
>>  2168
>>  2207
>>  2208
>>  2209
>>  2210
>>  2211
>>  2266
>>  2296
>> (10 rows)
>>
>> This query takes a sequential scan and works properly (returning 10 rows).
>>
>> test=# select id from documents where to_document(words, points) && row('pleas radio news'::tsvector,
'[(-90,-180),(90,180)]')order by id; 
>>  id
>> ------
>>  2137
>>  2151
>>  2168
>>  2207
>>  2208
>>  2209
>>  2210
>>  2211
>>  2266
>>  2296
>> (10 rows)
>>
>> This query uses my index structure and returns the same result as in the sequential scan above.
>> Until here, everything seems to work fine. However, if I issue the same queries using the count aggregate function
inSQL, there are some odd results: 
>> test=# select count(*) from documents where hybrid_index_query(to_document(words, points), row('radio pleas
news'::tsvector,'[(-90,-180),(90, 180)]')); 
>>  count
>> -------
>>    10
>> (1 row)
>>
>> Using the sequential scan, still, everything seems fine.
>> However, if I now do the index scan (my function will be called 11 times, returning false at the end), I get the
followingresult: 
>> test=# select count(*) from documents where to_document(words, points) && row('pleas radio news'::tsvector,
'[(-90,-180),(90,180)]');
>>  count
>> -------
>>     7
>> (1 row)
>>
>> This seems strange, because the same query returned 10 rows (when I didn't use the aggregate). If I issue queries
thatcount the id column, I receive the following: 
>> test=# select count(id) from documents where hybrid_index_query(to_document(words, points), row('radio pleas
news'::tsvector,'[(-90,-180),(90, 180)]')); 
>>  count
>> -------
>>    10
>> (1 row)
>>
>> test=# select count(id) from documents where to_document(words, points) && row('pleas radio news'::tsvector,
'[(-90,-180),(90,180)]');
>>  count
>> -------
>>    10
>> (1 row)
>>
>> These two queries do again return the same results.
>> Thus, I don't know, what's wrong here, does anybody know about that behaviour, or is it my fault that the results
arewrong, somehow? 
>> Thanks in advance
>
> I am guessing this is a bug in your code - have you used EXPLAIN to
> verify that the second-to-last of the above queries is really hitting
> your code?  If so, I'd recommend attaching with gdb and setting a
> breakpoint wherever you return the tuples, and then poke around...
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company