Re: [PATCH] Phrase search ported to 9.6 - Mailing list pgsql-hackers

From Artur Zakirov
Subject Re: [PATCH] Phrase search ported to 9.6
Date
Msg-id 56D464C0.8010107@postgrespro.ru
Whole thread Raw
In response to [PATCH] Phrase search ported to 9.6  (Dmitry Ivanov <d.ivanov@postgrespro.ru>)
Responses Re: [PATCH] Phrase search ported to 9.6
Re: [PATCH] Phrase search ported to 9.6
List pgsql-hackers
Hello, Dmitry

This is my initial review for you patch. Below are my comments.

Introduction
------------

This patch introduce new operator and new functions.
New operator:
- ??
New functions:
- phraseto_tsquery([ config regconfig , ] query text)
- setweight(tsquery, "char")
- tsquery_phrase(query1 tsquery, query2 tsquery)
- tsquery_phrase(query1 tsquery, query2 tsquery, distance integer)

New regression tests are included in the patch. Information about new 
features is provided in the documentation.

Initial Run
-----------

The patch applies correctly to HEAD. Regression tests pass successfully, 
without errors. It seems that the patch work as you expected.

Performance
-----------

I have not tested possible performance issues yet. Maybe later I will 
prepare some data to test performance.

Coding
------

I agree with others that there is a lack of comments for new functions. 
Most of them without comments.

> ../pg_patches/phrase_search.patch:1086: trailing whitespace.
>      * QI_VALSTOP nodes should be cleaned and
> ../pg_patches/phrase_search.patch:1124: trailing whitespace.
>         if (priority < parentPriority)
> ../pg_patches/phrase_search.patch:1140: trailing whitespace.
>         if (priority < parentPriority)
> ../pg_patches/phrase_search.patch:1591: space before tab in indent.
>                      /*    (a|b) ? c => (a?c) | (b?c) */
> ../pg_patches/phrase_search.patch:1603: space before tab in indent.
>                      /*    !a ? b    => b & !(a?b) */

It is git apply output. There are trailing whitespaces in the code.

> +typedef struct MorphOpaque
> +{
> +    Oid        cfg_id;
> +    int        qoperator;    /* query operator */
> +} MorphOpaque;

Maybe you should move structure definition to the top of the to_tsany.c

> +                    pushValue(state,
> +                              prs.words[count].word,
> +                              prs.words[count].len,
> +                              weight,
> +                              ((prs.words[count].flags & TSL_PREFIX) || prefix) ?
> +                                    true :
> +                                    false);

There is not need in ternary operator here. You can write:

pushValue(state,      prs.words[count].word,      prs.words[count].len,      weight,      (prs.words[count].flags &
TSL_PREFIX)|| prefix);
 

>  /*
> + * Wrapper of check condition function for TS_execute.
> + * We are using the fact GIN_FALSE = 0 and GIN_MAYBE state
> + * should be treated as true, so, any non-zero value should be
> + * converted to boolean true.
> + */
> +static bool
> +checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
> +{
> +    return !!checkcondition_gin_internal((GinChkVal *) checkval, val, data);
> +}

Maybe write like this?

static bool
checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
{return checkcondition_gin_internal((GinChkVal *) checkval, val, data) 
!= GIN_FALSE;
}

Then you don't need in the comment above of the function.

> +#define PRINT_PRIORITY(x)    ( (((QueryOperator*)(x))->oper == OP_NOT) ? 5 : QO_PRIORITY(x) )

What is mean "5" here? You can define a new value near the:

#define OP_OR        1
#define OP_AND        2
#define OP_NOT        3
#define OP_PHRASE    4

> +Datum
> +tsquery_setweight(PG_FUNCTION_ARGS)
> +{
> +    TSQuery        in = PG_GETARG_TSQUERY(0);
> +    char        cw = PG_GETARG_CHAR(1);
> +    TSQuery     out;
> +    QueryItem  *item;
> +    int         w = 0;
> +
> +    switch (cw)
> +    {
> +        case 'A':
> +        case 'a':
> +            w = 3;
> +            break;
> +        case 'B':
> +        case 'b':
> +            w = 2;
> +            break;
> +        case 'C':
> +        case 'c':
> +            w = 1;
> +            break;
> +        case 'D':
> +        case 'd':
> +            w = 0;
> +            break;
> +        default:
> +            /* internal error */
> +            elog(ERROR, "unrecognized weight: %d", cw);
> +    }
> +
> +    out = (TSQuery) palloc(VARSIZE(in));
> +    memcpy(out, in, VARSIZE(in));
> +    item = GETQUERY(out);
> +
> +    while(item - GETQUERY(out) < out->size)    
> +    {
> +        if (item->type == QI_VAL)
> +            item->qoperand.weight |= (1 << w);
> +
> +        item++;
> +    }
> +
> +    PG_FREE_IF_COPY(in, 0);
> +    PG_RETURN_POINTER(out);
> +}

This function has the duplicated piece from the tsvector_setweight() 
from tsvector_op.c. You can define new function for it.

> +/*
> + * Check if datatype is the specified type or equivalent to it.
> + *
> + * Note: we could just do getBaseType() unconditionally, but since that's
> + * a relatively expensive catalog lookup that most users won't need, we
> + * try the straight comparison first.
> + */
> +static bool
> +is_expected_type(Oid typid, Oid expected_type)
> +{
> +    if (typid == expected_type)
> +        return true;
> +    typid = getBaseType(typid);
> +    if (typid == expected_type)
> +        return true;
> +    return false;
> +}
> +
> +/* Check if datatype is TEXT or binary-equivalent to it */
> +static bool
> +is_text_type(Oid typid)
> +{
> +    /* varchar(n) and char(n) are binary-compatible with text */
> +    if (typid == TEXTOID || typid == VARCHAROID || typid == BPCHAROID)
> +        return true;
> +    /* Allow domains over these types, too */
> +    typid = getBaseType(typid);
> +    if (typid == TEXTOID || typid == VARCHAROID || typid == BPCHAROID)
> +        return true;
> +    return false;
> +}

These functions was removed in 9acb9007de30b3daaa9efc16763c3bc6e3e0a92d. 
It seems that tsvector_op.c was not synchronized with the master.

Conclusion
----------

This patch is large and it needs more research. I will be reviewing it 
and will give another notes later.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



pgsql-hackers by date:

Previous
From: Anastasia Lubennikova
Date:
Subject: Re: WIP: Covering + unique indexes.
Next
From: Tom Lane
Date:
Subject: Re: Typo fix