Re: DRAFT: Pass sk_attno to consistent function - Mailing list pgsql-hackers

From Michał Kłeczek
Subject Re: DRAFT: Pass sk_attno to consistent function
Date
Msg-id E17B60DE-42A3-4AAD-81C4-56CFF1D4CDD5@kleczek.org
Whole thread Raw
In response to Re: DRAFT: Pass sk_attno to consistent function  (Michał Kłeczek <michal@kleczek.org>)
Responses Re: DRAFT: Pass sk_attno to consistent function
List pgsql-hackers
Hi All,

Since it looks like there is not much interest in the patch I will try to provide some background to explain why I
thinkit is needed. 

We are in the process of migration from an old db platform to PostgreSQL.
Our database is around 10TB big and contains around 10 billion financial transactions in a single table.
Each transaction is assigned to an account (column acc_number).

We have partitioned the table BY HASH (acc_number).

A client can query transactions belonging to his accounts using several criteria - among them is te xt search.
Queries are of type TOP N (ie ORDER BY … LIMIT ).

The list of accounts that we are querying is provided as a parameter to the query.

We have decided to use a single Gist index supporting all queries (reasons described in [1]).

There are several problems with Gist usage (but I still think we have no other choice) but the most important is
that we cannot use SAOP in our queries - since Gist does not support it the planner decides to perform Bitmap Scan
which in turn does not support ORDER BY … LIMIT well because requires Sort.

So when we use “= ANY (array of account numbers) … ORDER BY ...” the plan requires reading all records meeting
search criteria and then sort.

As a workaround we have to perform LATERAL joins:

unnest(list of account numbers) AS a(n) LATERAL JOIN (SELECT * FROM … WHERE account_number = a.n ORDER BY … LIMIT …)
ORDERBY … LIMIT … 

It is still bad because requires multiple scans of the same partition if account number hashes are the same.

What we really need is for Gist to support “= ANY (…)”.
As Gist index is extensible in terms of queries it supports it is quite easy to implement an operator class/family with
operator:

||= (text, text[])

that has semantics the same as “= ANY (…)”

With this operator we can write our queries like:

account_number ||= [list of account numbers] AND
account_number = ANY ([list of account numbers]) — redundant for partition pruning as it does not understand ||=

and have optimal plans:

Limit
— Merge Append
—— Index scan of relevant partitions

The problem is that now each partition scan is for the same list of accounts.
The “consistent” function cannot assume anything about contents of the table so it has to check all elements of the
list
and that in turn causes reading unnecessary index pages for accounts not in this partition.

What we need is a way for the “consistent” function to be able to pre-process input query array and remove elements
that are not relevant for this scan. To do that it is necessary to have enough information to read necessary metadata
fromthe catalog. 

The proposed patch addresses this need and seems (to me) largely uncontroversial as it does not break any existing
extensions.

Attached is a patch with consolidated changes (I am pretty new to managing patches so previous two were partial and not
somethingshareable, I am afraid). 

—
Michal

Attachment

pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: Possibility to disable `ALTER SYSTEM`
Next
From: Tom Lane
Date:
Subject: Re: Possibility to disable `ALTER SYSTEM`