Thread: What's needed for cache-only table scan?

What's needed for cache-only table scan?

From

Kohei KaiGai

Date:

12 November 2013, 14:45:50

Hello,

It is a brief design proposal of a feature I'd like to implement on top of
custom-scan APIs. Because it (probably) requires a few additional base
features not only custom-scan, I'd like to see feedback from the hackers.

The cache-only table scan, being in subject line, is an alternative scan
logic towards sequential scan if all the referenced columns are cached.
It shall allow to scan a particular table without storage access, thus
make scan performance improved.
So what? Which is difference from large shared_buffers configuration?
This mechanism intends to cache a part of columns being referenced
in the query, not whole of the records. It makes sense to the workloads
that scan a table with many columns but qualifier references just a few
columns, typically used to analytic queries, because it enables to
reduce memory consumption to be cached, thus more number of records
can be cached.
In addition, it has another role from my standpoint. It also performs as
fast data supplier towards GPU/MIC devices. When we move data to
GPU device, the source address has to be a region marked as "page-
locked" that is exempted from concurrent swap out, if we want CUDA
or OpenCL to run asynchronous DMA transfer mode; the fastest one.

Probably, here is no problem on construction of this table cache.
All we need to do is inject a custom-scan node instead of seq-scan,
then it can construct table cache in concurrence with regular seq-
scan, even though the first access become a little bit slow down.

My concern is how to handle a case when table gets modified.
A straightforward idea is that each cached entries being modified
shall be invalidated by callback mechanism.
Trigger can help in case of INSERT, UPDATE, DELETE and
TRUNCATE. Furthermore, it's better if extension could inject
its own trigger definition at RelationBuildTriggers() on the fly,
to perform the feature transparently.
On the other hand, we have no way to get control around VACUUM.
I want to have a hook that allows extensions to get control when
a page got vacuumed. Once we have such a hook, it enables to
invalidate cached entries being indexed by tid, but already vacuumed.
Its best location is probably lazy_scan_heap() to call back extension
for each page getting vacuumed, with

How about your opinion?

I'd like to find out the best way to implement this table-caching
mechanism within scope of v9.4 functionality set.
Any ideas, comments or suggestions are welcome.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: What's needed for cache-only table scan?

From

Tom Lane

Date:

12 November 2013, 15:00:57

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
> The cache-only table scan, being in subject line, is an alternative scan
> logic towards sequential scan if all the referenced columns are cached.

This seems like a pretty dubious idea to me --- you're talking about
expending large amounts of memory on a single-purpose cache, without
any clear way to know if the data will ever be used before it gets
invalidated (and the life expectancy of the cache doesn't sound like
it'd be very high either).

> I'd like to find out the best way to implement this table-caching
> mechanism within scope of v9.4 functionality set.

There's no possible way you'll finish this for 9.4.  Keep in mind that
CF3 starts Friday, and CF4 starts 2014-01-15, and project policy is that
major feature patches should arrive (in at least rough form) by CF3.
Even ignoring that policy, this is more than 2 months worth of work.
        regards, tom lane

Re: What's needed for cache-only table scan?

From

Robert Haas

Date:

12 November 2013, 15:20:27

On Tue, Nov 12, 2013 at 9:45 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> It is a brief design proposal of a feature I'd like to implement on top of
> custom-scan APIs. Because it (probably) requires a few additional base
> features not only custom-scan, I'd like to see feedback from the hackers.
>
> The cache-only table scan, being in subject line, is an alternative scan
> logic towards sequential scan if all the referenced columns are cached.
> It shall allow to scan a particular table without storage access, thus
> make scan performance improved.
> So what? Which is difference from large shared_buffers configuration?
> This mechanism intends to cache a part of columns being referenced
> in the query, not whole of the records. It makes sense to the workloads
> that scan a table with many columns but qualifier references just a few
> columns, typically used to analytic queries, because it enables to
> reduce memory consumption to be cached, thus more number of records
> can be cached.
> In addition, it has another role from my standpoint. It also performs as
> fast data supplier towards GPU/MIC devices. When we move data to
> GPU device, the source address has to be a region marked as "page-
> locked" that is exempted from concurrent swap out, if we want CUDA
> or OpenCL to run asynchronous DMA transfer mode; the fastest one.
>
> Probably, here is no problem on construction of this table cache.
> All we need to do is inject a custom-scan node instead of seq-scan,
> then it can construct table cache in concurrence with regular seq-
> scan, even though the first access become a little bit slow down.
>
> My concern is how to handle a case when table gets modified.
> A straightforward idea is that each cached entries being modified
> shall be invalidated by callback mechanism.
> Trigger can help in case of INSERT, UPDATE, DELETE and
> TRUNCATE. Furthermore, it's better if extension could inject
> its own trigger definition at RelationBuildTriggers() on the fly,
> to perform the feature transparently.
> On the other hand, we have no way to get control around VACUUM.
> I want to have a hook that allows extensions to get control when
> a page got vacuumed. Once we have such a hook, it enables to
> invalidate cached entries being indexed by tid, but already vacuumed.
> Its best location is probably lazy_scan_heap() to call back extension
> for each page getting vacuumed, with
>
> How about your opinion?

I think it's hard to say in the abstract.  I'd suggest adding the
hooks you feel like you need and see what the resulting patch looks
like.  Then post that and we can talk about how (and whether) to do it
better.  My personal bet is that triggers are the wrong way to do
something like this; I'd look to do it all with hooks.  Of course,
figuring how to position those hooks so that they are maintainable and
don't affect performance when not used is the tricky part.

> I'd like to find out the best way to implement this table-caching
> mechanism within scope of v9.4 functionality set.
> Any ideas, comments or suggestions are welcome.

I think that Tom is right: there's not time to get something like this
done for 9.4.  If you start working relatively soon, I think you could
hope to have a good proposal in time for 9.5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: What's needed for cache-only table scan?

From

Kohei KaiGai

Date:

12 November 2013, 15:53:34

2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>:
> Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
>> The cache-only table scan, being in subject line, is an alternative scan
>> logic towards sequential scan if all the referenced columns are cached.
>
> This seems like a pretty dubious idea to me --- you're talking about
> expending large amounts of memory on a single-purpose cache, without
> any clear way to know if the data will ever be used before it gets
> invalidated (and the life expectancy of the cache doesn't sound like
> it'd be very high either).
>
Thanks for your comments. I assume this table-cache shall be applied
on tables that holds data set for analytic workloads. Even though it
may consume large amount of memory, it will hit its major workload
in this use scenario.
Probably, it will have upper limit of cache memory usage, and need
a mechanism to reclaim the cache block from the oldest one. Here
is nothing special. Also, even I call it "table cache", it is similar to
in-memory database being supported by rich memory hardware.

>> I'd like to find out the best way to implement this table-caching
>> mechanism within scope of v9.4 functionality set.
>
> There's no possible way you'll finish this for 9.4.  Keep in mind that
> CF3 starts Friday, and CF4 starts 2014-01-15, and project policy is that
> major feature patches should arrive (in at least rough form) by CF3.
> Even ignoring that policy, this is more than 2 months worth of work.
>
Yes, I understand it is not possible to submit whole of the patch until
CF3 deadline. So, I'd like to find out a way to implement it as an
extension using facilities being supported or to be enhanced on v9.4.

Best regards,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: What's needed for cache-only table scan?

From

Kohei KaiGai

Date:

12 November 2013, 16:13:29

2013/11/12 Robert Haas <robertmhaas@gmail.com>:
> On Tue, Nov 12, 2013 at 9:45 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> It is a brief design proposal of a feature I'd like to implement on top of
>> custom-scan APIs. Because it (probably) requires a few additional base
>> features not only custom-scan, I'd like to see feedback from the hackers.
>>
>> The cache-only table scan, being in subject line, is an alternative scan
>> logic towards sequential scan if all the referenced columns are cached.
>> It shall allow to scan a particular table without storage access, thus
>> make scan performance improved.
>> So what? Which is difference from large shared_buffers configuration?
>> This mechanism intends to cache a part of columns being referenced
>> in the query, not whole of the records. It makes sense to the workloads
>> that scan a table with many columns but qualifier references just a few
>> columns, typically used to analytic queries, because it enables to
>> reduce memory consumption to be cached, thus more number of records
>> can be cached.
>> In addition, it has another role from my standpoint. It also performs as
>> fast data supplier towards GPU/MIC devices. When we move data to
>> GPU device, the source address has to be a region marked as "page-
>> locked" that is exempted from concurrent swap out, if we want CUDA
>> or OpenCL to run asynchronous DMA transfer mode; the fastest one.
>>
>> Probably, here is no problem on construction of this table cache.
>> All we need to do is inject a custom-scan node instead of seq-scan,
>> then it can construct table cache in concurrence with regular seq-
>> scan, even though the first access become a little bit slow down.
>>
>> My concern is how to handle a case when table gets modified.
>> A straightforward idea is that each cached entries being modified
>> shall be invalidated by callback mechanism.
>> Trigger can help in case of INSERT, UPDATE, DELETE and
>> TRUNCATE. Furthermore, it's better if extension could inject
>> its own trigger definition at RelationBuildTriggers() on the fly,
>> to perform the feature transparently.
>> On the other hand, we have no way to get control around VACUUM.
>> I want to have a hook that allows extensions to get control when
>> a page got vacuumed. Once we have such a hook, it enables to
>> invalidate cached entries being indexed by tid, but already vacuumed.
>> Its best location is probably lazy_scan_heap() to call back extension
>> for each page getting vacuumed, with
>>
>> How about your opinion?
>
> I think it's hard to say in the abstract.  I'd suggest adding the
> hooks you feel like you need and see what the resulting patch looks
> like.  Then post that and we can talk about how (and whether) to do it
> better.  My personal bet is that triggers are the wrong way to do
> something like this; I'd look to do it all with hooks.  Of course,
> figuring how to position those hooks so that they are maintainable and
> don't affect performance when not used is the tricky part.
>
Yep, right now, all the idea is still in my brain, so it takes additional
times to make up it as a patch form.
The reason why I'm inclined to use trigger is, it is already supported
on the place I want to get control. So, it enables to omit efforts to
add and maintenance similar features in the limited time slot.

>> I'd like to find out the best way to implement this table-caching
>> mechanism within scope of v9.4 functionality set.
>> Any ideas, comments or suggestions are welcome.
>
> I think that Tom is right: there's not time to get something like this
> done for 9.4.  If you start working relatively soon, I think you could
> hope to have a good proposal in time for 9.5.
>
Yes, I agree his suggestion is realistic. It is nearly impossible to
get whole of the table-cache mechanism by CF3 deadline this week.
So, I assume to implement it as extension using facilities in v9.4.

It seems to me the last piece to implement this feature (of course,
custom-scan is pre-requisite) is a hook in vacuum.
Does it make sense to propose something other worth but small
contrib module that utilizes this last piece? For example, a contrib
module that reports progress of vacuum...

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: What's needed for cache-only table scan?

From

Tom Lane

Date:

12 November 2013, 16:29:59

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
> 2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>:
>> There's no possible way you'll finish this for 9.4.

> Yes, I understand it is not possible to submit whole of the patch until
> CF3 deadline. So, I'd like to find out a way to implement it as an
> extension using facilities being supported or to be enhanced on v9.4.

Oh!  Okay, I misunderstood the context --- you meant this as an example
use-case for the custom plan feature, right?  Makes more sense now.

I'm still dubious that it'd actually be very useful in itself, but it
seems reasonable as a test case to make sure that a set of hooks for
custom plans are sufficient to do something useful with.
        regards, tom lane

Re: What's needed for cache-only table scan?

From

Claudio Freire

Date:

12 November 2013, 16:45:50

On Tue, Nov 12, 2013 at 11:45 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> Hello,
>
> It is a brief design proposal of a feature I'd like to implement on top of
> custom-scan APIs. Because it (probably) requires a few additional base
> features not only custom-scan, I'd like to see feedback from the hackers.
>
> The cache-only table scan, being in subject line, is an alternative scan
> logic towards sequential scan if all the referenced columns are cached.
> It shall allow to scan a particular table without storage access, thus
> make scan performance improved.
> So what? Which is difference from large shared_buffers configuration?
> This mechanism intends to cache a part of columns being referenced
> in the query, not whole of the records. It makes sense to the workloads
> that scan a table with many columns but qualifier references just a few
> columns, typically used to analytic queries, because it enables to
> reduce memory consumption to be cached, thus more number of records
> can be cached.
> In addition, it has another role from my standpoint. It also performs as
> fast data supplier towards GPU/MIC devices. When we move data to
> GPU device, the source address has to be a region marked as "page-
> locked" that is exempted from concurrent swap out, if we want CUDA
> or OpenCL to run asynchronous DMA transfer mode; the fastest one.


Wouldn't a columnar heap format be a better solution to this?

Re: What's needed for cache-only table scan?

From

Kohei KaiGai

Date:

12 November 2013, 17:12:28

2013/11/12 Claudio Freire <klaussfreire@gmail.com>:
> On Tue, Nov 12, 2013 at 11:45 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>> Hello,
>>
>> It is a brief design proposal of a feature I'd like to implement on top of
>> custom-scan APIs. Because it (probably) requires a few additional base
>> features not only custom-scan, I'd like to see feedback from the hackers.
>>
>> The cache-only table scan, being in subject line, is an alternative scan
>> logic towards sequential scan if all the referenced columns are cached.
>> It shall allow to scan a particular table without storage access, thus
>> make scan performance improved.
>> So what? Which is difference from large shared_buffers configuration?
>> This mechanism intends to cache a part of columns being referenced
>> in the query, not whole of the records. It makes sense to the workloads
>> that scan a table with many columns but qualifier references just a few
>> columns, typically used to analytic queries, because it enables to
>> reduce memory consumption to be cached, thus more number of records
>> can be cached.
>> In addition, it has another role from my standpoint. It also performs as
>> fast data supplier towards GPU/MIC devices. When we move data to
>> GPU device, the source address has to be a region marked as "page-
>> locked" that is exempted from concurrent swap out, if we want CUDA
>> or OpenCL to run asynchronous DMA transfer mode; the fastest one.
>
>
> Wouldn't a columnar heap format be a better solution to this?
>
I've implemented using FDW, however, it requires application adjust its SQL
to replace "CREATE TABLE" by "CREATE FOREIGN TABLE". In addition,
it lost a good feature of regular heap, like index scan if its cost is smaller
than sequential columnar scan.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: What's needed for cache-only table scan?

From

Kohei KaiGai

Date:

12 November 2013, 17:23:05

2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>:
> Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
>> 2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>:
>>> There's no possible way you'll finish this for 9.4.
>
>> Yes, I understand it is not possible to submit whole of the patch until
>> CF3 deadline. So, I'd like to find out a way to implement it as an
>> extension using facilities being supported or to be enhanced on v9.4.
>
> Oh!  Okay, I misunderstood the context --- you meant this as an example
> use-case for the custom plan feature, right?  Makes more sense now.
>
> I'm still dubious that it'd actually be very useful in itself, but it
> seems reasonable as a test case to make sure that a set of hooks for
> custom plans are sufficient to do something useful with.
>
Yes. I intend to put most of this table-caching feature on the custom-scan
APIs set, even though it may take additional hooks due to its nature,
independent from planner and executor structure.

So, are you thinking it is a feasible approach to focus on custom-scan
APIs during the upcoming CF3, then table-caching feature as use-case
of this APIs on CF4?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: What's needed for cache-only table scan?

From

Tom Lane

Date:

12 November 2013, 17:35:24

Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
> So, are you thinking it is a feasible approach to focus on custom-scan
> APIs during the upcoming CF3, then table-caching feature as use-case
> of this APIs on CF4?

Sure.  If you work on this extension after CF3, and it reveals that the
custom scan stuff needs some adjustments, there would be time to do that
in CF4.  The policy about what can be submitted in CF4 is that we don't
want new major features that no one has seen before, not that you can't
make fixes to previously submitted stuff.  Something like a new hook
in vacuum wouldn't be a "major feature", anyway.
        regards, tom lane

Re: What's needed for cache-only table scan?

From

Kohei KaiGai

Date:

12 November 2013, 19:19:40

2013/11/12 Tom Lane <tgl@sss.pgh.pa.us>:
> Kohei KaiGai <kaigai@kaigai.gr.jp> writes:
>> So, are you thinking it is a feasible approach to focus on custom-scan
>> APIs during the upcoming CF3, then table-caching feature as use-case
>> of this APIs on CF4?
>
> Sure.  If you work on this extension after CF3, and it reveals that the
> custom scan stuff needs some adjustments, there would be time to do that
> in CF4.  The policy about what can be submitted in CF4 is that we don't
> want new major features that no one has seen before, not that you can't
> make fixes to previously submitted stuff.  Something like a new hook
> in vacuum wouldn't be a "major feature", anyway.
>
Thanks for this clarification.
3 days are too short to write a patch, however, 2 month may be sufficient
to develop a feature on top of the scheme being discussed in the previous
comitfest.

Best regards,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>