Thread: Searching in varchar column having 100M records
My table is having data like below with 100M records (contains all dummy data). I am having btree index on column ("field").
While searching for any text from that column takes longer (more than 1 minute).
user Id | field |
d848f466-5e12-46e7-acf4-e12aff592241 | Northern Arkansas College |
24c32757-e6a8-4dbd-aac7-1efd867156ce | female |
6e225c57-c1d1-48a5-b9aa-513223efc81b | 1.0, 3.67, 3.67, 4.67, 7.0, 3.0 |
088c6342-a240-45a7-9d12-e0e707292031 | Weber |
b05088cf-cba6-4bd7-8f8f-1469226874d0 | addd#$e@aaa.com |
Table and index are created using following query.
create table fields(user_id varchar(64), field varchar(64));
CREATE INDEX index_field ON public.fields USING btree (field);
Search Query:
EXPLAIN (ANALYZE, BUFFERS) select * from fields where field='Mueller';
EXPLAIN (ANALYZE, BUFFERS) select * from fields where field='Mueller';
Bitmap Heap Scan on fields (cost=72.61..10069.32 rows=2586 width=55) (actual time=88.017..65358.548 rows=31882 loops=1)
Recheck Cond: ((field)::text = 'Mueller'::text)
Heap Blocks: exact=31403
Buffers: shared hit=2 read=31492
-> Bitmap Index Scan on index_field (cost=0.00..71.96 rows=2586 width=0) (actual time=55.960..55.960 rows=31882 loops=1)
Index Cond: ((field)::text = 'Mueller'::text)
Buffers: shared read=91
Planning Time: 0.331 ms
Execution Time: 65399.314 ms
Recheck Cond: ((field)::text = 'Mueller'::text)
Heap Blocks: exact=31403
Buffers: shared hit=2 read=31492
-> Bitmap Index Scan on index_field (cost=0.00..71.96 rows=2586 width=0) (actual time=55.960..55.960 rows=31882 loops=1)
Index Cond: ((field)::text = 'Mueller'::text)
Buffers: shared read=91
Planning Time: 0.331 ms
Execution Time: 65399.314 ms
Any suggestions for improvement?
Best Regards,
Mayank
Mayank
Hello Please recheck with track_io_timing = on in configuration. explain (analyze,buffers) with this option will report how manytime we spend during i/o > Buffers: shared hit=2 read=31492 31492 blocks / 65 sec ~ 480 IOPS, not bad if you are using HDD Your query reads table data from disks (well, or from OS cache). You need more RAM for shared_buffers or disks with betterperformance. regards, Sergei
On Wed, Jul 17, 2019 at 02:53:20PM +0300, Sergei Kornilov wrote: >Hello > >Please recheck with track_io_timing = on in configuration. explain >(analyze,buffers) with this option will report how many time we spend >during i/o > >> Buffers: shared hit=2 read=31492 > >31492 blocks / 65 sec ~ 480 IOPS, not bad if you are using HDD > >Your query reads table data from disks (well, or from OS cache). You need >more RAM for shared_buffers or disks with better performance. > Either that, or try creating a covering index, so that the query can do an index-only scan. That might reduce the amount of IO against the table, and in the index the data should be located close to each other (same page or pages close to each other). So try something like CREATE INDEX ios_idx ON table (field, user_id); and make sure the table is vacuumed often enough (so that the visibility map is up to date). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Am 17.07.19 um 14:48 schrieb Tomas Vondra: > Either that, or try creating a covering index, so that the query can > do an > index-only scan. That might reduce the amount of IO against the table, > and > in the index the data should be located close to each other (same page or > pages close to each other). > > So try something like > > CREATE INDEX ios_idx ON table (field, user_id); > > and make sure the table is vacuumed often enough (so that the visibility > map is up to date). yeah, and please don't use varchar(64), but instead UUID for the user_id - field to save space on disk and for faster comparison. Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
On Wed, Jul 17, 2019 at 4:04 AM mayank rupareliya <mayankjr03@gmail.com> wrote:
create table fields(user_id varchar(64), field varchar(64));CREATE INDEX index_field ON public.fields USING btree (field);Any suggestions for improvement?
Reduce the number of rows by constructing a relationally normalized data model.
David J.
On 17/07/2019 23:03, mayank rupareliya wrote: [...] > Table and index are created using following query. > > create table fields(user_id varchar(64), field varchar(64)); > CREATE INDEX index_field ON public.fields USING btree (field); [...] Any particular reason for using varchar instead of text, for field? Also, as Andreas pointed out, use UUID for the user_id. Cheers, Gavin
Please recheck with track_io_timing = on in configuration. explain (analyze,buffers) with this option will report how many time we spend during i/o> Buffers: shared hit=2 read=3149231492 blocks / 65 sec ~ 480 IOPS, not bad if you are using HDDYour query reads table data from disks (well, or from OS cache). You need more RAM for shared_buffers or disks with better performance.
Aggregate (cost=10075.78..10075.79 rows=1 width=8) (actual time=63088.198..63088.199 rows=1 loops=1)
Buffers: shared read=31089
I/O Timings: read=61334.252
-> Bitmap Heap Scan on fields (cost=72.61..10069.32 rows=2586 width=0) (actual time=69.509..63021.448 rows=31414 loops=1)
Recheck Cond: ((field)::text = 'Klein'::text)
Heap Blocks: exact=30999
Buffers: shared read=31089
I/O Timings: read=61334.252
-> Bitmap Index Scan on index_field (cost=0.00..71.96 rows=2586 width=0) (actual time=58.671..58.671 rows=31414 loops=1)
Index Cond: ((field)::text = 'Klein'::text)
Buffers: shared read=90
I/O Timings: read=45.316
Planning Time: 66.435 ms
Execution Time: 63088.774 ms
Buffers: shared read=31089
I/O Timings: read=61334.252
-> Bitmap Heap Scan on fields (cost=72.61..10069.32 rows=2586 width=0) (actual time=69.509..63021.448 rows=31414 loops=1)
Recheck Cond: ((field)::text = 'Klein'::text)
Heap Blocks: exact=30999
Buffers: shared read=31089
I/O Timings: read=61334.252
-> Bitmap Index Scan on index_field (cost=0.00..71.96 rows=2586 width=0) (actual time=58.671..58.671 rows=31414 loops=1)
Index Cond: ((field)::text = 'Klein'::text)
Buffers: shared read=90
I/O Timings: read=45.316
Planning Time: 66.435 ms
Execution Time: 63088.774 ms
Thanks Tomas... I tried this and result improved but not much.So try something likeCREATE INDEX ios_idx ON table (field, user_id);and make sure the table is vacuumed often enough (so that the visibilitymap is up to date).
Thanks Andreas, David, Gavin
Any particular reason for using varchar instead of text, for field? No
use UUID for the user_id. Agreed
Regards,
Mayank
Mayank
On Thu, Jul 18, 2019 at 4:25 AM Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
On 17/07/2019 23:03, mayank rupareliya wrote:
[...]
> Table and index are created using following query.
>
> create table fields(user_id varchar(64), field varchar(64));
> CREATE INDEX index_field ON public.fields USING btree (field);
[...]
Any particular reason for using varchar instead of text, for field?
Also, as Andreas pointed out, use UUID for the user_id.
Cheers,
Gavin
On Thu, Jul 18, 2019 at 05:21:49PM +0530, mayank rupareliya wrote: >*Please recheck with track_io_timing = on in configuration. explain >(analyze,buffers) with this option will report how many time we spend >during i/o* > >*> Buffers: shared hit=2 read=31492* > >*31492 blocks / 65 sec ~ 480 IOPS, not bad if you are using HDD* > >*Your query reads table data from disks (well, or from OS cache). You need >more RAM for shared_buffers or disks with better performance.* > > >Thanks Sergei.. >*track_io_timing = on helps.. Following is the result after changing that >config.* > >Aggregate (cost=10075.78..10075.79 rows=1 width=8) (actual >time=63088.198..63088.199 rows=1 loops=1) > Buffers: shared read=31089 > I/O Timings: read=61334.252 > -> Bitmap Heap Scan on fields (cost=72.61..10069.32 rows=2586 width=0) >(actual time=69.509..63021.448 rows=31414 loops=1) > Recheck Cond: ((field)::text = 'Klein'::text) > Heap Blocks: exact=30999 > Buffers: shared read=31089 > I/O Timings: read=61334.252 > -> Bitmap Index Scan on index_field (cost=0.00..71.96 rows=2586 >width=0) (actual time=58.671..58.671 rows=31414 loops=1) > Index Cond: ((field)::text = 'Klein'::text) > Buffers: shared read=90 > I/O Timings: read=45.316 >Planning Time: 66.435 ms >Execution Time: 63088.774 ms > How did that help? It only gives you insight that it's really the I/O that takes time. You need to reduce that, somehow. > >*So try something like* > >* CREATE INDEX ios_idx ON table (field, user_id);* > >*and make sure the table is vacuumed often enough (so that the visibility* >*map is up to date).* > >Thanks Tomas... I tried this and result improved but not much. > Well, you haven't shown us the execution plan, so it's hard to check why it did not help much and give you further advice. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 18/07/2019 23:51, mayank rupareliya wrote: [...] > Thanks Andreas, David, Gavin > > /Any particular reason for using varchar instead of text, for field?/ No > > use UUID for the user_id.Agreed /[...]/ /Use of text is preferred, but I can't see it making any significant difference to performance -- but I could be wrong!/ /Cheers, Gavin /
Well, you haven't shown us the execution plan, so it's hard to check whyit did not help much and give you further advice.
This is the latest query execution with explain after adding indexing on both columns.
Aggregate (cost=174173.57..174173.58 rows=1 width=8) (actual time=65087.657..65087.658 rows=1 loops=1)
-> Bitmap Heap Scan on fields (cost=1382.56..174042.61 rows=52386 width=0) (actual time=160.340..65024.533 rows=31857 loops=1)
Recheck Cond: ((field)::text = 'Champlin'::text)
Heap Blocks: exact=31433
-> Bitmap Index Scan on index_field (cost=0.00..1369.46 rows=52386 width=0) (actual time=125.078..125.079 rows=31857 loops=1)
Index Cond: ((field)::text = 'Champlin'::text)
Planning Time: 8.595 ms
Execution Time: 65093.508 ms
-> Bitmap Heap Scan on fields (cost=1382.56..174042.61 rows=52386 width=0) (actual time=160.340..65024.533 rows=31857 loops=1)
Recheck Cond: ((field)::text = 'Champlin'::text)
Heap Blocks: exact=31433
-> Bitmap Index Scan on index_field (cost=0.00..1369.46 rows=52386 width=0) (actual time=125.078..125.079 rows=31857 loops=1)
Index Cond: ((field)::text = 'Champlin'::text)
Planning Time: 8.595 ms
Execution Time: 65093.508 ms
On Thu, Jul 18, 2019 at 6:11 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Thu, Jul 18, 2019 at 05:21:49PM +0530, mayank rupareliya wrote:
>*Please recheck with track_io_timing = on in configuration. explain
>(analyze,buffers) with this option will report how many time we spend
>during i/o*
>
>*> Buffers: shared hit=2 read=31492*
>
>*31492 blocks / 65 sec ~ 480 IOPS, not bad if you are using HDD*
>
>*Your query reads table data from disks (well, or from OS cache). You need
>more RAM for shared_buffers or disks with better performance.*
>
>
>Thanks Sergei..
>*track_io_timing = on helps.. Following is the result after changing that
>config.*
>
>Aggregate (cost=10075.78..10075.79 rows=1 width=8) (actual
>time=63088.198..63088.199 rows=1 loops=1)
> Buffers: shared read=31089
> I/O Timings: read=61334.252
> -> Bitmap Heap Scan on fields (cost=72.61..10069.32 rows=2586 width=0)
>(actual time=69.509..63021.448 rows=31414 loops=1)
> Recheck Cond: ((field)::text = 'Klein'::text)
> Heap Blocks: exact=30999
> Buffers: shared read=31089
> I/O Timings: read=61334.252
> -> Bitmap Index Scan on index_field (cost=0.00..71.96 rows=2586
>width=0) (actual time=58.671..58.671 rows=31414 loops=1)
> Index Cond: ((field)::text = 'Klein'::text)
> Buffers: shared read=90
> I/O Timings: read=45.316
>Planning Time: 66.435 ms
>Execution Time: 63088.774 ms
>
How did that help? It only gives you insight that it's really the I/O that
takes time. You need to reduce that, somehow.
>
>*So try something like*
>
>* CREATE INDEX ios_idx ON table (field, user_id);*
>
>*and make sure the table is vacuumed often enough (so that the visibility*
>*map is up to date).*
>
>Thanks Tomas... I tried this and result improved but not much.
>
Well, you haven't shown us the execution plan, so it's hard to check why
it did not help much and give you further advice.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jul 19, 2019 at 8:13 AM mayank rupareliya <mayankjr03@gmail.com> wrote:
Well, you haven't shown us the execution plan, so it's hard to check whyit did not help much and give you further advice.This is the latest query execution with explain after adding indexing on both columns.Aggregate (cost=174173.57..174173.58 rows=1 width=8) (actual time=65087.657..65087.658 rows=1 loops=1)
-> Bitmap Heap Scan on fields (cost=1382.56..174042.61 rows=52386 width=0) (actual time=160.340..65024.533 rows=31857 loops=1)
Recheck Cond: ((field)::text = 'Champlin'::text)
Heap Blocks: exact=31433
-> Bitmap Index Scan on index_field (cost=0.00..1369.46 rows=52386 width=0) (actual time=125.078..125.079 rows=31857 loops=1)
Index Cond: ((field)::text = 'Champlin'::text)
Planning Time: 8.595 ms
Execution Time: 65093.508 ms
Are you on a solid state drive? If so, have you tried setting effective_io_concurrency to 200 or 300 and checking performance? Given nearly all of the execution time is doing a bitmap heap scan, I wonder about adjusting this.
"The allowed range is 1 to 1000, or zero to disable issuance of asynchronous I/O requests. Currently, this setting only affects bitmap heap scans."
"The default is 1 on supported systems, otherwise 0. "
On Fri, Jul 19, 2019 at 07:43:26PM +0530, mayank rupareliya wrote: >Well, you haven't shown us the execution plan, so it's hard to check why >it did not help much and give you further advice. > > >This is the latest query execution with explain after adding indexing on >both columns. > >Aggregate (cost=174173.57..174173.58 rows=1 width=8) (actual >time=65087.657..65087.658 rows=1 loops=1) > -> Bitmap Heap Scan on fields (cost=1382.56..174042.61 rows=52386 >width=0) (actual time=160.340..65024.533 rows=31857 loops=1) > Recheck Cond: ((field)::text = 'Champlin'::text) > Heap Blocks: exact=31433 > -> Bitmap Index Scan on index_field (cost=0.00..1369.46 >rows=52386 width=0) (actual time=125.078..125.079 rows=31857 loops=1) > Index Cond: ((field)::text = 'Champlin'::text) >Planning Time: 8.595 ms >Execution Time: 65093.508 ms > That very clearly does not use the index-only scan, so it's not surprising it's not any faster. You need to find out why the planner makes that decision. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Another suggestion, try to cluster the table using the index for the "field" column, then analyze. If you're on a spinning disk it will help if you sort your search "field" during bulk insert.
--
regards
marie g. bacuno ii
regards
marie g. bacuno ii
On Fri, Jul 19, 2019 at 11:39 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Fri, Jul 19, 2019 at 07:43:26PM +0530, mayank rupareliya wrote:
>Well, you haven't shown us the execution plan, so it's hard to check why
>it did not help much and give you further advice.
>
>
>This is the latest query execution with explain after adding indexing on
>both columns.
>
>Aggregate (cost=174173.57..174173.58 rows=1 width=8) (actual
>time=65087.657..65087.658 rows=1 loops=1)
> -> Bitmap Heap Scan on fields (cost=1382.56..174042.61 rows=52386
>width=0) (actual time=160.340..65024.533 rows=31857 loops=1)
> Recheck Cond: ((field)::text = 'Champlin'::text)
> Heap Blocks: exact=31433
> -> Bitmap Index Scan on index_field (cost=0.00..1369.46
>rows=52386 width=0) (actual time=125.078..125.079 rows=31857 loops=1)
> Index Cond: ((field)::text = 'Champlin'::text)
>Planning Time: 8.595 ms
>Execution Time: 65093.508 ms
>
That very clearly does not use the index-only scan, so it's not
surprising it's not any faster. You need to find out why the planner
makes that decision.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services