Thread: need suggestion on querying big tables
Ebin
Hi Team,in postgresql 11 version we have two tables of size 435 GB and 347 GB.if we query on single table or if we are doing inner join on both the big tables, it is not displacing any output, it keeps runningWe can see the wait event is IO and directDatafile .Server Spec : 8 cores and 64GB RAMPG config : 53 GB(effective_cache), 12 GB(shared buffer)can you please suggest some ideas , how we can query on big tables and fasten them to get the output??Thanks & Regards,
Ebin
There are basically two "secrets" to performance in this case:
- make sure that the join operation uses indexes
- make sure that you have enough work_mem available
work_mem can be set in postgresql.conf, but also in every session individually. So don't set it too high generally, because it will be allocated for every sorting and hashing operation.
Kind Regards,
Holger
-- Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
Attachment
Am 05.12.22 um 07:29 schrieb Ebin Jozer:Hi Team,in postgresql 11 version we have two tables of size 435 GB and 347 GB.if we query on single table or if we are doing inner join on both the big tables, it is not displacing any output, it keeps runningWe can see the wait event is IO and directDatafile .Server Spec : 8 cores and 64GB RAMPG config : 53 GB(effective_cache), 12 GB(shared buffer)can you please suggest some ideas , how we can query on big tables and fasten them to get the output??Thanks & Regards,
EbinThere are basically two "secrets" to performance in this case:
- make sure that the join operation uses indexes
- make sure that you have enough work_mem available
work_mem can be set in postgresql.conf, but also in every session individually. So don't set it too high generally, because it will be allocated for every sorting and hashing operation.
Kind Regards,
Holger
-- Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
On 05/12/2022 12:14, Bo Guo wrote: > Partitioning the table if possible should also help. This really only helps if the partitioning is relevant to the query. Otherwise it can slow things down. Eg, if you partition on date, and search on date, the partitioning will probably help, but if you partition on date, and search on name, it'll be no quicker, and will possibly be slower. Paul -- Paul Smith Computer Services Tel: 01484 855800 Vat No: GB 685 6987 53 Sign up for news & updates at http://www.pscs.co.uk/go/subscribe
Am 05.12.22 um 07:29 schrieb Ebin Jozer:Hi Team,in postgresql 11 version we have two tables of size 435 GB and 347 GB.if we query on single table or if we are doing inner join on both the big tables, it is not displacing any output, it keeps runningWe can see the wait event is IO and directDatafile .Server Spec : 8 cores and 64GB RAMPG config : 53 GB(effective_cache), 12 GB(shared buffer)can you please suggest some ideas , how we can query on big tables and fasten them to get the output??Thanks & Regards,
EbinThere are basically two "secrets" to performance in this case:
- make sure that the join operation uses indexes
- make sure that you have enough work_mem available
work_mem can be set in postgresql.conf, but also in every session individually. So don't set it too high generally, because it will be allocated for every sorting and hashing operation.
A "just smart enough" user could perform an inadvertent Denial of Service attack on the database by cranking his session work_mem really high, no?
Angular momentum makes the world go 'round.
On Dec 5, 2022, at 12:29 AM, Ebin Jozer <ebinjozer@gmail.com> wrote:Hi Team,in postgresql 11 version we have two tables of size 435 GB and 347 GB.if we query on single table or if we are doing inner join on both the big tables, it is not displacing any output, it keeps runningWe can see the wait event is IO and directDatafile .Server Spec : 8 cores and 64GB RAMPG config : 53 GB(effective_cache), 12 GB(shared buffer)can you please suggest some ideas , how we can query on big tables and fasten them to get the output??Thanks & Regards,
Ebin
Hi Team,in postgresql 11 version we have two tables of size 435 GB and 347 GB.if we query on single table or if we are doing inner join on both the big tables, it is not displacing any output, it keeps runningWe can see the wait event is IO and directDatafile .Server Spec : 8 cores and 64GB RAMPG config : 53 GB(effective_cache), 12 GB(shared buffer)can you please suggest some ideas , how we can query on big tables and fasten them to get the output??Thanks & Regards,
Ebin
The first thing you should realize is that your server is really small. Things like NVME, more memory and more cores should help. It's a business decision. If your application is important to your business, you should invest some money in the appropriate hardware. You should also consider partitioning both tables and upgrading to at least Postgres 14. The fear of the latest version is understandable and I feel it myself. Newer versions behave much better with respect to partitioning. Be aware, though, that you will have to add partitioning key as a leading column to any global index. Also, parallelism in a query helps.
64 GB and 8 cores is a desktop class machine. Here is my desktop machine:
[mgogala@umajor ~]$ tail -20 /proc/cpuinfo
cache size : 512 KB
physical id : 0
siblings : 16
core id : 7
cpu cores : 8
apicid : 15
initial apicid : 15
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es
bugs : sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed
bogomips : 7385.36
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
[mgogala@umajor ~]$ head /proc/meminfo
MemTotal: 65781164 kB
MemFree: 48364252 kB
MemAvailable: 47110532 kB
Buffers: 14812 kB
Cached: 10199584 kB
SwapCached: 0 kB
Active: 3949732 kB
Inactive: 9434600 kB
Active(anon): 4304 kB
Inactive(anon): 3211636 kB
It's faster than your DB server and it has a NVME drive. And this is a $2K machine from the late 2018. Today, I would go for a DDR5 machine, with at least 64 cores, 1TB RAM and at least 20TB of NVME drives. Such a machine would cost around $50K.
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com
A "just smart enough" user could perform an inadvertent Denial of Service attack on the database by cranking his session work_mem really high, no?
Well, some of the applications that I've seen during my career could qualify as Denial of Service suicide attack.
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com
On 12/5/22 09:45, Ron wrote:A "just smart enough" user could perform an inadvertent Denial of Service attack on the database by cranking his session work_mem really high, no?Well, some of the applications that I've seen during my career could qualify as Denial of Service suicide attack.
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com
Hi Team,in postgresql 11 version we have two tables of size 435 GB and 347 GB.if we query on single table or if we are doing inner join on both the big tables, it is not displacing any output, it keeps runningWe can see the wait event is IO and directDatafile .Server Spec : 8 cores and 64GB RAMPG config : 53 GB(effective_cache), 12 GB(shared buffer)can you please suggest some ideas , how we can query on big tables and fasten them to get the output??
https://www.postgresql.org/docs/11/runtime-config-query.html
effective_cache_size is how much RAM a single query gets. "When setting this parameter you should consider both PostgreSQL's shared buffers".
https://www.postgresql.org/docs/11/runtime-config-resource.html
"a reasonable starting value for shared_buffers is 25% of the memory in your system. There are some workloads where even larger settings for shared_buffers are effective, but because PostgreSQL also relies on the operating system cache, it is unlikely that an allocation of more than 40% of RAM to shared_buffers will work better than a smaller amount."
You've set effective_cache_size to 83%, when it should be at most 20%.
Angular momentum makes the world go 'round.
Ron schrieb am 06.12.2022 um 15:51: >> We can see the wait event is IO and directDatafile . >> >> Server Spec : 8 cores and 64GB RAM PG config : 53 >> GB(effective_cache), 12 GB(shared buffer) >> >> can you please suggest some ideas , how we can query on big tables >> and fasten them to get the output?? > > https://www.postgresql.org/docs/11/runtime-config-query.html > > effective_cache_size is how much RAM *a single* query gets. "When > setting this parameter you should consider both *PostgreSQL's shared > buffers*". > > https://www.postgresql.org/docs/11/runtime-config-resource.html > > "a reasonable starting value for *shared_buffers is 25% of the > memory* in your system. There are some workloads where even larger > settings for shared_buffers are effective, but because PostgreSQL > also relies on the operating system cache, *it is unlikely that an > allocation of more than 40% of RAM to shared_buffers will work > better* than a smaller amount." > > You've set effective_cache_size to 83%, when it should be at most > 20%. effective_cache_size does not allocate any memory. It's not a setting that controls "how much memory a query gets" effective_cache_size merely a hint to the optimizer on the expected amount of data that is likely to be in the cache. Either the file system cache or shared_buffers. The current setting of 12GB for shared_buffers is absolutely inside the recommended limits. (40% would be about 25GB)
Ron <ronljohnsonjr@gmail.com> writes: > You've set effective_cache_size to 83%, when it should be at most 20%. I think you're confusing effective_cache_size with the size of shared buffers. It's meant to be an estimate of the total disk cache space available, in both shared buffers and OS caches. In any case, it's a fairly spongy parameter. It doesn't correspond to any allocation actually made or requested in Postgres; it just serves to tweak the planner's guesses about how expensive repeated indexscans are likely to be. 83% of available RAM might be too much, but it's probably not causing any real problems. regards, tom lane
On 12/6/22 09:03, Tom Lane wrote: > Ron <ronljohnsonjr@gmail.com> writes: >> You've set effective_cache_size to 83%, when it should be at most 20%. > I think you're confusing effective_cache_size with the size of > shared buffers. It's meant to be an estimate of the total disk > cache space available, in both shared buffers and OS caches. > In any case, it's a fairly spongy parameter. It doesn't correspond > to any allocation actually made or requested in Postgres; it just > serves to tweak the planner's guesses about how expensive repeated > indexscans are likely to be. 83% of available RAM might be too > much, but it's probably not causing any real problems. The docs say that ECS is an estimate of how much cache one query can use, no? If more than one query is running on a server at a time, then wouldn't a huge ECS confuse things? -- Angular momentum makes the world go 'round.
> On Dec 6, 2022, at 8:51 AM, Ron <ronljohnsonjr@gmail.com> wrote: > > https://www.postgresql.org/docs/11/runtime-config-query.html > > effective_cache_size is how much RAM a single query gets. "When setting this parameter you should consider both PostgreSQL'sshared buffers". > > https://www.postgresql.org/docs/11/runtime-config-resource.html > > "a reasonable starting value for shared_buffers is 25% of the memory in your system. There are some workloads where evenlarger settings for shared_buffers are effective, but because PostgreSQL also relies on the operating system cache, itis unlikely that an allocation of more than 40% of RAM to shared_buffers will work better than a smaller amount." > > You've set effective_cache_size to 83%, when it should be at most 20%. effective_cache_size is a hint as to how much RAM is available for the operating system file cache after all other uses aretaken into account.
Samed YILDIRIM
Hi Team,in postgresql 11 version we have two tables of size 435 GB and 347 GB.if we query on single table or if we are doing inner join on both the big tables, it is not displacing any output, it keeps runningWe can see the wait event is IO and directDatafile .Server Spec : 8 cores and 64GB RAMPG config : 53 GB(effective_cache), 12 GB(shared buffer)can you please suggest some ideas , how we can query on big tables and fasten them to get the output??Thanks & Regards,
Ebin