codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sporadic high IO bandwidth and Linux OOM killer


Hi,

To be honest I've never seen the OOM in action on those instances. My Xmx was 8GB just like yours and that let me think you have some process that is competing for memory, is it? Do you have any cron, any backup, anything that can trick the OOMKiller ?

My unresponsiveness was seconds long. This is/was bad becasue gossip protocol was going crazy by marking nodes down and all the consequences this can lead in distributed system, think about hints, dynamic snitch, and whatever depends on node availability ...
Can you share some number about your `tpstats` or system load in general?


No rollbacks, just moving forward! Right now we are upgrading the instance size to something more recent than m1.xlarge (for many different reasons, including security, ECU and network).Nevertheless it might be a good idea to upgrade to the 3.X branch to leverage on better off-heap memory management.

Best,


On Thu, Dec 6, 2018 at 2:33 PM Oleksandr Shulgin <oleksandr.shulgin@xxxxxxxxxx> wrote:
On Thu, Dec 6, 2018 at 11:14 AM Riccardo Ferrari <ferrarir@xxxxxxxxx> wrote:

I had few instances in the past that were showing that unresponsivveness behaviour. Back then I saw with iotop/htop/dstat ... the system was stuck on a single thread processing (full throttle) for seconds. According to iotop that was the kswapd0 process. That system was an ubuntu 16.04 actually "Ubuntu 16.04.4 LTS".

Riccardo,

Did you by chance also observe Linux OOM?  How long did the unresponsiveness last in your case?

From there I started to dig what kswap process was involved in a system with no swap and found that is used for mmapping. This erratic (allow me to say erratic) behaviour was not showing up when I was on 3.0.6 but started to right after upgrading to 3.0.17.

By "load" I refer to the load as reported by the `nodetool status`. On my systems, when disk_access_mode is auto (read mmap), it is the sum of the node load plus the jmv heap size. Of course this is just what I noted on my systems not really sure if that should be the case on yours too.

I've checked and indeed we are using disk_access_mode=auto (well, implicitly because it's not even part of config file anymore): DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap.

I hope someone with more experience than me will add a comment about your settings. Reading the configuration file, writers and compactors should be 2 at minimum. I can confirm when I tried in the past to change the concurrent_compactors to 1 I had really bad things happenings (high system load, high message drop rate, ...)

As I've mentioned, we did not observe any other issues with the current setup: system load is reasonable, no dropped messages, no big number of hints, request latencies are OK, no big number of pending compactions.  Also during repair everything looks fine.

I have the "feeling", when running on constrained hardware the underlaying kernel optimization is a must. I agree with Jonathan H. that you should think about increasing the instance size, CPU and memory mathters a lot.

How did you solve your issue in the end?  You didn't rollback to 3.0.6?  Did you tune kernel parameters?  Which ones?

Thank you!
--
Alex