Active RMA stops responding and service won't stop
Active RMA stops responding and service won't stop
I have the Active RMA 3.04 Beta installed on several machines (all Windows 2003 R2 Server SP2). Everything works fine for several hours, then some agents start to drop off.
For example, out of 40 agents I had running yesterday, when I came in this morning 11 of the agents were in the "RMA not connected" state. When I go to the remote machine and try to stop the service, it hangs.
The only way to restart is to use Task Manager to kill rma_active.exe and then start the service again. Then everything returns to normal.
I am communicating with HM 7.08, but plan to upgrade to 7.10. Since the problem appears to be the RMA, I doubt this will help. Is there a new RMA I should be using? What else can I do to help troubleshoot this problem?
Thank you.
For example, out of 40 agents I had running yesterday, when I came in this morning 11 of the agents were in the "RMA not connected" state. When I go to the remote machine and try to stop the service, it hangs.
The only way to restart is to use Task Manager to kill rma_active.exe and then start the service again. Then everything returns to normal.
I am communicating with HM 7.08, but plan to upgrade to 7.10. Since the problem appears to be the RMA, I doubt this will help. Is there a new RMA I should be using? What else can I do to help troubleshoot this problem?
Thank you.
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Could you provide more information, please?
- What Windows is installed on the machine, where HostMonitor is running? Service Pack?
- Do you have installed antivirus monitor? Personal firewall? Content monitoring software? Non-standard winsock components? Network packet analyzer?
- Do you see any suspicious error messages in Event Viewer (Start > Settings > Control Panel > Administrative Tools > Event Viewer applet)?
- Do you see any error messages in System Log (file if specified in menu "Options" -> "System Log"). You may access the System Log using menu "View" - > "System Log".
- Do you see any error messages in RMA's log files (successful and failure audit logs, that are specified by rma_cfg.exe utility)?
Regards,
Max
- What Windows is installed on the machine, where HostMonitor is running? Service Pack?
- Do you have installed antivirus monitor? Personal firewall? Content monitoring software? Non-standard winsock components? Network packet analyzer?
- Do you see any suspicious error messages in Event Viewer (Start > Settings > Control Panel > Administrative Tools > Event Viewer applet)?
- Do you see any error messages in System Log (file if specified in menu "Options" -> "System Log"). You may access the System Log using menu "View" - > "System Log".
- Do you see any error messages in RMA's log files (successful and failure audit logs, that are specified by rma_cfg.exe utility)?
Regards,
Max
All machines including the Host Monitor machine are running Windows 2003 R2 Server with Service Pack 2. Some of the RMA machines are running the 64 bit version, but the problem occurs with both 64 and 32 bit machines.
All machines have McAfee VirusScan Enterprise Server 8.5.0.781 ePolicy Orchestrator Agent 3.6.0.574 running. This is true for machines where the RMA stops communicating as well as machines where there is not a problem. I am having our AV admin take a look to make sure McAfee is not interfering.
I do not see anything unusual in the syslog or log for HM. In one check the machine responds, in the next check it fails. There is nothing in between to indicate a problem.
I do not see anything unusual in the Event Viewer for the Host Monitor machine. On the RMA machine's Event log I see "The KS Active Remote Monitoring Agent service terminated unexpectedly." That is probably when I killed the process using Task Manager. Otherwise nothing unusual.
All machines have McAfee VirusScan Enterprise Server 8.5.0.781 ePolicy Orchestrator Agent 3.6.0.574 running. This is true for machines where the RMA stops communicating as well as machines where there is not a problem. I am having our AV admin take a look to make sure McAfee is not interfering.
I do not see anything unusual in the syslog or log for HM. In one check the machine responds, in the next check it fails. There is nothing in between to indicate a problem.
I do not see anything unusual in the Event Viewer for the Host Monitor machine. On the RMA machine's Event log I see "The KS Active Remote Monitoring Agent service terminated unexpectedly." That is probably when I killed the process using Task Manager. Otherwise nothing unusual.
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
In fact, we recommend to install HostMonitor/RMA onto clean system. However, If you are unable to disable antivirus, we recommend at least to disable real-time protection module or add HostMonitor/RMA into the exclusions list.russionix wrote: I am having our AV admin take a look to make sure McAfee is not interfering.
Correct. This message is related to killing the process.russionix wrote:On the RMA machine's Event log I see "The KS Active Remote Monitoring Agent service terminated unexpectedly." That is probably when I killed the process using Task Manager. Otherwise nothing unusual.
What about RMA's log files? Each RMA writes information into log files. You may start rma_cfg.exe utility from the folder, where RMA is running to find out the log filenames. Also you may view certain rma.ini file and look for "[Logging]" section. Could you send the log files from one of failed rma's to support@ks-soft.net?
Regards,
Max
Log from failing RMA
This is the tail of the log. Prior to this, just wrong reply packet errors while I was setting things up. If you still want the entire log, let me know.
[2/11/2008 12:22 PM] active-name2 Connection error: Wrong reply packet received
[2/15/2008 5:57 PM] active-name2 Decode error: Cannot read data
[2/15/2008 5:57 PM] active-name2 Connection error
[2/11/2008 12:22 PM] active-name2 Connection error: Wrong reply packet received
[2/15/2008 5:57 PM] active-name2 Decode error: Cannot read data
[2/15/2008 5:57 PM] active-name2 Connection error
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Could you provide more information, please?
- Do these RMA perform a tests and actions, or tests only?
- What is an estimate load of HostMonitor (tests per second)? You may find this information using menu "View" -> "Estimate Load".
- What exact value is specified in "Don't start more than [N] tests per second" box in "Options" -> "Behavior" tab?
Regards,
Max
- Do these RMA perform a tests and actions, or tests only?
- What is an estimate load of HostMonitor (tests per second)? You may find this information using menu "View" -> "Estimate Load".
- What exact value is specified in "Don't start more than [N] tests per second" box in "Options" -> "Behavior" tab?
Regards,
Max
- Do these RMA perform a tests and actions, or tests only?
Ping, two shell scripts to test disk usage, CPU Usage, NTP check on each machine. All tests are dependant upon Ping.
- What is an estimate load of HostMonitor (tests per second)?
Load: 2 test/sec "System is able to perform given tests without significant load"
- What exact value is specified in "Don't start more than [N] tests per second" box in "Options" -> "Behavior" tab?
"Don't start more than 32 tests per second"
Ping, two shell scripts to test disk usage, CPU Usage, NTP check on each machine. All tests are dependant upon Ping.
- What is an estimate load of HostMonitor (tests per second)?
Load: 2 test/sec "System is able to perform given tests without significant load"
- What exact value is specified in "Don't start more than [N] tests per second" box in "Options" -> "Behavior" tab?
"Don't start more than 32 tests per second"
Still seeing 50% of Active RMA's failing
I am still experiencing the same problem with Active RMA. After running for a while, they stop communicating. The process is still running, but not communicating. When I try to stop the service, I am not able to do so. I must kill the active_rma.exe program and restart. Then it works for a while and starts all over again.
Currently half the hosts I am monitoring are not responding, which means I am not really monitoring them.
Currently half the hosts I am monitoring are not responding, which means I am not really monitoring them.
Could you please try the following update www.ks-soft.net/download/test/actrma305t.zip ?
Do not apply this module on all systems, this is test version with some "configuration limitations" so you will need to replace it again with normal version. Just install on several systems and check how it works.
If my gues is right and some functions that are published as thread-save are not really thread-safe, this version should work stable or almost stable (hung much more rarely). If it does, we will know how to fix the problem.
Regards
Alex
Do not apply this module on all systems, this is test version with some "configuration limitations" so you will need to replace it again with normal version. Just install on several systems and check how it works.
If my gues is right and some functions that are published as thread-save are not really thread-safe, this version should work stable or almost stable (hung much more rarely). If it does, we will know how to fix the problem.
Regards
Alex
active rma
Hi Alex,
remember me? I had the same problem with active rma and it ist still not solved.
Only good thing ist now, that i´m not the only one with this problem. I was already going crazy, because no one execpt me had this problem...
I will also try your beta downlaod from the post before this and wil report...
Bye,
Alex
remember me? I had the same problem with active rma and it ist still not solved.
Only good thing ist now, that i´m not the only one with this problem. I was already going crazy, because no one execpt me had this problem...
I will also try your beta downlaod from the post before this and wil report...
Bye,
Alex
Installed new rma_active.exe
I just installed new rma on 4 machines and will watch them over the next day. I'll let you know if they continue to run or if they hang.
Thanks for continuing to work on this.
Russ
Thanks for continuing to work on this.
Russ
Active RMA still running
Good news. All 4 test Active RMA agents are still running this morning. I think this means you may have identified the problem. I'll let you know if the status changes.
Active RMA still running
More good news. The Active RMA survived the weekend. This is the longest the RMA has ever run, so it appears you have found the problem. I just wanted to let you know that you are on the right track. Let me know when the new Active RMA client is ready and I will deploy it on all our systems and report back to you.