RMA Client in Unkown status causing flase alets

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
andrep
Posts: 2
Joined: Thu Dec 11, 2008 2:55 am

RMA Client in Unkown status causing flase alets

Post by andrep »

I have the lastest version and installed about 18 sites with 1 RMA on each site. 17 Test per site. Every now and then the RMA at sites does not respond and causes unknown status. When I force the test the its alive. It seems unstable ? What could be causing this ?
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Could you provide more information, please?
- Do you use Active or Passive RMA?
- What exact error message do you see in "Reply field" of the test? "Connection error"? Other?

Regards,
Max
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Also, do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
What exactly test method(s) do you use?
What Windows is installed on local and remote systems? Service Pack?

Regards
Alex
ldean
Posts: 17
Joined: Fri Nov 14, 2008 8:15 am

Post by ldean »

I'm experiencing the same issue occasionally... What is strange, is we will sometimes have several tests that are performed by one agent, and one test will time out and show unknown and the other will show up as OK. I'm kind of lost on it, the only thing I can think of is to play around with the settings pertaining to the number of tests that are initiated at once.
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Could you please answer our questions?

Regards
Alex
ldean
Posts: 17
Joined: Fri Nov 14, 2008 8:15 am

Post by ldean »

I don't want to hijack this thread, but in the interest of getting a resolution for both of us, here are our details.

The issue has occurred on several different environments. All use active RMA.
Here is one example of what happens when the issue occurs:

Code: Select all

Test: server.domain.local C Drive
Method: Drive space

12/15/2008 6:34:41 PM	Unknown	Timed out
12/15/2008 6:44:42 PM	Ok	23 Gb
12/15/2008 6:56:37 PM	Unknown	RMA not connected
12/15/2008 6:58:56 PM	Ok	23 Gb
12/16/2008 12:00:18 AM	Ok	23 Gb
That particular test was on a server08 machine, however it occurred at the same time on several servers, the rest of which are all 2003. ie,

Code: Select all

Test: WMI Service
Method: check service

12/14/2008 12:00:04 AM	Ok	0 ms
12/15/2008 12:00:07 AM	Ok	0 ms
12/15/2008 6:56:37 PM	Unknown	RMA not connected
12/15/2008 6:59:11 PM	Ok	0 ms
12/16/2008 12:00:17 AM	Ok	0 ms
What would be a good setting for the max # of tests initiated at once? could lowering it or raising it have an effect? I have set it to 100, and the problem hasn't reoccurred, yet...
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

RMA not connected
Looks like connection was dropped and Agent could not reconnect for several minutes.
Could you check RMA logs? By default these text logs stored in the same folder where agent is installed (unless you changed location).
What would be a good setting for the max # of tests initiated at once? could lowering it or raising it have an effect? I have set it to 100, and the problem hasn't reoccurred, yet...
I think there is some external problem: network error, firewall or antivirus monitor...
Do you have antivirus monitor installed on the systems?
What version of HostMonitor and RMA do you use?

Regards
Alex
ldean
Posts: 17
Joined: Fri Nov 14, 2008 8:15 am

Post by ldean »

but I dont understand how out of 2 tests from one single RMA, one will come back OK and one will come back unknown, when the tests run at the same time?

edit: all RMA's and HM are on the latest available versions

Also: we added today a simple connectivity test for each of our remote clients today, which just pings 127.0.0.1 and returns the result to HM. I was watching it, and as it came time for the test to run, many of the agents showed "checking" for a long time, and eventually went to unknown. I am connected through remote desktop to those agents tho, so I know it can't be a network connection issue
ldean
Posts: 17
Joined: Fri Nov 14, 2008 8:15 am

Post by ldean »

Here are a couple of the test stsatistics for the connectivity tests. Here are 2 separate RMA's:

Code: Select all

2/16/2008 1:43:30 PM	Host is alive	16 ms
12/16/2008 1:43:30 PM	Host is alive	0 ms
12/16/2008 2:01:33 PM	Unknown	Timed out
12/16/2008 2:04:35 PM	Host is alive	0 ms
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:15:17 PM	Host is alive	0 ms
12/16/2008 4:15:20 PM	Host is alive	0 ms

Code: Select all

12/16/2008 1:43:30 PM	Host is alive	16 ms
12/16/2008 1:43:30 PM	Host is alive	16 ms
12/16/2008 1:43:30 PM	Host is alive	0 ms
12/16/2008 2:01:33 PM	Unknown	Timed out
12/16/2008 2:04:35 PM	Host is alive	0 ms
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:14:55 PM	Unknown	Timed out
12/16/2008 4:15:17 PM	Host is alive	0 ms
12/16/2008 4:15:20 PM	Host is alive	0 ms
As I was watching, and they said checking, I right clicked on one and told it to refresh selected test, and it returned to OK.


EDIT: I'm not sure if the statistics are accurate... the tests have the same test name, perhaps that is throwing it off and giving me the stats for the same test?
ldean
Posts: 17
Joined: Fri Nov 14, 2008 8:15 am

Post by ldean »

The issue is occurring at the moment for me. I have an agent which in HM, had its tests come back as unknown, and it says RMA not connected, however in the RMA manager, it shows connected. Any idea what could cause that? Also, this test is supposed to run every couple minutes, but if I dont do anything, it just stays on unknown. If I manually refresh it, it will come back as OK. (BTW, I have access to the remote network via RDP, and it is connected still as well)

Any help is appreciated.
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

but I dont understand how out of 2 tests from one single RMA, one will come back OK and one will come back unknown, when the tests run at the same time?
According to your previous post 2 different test items failed at the same time
================
Test: server.domain.local C Drive Method: Drive space
12/15/2008 6:56:37 PM Unknown RMA not connected
...
Test: WMI Service Method: check service
12/15/2008 6:56:37 PM Unknown RMA not connected
================

Could you please check RMA logs?
Also: we added today a simple connectivity test for each of our remote clients today, which just pings 127.0.0.1 and returns the result to HM. I was watching it, and as it came time for the test to run, many of the agents showed "checking" for a long time, and eventually went to unknown. I am connected through remote desktop to those agents tho, so I know it can't be a network connection issue
Well. "Timed out" and "RMA not connected" thats 2 different problems.

1) "RMA not connected" means agent cannot connect to HostMonitor for several minutes. Actually this error may appear right after connection drop if you manually force test to be "refreshed".
If you check RMA logs, probably you may find some more information why RMA cannot reconnect

2) "Timed out" means agent did not return test result within 15 min. That looks strange...
Could you try to setup Passive RMA instead of Active? Just for testing...
EDIT: I'm not sure if the statistics are accurate... the tests have the same test name, perhaps that is throwing it off and giving me the stats for the same test?
Its better to use unique test names. Patterns can help you to do this
http://www.ks-soft.net/hostmon.eng/mfra ... tterns.htm

You may use Quick Log to check latest test results for specific item.
http://www.ks-soft.net/hostmon.eng/mfra ... ickLogPane
The issue is occurring at the moment for me. I have an agent which in HM, had its tests come back as unknown, and it says RMA not connected, however in the RMA manager, it shows connected. Any idea what could cause that?
That's possible. 2 different applications uses different sockets (TCP ports) and/or IP addresses.
As I said several times - lets check the logs
Also, this test is supposed to run every couple minutes, but if I dont do anything, it just stays on unknown. If I manually refresh it, it will come back as OK. (BTW, I have access to the remote network via RDP, and it is connected still as well)
Could you please start Auditing? Menu View -> Auditing tool.
Any warnings?

Regards
Alex
ldean
Posts: 17
Joined: Fri Nov 14, 2008 8:15 am

Post by ldean »

auditing shows no issues except that a couple alert wav files could not be found.

i checked the RMA on one server that we have specifically been having issues with... this is, i believe, the only one that has had the "not connected" issue.... the others are just timing out.. but here is a segment of that log, maybe you can tell me what it means.

Code: Select all

[12/16/2008 9:43 PM]	SERVER2.flnet.local	Decode error: Cannot read data (RMA Manager)
[12/16/2008 10:14 PM]	SERVER2.flnet.local	Decode error: Cannot read data
[12/16/2008 10:14 PM]	SERVER2.flnet.local	Decode error: Cannot read data (RMA Manager)
[12/16/2008 10:15 PM]	SERVER2.domain.local	Connection error
[12/16/2008 11:58 PM]	SERVER2.domain.local	Connection error
[12/17/2008 2:31 AM]	SERVER2.domain.local	Decode error: Cannot read data
[12/17/2008 2:31 AM]	SERVER2.domain.local	Decode error: Cannot read data (RMA Manager)
[12/17/2008 2:31 AM]	SERVER2.domain.local	Connection error
[12/17/2008 5:59 AM]	SERVER2.domain.local	Decode error: Cannot read data
[12/17/2008 9:29 AM]	SERVER2.domain.local	Decode error: Cannot read data (RMA Manager)
However, I do see similar errors on some other RMA logs:

Code: Select all

[11/16/2008 11:44 AM]	server.domain.com	Decode error: Cannot read data. An existing connection was forcibly closed by the remote host.
[11/16/2008 11:45 AM]	server.domain.com	Connection error
[11/16/2008 11:45 AM]	server.domain.com	Connection error
I also changed all the test names so there are no repeats.

I am out of my office atm, but I will try and set someone up for passive. We were hoping to avoid this, in order to avoid having to mess with firewall rules on all of our remote clients.

Thanks for your help
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

[12/17/2008 2:31 AM] SERVER2.domain.local Decode error: Cannot read data
[12/17/2008 2:31 AM] SERVER2.domain.local Decode error: Cannot read data (RMA Manager)
It looks like some other application (not HostMonitor and not RMA Manager) accepted connection request from the agent :o
Not sure how this is possible....
I still think there is some external problem. Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?

Regards
Alex
doodleman99
Posts: 38
Joined: Tue Sep 02, 2008 5:45 am

work around

Post by doodleman99 »

i had the same problem which was a REAL PAIN !!!!
hated waking up in the morning with 400 emails on my blackberry.

aaaaaanyway... my workaround is to UnTick the "treat unknown reply as bad" in the properties of the tests.
it's not perfect. but it works for me
Post Reply