"Unknown" status in ping test

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
rodionov.ka
Posts: 8
Joined: Mon Apr 08, 2013 11:55 pm

"Unknown" status in ping test

Post by rodionov.ka »

Hello.
What mean status "unknown" in ping test?

at last time i'm receive many falsenegatives with this statuses

First mail - FAIL of service

Code: Select all

Test     : DC71  Ping
Method: ping (timeout - 2000 ms)
Status  : Unknown
StatusChangedTime: 26.04.2013 1:22:07  
Reply   : Timed out
Suggested Reply: 0 ms
Agent (host, who performed test): gw71.*domain*.ru

Last status: Host is alive
LastReply: 0 ms 
PreviousStatusDuration: 6 days 18:06:15 

Folder: DC71
Interval of test: 00:01:00 
TaskComment: Ping DC71 
Test Object Info: Ping DC71 (timeout: 2000 ms)

MasterTests (depend on):  

Recurrences : 1
Total tests: 1368772
Alive ratio : 95,60 %
Dead ratio: 0,18 %
Second with HOST ALIVE

Code: Select all

Test     : DC71  Ping
Method: ping (timeout - 2000 ms)
Status  : Host is alive
StatusChangedTime: 26.04.2013 1:23:08  
Reply   : 0 ms
Suggested Reply: 0 ms
Agent (host, who performed test): gw71.*domain*.ru 

Last status: Unknown
LastReply: Timed out 
PreviousStatusDuration: 00:01:00 

Folder: DC71
Interval of test: 00:01:00 
TaskComment: Ping DC71 
Test Object Info: Ping DC71 (timeout: 2000 ms)

MasterTests (depend on):  

Recurrences : 1
Total tests: 1368773
Alive ratio : 95,60 %
Dead ratio: 0,18 %
when it's real in FAIL state, i receive message like this

Code: Select all

Test     : Ping  Lipetsk VPN-GW48
Method: ping (timeout - 2000 ms)
Status  : [b]No answer[/b]
StatusChangedTime: 26.04.2013 9:36:18  
Reply   : 100 %
Suggested Reply: 100 %
Agent (host, who performed test): HostMonitor 

Last status: Host is alive
LastReply: 0 % 
PreviousStatusDuration: 2 days 10:50:40 

Folder: VPN's
Interval of test: 00:05:00 
TaskComment: Ping 192.168.48.254 
Test Object Info: Ping 192.168.48.254 (timeout: 2000 ms)

MasterTests (depend on): GW5 Firewall
Ping  Lipetsk-EXT 

Recurrences : 1
Total tests: 291302
Alive ratio : 93,05 %
Dead ratio: 0,59 %
and many others same status tests.
I use two Active RMA in that network, first, and second in backup only mode.


How i can troubleshoot this messgaes to exclude fasle negatives?
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Status : Unknown
StatusChangedTime: 26.04.2013 1:22:07
Reply : Timed out
This means HostMonitor could not perform test within 15 min.
How i can troubleshoot this messgaes to exclude fasle negatives?
Well, its not "false". Unknown status means test cannot be executed correctly. There is some problem...

You may easily tell HostMonitor do not start "bad" actions on Unknown status but I think its better to find reason of this problem.

Could you please provide more information?
- HostMonitor version?
- Test performed by Active RMA? RMA version?
- What Windows do you use?
- Service pack?
- Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use?
- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?

Regards
Alex
rodionov.ka
Posts: 8
Joined: Mon Apr 08, 2013 11:55 pm

Post by rodionov.ka »

Could you please provide more information?
- HostMonitor version?
v.9.32
- Test performed by Active RMA? RMA version?
ActiveRMA 4.52 and 4.53 (both have a same behavior)
- What Windows do you use?
- Service pack?
For Hostmonitor server - Win2003 R2 SP2 Eng (xeon 3Ghz, 4 Gb mem, sas disks)
For agents in "problem" remote office (link by VPN, medium quality inet channels)
Main Agent - Windows NT 5.2 Build 3790 Service Pack 2
Backup agent - Windows NT 6.1 Build 7601 Service Pack 1
- Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use?
Yes, ODBC backup Logging on SQL server within LAN for HM Server, not for remote agents. No other ODBC related tests.
Image
SQL Native Client - 2005.90.2047.00
- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
I think no.

KS-Soft wrote:
Status : Unknown
StatusChangedTime: 26.04.2013 1:22:07
Reply : Timed out
This means HostMonitor could not perform test within 15 min.
How i can troubleshoot this messgaes to exclude fasle negatives?
Well, its not "false". Unknown status means test cannot be executed correctly. There is some problem...

You may easily tell HostMonitor do not start "bad" actions on Unknown status but I think its better to find reason of this problem.

Could you please provide more information?
- HostMonitor version?
- Test performed by Active RMA? RMA version?
- What Windows do you use?
- Service pack?
- Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use?
- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?

Regards
Alex
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Looks like connection was dropped after test probe started...
Do you see errors in RMA log file (by default its loga_bad.txt file, unless you changed settings)?
Any errors in HostMonitor system log file (specified on System Log page in HostMonitor Options dialog)?
Any warnings displayed by Auditing Tool (menu View)?

Regards
Alex
rodionov.ka
Posts: 8
Joined: Mon Apr 08, 2013 11:55 pm

Post by rodionov.ka »

KS-Soft wrote:Looks like connection was dropped after test probe started...
Do you see errors in RMA log file (by default its loga_bad.txt file, unless you changed settings)?
GW72 single agent

Code: Select all

[26.04.2013 0:59]	gw72.*domain*.ru	Connection error
[26.04.2013 0:59]	gw72.*domain*.ru	Connection error
[26.04.2013 1:07]	gw72.*domain*.ru	Decode error: Cannot read data. An existing connection was forcibly closed by the remote host.
[26.04.2013 1:07]	gw72.*domain*.ru	Connection error
GW71 Main agent:

Code: Select all

[26.04.2013 0:59]	gw71.*domain*.ru	Connection error
[26.04.2013 1:07]	gw71.*domain*.ru	Connection error
DC71 backup agent of GW71:

Code: Select all

[26.04.2013 0:59]	dc71.*domain*.ru	Connection error
[26.04.2013 0:59]	dc71.*domain*.ru	Connection error
[26.04.2013 1:07]	dc71.*domain*.ru	Connection error
[26.04.2013 1:07]	dc71.*domain*.ru	Connection error
[26.04.2013 2:05]	dc71.*domain*.ru	Decode error: Cannot read data. An existing connection was forcibly closed by the remote host.
[26.04.2013 2:06]	dc71.*domain*.ru	Agent "dc71.*domain*.ru" already connected!
KS-Soft wrote:Any errors in HostMonitor system log file (specified on System Log page in HostMonitor Options dialog)?

Code: Select all

 26.04.2013 0:58:47   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:00:58   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:07:13   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:08:13   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:13:30   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:13:35   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:13:39   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:13:43   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:13:49   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:13:56   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:13:56   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:14:05   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:14:06   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:14:13   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:14:19   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:14:26   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:14:30   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:14:40   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:14:56   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:16:07   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:22:08   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 1:23:08   E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)  
 26.04.2013 2:06:05   192.168.71.2: Agent "dc71.*domain*.ru" already connected!  
KS-Soft wrote:Any warnings displayed by Auditing Tool (menu View)?
Warning only about non correct bat or vbs files in disabled tests and wrong sound filenames.

Image

Other question:
Why lots of "Already connected? every 30 seconds for some of agents?
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Why lots of "Already connected? every 30 seconds for some of agents?
2 possible reasons:
- you have installed 2 agents using the same name (could you please check this?)
- there is some mistake in our code

Regards
Alex
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

PS
we checked code - this may happen when TCP connection dropped (not closed in normal way). HostMonitor system may wait for packets from remote system for some time until it recognizes problem and socket will be closed.

Regards
Alex
rodionov.ka
Posts: 8
Joined: Mon Apr 08, 2013 11:55 pm

Post by rodionov.ka »

KS-Soft wrote:
Why lots of "Already connected? every 30 seconds for some of agents?
2 possible reasons:
- you have installed 2 agents using the same name (could you please check this?)
- there is some mistake in our code

Regards
Alex
Hi!
About "already connected":
I've double checked - no cfg's on remote systems with same agent names.
At now - some agents have this message and it's sending it constantly, when RMA manager active. Resarting of agent (with waiting about a 30 seconds) solving this "problem". Or not starting RMA manager solves this problem too

But what about first problem with "unknown's"?
What i'm need to do to solve it?
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

If you cannot use more reliable channel, there is nothing you can do.
We can add some code in next version to handle such problems better (faster reconnect, etc).
But if connection is bad and test cannot be performed, you still will see Unknown status sometimes. You may setup action profiles (test settings) to ignore Unknown status, do not start actions or start different actions on Unknown status.

Regards
Alex
rodionov.ka
Posts: 8
Joined: Mon Apr 08, 2013 11:55 pm

Post by rodionov.ka »

KS-Soft wrote:If you cannot use more reliable channel, there is nothing you can do.
What difference with Active and Passive RMA in that case?
I know that the connection did not disappear for a time greater than 2-3 minutes (more than 2 tests of that connection from a main HM server: pings of ext ip and vpn session pings). If a time for assigning unknown status is a 15 minutes, why unknown status is set? Can you imagine 15 minutes channel breakdown in office internet network connection??
My be this an error in rma session management?
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Passive RMA waits for connection from HostMonitor.
HostMonitor may set Unknown status even if you use Passive RMA (when HostMonitor cannot connect to the agent and cannot connect to backup RMA).
As I said we will add some code...

Regards
Alex
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

We changed some code in new version of HostMonitor, RMA Manager and Active RMA.
Now it should work better over unreliable connections. E.g.
- when connection from primary RMA dropped and there is active (connected) backup agent, HostMonitor may switch to backup RMA right away even if test probe already started;
- if there is no backup RMA provided, HostMontor may update results for started test probes right after successful reconnect of agent;
and so on.

If you have installed version 9.50, we can provide updated hostmon.exe, rma_active.exe, rma_mgr.exe modules.

Regards
Alex
rodionov.ka
Posts: 8
Joined: Mon Apr 08, 2013 11:55 pm

Post by rodionov.ka »

KS-Soft wrote:If you have installed version 9.50, we can provide updated hostmon.exe, rma_active.exe, rma_mgr.exe modules.
Hello
We have installed version 9.32. Do you have updated version for us?
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Why do you want to use old version? Your license does not allow updates anymore?
Please send request to support@ks-soft.net, provide your registration name and/or order number.

Regards
Alex
rodionov.ka
Posts: 8
Joined: Mon Apr 08, 2013 11:55 pm

Post by rodionov.ka »

KS-Soft wrote: If you have installed version 9.50, we can provide updated hostmon.exe, rma_active.exe, rma_mgr.exe modules.
Hello!
Can you provide updated modules?
Post Reply