RMA: 301 - Timeout on SNMP tests

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
c.tenner
Posts: 16
Joined: Wed Jun 30, 2010 4:16 am

RMA: 301 - Timeout on SNMP tests

Post by c.tenner »

Hi!
we are experiencing "RMA: 301 - Timeout" on our SNMP tests to check Dell's global system state and system temperature of our hardware servers. We set the time out to 2500 but upon repeating the test instantly fails/times out again. After 3-4 retries the global system state/temperature is returned correctly though.

This behaviour has been seen since the last update(9.50).


We are running HM 9.50 Build 1028 and active RMA 4.58 (I've just updated to 4.60).

See below the settings of the test:
SNMP Global System State:

Code: Select all

;-----------------------------------------------------------------------------
;- HostMonitor`s export/import file                                          -
;- Generated by RCC 4.50 at 05.08.2013 12:46:48                              -
;- Source file: C:\Program Files (x86)\HostMonitor8\Abtis-Demo-Monitoring.hml-
;- Generation mode: Selected_Tests                                           -
;-----------------------------------------------------------------------------


; ------- Test #01 -------


Method      = SNMP
;--- Common properties ---
DestFolder  = Root\Abtis (abtis.local)\HQ Pforzheim\Application-Server\SRV-HQ-BU01\
RMAgent     = abtis - SRV-HQ-BU01.abtis.local
Title       = SNMP: SRV-HQ-BU01 - Global System State failed
Comment     = SNMP Get 1.3.6.1.4.1.674.10892.1.200.10.1.2.1 from SRV-HQ-BU01
RelatedURL  = http://support.dell.com/support/edocs/software/svradmin/6.2/en/SNMP/PDF/SNMP.pdf
TargetPattern= %folder%
NamePattern = SNMP: %host% - Global System State failed
CmntPattern = SNMP Get %mibnameshort% from %host%
PLogPattern = %fvar_logfile%
ScheduleMode= Regular
Schedule    = abtis-ExcludeMaintenanceWindows
Interval    = 14400
Alerts      = abtis - 1 Fehler -> Ticket
ReverseAlert= No
UnknownIsBad= No
WarningIsBad= No
UseWarning  = Yes
WarningExpr = ('%SuggestedReply%'==4)
TuneupReply = Yes
TuneReplyExp= if ('%SuggestedReply%'==3) OK; if ('%SuggestedReply%'==4) warning, noncritical; if ('%SuggestedReply%'==5) critical (failure); if ('%SuggestedReply%'==6)  nonrecoverable (dead); if ('%SuggestedReply%'=="RMA: 301 - Timeout") TimeOut; else Unkown
UseCommonLog= No
PrivateLog  = C:\Program Files (x86)\HostMonitor8\Logs\abtis_HQ.htm
PrivLogMode = Default
CommLogMode = Default
SyncCounters= Yes
SyncAlerts  = No
DependsOn   = list
MasterTest-Alive = Ping: SRV-HQ-BU01.abtis.local
;--- Test specific properties ---
Agent       = SRV-HQ-BU01
Profile     = public
Timeout     = 2500
Retries     = 1
OID         = 1.3.6.1.4.1.674.10892.1.200.10.1.2.1
Condition   = DifferentFrom
Value       = 3

;-----------------------------------------------------------------------------
; Exported 1 items
SNMP Temperature:

Code: Select all

;-----------------------------------------------------------------------------
;- HostMonitor`s export/import file                                          -
;- Generated by RCC 4.50 at 05.08.2013 12:45:38                              -
;- Source file: C:\Program Files (x86)\HostMonitor8\Abtis-Demo-Monitoring.hml-
;- Generation mode: Selected_Tests                                           -
;-----------------------------------------------------------------------------


; ------- Test #01 -------


Method      = SNMP
;--- Common properties ---
DestFolder  = Root\Abtis (abtis.local)\HQ Pforzheim\Application-Server\SRV-HQ-BU01\
RMAgent     = abtis - SRV-HQ-BU01.abtis.local
Title       = SNMP: Temperatur SRV-HQ-BU01.abtis.local
Comment     = SNMP Get 1.3.6.1.4.1.674.10892.1.700.20.1.6.1.1 from SRV-HQ-BU01.abtis.local
RelatedURL  = http://support.dell.com/support/edocs/software/svradmin/6.2/en/SNMP/PDF/SNMP.pdf
TargetPattern= %folder%%fvar_domain%
NamePattern = SNMP: Temperatur %folder%%fvar_domain%
CmntPattern = SNMP Get %mibnameshort% from %host%
PLogPattern = %fvar_logfile%
ScheduleMode= Regular
Schedule    = 
Interval    = 3600
Alerts      = abtis - 1 Fehler -> Ticket
ReverseAlert= No
UnknownIsBad= No
WarningIsBad= No
TuneupReply = Yes
TuneReplyExp= [%SuggestedReply% div 10],[%SuggestedReply% mod 10] °C
UseCommonLog= No
PrivateLog  = C:\Program Files (x86)\HostMonitor8\Logs\abtis_HQ.htm
PrivLogMode = Default
CommLogMode = Default
SyncCounters= Yes
SyncAlerts  = No
DependsOn   = list
MasterTest-Alive = Ping: SRV-HQ-BU01.abtis.local
;--- Test specific properties ---
Agent       = SRV-HQ-BU01.abtis.local
Profile     = public
Timeout     = 2500
Retries     = 1
OID         = 1.3.6.1.4.1.674.10892.1.700.20.1.6.1.1
Condition   = MoreThan
Value       = 450

;-----------------------------------------------------------------------------
; Exported 1 items
What can we improve/change to not get the time outs any more?

Regards
Christian Tenner
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

What HostMonitor & RMA version have you used before the upgrade?
You may try to increase Retries parameter using Test properties dialog.
c.tenner
Posts: 16
Joined: Wed Jun 30, 2010 4:16 am

Post by c.tenner »

Hi,
we used RMA 4.58 and HM 9.46.
Yesterday I increased the retries to 2 and today to 3. We still get the 301 though.

Cheers
Christian Tenner
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

We did not change anything related to SNMP.
Perhaps something wrong with target server?

Do you have some antivirus monitor installed? Could you try to disable it?
Could you check network traffic using packet analyzer WireShark?

Regards
Alex
c.tenner
Posts: 16
Joined: Wed Jun 30, 2010 4:16 am

Post by c.tenner »

we'll give it a wirl, see what comes up.

thank you
oakyuz
Posts: 74
Joined: Thu Feb 08, 2007 5:48 am

Post by oakyuz »

Hi Alex

We are using AHM v9.56 on Windows 2008 R2.

We are experiencing the same issue. After upgrading from v9.46 to v9.56, some SNMP tests started to give timeout. This timeout situation is intermittent, i.e., not always.

In our case, we are checking the interface traffic of Brocade SAN Switch ports through Traffic Monitor test method. We first suspected that something is wrong with SAN switch and upgraded one of our SAN switches firmware to the latest version. The timeouts did not disappear. Two SAN swithces with different firmware versions are in the same situation.

Besides timeouts, I realized that sometimes the tests return results beyond the total interface bandwidth. For example, this morning, a 4 Gbps interface (total in/out 8 Gbps) returned 18104.20 Mbit.

These two situations did not happen before v9.56.

Thanks,
Oguzhan
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Could you check network traffic using packet analyzer WireShark?

Regards
Alex
c.tenner
Posts: 16
Joined: Wed Jun 30, 2010 4:16 am

Post by c.tenner »

mmh I didn't get an info for your last message, thus the delay.
The active RMA is installed on the server that is been SNMP questioned.
So there is no network communication except the result which is sent to the HM.
Should I try nonetheless?

Cheers
Christian
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

There is UDP traffic within system, WireShark can capture packets .

Regards
Alex
c.tenner
Posts: 16
Joined: Wed Jun 30, 2010 4:16 am

Post by c.tenner »

We have checked with wireshark and to capture local events we enabled this <a href=http://ig2600.blogspot.de/2011/03/power ... l>"fix"</a>* to check local traffic.

This way we determined that the SNMP get was indeed transmitted to the server but remained unanswered, but also found out that the backup RMA was running this test instead of the "normal" RMA (in contrast to the set RMA - see test details in opening post). Once we found that out we allowed both servers to receive SNMP-messages from both servers using server name and FQDN.

We have yet to experience the behaviour as stated in the opening post again.


*: Upon removing the "fix" we weren't able to connect/ping the server until we disabled the NIC, enabled it again and re set the default gateway in Windows Network and Sharing centre.

We will monitor the monitoring until tomorrow and give you a heads up if anything changes.

Cheers
Christian
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

This way we determined that the SNMP get was indeed transmitted to the server but remained unanswered, but also found out that the backup RMA was running this test instead of the "normal" RMA (in contrast to the set RMA - see test details in opening post).
This could be normal begavior, may be primary RMA did not respond or may be you are using "Load balancing" or "Redundant check" backup mode for the agent.

Regards
Alex
c.tenner
Posts: 16
Joined: Wed Jun 30, 2010 4:16 am

Post by c.tenner »

The both RMAs have the setting Backup only and in the wireshark log we saw tests performed by the normal server.
Anyway it works now. Thanks for the tip!
Post Reply