Log Test: count # of bad records

Need new test, action, option? Post request here.
Post Reply
stesan100
Posts: 5
Joined: Thu Jun 18, 2009 12:50 am

Log Test: count # of bad records

Post by stesan100 »

In our system, we have known conditions where an error occurs occasionally, and it is expected. We don't want to have the status be Bad for a single occurrence, or the test would fail 100% of the time.

We want the ability to error if we get TOO MANY of those errors during a test interval.

For instance, it would be great if a log test status could only become bad if we have > 5 errors in the log over the test interval. If we have 4 or fewer errors, status remains Good.

The test Reply value would equal the number of times the test string has appeared in the log.
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Actually you may use "Use Normal status if" option (located on Test properties dialog) with expression like the following:
('%FailureIteration%'<5) AND ('%FailureIteration%'>0)

Please check for details at:
http://www.ks-soft.net/hostmon.eng/mfra ... rmalstatus
stesan100
Posts: 5
Joined: Thu Jun 18, 2009 12:50 am

Post by stesan100 »

Let me make sure I understand this right with an example:

* I have a Text Log test which searches for the string "ERROR" in the log. This test runs every 5 minutes.
* I set up the conditional "Use Normal status if" and specify
('%FailureIteration%' < 20)

* In one 5-minute span, the word "ERROR" appears 15 times. The test shows normal status.
* In another 5-minute span, the word "ERROR" appears 24 times. The test shows Bad status.

Is this correct? I always assumed FailureIteration meant number of consecutive test failures, not number of bad records in the log.
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Correct, %FailureIteration% means number of consecutive failed test probes.
There is no variable that holds number of Bad records in Log.
However you may use %FailureIteration% as workarount to count number of Bad records in Log.
* In one 5-minute span, the word "ERROR" appears 15 times. The test shows normal status.
* In another 5-minute span, the word "ERROR" appears 24 times. The test shows Bad status.
In such case you will need the following configuration:
1. Set Warn of "all new events" option on Test properties dialog for Text Log test.
2. Set "Use Normal status if" option with expression like:
('%FailureIteration%'<20) AND ('%FailureIteration%'>0)
3. Setup additional action "Repeat test", select "advanced mode" and provide expression like the following:
('%SimpleStatus%'=='DOWN') OR ('%Status%'=='Normal')

With these settings you will get:
- if there were 15 Bad records within 5 minutes (Test Interval), HostrMonitor will not set Bad status (Normal and Ok statuses will be used).
- if there were 20 or more Bad records within 5 minutes (Test Interval), HostrMonitor will set Bad status and trigger assigned actions.
mp1
Posts: 200
Joined: Tue Mar 07, 2006 3:24 am

Post by mp1 »

Hi,

I would have the same request and checked it with your Suggestion.

Will be the repat test Action always be executed by the Hostmonitor?

The test it self will be executed by the RMA Agent (Linux).
I will get the following error: "RMA: Wrong Command"

I would need the possiblity to cound an Expression within a log file ... (Linux)

Thank in advance

Martin
KS-Soft
Posts: 12869
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Will be the repat test Action always be executed by the Hostmonitor?
HostMonitor sets test execution time, so yes, this action tells HostMonitor to schedule test execution.
Test will be executed by specified agent (HostMonitor or RMA) in any case.
The test it self will be executed by the RMA Agent (Linux).
I will get the following error: "RMA: Wrong Command"
HostMonitor version?
RMA version?
Test method? Text Log?

Regards
Alex
mp1
Posts: 200
Joined: Tue Mar 07, 2006 3:24 am

Post by mp1 »

just saw, that I had an old linux agent (1.25),
just have done the update to 1.29 and now it's basically ok, although I still have a problem with the alerting

I use the test medthod "Text Log" and want to get an alert, when the bad text will written more than 10 times in 5 minutes

I have this configuration:

Use normal status if: ('%FailureIteration%'<10) AND ('%FailureIteration%'>0)

Alertprofile with "Check host again":
('%SimpleStatus%'=='DOWN') OR ('%Status%'=='Normal')


What I have to select in the text log properties?

set "OK" status when no new "bad" records detected
....

Thanks,

Martin
KS-Soft
Posts: 12869
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Alertprofile with "Check host again":
('%SimpleStatus%'=='DOWN') OR ('%Status%'=='Normal')
I would not use this action.
I would set "warn of all new events" test option
What I have to select in the text log properties?
I don't know. It depends on what exactly data you want to find in the log.

Regards
Alex
Post Reply