In our system, we have known conditions where an error occurs occasionally, and it is expected. We don't want to have the status be Bad for a single occurrence, or the test would fail 100% of the time.
We want the ability to error if we get TOO MANY of those errors during a test interval.
For instance, it would be great if a log test status could only become bad if we have > 5 errors in the log over the test interval. If we have 4 or fewer errors, status remains Good.
The test Reply value would equal the number of times the test string has appeared in the log.
Log Test: count # of bad records
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Actually you may use "Use Normal status if" option (located on Test properties dialog) with expression like the following:
('%FailureIteration%'<5) AND ('%FailureIteration%'>0)
Please check for details at:
http://www.ks-soft.net/hostmon.eng/mfra ... rmalstatus
('%FailureIteration%'<5) AND ('%FailureIteration%'>0)
Please check for details at:
http://www.ks-soft.net/hostmon.eng/mfra ... rmalstatus
Let me make sure I understand this right with an example:
* I have a Text Log test which searches for the string "ERROR" in the log. This test runs every 5 minutes.
* I set up the conditional "Use Normal status if" and specify
('%FailureIteration%' < 20)
* In one 5-minute span, the word "ERROR" appears 15 times. The test shows normal status.
* In another 5-minute span, the word "ERROR" appears 24 times. The test shows Bad status.
Is this correct? I always assumed FailureIteration meant number of consecutive test failures, not number of bad records in the log.
* I have a Text Log test which searches for the string "ERROR" in the log. This test runs every 5 minutes.
* I set up the conditional "Use Normal status if" and specify
('%FailureIteration%' < 20)
* In one 5-minute span, the word "ERROR" appears 15 times. The test shows normal status.
* In another 5-minute span, the word "ERROR" appears 24 times. The test shows Bad status.
Is this correct? I always assumed FailureIteration meant number of consecutive test failures, not number of bad records in the log.
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Correct, %FailureIteration% means number of consecutive failed test probes.
There is no variable that holds number of Bad records in Log.
However you may use %FailureIteration% as workarount to count number of Bad records in Log.
1. Set Warn of "all new events" option on Test properties dialog for Text Log test.
2. Set "Use Normal status if" option with expression like:
('%FailureIteration%'<20) AND ('%FailureIteration%'>0)
3. Setup additional action "Repeat test", select "advanced mode" and provide expression like the following:
('%SimpleStatus%'=='DOWN') OR ('%Status%'=='Normal')
With these settings you will get:
- if there were 15 Bad records within 5 minutes (Test Interval), HostrMonitor will not set Bad status (Normal and Ok statuses will be used).
- if there were 20 or more Bad records within 5 minutes (Test Interval), HostrMonitor will set Bad status and trigger assigned actions.
There is no variable that holds number of Bad records in Log.
However you may use %FailureIteration% as workarount to count number of Bad records in Log.
In such case you will need the following configuration:* In one 5-minute span, the word "ERROR" appears 15 times. The test shows normal status.
* In another 5-minute span, the word "ERROR" appears 24 times. The test shows Bad status.
1. Set Warn of "all new events" option on Test properties dialog for Text Log test.
2. Set "Use Normal status if" option with expression like:
('%FailureIteration%'<20) AND ('%FailureIteration%'>0)
3. Setup additional action "Repeat test", select "advanced mode" and provide expression like the following:
('%SimpleStatus%'=='DOWN') OR ('%Status%'=='Normal')
With these settings you will get:
- if there were 15 Bad records within 5 minutes (Test Interval), HostrMonitor will not set Bad status (Normal and Ok statuses will be used).
- if there were 20 or more Bad records within 5 minutes (Test Interval), HostrMonitor will set Bad status and trigger assigned actions.
Hi,
I would have the same request and checked it with your Suggestion.
Will be the repat test Action always be executed by the Hostmonitor?
The test it self will be executed by the RMA Agent (Linux).
I will get the following error: "RMA: Wrong Command"
I would need the possiblity to cound an Expression within a log file ... (Linux)
Thank in advance
Martin
I would have the same request and checked it with your Suggestion.
Will be the repat test Action always be executed by the Hostmonitor?
The test it self will be executed by the RMA Agent (Linux).
I will get the following error: "RMA: Wrong Command"
I would need the possiblity to cound an Expression within a log file ... (Linux)
Thank in advance
Martin
HostMonitor sets test execution time, so yes, this action tells HostMonitor to schedule test execution.Will be the repat test Action always be executed by the Hostmonitor?
Test will be executed by specified agent (HostMonitor or RMA) in any case.
HostMonitor version?The test it self will be executed by the RMA Agent (Linux).
I will get the following error: "RMA: Wrong Command"
RMA version?
Test method? Text Log?
Regards
Alex
just saw, that I had an old linux agent (1.25),
just have done the update to 1.29 and now it's basically ok, although I still have a problem with the alerting
I use the test medthod "Text Log" and want to get an alert, when the bad text will written more than 10 times in 5 minutes
I have this configuration:
Use normal status if: ('%FailureIteration%'<10) AND ('%FailureIteration%'>0)
Alertprofile with "Check host again":
('%SimpleStatus%'=='DOWN') OR ('%Status%'=='Normal')
What I have to select in the text log properties?
set "OK" status when no new "bad" records detected
....
Thanks,
Martin
just have done the update to 1.29 and now it's basically ok, although I still have a problem with the alerting
I use the test medthod "Text Log" and want to get an alert, when the bad text will written more than 10 times in 5 minutes
I have this configuration:
Use normal status if: ('%FailureIteration%'<10) AND ('%FailureIteration%'>0)
Alertprofile with "Check host again":
('%SimpleStatus%'=='DOWN') OR ('%Status%'=='Normal')
What I have to select in the text log properties?
set "OK" status when no new "bad" records detected
....
Thanks,
Martin
I would not use this action.Alertprofile with "Check host again":
('%SimpleStatus%'=='DOWN') OR ('%Status%'=='Normal')
I would set "warn of all new events" test option
I don't know. It depends on what exactly data you want to find in the log.What I have to select in the text log properties?
Regards
Alex