Avoid Bad status, but perform actions

Rod · Post by **Rod** » Mon Apr 11, 2011 11:55 pm

Task:
Perform system maintenance when certain conditions are met.

How I did it:
- Added few tests to act as basic conditions checks (e.g. last time system was rebooted, ping, TCP data send/reply to see if anyone is using a system). These are changing to Bad when conditions are met.
- Set up tests that are preparing a system for a reboot when basic conditions are met and their statuses set to Bad. For example, I am putting system out of service through TCP Data Send action when it was not rebooted for some time and no one is using it currently.
- Set up a test for a reboot to happen when everything is in sequence and all of the above tests hit Bad status (that’s by making reboot test dependant on checks with “dead” status)

Problem:
It’s all working well, but there is a problem that checks are displaying Bad statuses and are changing system folder color as well as showing up in a list of All Bad Items on Web Interface.

Question is:
How to setup a Test and have it to
- Perform actions with the test when Alert is active
- Do not change folder color and do not come up as Bad on the Web Interface
- Use it for dependant tests to perform their actions

After reading the manual and searching the forum I understand that:
- .. there is a option to force status to Normal and then use expressions in dependant test like '%::MasterTestName::Status%'=='Normal' to do a reboot which avoids Bad status, but when I set Normal Status my alert actions in that test are not working.
- ..there is an option for tests not to change the folder color, but I do have some critical tests in same folder for that system that should change the color. I would than prefer not to create a separate folder just the maintenance tests.

And as a general thought, I believe it would be logical to separate two needs: one is to have a test section for alerting people about the broken stuff and second one is to have a check for conditions to perform actions, run scripts etc. Currently those are kind of merged into one. There are Alerts and Alert Profile, but those are not just used for alerting, but to run scripts and do stuff as well. May be it’s time to separate it and make it all more clear? Say to have Alert When option and to add Perform Action When? That’s just to think about, more critical question above.

KS-Soft · Post by **KS-Soft** » Tue Apr 12, 2011 11:16 am

- .. there is a option to force status to Normal and then use expressions in dependant test like '%::MasterTestName::Status%'=='Normal' to do a reboot which avoids Bad status, but when I set Normal Status my alert actions in that test are not working

You may use similar expressions for "advanced mode" actions but there is the problem - Normal, Ok and "Host is alive" statuses are "good" statuses. So HostMonitor does not reset Recurrences counter when test changes status from Ok to Normal.
If you setup test to fail when it should fail and override "bad" status using "Set Normal status if" option, then you may setup condition for such advanced actions using %FailureIteration% macro variable.
E.g. if you need to start action just once, when test status changes from Ok to Bad (then to Normal by "Optional status processing" procedure), use expression like ('%Status%'=='Normal') and (%FailureIteration%==1)

And as a general thought, I believe it would be logical to separate two needs: one is to have a test section for alerting people about the broken stuff and second one is to have a check for conditions to perform actions, run scripts etc. Currently those are kind of merged into one. There are Alerts and Alert Profile, but those are not just used for alerting, but to run scripts and do stuff as well. May be it’s time to separate it and make it all more clear? Say to have Alert When option and to add Perform Action When? That’s just to think about, more critical question above.

I am not sure but I am afraid this will help you and 5 other people but lead to problems and confusion for 1000 customers.

- ..there is an option for tests not to change the folder color, but I do have some critical tests in same folder for that system that should change the color. I would than prefer not to create a separate folder just the maintenance tests.

I think its more easy than redesign software

You can create subfolder in the same parent folder...

Regards
Alex

Rod · Post by **Rod** » Wed Apr 13, 2011 10:42 pm

Alex, thanks for the response. I figured out how to make actions work with Normal status thanks to the Advanced Mode in Action Properties and ('%Status%'=='Normal') expression.

There is, however, a possible bug that doesn’t allow me to use that function. I noticed strange behaviour with status processing

Code: Select all

Use “Normal” status if: '%Status%'=='Bad'

With that set up, Normal status is not always set instead of Bad. Here is what happens:

1:50:00 - manual test refresh, status is normal
1:51:03 - auto refresh - Bad
1:52:04 - auto refresh - Normal
1:53:07 - auto refresh - Bad
1:54:08 - auto refresh - Normal

My guess is that Host Monitor has problems setting status Bad->Normal when previous status was Normal. That is most likely as when I changed expression to

Code: Select all

Use “Normal” status if: ('%Status%'=='Bad') or ('%Status%'=='Normal')

it stopped switching to Bad. Could be a timing issue between the moment expression is checked and %Status% variable is assigned?

At the next step of troubleshooting, with Test status displaying Normal and processing option left as above, I changed test to point to a wrong port for it to timeout. And guess what, it timed out, but test was still showing Normal. If I understand it correctly it should be displaying No answer. And it was diplaying it when I removed Normal status processing tickbox and refreshed the test.

Here is an example of this test on a screenshot:

KS-Soft · Post by **KS-Soft** » Thu Apr 14, 2011 12:24 am

You just using wrong variable.
Quote from the manual
====================
I.e.

- HostMonitor performs the test;
- processes "Reverse alert" option;
- sets "suggested" macro variables (%SuggestedStatus%, %SuggestedSimpleStatus%, %SuggestedReply%, %SuggestedRecurrences% and %FailureIteration%) without touching regular counters (%Status%, %Reply%, %Recurrences%, etc);- then HostMonitor evaluates "Warning", "Normal" and "Tune up Reply" expressions and finally modifies current test status, reply field and statistics counters (Status, Reply, Alive%, Passed tests, Failed tests, etc)

====================

Also, there are many examples in the manual and help file...
As you may see we never use %Status% variable for such expressions. Just %SuggestedReply%, %SuggestedStatus%

Regards
Alex

Rod · Post by **Rod** » Wed Apr 20, 2011 5:25 pm

That works fine now Alex, thank you. I am pretty sure I read that section in the manual, but with so much information I guess it is about putting it together when needed.

Few more questions if you don’t mind:
- Would you recommend something for this issue: when reboot action is performed by HM an application on a server is forced to close and does that non-gracefully, is there a tool or a way to basically “click buttons” to shut app down?
- Is there an explanation why %SuggestedStatus% works in This test depends on Expression, but does not in Action -> Advanced mode? I used ('%Status%'=='Normal') there instead.
- Is there an reason why test configured to Use Normal status instead of a Bad one via ('%SuggestedStatus%'=='Bad') has a actions triggered in “Bad” status actions and not in “Good” status actions. Normal is considered a good one, right?
- Is there an easy way to run test only when a test dependant on it requires it?

And feature suggestions if I may:
- Have ability to move items from “Bad” status actions to Good ones.
- Have a sorting and prioritising for actions, so that one can be sure that action at the top will be performed quicker and, say, with a configurable delay over the action down the bottom of the list (I know this being talked about, I used search, but it would really help in my opinion to reduce the complexity and a number of tests)

Thanks again for your help,
Have great holidays

KS-Soft · Post by **KS-Soft** » Thu Apr 21, 2011 2:48 pm

Would you recommend something for this issue: when reboot action is performed by HM an application on a server is forced to close and does that non-gracefully, is there a tool or a way to basically “click buttons” to shut app down?

Do you mean some application does not process WM_QUERYENDSESSION requests (Windows send such message on logoff/shutdown events)? So you need to send WM_QUIT message to such application?
There is kill.exe utility included into Advanced Host Monitor package (check Utils\ subfolder). You may use this utility with -t parameter.

Regards
Alex

KS-Soft · Post by **KS-Soft** » Thu Apr 21, 2011 2:59 pm

Is there an explanation why %SuggestedStatus% works in This test depends on Expression, but does not in Action -> Advanced mode? I used ('%Status%'=='Normal') there instead.

As I see this variable works just fine.
May be you made some mistake in expression?

Is there an reason why test configured to Use Normal status instead of a Bad one via ('%SuggestedStatus%'=='Bad') has a actions triggered in “Bad” status actions and not in “Good” status actions. Normal is considered a good one, right?

Perhaps you are using "advanced mode" action?
Then "good" and "bad" division has sense for users, not for HostMonitor. It checks logical expression regardless of section where you put this action.

Is there an easy way to run test only when a test dependant on it requires it?

What exactly means "requires" in this context? Do you want to perform master test only if its time to perform dependant test?
Then simply setup longer test interval for master test. E.g. if you perform dependant test every 5 min, setup 6 min (or 60 min) interval for master test.

Regards
Alex

KS-Soft · Post by **KS-Soft** » Thu Apr 21, 2011 3:06 pm

Have ability to move items from “Bad” status actions to Good ones

I think we will implement "copy" option for actions.

Have a sorting and prioritising for actions, so that one can be sure that action at the top will be performed quicker and, say, with a configurable delay over the action down the bottom of the list (I know this being talked about, I used search, but it would really help in my opinion to reduce the complexity and a number of tests)

Yes, we have such task in the list. I hope it will be implemented in version 9

Thanks again for your help, Have great holidays

And you

Regards
Alex

Rod · Post by **Rod** » Sun May 08, 2011 10:18 pm

KS-Soft wrote:
Would you recommend something for this issue: when reboot action is performed by HM an application on a server is forced to close and does that non-gracefully, is there a tool or a way to basically “click buttons” to shut app down?
Do you mean some application does not process WM_QUERYENDSESSION requests (Windows send such message on logoff/shutdown events)? So you need to send WM_QUIT message to such application?
There is kill.exe utility included into Advanced Host Monitor package (check Utils\ subfolder). You may use this utility with -t parameter.

Regards
Alex

Hi Alex,
The thing is that application is getting shutdown request fine and starts to quit, but there is an additional message box coming up asking Are You Sure? (OK/Cancel). Because there is no one to click OK there, app is closed forcefully after a timeout and there is a data loss occurring. So, the question is if there is something to emulate clicking X to close app and then answer OK to the final message?

KS-Soft · Post by **KS-Soft** » Mon May 09, 2011 8:53 am

We do not have such software but I think it exists. You may try to google it.

Regards
Alex