WatchDog, RCI connection problems

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
mp1
Posts: 199
Joined: Tue Mar 07, 2006 3:24 am

WatchDog, RCI connection problems

Post by mp1 »

Hi,

In the last time we have again and again problem with the RCI connection.
Because of that I have enabled the RCI logging in the option.

Now I get the following entries:

09/13/2011 10:06:11 RCI connection terminated. Operator: xxxx (Connected at 09/13/2011 09:37:52)
09/13/2011 10:06:11 RCI connection terminated. Operator: xxxx1 (Connected at 09/13/2011 04:06:00)
09/13/2011 10:06:11 WatchDog connection closed (Connected at 09/13/2011 07:15:18).
09/13/2011 10:23:17 WatchDog connection closed (Connected at 09/13/2011 10:11:23).
09/13/2011 11:18:02 WatchDog connection closed (Connected at 09/13/2011 10:23:17).
09/13/2011 12:26:28 WatchDog connection closed (Connected at 09/13/2011 11:18:02).


I can't see any performance problems.
Do you have any ideas?

HM 8.86 on Windows Server 2008 R2.

Thanks in advance

Martin
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

RCI connection closed - means RCC normally closed session.
RCI connection terminated - means connection was dropped because of some network error.
So, there is some problem.

While "WatchDog connection closed" recorded in both cases: network error or session closed by user.

Regards
Alex
mp1
Posts: 199
Joined: Tue Mar 07, 2006 3:24 am

Post by mp1 »

KS-Soft wrote: RCI connection terminated - means connection was dropped because of some network error.
So, there is some problem.
And what could this be?
At the same time, when the Wachtdog is alerting, also the RCI will be closed.
At the watchdog we have the condition:
... does not respond for 3 min
So this is already a high value, I think.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

And what could this be?
How can we possibly know? It can be anything: router, defect network card, buggy network driver, antivirus, especially antivirus... :(
RCC and WatchDog located in the same LAN where HostMonitor is running?
Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
Have you tried to collect some statistisc, e.g. using tracert or ping utilties (or HostMonitor Ping/Trace test methods)?
Are you using "Try to reconnect automatically" option for RCC? Is it reconnecting without problems?

Regards
Alex
mp1
Posts: 199
Joined: Tue Mar 07, 2006 3:24 am

Post by mp1 »

KS-Soft wrote:
And what could this be?
How can we possibly know? It can be anything: router, defect network card, buggy network driver, antivirus, especially antivirus... :(
RCC and WatchDog located in the same LAN where HostMonitor is running?
Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
Have you tried to collect some statistisc, e.g. using tracert or ping utilties (or HostMonitor Ping/Trace test methods)?
Are you using "Try to reconnect automatically" option for RCC? Is it reconnecting without problems?
Sorry, some more details:

The RCC connection will be closed on all our RCC clients and also the Watchdog service. Also the RCC local on the machine, where HM is running will be disconnected, so I don't think, this is a network problem.
The reconnect will work.

HM 8.86 is running on Windows Server 2008 R2. There is no AntiVirus or other special Software installed.

I think, we have more problems, since the number of RMA agent is increasing. At the moment we have 37 passive and 60 active agents.
The load is by 50 tests/sec. Hm is using about 320 MB VMem, 1749 Handles, 570 MB Adress space.

Do you think, that the number of Agents can be a problem?

Regards,

Martin
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

The RCC connection will be closed on all our RCC clients and also the Watchdog service.
At the same time?
HM 8.86 is running on Windows Server 2008 R2. There is no AntiVirus or other special Software installed.
I think, we have more problems, since the number of RMA agent is increasing. At the moment we have 37 passive and 60 active agents.
The load is by 50 tests/sec. Hm is using about 320 MB VMem, 1749 Handles, 570 MB Adress space
H'm, we don't have good idea what possible reason can be :(
97 agents should not be a problem, HostMonitor can handle more. Resource usage a little bit high but if you have a lot of tests, this can be normal.
How many threads created by hostmon.exe process?

Regards
Alex
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

I forgot about timeout. What timeout is specified for connection? Could you increase it?

Regards
Alex
mp1
Posts: 199
Joined: Tue Mar 07, 2006 3:24 am

Post by mp1 »

KS-Soft wrote:I forgot about timeout. What timeout is specified for connection? Could you increase it?
Watchdog Service and RCI client on HM server have timeout from 20 sec.
20
KS-Soft wrote:How many threads created by hostmon.exe process?
At the moment about 100 threads.

Regards,

Martin
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

100 threads is not a problem.
Could you setup some additional tests to check network connection (using Ping or Trace tests) and check CPU Usage on system where HostMonitor is started?

Regards
Alex
mp1
Posts: 199
Joined: Tue Mar 07, 2006 3:24 am

Post by mp1 »

KS-Soft wrote:100 threads is not a problem.
Could you setup some additional tests to check network connection (using Ping or Trace tests) and check CPU Usage on system where HostMonitor is started?
I have added some further checks, although I couldn't find anything.
CPU is by 3-10 %. I installed the RCI on a Windows XP SP3 client and I will be disconnected several times a day.

It's not the big problem, I have increased the Watchdog action now to 10 min.
On the other hand it would be nice, if we could find a solution ;-)

Regards,

Martin
Post Reply