Hi,
Since we updated Hostmon to v9.59(via the zip hotfix) we have had timeout issues on all our DNS tests. The DNS tests works fine for the most part, but several times a day all tests fail with timeouts. The tests check one A-record and compares to a fixed ipadress. We have configured the DNS tests to check every 10 minutes and if set to Bad when 3 consecutive failures occur. I know that the DNS-servers works 100% since we have other external tests and we also tests manually internally. It´s only Hostmon that fails on this. I have also manually started the test and seen it hang on "Checking". During this time I know the DNS works.
This started with v9.59. Never before the upgrade.
And it's only with the DNS tests, and all our DNS tests on all our different DNS-servers. The tests run from different active RMAs and also from the Hostmon server directly. We test all DNS servers with other tests and they work fine.
Any ideas on why this?
//Andreas..
DSN query problems since v9.59
We have tried lots of variations and different firewalls and paths. I can change RMA to a closer one to test DNS using internal ipadress and I also tried testing with a very remote RMA that goes through other firewalls I don´t have any control on just to test this. The error is still there. We have traced all logs in the firewalls and when hostmon timeouts there are no DNS packets even reaching the DNs-server. And none of the passing routers/firewalls log any DNS traffic from the rma/hostmon either. I have also executed the test manually and sometimes it just hangs on the test and no DNS-packet leaves the server. I hangs for a long time on "Checking" but if I simply run the test manually again it works(or not sometimes). And I have as I said other both internal and external dns tests not using Hostmon and they work fine.
Sounds like network card/driver related problem.And none of the passing routers/firewalls log any DNS traffic from the rma/hostmon either. I have also executed the test manually and sometimes it just hangs on the test and no DNS-packet leaves the server. I hangs for a long time on "Checking" but if I simply run the test manually again it works(or not sometimes).
Unfortunately this does not explain why RMA installed on remote systems has similar problem

We will check our code again but I don't think there is something wrong, it works for years and its not very complicated code...
You may use "repeat test" action to make it more reliable.
Regards
Alex
Just thought I´d let you know that this is still a problem for us, and provide some updates.
We have done lots of more tests and the only thing we can see is that sometimes we don´t get a reply from our DNS servers. This is probably one dns-request of 70 or so. Simply a packet-loss. This is probably normal when it comes to UDP and not a problem since the dns-clients just timeout after a couple of seconds and then retries and then it gets a reply. So our DNS traffic work as it should.
But the test from Hostmon still gives as errors on this a couple of times every day on both our DNS-servers. Those tests check for a specific A-record using UDP on port 53 with a timeout value of 5 seconds. The testinterval is set to 10 minutes and we need 3 consecutive errors to get an alarm. This should never happen according to all our other tests.
What I have discovered is that when the test starts and it doesn't get a reply from the dns-server(due to packet-loss) it doesn't timeout after the 5 seconds I have set. In RCC I can see that it starts with the status Checking and then never leaves that status. I would think that after 5 seconds if would change the status to Bad(or Normal in this case) and then the next test would most likely work. But what happens is that it just stays on checking for the next two tests(10 minute intervals) and when the third has failed I get an alarm.
Can there be a bug with the timeout setting in Hostmon for the DNS-test? Any other suggestion on why the test just stays on Checking?
Regards,
//Andreas..
We have done lots of more tests and the only thing we can see is that sometimes we don´t get a reply from our DNS servers. This is probably one dns-request of 70 or so. Simply a packet-loss. This is probably normal when it comes to UDP and not a problem since the dns-clients just timeout after a couple of seconds and then retries and then it gets a reply. So our DNS traffic work as it should.
But the test from Hostmon still gives as errors on this a couple of times every day on both our DNS-servers. Those tests check for a specific A-record using UDP on port 53 with a timeout value of 5 seconds. The testinterval is set to 10 minutes and we need 3 consecutive errors to get an alarm. This should never happen according to all our other tests.
What I have discovered is that when the test starts and it doesn't get a reply from the dns-server(due to packet-loss) it doesn't timeout after the 5 seconds I have set. In RCC I can see that it starts with the status Checking and then never leaves that status. I would think that after 5 seconds if would change the status to Bad(or Normal in this case) and then the next test would most likely work. But what happens is that it just stays on checking for the next two tests(10 minute intervals) and when the third has failed I get an alarm.
Can there be a bug with the timeout setting in Hostmon for the DNS-test? Any other suggestion on why the test just stays on Checking?
Regards,
//Andreas..