Plantronics + Polycom. Now together as Poly Logo

ongoing call failure problem with large deployment of HDX systems

jtiner
Occasional Contributor

ongoing call failure problem with large deployment of HDX systems

We have a large deployment of HDX systems(~120 total, 6000's, 7000's, 8000's, with most at 3.0.6 and 3.1.9) that have exhibited a new ( ~ one year ago) behavior where incoming calls are not answered, nor can an outbound call be placed even though the systems appear to be in a completely normal state locally and are reachable on the network. There have not been any systemic changes regarding network, codec configuration, etc.

Real time local and/or web monitoring shows no indication of an incoming call, yet codec system logs show the calls did reach the codec. Thorough inspection of codec logs seem to show a system process or processes that stop after stray network traffic touches the codec, and then there's a failure of remote viewing via the Web Director utility from the codec's web interface, and the diagnostic page reports that the time server is down. A codec in this state will not answer calls made from any other endpoint, MCU, or desktop client and can not place calls. A codec reboot restores service and calls can be placed/received as expected.

The problem doesn't have anything to do with the Web Director or time server functions; those sysmptoms are merely the manifestation of whatever process/processes have stalled on the codec. 

None of these systems are under service contract, but this is not a codec-specific problem and has occurred across many different codecs and software versions throughout our system. Because of the number of codecs involved, we have started to see this problem at least several times a week across the system. I'd be very interested to hear from other users that have experienced this same problem. 

Message 1 of 4
3 REPLIES 3
jdicn
Member

Re: ongoing call failure problem with large deployment of HDX systems

I have the same problem, I think.  One note I would add is that this problem is not found in any HDX 3.0.3 or 3.0.4 units or GroupSeries.  It has been found in many variations of 3.1.x HDX.  Usually the timeserver status is in failed state, calls failure is due to the codec and not the network/infrastructure, if telnet is enabled it will connect but give no login or it will respond there are too many connections established.  If the gatekeeper is forced to deactivate the registration, the codec does not recognize/report the status change.

Message 2 of 4
jtiner
Occasional Contributor

Re: ongoing call failure problem with large deployment of HDX systems

It appears that this problem has been resolved in our deployment. Tests were conducted over extended periods with various security and feature configurations enabled/disabled, and all of the logs we collected and parsed were passed on to Polycom.

It appears that the problems were mainly due to telnet being enabled on the codecs. While nobody ever gained access to the codecs, malicious probes overwhelmed the stack and bogged down some processes on the codec (manifested as the "time server" problem, odd web interface behavior, Web Director failure, etc., etc.). Polycom was able to verify the network activity was the root of the problem even though we could only see the indirect end results when we scoured the logs.

In our tests, we disabled SNMP, whitelisted IP ranges for web, altered sessions settings, stopped API access, and finally disabled telnet.  Although we had routinely used it in the past without problems, it's clear there was a sudden and severe increase in malicious telnet probes from many, many sources. 

Message 3 of 4
jtiner
Occasional Contributor

Re: ongoing call failure problem with large deployment of HDX systems

I should also note that any codec configuration changes attempted and how configurations are reported via the web interface may not be accurate when you're trying to troubleshoot an affected codec.

The failure of some processes when this problem occurs actually causes the web interface to behave strangely, so you may see reporting that's incorrect like features appearing to be off when they're on or vice-versa, and changes you make may not take. Make sure to restart the codec before checking/making changes.

Message 4 of 4