I believe that I listed our Polycom ticket # in a previous reply. I was told by our reseller that it is 1-494027911; this should be the Polycom case # and not the case # that our reseller uses internally in their own ticketing system. I don't expect EMEA support to be aware of or working on it since I am in the United States. These latest notes of mine may not even be posted in the ticket yet since I sent them to our reseller after they had closed for the day.
I am being told by our reseller that our ticket is working its way up the escalation chain. Polycom U.S. now has in their possession a set of scripts that I wrote which flood the phone with SIP NOTIFY BLF update messages (dialog-info+xml), and I also have reason to believe that somebody at Polycom EMEA (perhaps Steffen?) has also obtained a copy as well. I hope before too long that the actual software engineers will be merrily reproducing the issue in their lab based on my report and with the assistance of my tool.
In the meantime, this process has dragged on for long enough that my co-workers are quickly losing patience with the "horrible new phones." The things I learned about the nature of this bug while running tests and preparing my report made me realize that there might be a way to at least work around the problem on the PBX side of things while we all wait for an official fix from Polycom. This afternoon, I dived into the chan_sip source code for Asterisk and came up with a one-line patch that should prevent Asterisk from sending gratuitous dialog-info SIP NOTIFYs with "<status>terminated</status>" in them to the phone. In a normal environment, this should be all that is needed to prevent the phones from falling on their faces.
If you are affected by this UC bug and you are an Asterisk user, and you have the option of rebuilding Asterisk from source for your production environment, apply this unified diff patch to your source tree before building:
diff -r -d -u a/channels/chan_sip.c b/channels/chan_sip.c --- a/channels/chan_sip.c 2012-07-09 07:38:18.000000000 -0700 +++ b/channels/chan_sip.c 2014-03-13 17:50:59.000000000 -0700 @@ -25047,6 +25047,7 @@ } ast_set_flag(&p->flags, SIP_PAGE2_DIALOG_ESTABLISHED); transmit_response(p, "200 OK", req); + if (firststate != AST_EXTENSION_NOT_INUSE && firststate != AST_EXTENSION_UNAVAILABLE) transmit_state_notify(p, firststate, 1, FALSE); /* Send first notification */ append_history(p, "Subscribestatus", "%s", ast_extension_state2str(firststate)); /* hide the 'complete' exten/context in the refer_to field for later display */
This patch is based off of an untouched Asterisk 18.104.22.168 source tree, and may not apply (cleanly or otherwise) to any other version. I'm sure that Asterisk 10, 11, and 12 probably have a similar construct, but you may have to go hunting for it, and the constants used in later versions of Asterisk may be different.
If you aren't using Asterisk, or if you can't or won't rebuild from source, then I'm afraid that at this point you will have to wait it out.
...and just to clarify: this patch was not written to correct something that Asterisk is doing, but just to work around the issue in Polycom UC. Asterisk is not doing anything "wrong" here by sending those NOTIFY dialog updates immediately after SUBSCRIBE, and even if it were, the phones should not respond by eating themselves and crashing. Remember the robustness principle: "be conservative in what you send, and liberal in what you accept."
As an update, I have positive news to report: I have still not heard anything back from Polycom or my reseller about any further progress to my ticket (other than that it was escalated last week), BUT I have not gotten any more complaints about our phone crashing since I implemented my workaround to the firmware bug in Asterisk last week. Not one peep! Not only that, but I've been watching the Memory Usage graph on the phone ([Menu] -> 2 -> 4 -> 2 -> 3), which would previously creep up and up until the phone crashed, and it hasn't budged a single percentage point this whole time.
Glad to have found this. I am having the same issue as well with one particular phone that has 3 sidecars and over 40 BLF instances. It does not like it when one particular phone is used to dial all the phones at once. Too many sip notify messages at once seems to cause a problem.
It sounds to me like you might be running a version of UC older than 4.0.5 on your phones. Before 4.0.5, you would experience the problem you are talking about, which does seem to be related to the rate of SIP packets the phone is being sent. On 4.0.5 and 4..0.6, though, the BLF functionality seems to have undergone some significant engineering changes in the implementation. The old problem, as far as I have been able to tell in my testing, appears to be gone, but is also replaced by a new one: the memory leak that eventually causes the phone to crash.
If you are using Asterisk and have the option to recompile Asterisk from source on your production server, you should be able to achieve stability by upgrading to 4.0.6 and rebuilding Asterisk using my patch. Otherwise, you will have to wait until Polycom releases a fix, and choose the lesser of 2 evils in the meantime: having the phones freak out when they are sent too many SIP NOTIFYs (UC 4.0.1 - 4.0.4), or having the phones crash every day or every few days, requiring a reboot (UC 4.0.5 - 4.0.6). (If while running 4.0.5+ you discover that your phones can generally manage to get through an entire workday without crashing, you might try scheduling a mass-reboot of all affected phones every 24 hours, like say at midnight. That will at least free up whatever memory has been consumed by the leak during the day.)
Any updates on this? I am fighting this problem with 3cx and polycom 650 with BLF's. Version 4.0.6 is showing the same behavior.
I have asked for an update, but have not heard anything back yet. Yes, 4.0.6 still has the issue.
In my case, the lesser of two evils is to just let it ride (and I am on 4.0.3, so good call there). I have one phone that is affected and only two cases where it can happen that are very rare (2-3 times a year). I plan to keep an eye on this thread for updates to see if/when the memory leak is fixed as I have quite a few Polycoms at 9 different locations, so I don't upgrade on a whim, especially since this affects one phone.
Thanks for the updated information and what you did to troubleshoot this. I look forward to hearing about a corrected firmware in the future.