What’s your problem?

14 Aug
August 14, 2012

One of the topics that’s frequently raised to our support team is what they expect a platform to do on a particular failure from a (normally upstream) 3rd party. There’s a lot of FUD around about SIP error codes, so here’s a short guide including some pointers to the appropriate Q.850 codes. For those of you not based in the UK, note that the Q.850 to SIP error code mapping seen here isn’t what you’d expect if you’ve read ITU-T Recommendation Q.1912.5 (03/2004): “Interworking between Session Initiation Protocol (SIP) and Bearer Independent Call Control Protocol or ISDN User Part” – the NICC have issued modifications to Q.1912.5 in ND1017.

Sledgehammer

Essentially then, when you send an INVITE to a third party, you may get a wide variety of responses. If you are a responsible carrier, you’ll have multiple routes to reach a given PSTN destination which you’ll pick based on your own policy (usually least-cost-first). If you receive an error message from your upstream carrier, you’ll need to know what to do about it. I use the term “recurse” to describe this re-attempt via a different route, some people may know it as “route advance”.

The widest approach is based on the response code class:

Response codes in the SIP protocol

As a general principle you shouldn’t ever recurse on a 6xx error – it’s a global error indicating that you’ll never reach that end user for some reason. A 604 is a good example – the global brother to the well-known 404, it indicates that the user “Does not exist anywhere”. You’ll typically see that error when the service that you’re contacting is authoritative for that number, but the number isn’t associated with a user.

You should however recurse on 5xx error messages – a 5xx class indicates a problem with the server you’re talking to. Typically used when an exception occurs in software, the server itself can’t handle the message so you should try another route.

4xx class messages are a bit awkward. Essentially, they indicate that the request has reached its destination but that there was something wrong that means the call could not be completed. You shouldn’t really reattempt the call to this location without modification, but you can try another server if you wish. Providers and carriers approach 4xx messages differently – some try other routes on receipt of one, some don’t. The best approach we’ve found is a selective approach, based on experience and an understanding of what generates the 4xx message.

3xx messages are not errors. They are a redirection attempt to a new user or location that can serve this request.

Nutcracker

Let’s dig down a bit deeper – 3xx, 5xx and 6xx messages are pretty clear, but there’s some ambiguity on 4xx class. Some are pretty clear cut, like the 400 Bad Request indicating that there’s something fundamentally wrong with your request – or at least that’s what the server thinks. And that really brings us to the crux of the problem of dealing with 4xx messages – some 4xx errors are returned when your provider is having problems, some are returned when your end user simply doesn’t answer the phone. There simply is no easy answer to this but we’ve found a reasonable balance over time.

Essentially, we suggest that you should definitely not recurse if you get:

  • 401 Unauthorized – The server is requesting authentication, so the client now needs to re-request the resource with authentication
  • 407 Proxy authentication required – similar to 401, you need to do something at the client side before re-requesting
  • 481 Call/Transaction does not exist – this indicates an out-of-dialogue message has been received with no context- something has gone very wrong
  • 482 Loop detected – this typically indicates some sort of misconfiguration which needs investigation
  • 484 Address Incomplete – Unless you’re using overlapped dialling, the most common cause for this is a mis-typed number. You should catch this in your own normalisation before sending it upstream
  • 485 Ambiguous – Very rarely implemented and almost never seen between carriers
  • 486 Busy Here – Now, this is an anomaly. Strictly speaking, you should recurse on this code since there may be other ways to reach a user, unless you know a user is busy in which case you return a 600 Busy Everywhere. However, most platforms return this error when they actually mean to use the 6xx equivalent. So we suggest that you set this to not recurse.

We use a slightly different list when dealing with inbound requests into BroadWorks, for that we add these to the stop-recurse list:

  • 403 Forbidden – You don’t have permission, no amount of attempts will succeed here.
  • 404 Not Found – The user isn’t at this location or you’ve supplied a URI that you can’t reach through me.

The most difficult response to deal with is 480 Temporarily Unavailable. In theory, this is generally used to indicate that the user isn’t available to accept this call and you should try elsewhere. In reality, trying elsewhere will rarely get you any success. Moreover, in the RFC, they added the following gem:

This status is also returned by a redirect or proxy server that recognizes the user identified by the Request-URI, but does not currently have a valid forwarding location for that user.

Which suggest that you should recurse. To make matters worse, SIP protocol programmers seem to have chosen this error code as their catch-all error if they’re not quite sure what else to use and a number of SIP stacks return this when things go wrong. In particular, I know of one UK trunking provider that sends this message when certain error conditions occur internally.

So, what should we do about 480?

LASER

Part of the problem with the specification is that it defines a whole set of error codes but misses out a whole range of scenarios that traditional telephony worked well with. This is partly because SIP isn’t just to handle voice – it’s just that an extension of the PSTN is our most common use of it. Handily, there’s a way of using that extended information from the PSTN in error responses – Q.850 response codes.

The traditional telephony network (specifically, ISDN) has used these codes for a significant period of time to indicate the cause of a call being cleared, or finished. They are more plentiful, accurate and descriptive than, for example, 480 Temporarily Unavailable. You can map one SIP error ID to one or more Q.850 codes which doesn’t help that much when there are 4 or 5 Q.850 codes that match. However, RFC 3326 introduced a new header – the Reason header.

Simply put, where a particular response is sent, it can be useful to provide more information, for example:

408 Request Timeout
Reason: Asleep

would be a nice idea. In essence, this header support embedding a Q.850 code from the PSTN in addition to the SIP message. Now that comes in useful because not only do we know that an error matches the conditions for a 480, for example but it might also be one of the following Q.850 reasons:

  • 8 – Preemption
  • 9 – Preemption – circuit reserved for reuse
  • 16 – Normal call clearing
  • 19 – No answer from user (user alerted)
  • 20 – Subscriber absent
  • 27 – Destination out of order
  • 31 – Normal (unspecified)…

In addition, the Q.850 Reason header can be priceless in debugging what went wrong with a particular call.

So, in summary, enable Q.850 codes and keep an eye on your recursion.

 

Tags: , , , , ,
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *