Login

Forgotten your details?

« Back to previous page

Identifying, Avoiding and Resolving Network Problems

01 March 2008

Not all relationships should be based on trust. The relationship between IT managers and their networks is one such example, according to Claranet's Mike Rogers

While IT managers must be familiar with the multitude of technologies that make up a modern network, too much trust in the reliability of individual components can lead to the complacent attitude of "if I don't touch it, nothing will go wrong".

The danger of this approach is that when a part of the network malfunctions – and one inevitably will – those responsible for maintaining the network will be unfamiliar with individual components and will be unable to remedy the fault.

The costs of network downtime can be enormous. Not only can it cause short term revenue loss; it can also have a devastating impact on a business's reputation. Yet, according to Claranet's Technical Director Mike Rogers, by following a few simple precautions network administrators can minimise the threat to the networks that enable busineses to function.

Planning For Downtime

The holy grail of successful network management is planning. "Start with a pen and paper," Rogers suggests. "Plot a network map that shows everything in your network, from routers to individual cables. Then begin removing components from the map, and watch what happens to the rest of the network.

"When you can remove a point that completely disables the network, you've identified a Single Point Of Failure (SPOF)."

SPOFs are the weak spots in a network that need to be made more resilient. Often they can be difficult to spot, especially considering the amount of cabling and connections in today's offices. Identifying areas with multiple SPOFs  – where they are in the same location or run from the same power circuit, for instance – is critical to preventing downtime.

"Using multiple technologies is paramount in increasing resilience," explains Rogers. "For example, if an office has one DSL line to a datacentre which is the SPOF, adding another DSL line along the same copper wires will hardly help when a builder puts a spade through the bunch, or a DSL exchange goes out of action. Instead mixing delivery technologies and cable routes greatly reduces the chances of something going wrong."

In increasing resilience, network managers also have to plan for the unexpected and take unconventional routes around problems. Network outages are often caused by unforseen or uncontrollable events and the administrators must have planned to mitigate the effects.

"IT Managers are excellent at spotting the obvious weak spots such as cable breaks, hardware malfunctions or power loss," says Rogers. "But they also need to think about the unexpected – what happens to an SME if their only network engineer is off work for a month? What happens if we have an abnormally hot summer and equipment is damaged by overheating from a broken air conditioning unit? What if a supplier goes out of business?"

Achieving a Rapid Response

To act quickly and efficiently during downtime, administrators need an intimate knowledge of their network, each component, where it is sourced from and how to replace it. Fixing network problems in real time is almost impossible, according Rogers, so during downtime staff need to know how long an individual part will take to repair or replace.

"Staff have to make informed decisions on network components, from purchasing to using them. Which components have the longest fix time? In financial terms, is it better to wait for a technician to fix it, or just to replace it? Is the part from a shop down the road or must an engineer come from Europe? These become major issues during downtime, so in-depth knowledge of the network is paramount."

Service Level Agreements (SLAs) can be a useful method of determining a provider's confidence in its network, but should be treated with caution. Often a comprehensive repair agreement or guaranteed response time can better safeguard an organization's interests.

"Don't choose a provider purely on SLAs, which are commercial agreements rather than technical ones," says Rogers. "Just because one service provider offers a better SLA than its rivals, this is no guarantee of availability – it's simply a more optimistic prediction of the service to be provided. It's best to discuss your SLA requirements – and your willingness to pay to achieve these – frankly with your service provider. Consider "time to fix" or "hours to respond" criteria that meet your business needs, mitigating downtime problems with other technology."

Network Software

"An unmonitored network is an uncontrolled network," says Rogers. "Businesses must deploy network monitoring software that is appropriate to the size and relative importance of their network. Solutions should also be scaleable, able to monitor a network extended across different geographical areas and flexible in their reporting ability."

Knowledge of historical network trends is crucial – IT managers need to know what is normal for a network to prevent them making knee-jerk reactions when something goes wrong. A sudden spike in traffic that causes an outage does not necessarily warrant a costly, full-scale response if it occurs once in a year.

Cost vs. Benefit

IT Managers need to conduct a cost/benefit analysis for the cost of increasing network resilience. The costs of network downtime can be catastrophic as customers melt away and revenues dry up.

"For each SPOF, work out how much it will cost if that fails for a minute, an hour, a day, and the cost of preparing a backup," says Rogers.

"If a DSL line to a single office fails, the cost could be, hypothetically, £1000 per hour. However, if the router or switch connecting ten offices to the company mainframe fails, the cost will be correspondingly higher and the cost of having a backup technology in place seems lower that  it did in the pre-purchase discussions."

Using this simple analysis, administrators can build a list of the critical components and the SPOFs. Then they can prioritise the ones that need extra resilience.

Mike Rogers' Top Tips For Network Managers

1.    Have a network plan that identifies the downtime from individual component loss (e.g. cables, routers, power), including an uninterruptable power supply (UPS) to protect critical points.
2.    Use a monitoring platform that can instantly alert you when things are down
3.    Make sure bills are paid! There may be a £200-£300 SDSL connection on a £30 phone line, but if someone doesn't know what the phone line is for, it's often cancelled accidently
4.    Engage an Internet and communications provider that can work with you to identify and tackle resiliency needs. Don't manage any component of a solution yourself if you only have one person in the business who understands how it works.
5.    Look to host key business infrastructure in commercial data centres
where theft, power outages and air conditioning risks are minimised.

Mike Rogers, Network Operations Manager, Claranet

Our savings accounts can make banking simple and rewarding.
Business ResilienceEADS Defence & Security

Latest News

Healthcare Connections introduces pandemic flu pro… More…
20 November 2008

China denies space espionage following guilty plea… More…
20 November 2008

New boss for ID cards… More…
19 November 2008

UK most receptive to biometrics… More…
19 November 2008

RSS Feed symbol | What is RSS?
View all news items…

Latest Events

25 - 26 November, 2008
THE FUTURE OF THE CARBON MARKE…
Location: Le Meridien Piccadilly, London

2-3 December, 2008
ISNR London 2008 - The Interna…
Location: Olympia, London

3 - 5 December, 2008
Delivering Netcentric Operatio…
Location: Brussels, Belgium

View all events…

Key Articles

Is London on the brink of a data crunch?… More…
22 October 2008

The practical side of biometric security for the O… More…
22 October 2008

Tighter Budget, Canny Spending… More…
22 October 2008

Olympic Delivery Authority under pressure … More…
22 October 2008

RSS Feed symbol | What is RSS?
View all articles…


Design: Burnthebook