Over the past three weeks, we have experienced a number of network issues within our London facility. The majority of these have been isolated and small in nature, generally affecting individual or small numbers of clients for short periods. Others have been larger in nature and resulted in more wide-scale traffic routing issues to the wider internet from segments of our network.
Between 17:03 and 17:26 UTC today, we experienced a widespread outage of our London facility that affected a large number of our clients. This was unfortunately caused by our datacentre partner NTT Communication’s network engineering team making adjustments as part of their diagnosis of the underlying intermittent issues. They had needed to perform intrusive testing on one of the redundant routers that power the facility, and so they instigated a routine move of facility-wide traffic between their redundant core routers.
This is a routine operation that happens regularly behind the scenes without impact, however, today traffic began to drop unexpectedly resulting in the original issue essentially amplifying and affecting more clients and a much larger set of remote destinations. In short, outbound traffic was unable to progress past the routing infrastructure to various destinations within the internet, resulting in services largely going offline.
Our engineering team has been liaising closely with the NTT network engineering team over the past few weeks. As a result of the data collected both prior to and during today’s outage, they have determined that either a software or hardware issue within the Juniper routing platform is the cause.
A replacement of the routing equipment within the facility was already scheduled to occur during the first two weeks of March as part of a scheduled equipment refresh. Given the severity of this issue, NTT has taken the decision to bring that forward to tomorrow (22nd February). A separate maintenance announcement will be made for this shortly.
Again, we apologise for the inconvenience that this has caused. We’d like to assure you that all possible steps are being taken to restore full stability to the network within our London facility, and we look forward to providing our regular level of network uptime once again as a result.
The issue has now been resolved, we are awaiting a follow up from the network team.
We apologise for the inconvenience this has caused.
We are experiencing a further partial network outage today.
This was the result of the datacentre network admins making an update that was intended to be non-service impacting.
This is currently being reviewed as a matter of priority.