Downtime, failures and errors - Understand their true costs
This content is brought to you by Evolven. Evolven Change Analytics is a unique AIOps solution that tracks and analyzes all actual changes made in the enterprise cloud environment. Evolven helps leading companies reduce the number of incidents, improve troubleshooting time and eliminate unauthorized changes.Learn more
When it comes to mission-critical applications or data center performance quality, companies are willing to make huge investments. Unfortunately, these investments do not always deliver full performance.
Handles system downtime
Despite the efforts that have been put into infrastructure resilience, many IT organizations continue to deal with database, hardware and software downtimes lasting from just a few minutes to several days, completely crippling the business and causing huge losses cause.
Downtime is expected
A world of IT outages can sometimes seem uncomfortable.
Despite the variety of advanced solutions and the growing amount of data being collected by major software vendors and IT departments (from ERP to CRM and more), outages are still a valid and frightening threat to the industry.
On the other hand, IT failures have somehow become an accepted, even expected, part of corporate life.
This is counterintuitive...
IT downtime revised
While IT professionals experience downtime from time to time and are fully focused on overcoming it, the business organization as a whole suffers the "financial pain" of the impact, which is usually very significant.
Previously, we delved deeply into the many ways IT downtime can impact the bottom line of organizations (you can read more about it here –Cost and scope of unplanned outages). We looked at various aspects, from direct sales losses and damage to reputation to indirect effects such as reduced productivity.
Now, I want to revisit the issue and examine how organizations should address and assess threats to their IT operations, including systems, applications and data, by analyzing robust (and established) benchmarks that represent the potential costs of downtime and outages .
Measurement of major fire failures
When will the industry start measuring the financial impact of major fire outages like the recent one?Facebook, Theone that affected hundreds of thousands of Lloyds Bank customers, or theJetstar failurewhich led to hundreds of flight delays?
In other words, when is an outage "significant enough" that a cost analysis becomes valuable for the industry to learn from and predict the impact of future outage events?
Well, apparently at some point the fallout creates an impact that PR-wise can't ignore. It is the point of no return followed by financial consequences.
The cost of downtime varies significantly between industries. The size of the respective company is of course a critical factor, but not the only big one. The role of IT systems in the company is also central.
Putting a numeric value behind an IT outcome means pre-defining its impact across multiple business and organizational aspects so the entire industry can learn and optimize accordingly.
A failure in a critical application can result in two different types of losses:
- Application service outage – the impact of downtime varies by application and organization;
- Data Loss – The potential loss of data due to a system failure can have significant legal and financial consequences.
Well, I'm sure you'll agree that today's data centers must never go down; Applications need to be available 24/7, and internal (let alone external) end users worldwide need to be able to rely on data center availability (for critical data and application availability) at all times.
Well, reality bites. This is not the case in the back office (i.e. within the data center). No organization enjoys 100% uptime. Should you strive to achieve 100%? Naturally. But you should also develop a deep understanding of the impact of downtime and ways to minimize it.
Worst Breakup Nightmare Ever? Probably what happened to you...
Some previous outages have turned into PR disasters, like the mythological Virgin Blue debacle of 2010 or the most recent one that ravaged Facebook.
Why? Crowd influence probably had something to do with it.
As a reminder, Virgin Blue's outage prevented passengers from boarding flights for 11 days (!!), resulting in negative press, damaged reputation and lost millions of dollars.
More specifically, Navitaire, Virgin Blue's reservations management company, eventually compensated Virgin Blue for more than $20 million (Navitaire booking error brings Virgin $20 million into compo).
There are many other incidents that still attract media attention. Here's just one of the latestUSA Today article about Wells Fargo outagewho prevented customers from accessing their accounts for many hours.
It's safe to say that everyone in IT will agree that outages or downtime are VERY bad for business. They are undesirable, economically very harmful and must be combated with all available means.
Misconfigurations are key
The IT Process Institute's Visible Ops Handbook previously reported that "80% of unplanned outages are due to poorly planned changes made by administrators ("ops") or developers" (Visible operations).
The Enterprise Management Association reported that 60% of availability and performance failures are due to misconfigurations.
What does it cost?
Downtime can cost organizations $5,600 per minute and up to $300,000 per hour in web application downtime (according to a2014 Gartners Analyse).
The average hourly cost of enterprise server downtime worldwide, 2017-2018:
Application maintenance costs are increasing at 20% annually. But it cannot solve all your problems. A previous industry survey found that at least a quarter of respondents' downtime was caused by configuration errors. (How much will you spend on application downtime this year?).
How common is downtime or breakdowns?
Ok, downtime can be a financial nightmare. This part is done. However, if you want to properly assess the risk potential of a disruption to your business, the immediate question should be, "How likely is it to happen?"
Those:Data center knowledge
Ok, so breakouts are way too common to be ignored by thinking, "I probably won't have a bigger breakout." Now the question arises as to how you can calculate the specific risk for your company.
Production and application costs clarified
Unplanned outages must be resolved by IT. Nonetheless, and as previously mentioned, these disruptions ultimately affect the entire organization.
An important part of a thorough disruption risk assessment is estimating how much money you will lose per hour (or minute, or other time interval of your choosing) in the event of downtime.
For businesses that rely solely on data centers' ability to provide IT and network services to customers -- such as telecom service providers or e-commerce companies -- downtime can be particularly costly, with the highest cost of a single event reaching $1 million (more ) exceed $11,000 per minute) according to expert estimates.
In a USA Today survey of 200 data center managers, over 80% said their downtime costs exceeded $50,000 per hour. Over 25% reported downtime costs in excess of $500,000 per hour (!!).
According to another study, while companies cannot achieve zero downtime, one in ten companies states that their availability must be greater than 99.999%.
To get a thorough understanding of the impact of production and release downtime, let's take a look at how the consequences of downtime manifest themselves.
Downtime costs - per year or per incident?
INSurvey 2017found that 46% of 400 IT decision makers experienced more than four hours of IT-related downtime over a 12-month period; 23% said they had costs between $12,000 and more than $1 million per hour.
Over 35% admitted they are unsure of the cost of an outage to their business.
If you ask Delta Airlines, which canceled 280 flights in 2017, the losses from a single disruption eventcan reach over 150 million dollars.
A few years ago, Dun & Bradstreet reported that 59% of Fortune 500 companies experience at least 1.6 hours of downtime per week.
If you take the average Fortune 500 company (or any company with at least 10,000 employees) and assume that they pay an IT team an average of $56 an hour, then (assuming all IT is busy with downtime ) the labor alone being a part of the downtime for a company this size would reach $896,000 per week, which is more than $46 million per year (Assessing the financial impact of downtime).
Of course, the reality is more complicated since you have to take into account many parameters such as the timing of the event (mid-week or weekend? day or night?) and more. However, understanding the cost of downtime greatly helps in assessing your risk potential and the ROI of tools that can help minimize the impact of downtime.
Has the industry been able to learn from the past and minimize collateral damage in the event of an outage?
How have things changed from before?
So we already know that downtime and outages are still happening today and the industry has yet to successfully eliminate them. But how have their costs changed over time? Are these incidents less harmful today?
I 2010a study by Coleman Parkesfound that IT downtime costs organizations a total of more than 127 million man-hours per year – an average of 545 man-hours per year. Company - in employee productivity.
In 2009, it was reported that the average cost of downtime varies significantly by industry, from about $90,000 per hour in the media sector to about $6.48 million per hour for large online brokers (How to quantify downtime).
According to a survey of IT managers conducted over the past few years, companies are becoming more aware of the direct financial cost of computer failures. The study found that one in five companies loses $12,000 an hour due to system downtime (How to quantify downtime).
As mentioned above, later analysis conducted by Gartner in 2014 found average costs of $5,600 per minute and over $300,000 per hour.
As early as 2004, a conservative estimate by Gartner put the hourly cost of computer network downtime at $42,000. Therefore, a company that suffers an above-average 175 hours of downtime per year can lose more than $7 million annually. But the cost of each disruption affects every business differently, so it's important to know how to calculate the exact financial impact (How to quantify downtime).
It makes sense to think that the cost of outages will only increase over time (since we all rely more on data systems today). You can therefore understand why past dates can be multiplied by a significant number to reflect today's reality...
Every minute counts
Over a decade ago, the average cost of a data center outage was estimated to be approximately $5,600 per minute across industries (Unplanned IT outages cost more than $5,000 a minute), a number accgardener, remained the same until 2014. The Ponemon Institute predecessor study referenced above calculated the minimum, mean, mean, and maximum cost per minute of unplanned outages, based on inputs from 41 data centers. The largest cost of an unplanned outage has been found to exceed $11,000 per minute.
On average, the cost of an unplanned outage is likely to be over $5,000 per minute.
It will only gain in importance
INSurvey 2013recorded an increase of over 41% over the previous averages described above and an average price of more than $7900 per minute.
AITIC survey 2015has clearly shown that hourly costs have increased by 25% to 30% (compared to 2008 data).
Impact of downtime per year
A previous analysis by Gartner calculated that downtime can reach an average of 87 hours per year. It is of course the sum of many results - anything from a few minutes to several hours (An average large enterprise experiences 87 hours of network downtime per year).
How have things changed?
One laterResearch from 2011revealed that while the industry has managed to combat the downtime epidemic and reduce its frequency, we are still seeing significant downtime and huge revenue losses (Source:resulted in over 3 million (apparently Whatsapp users) migrating to Telegram)
Impact on reputation and loyalty
How much is your company's reputation worth? This can be extremely difficult to assess, as can the long-term impact of a damaged reputation and its impact on sales and profitability.
In this case, the cost of failure includes lost customers (both short- and long-term) and other tangible assets that reflect the cost of reputational degradation, such as B. inventory declines, marketing times (crisis management and brand recovery) and the media budget required for the restart and brush up. the profile of an organization.
Which parameters should influence your calculation?
When trying to estimate the cost of downtime, there are the obvious direct costs (e.g., lost business during the downtime). However, many indirect costs, such as the overhead or reputation issues mentioned above, should also be considered.
Personnel costs derive from the cost of burning "war room" tasks focused on getting the IT systems back up and running, the cost of delays in all other scheduled tasks, the cost of staff overtime (if applicable ) and more. Add to this the value of data loss, emergency maintenance fees (especially if the outage occurs outside of business hours), and additional repair costs that can persist long after service is restored.
Of course, you need to take these costs into account when evaluating the impact of downtime, as they are usually very significant. But even a rough estimate can prove extremely helpful in understanding the risks and deciding what level of technology to lean on to combat them.
There's also the impact of lost sales. To get an accurate estimate of total lost sales, the impact rate needs to be increased to reflect the true lifetime value of customers who permanently switch to a competitor. For example Facebook (and Whatsapp) as I mentioned beforeUnconscious Cost: Denies the true cost of network downtime. How much revenue will be lost if these users are shown fewer billable ad impressions?
The stock fell 25%
Although it is difficult to quantify so many parameters, they are nonetheless significant and significant. For example, when Amazon.com went offline for several hours in the early days, the stock fell 25% in a single day (Unconscious Cost: Denies the true cost of network downtime)!
HappyAmazon SkyudfaldFor example, the company continued to struggle to bring its cloud services back online. As a result, many customers questioned the reliability of their cloud and how Amazon communicated about the outage. Other customers felt they should be compensated for the downtime as part of their SLA.
I know you're curious: As for the SLA, Amazon's EC2 SLA was not breached despite nearly four days of outage (Seven lessons to learn from Amazon's disruption).
Downtime costs: Calculate them yourself
How much do you have to lose from an unexpected server or business application failure?
According to multiple sources, the easiest way to calculate potential lost revenue during a power outage is to use this equation:
|LOSS OF SALES||=||(GR/TH) x I x H|
|GR||=||annual gross income|
|TH||=||total annual working time|
|I||=||effect in percent|
|H||=||Number of hours of downtime|
How do you minimize the risk of breakdowns and downtime?
Downtime and failures are catastrophic, but they don't have to be that big of an impact. By using solutions that focus on getting to the root of the problem, failures can be prevented before they even happen.
Evolved change analysishas developed a unique AIOps solution that focuses on change - the real cause of performance incidents. Evolven helps enterprise IT and cloud ops teams prevent and remediate incidents before problems arise.
Contact usto see how we are helping leading companies reduce incidents and MTTR.
Quick downtime calculator
To get a quick estimate of your company's probable downtime costs, use the following formula, based on the size of your business and the number of minutes your most recent incident lasted: Downtime cost = minutes of downtime x cost-per-minute. For small business, use $427 as cost-per-minute.
In industrial environments, downtime may refer to failures in production equipment. This type of downtime is often measured as downtime per work shift or downtime per a 12- or 24-hour period. Downtime duration is the period of time when a system fails to perform its primary function.What is downtime What are the costs associated with downtime? ›
Downtime cost is defined as any profit that a company loses when its equipment or network stops functioning. The cost of downtime implies not only direct financial loss but can have an impact on your company in at least the other 4 ways.What are the two major considerations when calculating the cost of downtime? ›
Calculating Downtime Cost
The duration of the downtime and the cost incurred per minute you're offline are the two variables that most affect the financial impact of an outage.
TDC is a methodology of analyzing all cost factors associated with downtime, and using this information for cost justification and day to day management decisions. Most likely, this data is already being collected in your facility, and need only be consolidated and organized according to the TDC guidelines.What are the three types of downtime? ›
Common categories of downtime include excessive tool changeover, excessive job changeover, lack of operator, and unplanned machine maintenance.What are the main causes of downtime? ›
This can be due to several reasons including hardware or software failure, human error, malicious attacks or natural disasters. Since unplanned downtime is unexpected and occurs without a warning, preventing it can be a challenge.How do you explain downtime? ›
a time during a regular working period when an employee is not actively productive. an interval during which a machine is not productive, as during repair, malfunction, maintenance.What are the two types of downtime? ›
Downtime falls into two categories: planned and unplanned. Planned downtime is notable because it offers advanced warning and gives users a chance to prepare. Planned downtime is usually done for upgrades or maintenance to the network infrastructure.What is a high cost of downtime? ›
How Much Does Downtime Cost a Company? The average cost of downtime is significant. Each minute costs an average of $9,000, according to the Ponemon Institute, bringing the downtime cost per hour to over $500,000.
- Not-Utilizing Talent.
- Motion Waste.
- Excess Processing.
- Track Downtime. Before jumping into the steps of reducing downtime, it is critical to track it. ...
- Monitor Production. Having a system to monitor production can also help reduce downtime. ...
- Create a Preventative Maintenance Schedule. ...
- Provide Operator Decision Support. ...
- Perform DMAIC Analysis.
For example, the average automotive manufacturer loses $22,000 per minute when the production line stops. That quickly adds up. Overall, unplanned downtime costs industrial manufacturers as much as $50 billion a year. Downtime costs aren't limited to direct labor, production or finances.What are the two basic stages of costs? ›
Explanation: Costs are accounted for in two basic stages: accumulation followed by assignment. An actual cost is the cost incurred-a historical or past cost. Accountants define a cost as a resource to be sacrificed to achieve a specific objective.What are the two techniques used in the cost control process? ›
- Planning the budget properly. ...
- Monitoring all expenses using checkpoints. ...
- Using change control systems. ...
- Having time management. ...
- Tracking earned value.
All manufacturing downtime reduces overall output by stopping production. Unplanned downtime can cost 15 times more than planned downtime. The loss of revenue during any type of asset maintenance can be as high as $3 million per incident.What is true cost analysis? ›
True Cost – From Costs to Benefits in Food and Farming
True Cost Accounting (TCA) is a new way of identifying the real costs of a specific product or service. TCA calculates not only the direct costs like raw materials and labour, but also the effects on the natural and social environment in which a company operates.
Key Takeaways. Mean Downtime is the average amount of time that an asset is required to be down to perform maintenance and repairs.What is Level 3 downtime? ›
Downtime Level 3 - Operations are defined as localized, scheduled or unscheduled problem involving the loss of multiple functions, applications, or systems, not anticipated to exceed 24 hours of unavailability. For a level 3 the problem can be resolved using all available resources.What is difference between breakdown and downtime? ›
Downtime can be planned or unplanned activity but the breakdown is entirely an unplanned activity. A planned event such as scheduled downtime is cost-effective compared to an unplanned event such as a sudden breakdown. Planned downtime does not delay production whereas breakdown time can cause delays in production.
Consequences of unplanned downtime
Lost productivity and revenue: Every minute of downtime can result in lost productivity and revenue, affecting a business's bottom line. Decreased customer satisfaction: Unplanned downtime can lead to delayed deliveries, canceled orders, and frustrated customers.
We define downtime as a time when employees are involuntarily idle in their work tasks, due to equipment or technological malfunction, project bottlenecks, or a lower volume of in-person customer interaction.What is an example sentence for downtime? ›
After a busy day at work, I look forward to some downtime at home. The kids napped during their downtime. We need to minimize network downtime.What is downtime in maintenance? ›
In manufacturing, “downtime” occurs when an unplanned event halts production for a period of time. This event can be a malfunction, repair, or changeover of tools or equipment. Maintenance downtime in particular is when a machine is not operating or being productive due to required maintenance work.What are the benefits of downtime? ›
Downtime gives us time and space to enjoy our personal lives and get personal tasks done. It grants us time with family, friends, and our hobbies. On a brain level, it allows us to reach homeostasis and is a necessary break from the aroused state, Dr. Hanson says.What is managing downtime? ›
Downtime management enables you to exclude periods of time from being calculated for events, alerts, or views that can skew CI data. To access. Administration > Service Health > Downtime Management. Alternatively, click Downtime Management.Why is it called downtime? ›
downtime (n.) also down-time, 1952, "time when a machine or vehicle is out of service or otherwise unavailable;" from down (adj.) + time (n.). Of persons, "opportunity for rest and relaxation," by 1982.How can we minimize the risk of system downtime? ›
- Test Server Backups On A Regular Basis. When a server goes down, you can mitigate damage by restoring it quickly. ...
- Utilize Cloud Solutions. ...
- Keep Everything Up To Date. ...
- Invest In Reliable Equipment.
For example, in the auto industry, downtime can cost up to $50,000 per minute. That's $3 million per hour. 400 The true downtime cost includes a variety of wasted business support costs and lost business opportunity costs because resources were needed to resolve a downtime incident that probably didn't need to happen.What is the industry standard for downtime? ›
World Class Standards For Downtime
Aim for unscheduled downtime to be 10% or less.
Database outages can have a significant impact on top line revenue. In fact, according to a survey conducted by ITIC, 98% of organizations say a single hour of downtime costs over $100,000, while 81% report that it costs over $300,000. And that's just for a single hour!What is the average cost of downtime in a data center? ›
According to Gartner, downtime costs $5,600 per minute on average. This results in average costs between $140,000 and $540,00 per hour depending on the organization. Some factors that contribute to the costs associated with downtime include: Lost sales.What is average downtime? ›
Average downtime is usually built into the price of goods produced to recover its costs through the sales revenue. Opposite of "uptime." Also called "waiting time."Why is automotive work so expensive? ›
Diagnostic Labor – This requires significantly more training than a repair laborer, as well as different tools, both of which require training and exact a significant expense. Repair Labor – This requires a significant amount of training and experience, which master technicians take many years to accrue.
The first way to measure your equipment downtime is in actual time. For a given asset (or set of assets), record the amount of time during each month that the asset is broken down. Keeping a running tally and comparing it to past months will help you know when an asset is having more issues than normal.