Preventive Maintenance Will Not Single-Handedly Solve Reliability Problems
Preventive maintenance has a role in industry but it should not be the dominant strategy. The focus should be on improving the reliability of equipment at every stage of its life.
In an ideal world, rotating machinery operates efficiently, at full-speed, without ever failing. If this were true, there would be no unplanned downtime or no need for planned downtime in order to perform maintenance work.
Moreover, there would be no safety or environmental incidents as a result of failure. In other words, the machinery would be perfectly reliable. Unfortunately, reliability requires proactive steps, and no matter what we do we will never achieve perfect reliability.
Traditionally, maintenance departments have developed strategies for dealing with equipment failures. They either simply react to the failures and get very good at returning them to service, or they attempt to detect failures, prevent them from occurring and react in a more orderly way if something does happen.
Reacting to the failures, typically referred to as reactive maintenance or breakdown maintenance, is very costly and dangerous – but also the most common strategy in industry today.
Detecting the onset of failure is called condition monitoring. Although, simply monitoring the equipment is not enough, it does allow us to avoid catastrophic failure.
What Is Preventive Maintenance?
Many organizations turn to the preventive maintenance strategy, after recognizing that they are paying a high cost by simply reacting to failure. But there is confusion over what preventive maintenance really means. There is even more confusion over whether preventive maintenance even prevents failure from occurring in the first place.
By some definitions preventive maintenance includes all tasks that may be performed in order to reduce the likelihood of failure. However, in the majority of implementations, it can be more easily defined as maintenance tasks performed according to some predetermined interval, and designed to perform restorative or replacement tasks before equipment gets a chance to fail. Thus, preventive maintenance may be best described as interval-based maintenance.
The interval mentioned could be running hours, distance travelled, production cycles completed, or some other interval related to age.
Maintenance tasks performed on the other hand could be the replacement of bearings or other components, or the opening and inspection of the machine to determine if repairs or replacement is required. These tasks may have been defined by the OEM, or by regulators. Or they may have been added to the maintenance plan because of some historical failure.
Overall, the expectation is that these maintenance tasks restore the machine, and therefore, help to prevent failure from occurring at a future date. However, in many cases these maintenance tasks actually harm the machine and increase the likelihood of failure. At the same time they can create opportunities for safety incidents to occur and also result in the wasting of valuable resources.
Why Do Machines Fail?
Components such as rolling element bearings wear out at approximately the same rate. For example, if you were to operate 30 bearings over a period of time you may think that they would all fail at approximately the same time, as depicted in graph 1.
If we were to plot the probability of failure of a large family of machines with rolling element bearings over time we might have suspected a graph shaped like the one shown in graph 2.
The flat region of the graph represents a low probability of failure. At some point in time the probability of failure increases – and then the machine fails when no action is taken.
Looking at the two graphs above, you could safely replace the bearings after approximately 225 million revolutions and you would avoid all the failures and you would not be wasting any residual life of the bearings.
Unfortunately, this is not the reality of the situation.
First of all, components like bearings tend to fail early in their life resulting in so-called infant mortalities. An updated graph 3 is commonly called a bath-tub curve. In actual fact, the above curve does not follow the reality of most equipment in industrial plants.
Instead of there being a period of time with a low probability of failure followed by a period where the probability of failure is higher, the probability of failure is actually constant over a long period of time. The bearings will fail at random times and there is no real way to determine (without condition monitoring) when. Graph 4 is the actual result achieved when 30 bearings were tested.
Graph 5 is an updated graph that shows infant mortality followed by a period of random failures.
Research was performed in association with United Airlines to study the failure patterns of a wide range of equipment. This study has been repeated many times in traditional industry and the results have been very similar – the only difference typically relates to maintenance practices and the percentage of equipment that suffer from infant mortality.
According to the study, most of the equipment we monitor suffers from random failure. In the graph 6 Type D, Type E and Type F, representing 89 percent of all equipment failures, all have random failure. The flat region of the graph indicates that there is an equal probability of failure to occur after one month, one year, or 10 years.
What Does All This Mean?
First of all, performing preventive maintenance does not make sense. If we replace a bearing after two years, for example, we have not improved the likelihood of the equipment running smoothly for the next two years.
Statistically the new bearing is no better than the old one. In fact, we have made the situation worse because we will be entering the infant mortality region.
We therefore have to change our maintenance practices with these failure patterns in mind:
- Do everything possible to reduce the likelihood of failure, which includes operating the equipment properly and conducting proactive maintenance tasks such as cleaning and proper lubrication. This is called precision operation and proactive maintenance.
- Do everything possible to reduce the likelihood of failure when the machine is installed and commissioned in order to reduce the likelihood of infant mortality, which includes the practices used when installing new bearings and aligning machines. This is called precision maintenance.
- Test the condition of the equipment to determine if failure will occur in the foreseeable future so that the components in question can be repaired or replaced at a time that is most convenient. This is called condition-based maintenance.
- Test the condition of equipment to determine if there is a problem that may cause the machine to fail so that we can prevent failure from occurring. We might call this proactive condition monitoring.
- If we cannot cost effectively determine the condition of equipment via non-intrusive visual inspections or testing with scientific equipment (including vibration analysis and infrared analysis), then it is appropriate to perform preventative maintenance. Of course, we need to know approximately how long it will take a component to fail so that we can determine the optimal time to perform this repair/replacement maintenance action.
- Avoid all of the planned maintenance tasks that can lead to an increased probability of failure. These include intrusive inspections, and component replacements mandated due to warranty requirements or a misunderstanding of the failure modes. Moreover, ensure that the design and procurement process prioritizes a reduction in lifecycle costs; ensure that we are storing spare parts in a manner that does not degrade the condition; and develop optimal planning and scheduling procedures to ensure the work is performed in the most effective manner.
How Do We Improve Reliability?
Much can be done to improve reliability. Some of it involves the maintenance department, and some of it does not. Let’s now look at all of the reasons why equipment fails, including:
1. The way equipment is designed affects reliability.
2. The procurement process affects reliability.
3. The way in which equipment is transported to site and stored on shelves affects reliability.
4. And the way the equipment is operated certainly affects reliability.
None of these activities are maintenance activities. But there are steps the maintenance department can take, including:
1. All maintenance jobs can be properly planned and scheduled.
2. All maintenance jobs can be performed with precision; that includes shaft alignment, belt alignment, balancing, tightening and fastening.
3. Proactive maintenance tasks should also be performed such as precision lubrication and frequentcleaning.
Moreover, regular nonintrusive inspections should be made and cost-effective condition monitoring tasks should be performed in order to detect when condition-based maintenance tasks are required.
Defect elimination is a name given to the proactive philosophy of looking for every root cause of equipment failure and proactively seeking to eliminate those root causes; whether they are related to the maintenance department or not.
This does not mean that we wait for failure to occur and then perform root cause failure analysis. Instead we learn from industry about all the common reasons why rotating machinery and electrical & process equipment fails and we act proactively to eliminate those root causes.
This is not a simple task as it involves the majority of the people in the organization – from the highest levels of management through to operators and craftspeople on the plant floor. But it is an important task and it is the only way to truly improve reliability and achieve the highest levels of production, competitiveness, safety, and protection of the environment.
Neste Engineering Solutions has performed a dynamic simulation for Kiilto Oy, a producer of chemical industry products. The purpose of the dynamic simulation was to get a better understanding of how Kiilto's production facility's polymerizing reactor behaves in possible disturbances. The production facility is located in Tampere, Finland.
We have all read about it: leak detection should be a top priority since, if no leak detection program has been implemented, leaks can account for 30 to 40% of consumed volume. So, why is this issue still on the table? Why is it difficult to change things in the field?