As you may know, at Welltory we analyse heart rate variability data, the sympathetic and parasympathetic nervous system performance, physical and mental stress and energy levels.
Recently and completely accidentally we have noticed that our server load graph looks very similar to heart rate variability graphs.
Figure 1. HDD rotational latency
Figure 2. RR intervals
What is heart rate variability analysis?
Heart rate variability analysis is a method of analysing RR intervals, the intervals between heart beats. This is what they look like on a standard ECG
Roman Markovich Baevsky is a prominent scientist, Doctor of Medicine, Professor, Member of the Russian Academy of Sciences, one of the founders of aerospace cardiology and Honoured Scholar of the Russian Federation (2003). He was tasked to remotely assess astronauts’ health status in order to know when to take control of the rocket if they suddently feel unwell. This is when the heart rate variability analysis method was developed. Scientists were able to detect a lot of things using this method.
Let’s skip the specific medical terminology and explain it in simple terms. Heart rate variability (HRV) is the language that our heart speaks. Heart rate variability data is a map which reflects the condition our autonomic nervous system is in. This index can show quite well whether your body is exhausted and is not able to restore energy or adapted to everyday activities.
At first glance it may seem that our heart beats are perfectly regular, but it’s not true. Just like any high-tech gadget our heart has a “sensitive sensor” — a pacemaker, which controls the amount of blood our body needs and the time of its release. This process is governed by the brain which monitors the state of our body by assessing its needs. That is why this index is called variability. And it is variability that can tell us what state our nervous system is in.
What did we do next?
We asked our friends to send us their server load data. It looked pretty similar.
Then we did a simple thing, we applied the algorithms that are used to assess people’s energy level to servers. What do you think happened next?
We were able to estimate HDD wear and tear with a 5% margin of error.
Let’s look at the details.
Rotational latency is the time that is needed to rotate the disk to bring the required sector under the read-write head. You can read more about it on Wikipedia.
If the disk fragmentation level is high its latency keeps changing. Fragmentation can be caused by many things, one of them is bad sectors. So if the rotational latency in the background mode has some evident periodicity (just like in human ECG) then the disk is not in a critical state. Continuing the analogy, a sprinter’s heart beat intervals during a run and a hard drive activity graph that shows hard drive performance in processing large amounts of small files will share a common trait — apparent rises (QRS complexes) per unit of time.
From theory to practice. Let’s remove noise from the original signal (the usual linear Fourier transform) and imagine that Figure 1 is a human cardiogram. Now let’s create a correspondence table for a person’s energy level, HDDs’ percentage of bad sectors and hard drives’ power-on time.
|№||Person’s Energy||Bad sectors||Power On Time Count|
Note: the first hard drive was bought second hand and its actual power-on time is in question. It may have been manually changed.
- It is obvious that uneven RAID wear raises questions about both hard drives’ performance (RAID 0) and reliability (RAID 1). We are planning to create a new RAID-RR specification in which hard drives will wear out simultaneously! This RAID configuration will ensure that hard drives have the same lifespan which will be easy to predict.
- Developing a Zabbix plug-in to see HDD lifespan graphs.
- Applying the technology to analyse processor and video card lifespan.
We are planning to investigate the phenomenon more thoroughly. If you would like to participate in developing this technology, please, let us know in the comments to this post, we will contact you and discuss what to do next!