Wednesday, May 15, 2019

Firstbeat VO2 estimation - valid or voodoo?


Although I've had a Garmin Fenix 5 for a couple of years, I've never paid much attention to it's VO2 max calculation number.  I had no idea how it was measured and was more concerned with what was my maximal aerobic power (MAP in watts).  

For training purposes, knowing your MAP allows one to train near your VO2 max, which has been shown to be among the best stimuli for fitness enhancement.  In addition, tracking your MAP over time may provide feedback on aerobic performance status.  With the release of a new line of watches (Marq and Forerunner 945), Garmin has brought the VO2 max tracking to the forefront.  In fact, with the Marq athlete, the VO2 max has a hard coded indicator on the bezel.  Although the VO2 max noted by the watch is not helpful in knowing what cycling power to choose in regards to training, it may still be a reasonable way of tracking changes in cardiac fitness.  Given my skeptical nature and experience with the conflicts of corporate profit with medical science, I decided to look into the validity of the Firstbeat VO2 max estimation.  

First some early basic physiology observations.
VO2 is the rate of oxygen usage by the body during a given amount of exercise.  As an analogy, think of it as the amount of O2 (air) used by your auto engine cruising along the road.  The gasoline used is analogous to the carbohydrates your body is using as fuel.  In either case a certain amount of oxygen (along with the fuel) is required depending on the load.  Running up a mountain at the same speed as on flat land will require more O2 (and carbs), just like your automobile will require more fuel and O2 to tackle the mountain than level ground.  The VO2 is simply a "gas" measurement, a volume of O2 used per minute (ml/min or ml/kg/min).  The VO2 max is the greatest amount of O2 (per minute) you are able to utilize at a high work load.  Going higher on the workload (more watts) is possible but at the expense of anaerobic metabolism since O2 usage is already maxed out.  Calculating the true VO2 is difficult and needs elaborate gear, so many alternate formula, nomograms and other methods have been developed that do not rely upon gas exchange measurements.
By using running speed or cycling power at (near) maximum heart rate and ventilation rates, it is possible to infer VO2max equivilant speed or power as well as derive an actual VO2 value in ml/kg/min.  This type of calculation relies on principles of physics to some extent.  For example, equate running speed to the speed of the car.  If we know the speed of the vehicle, it should be possible to use multiple parameters (vehicle mass, aerodynamics, wind speed, grade and empiric correction factors) to figure the instantaneous fuel usage (or VO2).  In my post of VO2 peak/max, some formulas were reviewed.  However, a huge downside to this type of process is that it needs to be done at a peak or maximum effort.  Over the years many groups have attempted to come up with ways to infer a VO2 peak without doing a maximal load.  Firstbeat (the company Garmin uses for VO2 metrics) has developed methods to estimate VO2 max, without the max effort being done.  Their rational is sound - many subjects either can't physically exercise at a maximal level or it's just not part of their training program for that particular cycle.  If a non gas exchange, sub maximal method was accurate in estimating VO2 peak, it would be valuable indeed.

Some early attempts to estimate VO2 max date back to the 1950's with Astrand and Ryhming's notable work.  They investigated the relationship of cycling power (or stepping work), heart rate and VO2.  


Based on evaluating both men and women with true gas exchange exercise testing, they were able to calculate a nomogram to estimate the VO2 at submaximal power.  Note that the unit of work is a kg-m/min (not watts).  This was due to the cycling ergometers of the time that did not read in watts.
The conversion is 1 Watt= 6.118 kg-m/min.  The test is also meant to be a sub max test so the watts will not scale to conventional high power zones.  The time of testing was to be a steady 6 minutes, which should provide good equilibrium.



If I did 200 watts (about 1200 kgm/min) with a resultant heart rate of 142 after about 6 minutes continuous pedaling (their time figure), my calculated VO2 max would be about 4.4 liters/min, or 54 ml/kg.min, if I was 81 kg.


The accuracy/error was plotted and was not too bad:

Still not a super accurate predictor, but the major advance was the usage of power in a non maximal test.

I decided to plot several low power figures and 1 moderate value of my data.  The 3 lower lines are from power values about 160 to 165w (40-45% VO2 max power) and the higher one is 217w (about 60% VO2 peak) done for 7 minutes after a VO2 peak interval.  The low power figures' result seem excessively high (and apparently that power range is not a good one to use) but the 217w line is about what both Garmin and a more accurate equation (to be discussed below) indicate.

Storer cycling calculation
This test is usually done as a ramp with a 15w per min rise in power to exhaustion.  However, we will see in a bit that VO2 max constant power "time to exhaustion" is generally 4 minutes for cycling.  Since I don't like to do ramp testing indoors, my data will be from a 4 minute maximum interval instead.  The "time to exhaustion" test is simply the time you are able to cycle at VO2 max power until you are unable to continue at that load.

Here is the abstract from 1990:

My calculation:

For age 62, with a 4 min max power of 350w, and 81 kg weight:

The equation with my numbers would be:
(10.51 x 350w) + (6.35 x 81kg) - (10.49 x 62years) + 519.3  = 4062 ml/min
or 50.1 ml/kg/min.

Which also agrees with the Astrand figure at moderate power and with my Garmin watch (48-50).

Indeed, the correlation and standard error with the Storer formula were very reasonable:






They did do a before and after endurance training intervention and did not see any change in calculated values in a group of 9 subjects.  However, they did not define what the exercise protocol was and it may not have been strenuous enough to change VO2 over the study:






VO2 max estimation equations for running: 
If interested, here is a large study looking into the relation of treadmill speed/inclination with VO2 max prediction. 
Here is the equation with correlation coefficient and standard error:



  • The error and correlation factors are not as good as the Storer cycling equation.

My Garmin Marq Athlete gives an estimate of a 5k run of 24 minutes.  Although this is an interval time less suitable than the 3-5 minutes needed for VO2 max estimates, I plugged it into the above equation.  The corresponding VO2 max given my age and weight was 45 mg/kg/min.  Considering that the power figure at 24 minutes would be low, it still gives a roughly similar value to my cycling estimate.


Heart rate min/max ratio
We are all aware that athletes with lower resting heart rates are usually aerobically fitter than other individuals with higher resting HR.  In addition, since VO2 max is related to cardiac output (which is stoke volume x heart rate), it follows that a higher max heart rate will yield a better VO2 max value.
To better examine this phenomena and estimate VO2 max from the relation of min to max heart rate, let's look at a study that was done about 14 years ago:




This was the equation from the paper:
A subgroup(n=10) demonstrated that the proportionality factor
between HRmax/HRrest and mass-specific VO2max was
15.3. Using this value, VO2max in the
remaining 36 individuals could be estimated with an
SEE of 0.21 l/min or 2.7 ml/min/kg ( 4.5%). 
The method for obtaining resting and max HR was the following:
Resting heart rate was measured over a 5-min period by the subject in
the morning the day after the test (supine, while in bed). HRrest
(beats/min)1 with one decimal) was defined as the lowest value of any
1-min average during the 5-min sampling period.HRmax was defined
as the highest 5-s average during the treadmill test

In my case, recent resting HR is 47, max HR 171 
(171/47) x 15.3 = 55 ml/min/kg without any power testing.

  • The downside to this is the higher error zone, but it does ballpark the value and gives credit to reductions in resting HR as well as higher max HR, as indicators of fitness.


Comparing the formulas:
Let's also examine a study comparing many different VO2 max predictive equations.
From the abstract:
The purpose of this investigation was to cross-validate existing VO2max prediction equations on samples of aerobically trained males and females. Methods: A total of 142 aerobically trained males (mean SD; 39.0 11.1 yr, N 93) and females (39.7 10.1 yr, N 49) performed a maximal
incremental test to determine actual VO2max on a cycle ergometer. The predicted VO2max values from 18 equations (nine for each
gender) were compared with actual VO2max by examining the constant error (CE), standard error of estimate (SEE), correlation
coefficient (r), and total error (TE).
After the group was tested with gas exchange, each subject also had a predicted estimate done according to each VO2 max equation which was then compared.  The Storer formula reviewed above had the best fit and least amount of error.  The correlation coefficient was high, near 90% for males:
From the conclusion:
The results of the present study indicated that the equations
(equations 3M and 3F) of Storer et al., which
included age, body weight, and W˙ max, most accurately estimated
VO2max
in aerobically trained males (TE 413
mL/min ) and females (317 mL/min ). The applicability
of these equations, however, is limited because they require
a maximal exercise bout to determine W˙ max in order to
estimate VO2max.
  • In conclusion, the Storer equation is a very good model but requires a maximum effort.

Ramp test vs Constant power load?
Since the Storer formula was based on a ramp test protocol, can we simply do this on the road with a constant cycling power output? If this could be a field based observation (done during a routine road ride), it would be much more practical.
I have discussed this in a past post but wanted to review the subject again.  According to estimates, the time one is able to cycle at their VO2 max related power is limited.  One study as an example, performed two "time to exhaustion" tests a week a part where the subjects (highly trained cyclists with VO2 max over 60) cycled at their VO2 max power.  The VO2 max power was defined as:
Pmax was calculated from the progressive exercise
test and defined as the power output that elicited a VO2 reading within 2.1ml/kg/min of the subsequent reading despite an increase in workload (i.e., 15W in a 30-s period).
The authors were concerned that the time to exhaustion would vary between trials, which it did but to a very small degree:


The time that the athletes were capable of sustaining VO2 max power was about 240 seconds or about 4 minutes, plus or minus 1 minute.

For curiosity sake I plugged in the average stats for the above subjects into the Storer equation:

(10.51 x 426w) + (6.35 x 75kg) - (10.49 x 25years) + 519.3  = 5031 ml/min
or 67.1 ml/kg/min.
Which is very close to the 65 (+-5) value mentioned in the abstract.

Further support for the timing of VO2 equilibrium comes from kinetic observations:

B: increases in exercise intensity modify the time course of changes in oxygen uptake and the steady-state amount of oxygen uptake (left). Oxygen uptake at 5 minutes of exercise is plotted as a function of exercise intensity in watts (right).Note that when exercise intensity was increased from 250 to300 W, oxygen uptake did not increase. The increased workload was fueled by anaerobic processes

The figure indicates that even at higher power zones, VO2 levels off by 3 minutes during constant power cycling.  At loads above VO2 max, oxygen uptake does not increase any further to a significant extent.
Therefore the idea of using either a time to exhaustion figure of 4 minutes or the 3 to 5 minute VO2 equilibrium graphic above appears sound in lieu of a ramp test for VO2 max power.


Firstbeat Estimation of VO2 max
Finally we are going to discuss the Firstbeat VO2 max estimation.  The reason I spent time with the background discussion was to take some of the mystery out of their estimates as well as give alternate ways of deriving this figure.  I also wanted to make it very clear that they are not the first group attempting to estimate VO2 max from heart rate, load and baseline demographics.  Whether they can do as well as prior attempts is the question.

We have seen how power is related heart rate and VO2, as well as how a simple resting/max heart rate ratio can somewhat predict VO2 max.

Let's look at their web site literature to get an idea of how they do it.

What they are saying here (I think) is that the heart rate to work load relation is used for a VO2 max calculation, with the added benefit of filtering out bad data.
And:
More emphasis on filtering out low quality data.


Avoiding garbage in:
To derive a VO2 value, a valid measure of cycling or running power is needed (in power related modeling).  So a source of error would be in the measurement of running speed (the equivalent of watts in cycling) from faulty GPS coordinates.  If you can't track your position with certainty, speed precision will suffer.  Unfortunately, this is a major issue with wrist based units and the Garmin forums are loaded with complaints of poor GPS performance.
The heart rate itself must be measured with as much accuracy as possible but this should be fine with a chest belt or even a Moov sweat on the forehead.  Given prior studies on arm based optical sensors, they are not reliable enough in my opinion for high precision, but may provide a ballpark figure.
Using cycling power as a load metric should be relatively immune to erroneous power values and should be highly reliable.
From the Firstbeat site, the figure below seems to indicate that their formula is taking both heart rate and power into consideration.
Is this novel?
Not really - We are still left with using the same relationship of power (running speed or cycling watts) to heart rate as Astrand and other investigators have done many years ago.  Once we get a work/power figure, then conversion to VO2 is possible.

Another "advance" that Firstbeat discusses is the use of RR interval variation to improve the VO2 max estimate.  The study they point to was a poster presentation listed at the ACSM journal website by members of their staff.  It was not a peer reviewed study and is only a very brief sketch.
They are claiming (?) that the VO2 estimation can be improved by looking at the on/off heart rate kinetics (the speed the HR ramps up and down at the start and stop of activity) as well as factoring in breathing rate, derived from the heart rate variability.
Here is the poster:
The on off kinetics and modeling follows:
The tracings on the left are representative of constant load, but on the right (RLT-random tasks) generate up and down VO2 values which does make sense.

They then factor in (but don't say how their algorithm does it) the respiratory rate (derived) and the heart rate on-off data to get lower error estimates.  This then gives them a lower error rate:
The reference to the usage of the RR interval data is to the thesis done by one of the principles of Firstbeat which was noted above:
The thesis:
Part of Firstbeat's reference material is about statistical modeling and "neural networks".  This is discussed in a thesis presentation by one of the Firstbeat staff which was reviewed and supervised by other members of the Firstbeat company.  The majority was over my head math wise, but the following figure caught my attention regarding heart rate, true respiratory rate and derived rate from HRV:
Indeed, there does seem to be good correlation of true vs derived breathing rate by simple visual inspection, but this data has not been published in a peer reviewed journal.  I am also not aware of anyone repeating this for confirmation using Firstbeat's methods.
Here is the conclusion of the thesis, with the mention of respiratory rate derived from HRV (number 5) as a notable advance.  Although this is interesting, there is no comment about how this will improve VO2 max estimation:

Back to Firstbeat VO2 estimation
The heart rate to work relationship was used in the Astrand nomogram 60 years ago and has been a repeatable metric in many studies since.  It is certainly nice that the Firstbeat formulas are filtering out bad data, but does that add to the accuracy of the final result especially if you had valid numbers to start with?


ECG Derived Respiration:
Let's take a moment to review the issue of deriving respiratory rate from cardiac signals (ECG Derived Respiration or EDR).  Although one of the oldest methods is based the observation of respiratory sinus arrhythmia (RSA), other perhaps better modalities exist.  The RSA is the beat to beat variation in heart rate during inhalation and exhalation (measured with the R-R interval).  That is what Firstbeat is using in their attempt to estimate breathing frequency.  A review and analysis of breathing rate estimation from ECG compared 4 different methods for accuracy:
Four different methods to obtain the EDR (ECG Derived
Respiration) signal were selected: HRV (based on RSA), Amplitude, Area and AMEA (based on beat morphology).
HRV method uses the beat-to-beat intervals for the
construction of the EDR signal.
Amplitude method calculates the EDR from the change
in the amplitude of each QRS complex. The amplitude is
computed as the difference between the maximum and
minimum value within a time window of 100ms around
the R peak within each beat.
Area method is a variation of the previous method. The
area of the QRS complex is computed against its baseline.
The baseline is defined as the mean value around each
beat. The baseline is subtracted to the ECG and the area is
calculated within a 100 ms time window around the R
peak
Angle of Mean Electrical Axis (AMEA) method
estimates the EDR as the variations of the heart axis. The
area of two ECG leads is calculated. The angle of the
mean electrical axis is obtained as arc tangent of the ratio
of these areas.
The EDR (ECG Derived Respiration) field is a fascinating one!

One issue that is immediately obvious is that only HRV will be possible with a simple chest strap (Polar H10).  The other methods need at least a single lead ECG if not more (Hexoskin) since they rely on analysis of the entire QRS complex.  

How did each method do?  
There were two types of breathing trials, paced (a defined regular pattern), and free (associated with motion and light exercise).
Here are the results:

The accuracy of the non HRV methods was quite impressive across the board.  Unfortunately, HRV analysis was poor during free breathing and light exercise.  With an error rate of 42% we can hardly say this is ready for prime time usage as a trustworthy physiologic metric.


Firstbeat/Garmin ECG Derived Respiration
The real time display of respiratory rate from HRV gives us the opportunity to see if Firstbeat's EDR is as accurate as a documented respiratory monitor, namely the Hexoskin They clearly have stated and shown in the thesis that their software can calculate a breathing rate derived from heart rate variability that not only correlates in character but in numeric values as well.  Although the claim is an old one (2003), for the first time, Garmin is showing the respiratory rate data both on the Connect website (if you have a new watch such as Forerunner 945, Marq) as well as on the watch itself in real time.  Since I have a Hexoskin shirt that records accurate respiratory activity, this gives us a chance to see if Firstbeat's derivation is valid.

Where is the EDR data in the Garmin .fit file?
If you open the .fit file in Golden Cheetah, you will need to go to the Extra's tab.  It is under field 108 (you may have a different field number).  The value is the EDR x 100, so you will need to divide by 100 for a net result.  



I did a series of two test sessions.  One ride was done with the Moov HR optical sensor on the forehead and another used the Polar H10 electrical chest unit.  I was curious to see if the Moov could track as well as the H10, considering it's unique accuracy for heart rate but unknown for HRV.  The Polar H10 is a gold standard device for both rate and HRV.  The EDR of each was compared with the Hexoskin shirt worn on each ride.


3 minutes at near VO2 max power (350w) 
followed by 7 min of tempo riding just under MLSS:

Polar/Garmin Respiratory rate vs Hexoskin:

  • Although the absolute value of respiratory rate is quite a bit lower with the Garmin/Polar H10 combo, the overall correlation is apparent.  I was impressed with the tracking of the HRV derived respiratory rate, although not with the ability to match absolute rates.
  • The true respiratory rate (Hexoskin in blue) rises in a linear fashion to a maximal value at the end of a VO2 max time to exhaustion interval.  The EDR (Garmin in red) did rise initially, but reached a plateau midway through the 3 minute MAP interval.
What is the "normal" breathing rate response to intense interval work look like?  Here is an excerpt from a study looking at various HIT protocols comparing respiratory rate vs tidal volume (the amount of air in each breath).  I show this as an illustration of breathing rate increasing with intense work loads over time:
  • The top panel shows the continuous increase in breathing frequency over the duration of the HIT intervals, with a maximum value at the test termination.  
  • This continuous increase was seen with the Hexoskin but no the EDR of the Garmin.
  • Note as well the absolute values of respiratory rates about 60/min in the top graph.  That is consistent with my Hexoskin values but not the Garmin EDR.
Back to testing...
How about the Moov HR optical on the forehead vs the Hexoskin?
Similar interval, different day:
  • Although a bit slow to rise initially, the Moov/Garmin still did well in character.  As in the Polar tracing, the absolute values are much lower with the HRV derived rate.
  • The Hexoskin continuous rise in breathing rate is the same as above.
Polar H10/Moov HR vs Hexoskin ECG:
Although not a subject of review, how does the raw heart rate of each device compare to the Hexoskin?






  • Both the Moov and H10 (below it) are virtually identical to the Hexoskin heart rate.
  • The HRV data is not easily extracted and I decided to skip it.

Wingate 60 second:
The one minute, all out Wingate 60 interval is a challenge for all recording devices, given the severe motion artifacts and rapid physiologic shifts. 

Here is the  
Polar H10/Garmin EDR vs Hexoskin respiratory rate:

Moov HR/Garmin EDR vs Hexoskin respiratory rate (same interval different day):

  • It is interesting how close the breathing rate pattern is on two separate days with the Hexoskin.  There is a quick spike, slight dip then increase to maximum.
  • This micro pattern was not seen with the derived respiratory rates in either the Moov or Polar tracings.  The overall trend was to increase, but the absolute values were significantly lower.
Heart rate comparison:
Excellent pure heart rate correlation as expected with the H10 but the Moov HR is very close:






Moov HR vs Hexoskin









 Polar H10 vs Hexoskin
 

 

Lastly here is a 5 minute MLSS interval using the H10/Garmin vs Hexoskin
Respiratory rates: 



  • With excellent heart rate correlation


  • This interval showed a matching pattern of relative respiratory rate, but the absolute values are much lower with the HRV derived data

Thus far the pattern matching and relative values of the derived respiratory rate of the Garmin watch vs the Hexoskin seem plausible.  Absolute accuracy is not good at all.  I have no explanation for that.  

What happened when I held my breath?
Notice how the sinusoidal tracing abruptly stops in the middle of the screen.  This is a screenshot of the Hexoskin app on my android bike unit during the ride (with the H10) as I held my breath.  The two lines represent thoracic and abdominal motion.
I did this several times and on each, the Garmin watch respiratory rate did not change.  Fortunately I have a Varia Vision so just a flick of the eyes gave me a continuous look at the Garmin respiratory data.  I was somewhat disappointed that there wasn't even a small change in the Garmin values. Either there is substantial time averaging or the Firstbeat EDR just does not pick this change. 
 
summary of article published in Sensors 


Summary of personal testing of the Garmin/Firstbeat respiratory rate prediction:
  • The absolute values for derived respiratory frequency are not correct and underestimate breathing rates significantly.
  • The relative values and pattern of respiratory frequency is generally in line with the Hexoskin.
  • Fine detail is missed in the Firstbeat EDR.
  • There is a short lag in resp rate elevation estimation during the initial section of an intense interval.
  • During breath holding (which should stop the respiratory sinus arrhythmia - RSA), there is no change in Firstbeat rate estimation.


Other studies exploring Firstbeat:

Smolander and associates published a study looking at VO2 in postal workers, comparing the Firstbeat derived VO2 vs traditional gas exchange VO2.  They were not looking at VO2 max, they were simply looking for oxygen consumption doing various activities.

The study was also funded by Firstbeat.
Here is an except from the methods:
VO2-HRV was calculated with Firstbeat PRO heartbeat
analysis software version 1.4.1 (information available at:
http://www.firstbeattechnologies.com). In addition to HR,
the software calculates and takes into account respiratory
frequency as well as on- and off-response phases (on/
off-dynamics), which are both derived from ambulatory
RRI data
. On/off-dynamics data are used since HR and
VO2 are known to have different response patterns when
intensity of physical activity changes (Bernard et al., 1997;
Davies et al., 1972; Pulkkinen et al., 2004).


They did do a correlation of derived vs traditional VO2 against heart rate - with the Firstbeat method being somewhat accurate:


  • The correlation was better at higher heart rates.
  • VO2 by gas exchange was 12.2 vs 10.9 ml/kg/min at heart rates above 100.  That is still 10% off the correct value.

A Real world study of wrist based VO2:
The most suitable study that I have found looking at the validity of a Garmin+Firstbeat product is the following:
This is from the abstract:

PURPOSE: To determine the validity of the GF5 VO2max estimation capabilities against the ParvoMedics TrueOne 2400 (PMT) metabolic measurement system in recreational runners.

METHODS: Twenty-five recreational runners (17 male and 8 female) ages 18-55 participated in this study. Participants underwent two testing sessions: one consisting of the Bruce Protocol utilizing the PMT, while the other test incorporated the Garmin Fenix 5 using the Garmin outdoor protocol. Both testing sessions were conducted within a few days of each other, with a minimum of 24 hours rest between sessions.

RESULTS: The mean VO2max values for the PMT trial (49.1 ± 8.4 mL/kg/min) and estimation for the GF5 trial (47 ± 6.0 mL/kg/min) were found to be significantly different (t = 2.21, p = 0.037).

They used an actual Garmin device (which I have had as well) under real world conditions (running outside) and compared VO2 max to a true metabolic cart value with gas exchange monitoring.
  • On first glance the VO2 max estimates are surprisingly spot on (49 vs 47ml/kg/min)!  
But lets look at each participant's comparison values for a better insight:


Even though the overall average was quite close, some subjects had huge variation in VO2max compared to the true, measured value.  This was either up or down (circled in red or green) with no consistency.  The derived VO2 values were very scattered, but if they were averaged together, the net average results were close.  

A thought exercise in VO2 max power training:
Studies have shown that training near or at your VO2 max is beneficial for improving your aerobic fitness.  What are the implications of using Garmin/Firstbeat estimates based on the above study?  
Subject 5 looks like the VO2 max is 51 by Fenix vs 62 ml/kg/min by gas exchange   If subject 5 wanted to train at their VO2 max cycling power let's see what the difference would be according to Garmin (51 ml/kg/min) vs True (62 ml/kg/min) using the Storer equation in reverse (with age of 40, wt of 75 kg):

Fenix 5 MAP based on Firstbeat VO2 max of 51  = MAP of 309 watts
Vs
True VO2 max from metabolic cart of 62 = MAP of 388 watts
The difference is almost 80 watts!
  • Therefore, if you wanted to train at your VO2 max power using the Garmin/Firstbeat equation you would be well below your goal, possibly barely above the MLSS.

Let's now examine subject 22, who had a VO2 max of 45 by gas exchange but 51 by the Fenix (same age and weight for this example):
Fenix 5 MAP based on Firstbeat VO2 max of 51  = MAP of 309 watts
Vs
True VO2 max from metabolic cart of 45 = MAP of 266 watts
The difference is now 43 watts too high a power.  If this individual was attempting to train 43 watts above their true MAP of 266, they would be using a much higher intensity than desired with potential results being over reaching, excess fatigue as well as inadequate time spent at VO2 max (since they would be physically unable to maintain this pace for long).  

The bottom line for cyclists:
Put the VO2 max calculation into perspective with regards to training.  One, it may not be accurately derived and second, if it is not your true value and you attempt to use if for a training power calculation, the result can be unpleasant.  Fortunately, I don't believe cyclists take much stock in that number and use other zone related formulas for training.  As far as Garmin products are concerned, they really do not aide the cyclist in choosing training metrics or monitoring worthwhile physiologic markers.  For instance, they could keep track of your best 3 or 4 minute average (continuous) power as a surrogate for VO2 max/MAP.  Or they could monitor your Wingate 60 second times as a surrogate for VO2 max.

Back to the runners Fenix 5 study:
This brings up a few comments on the methods of how the Fenix 5 was measuring running speed and heart rate.  The GPS coordinates are used to measure speed.  As Garmin watch users know, the GPS tracking can be of variable accuracy.  In addition, the study used the optical heart rate sensor of the watch (believe it or not) and did not use a chest belt for heart rate.  The error in heart rate is obviously problematic, but one wonders how they are able to use HRV data from this type of device.  So one could argue, garbage in = garbage out.  Despite these issues, the ballpark accuracy was not horrible, possibly related to Firstbeat's advertised ability to filter out poor quality data.

Wish list - Better error reporting:
It would be helpful if some sort of indication was given as to the error of the measurements.  Common sense dictates that the software (especially if it is using sophisticated "neural networks") should have qualified the run's result, if it was based on optical HR (included a comment on questionable validity).  In addition, a confidence range, or some sort of statistical information would be helpful.  So if you had a VO2 max measured at 50 ml/kg/min, based on a poor GPS signal and shoddy HR/HRV data, they could provide a plus minus figure with error bars or outright reject the session.  The user could have made real improvements over time in their VO2 max, but become frustrated in lack of "progress" since the increment is buried in the error zone.  On the other hand, perhaps one is using a chest belt and has a clear GPS signal with excellent tracking.  In that case, the error zone is small and a change may be apparent. If you were interested in that type of detail, you would be more apt to train with the high accuracy devices.

Wish list - Validation by non funded sources:
A common theme among the Firstbeat papers is the proprietary nature of their methods.  Most studies(except the Fenix 5 study) are done by the company staff.  This brings up the obvious financial incentive to present a positive outcome.  If Garmin wants to lend some credibility to these calculated values, encouragement should be given to outside groups who are not affiliated with Firstbeat, to validate and repeat their work. 


Summary:
  • There is a relationship between heart rate, work load (running speed, cycling power) and oxygen consumption.
  • Using this relationship, an estimation of VO2 max can be made at moderate work loads.  Several formula are available to derive a VO2 max form either sub or maximal efforts or even the resting/maximal heart rate ratio.
  • Astrand and Ryhming's nomogram for cycling power (or steps) was published in the 1950's and is notable for it's ability to measure a VO2 max from submaximal data.
  • The Storer equation seems the most accurate for cycling, incorporating power, age, weight.  This formula uses a ramp protocol but evidence indicates that a 4 minute maximal interval power will be equivalent.
  • Respiratory rate can be derived from ECG signals by multiple means.  Only HRV related methods are possible with a chest belt.
  • Firstbeat claims to have superior HRV data filtering and algorithms to derive both a continuous VO2, a VO2 max as well as ECG derived respiration rates.
  • Objective and personal data do not show equivalency in Firstbeat respiratory rate metrics compared to true physical measurements (Hexoskin).  However, there is a loose correlation in pattern between the two methods that may or may not have value in exercise training.
  • VO2 max estimation by Firstbeat systems depends on accurate heart rate and power measurements.  For running, a clear GPS signal must be obtained.  For both running and cycling, heart rate should be measured by chest belt for both optimal precision in rate and HRV.
  • There is a lack of published, peer reviewed data on the correlation and error range of the Firstbeat VO2 max estimates when compared to gas exchange.  Hopefully, outside groups will be able to repeat and corroborate their findings
  • Using the VO2 max estimate to compute a maximal aerobic power value (MAP) for training purposes is problematic.  If one is interested in a MAP value for training purposes, using a continuous 3 to 5 minute cycling interval is recommended.

Addendum 8/18/2019
I performed a true gas exchange VO2 max test and then a comparison to the Garmin/Firstbeat as well as the other methods discussed.  The results were quite interesting!



Yes, I have been a bit critical of Garmin/Firstbeat.  However, unlike the presumption of innocence in the US justice system, in scientific advancement a "method" is used.  From the Wikipedia:
The scientific method is an empirical method of acquiring knowledge that has characterized the development of science since at least the 17th century
It involves careful observation, applying rigorous skepticism about what is observed, given that cognitive assumptions can distort how one interprets the observation
Heart rate Variability related posts
 VO2 max related posts

6 comments:

  1. Quite impressed by your study, congrats...

    ReplyDelete
  2. Great post ! I am interested on this subject and I have done some small blog post on it too (in french) I am wondering with the latest Firstbeat algorithms if the race time predictions are more accurate. Have you had the ooportunity to make the comparison ? Right know with a VO2max of 60 (in par with a lab test) it gaves me a marathon time of 2:43 which is frankly impossible.

    ReplyDelete
  3. PS I used to have NIRs monitor for anesthesia stuff at my job... frankly this technology seems really complicated to me to measure precise things. I will navigat ethrough your blog, looks fascinating

    ReplyDelete
  4. "I am wondering with the latest Firstbeat algorithms if the race time predictions are more accurate"

    The post above was based on the latest (still) released algorithm. I did get confirmation from Firstbeat themselves about this. I think the problem is that one simply can't estimate VO2 max by any equation with precision. Sure, you get a ballpark but what I found out with my gas exchange data was how inaccurate they all are (especially Firstbeats). I am also very skeptical about the race predictors, training recommendations that my Garmin watch (Marq Athlete) puts out. Garbage in, garbage out as they say. I think one's best bet is to get a few "benchmarks" like best 5 minute cycling power, or running pace and compare that through the year of training. The muscle desaturation pattern for MLSS determination is pretty good as well.

    ReplyDelete
  5. Respiratory rat is calculated by the change of RR value. Breath-inspiration will cause the waveform of RR data to change

    ReplyDelete