Detailed Topic Index

Thursday, November 21, 2019

Stryd running power - validation with VO2 testing

I am always looking for interesting wearable devices but since I don't run, have neglected that particular sport.  I recently reviewed some impressive Stryd data metrics and wanted to share them.  First, what is Stryd?  It is essentially a power meter for runners.  Using various motion sensor data fields and rules of physics it is able to compute effective running power.  The hardware includes both 3-axis gyroscope, 3-axis accelerometer and now wind sensing in the newest version.

Is this tech new?  No, in fact many years ago I used something called an "Ibike", now rebranded as the "Powerpod".  This device used similar principles of physics to estimate cycling power:

Unfortunately, it was a pain to calibrate and was not very accurate compared to conventional power meters.

Given my prior poor experience with the "Powerpod" and my personal avoidance of running (bad knees), I never paid much attention to Stryd.  However, recently my friend the cross country skier did his VO2 (near) max test and was wearing the Stryd.  This gives us an ideal opportunity to see how well the Stryd tracks with VO2.  Why is this important?  With cycling power, we have the simple ability to test an established strain gauge related power meter (like the Powertap, Assomia) against something like the Powerpod that uses environmental sensors (speed, wind, acceleration).  The Stryd has no easy direct comparison device.  But, since work rate (power) is linearly related to VO2 (oxygen consumption), especially up to the second lactate threshold, we can benchmark the Stryd to an alternate reliable benchmark.

Are there any validation studies? 
I searched through Pubmed and found very little published work on the Stryd.  The company claims to have VO2 vs Stryd power testing results on their website, but of course that is open to skepticism.

We do have this comparison study:

Which does not look good at all.  A very weak relation of Stryd power to VO2 exits, hence their conclusion:
The main issue here is that the test was done with an early device, the Stryd Pioneer which is a chest mounted triaxial accelerometer:

I almost wonder if they would have been better served not even producing this device.  

In response to the comment from the Stryd developers (see below), here is the reply from the authors of the above paper:

A different study looked at the newer Stryd from a different point of view.  They simply wanted to see how consistent the measure of power was on a treadmill at a constant velocity.  Theoretically, with no change in wind, incline or running dynamics, there should be a steady power output as in cycling on an indoor trainer at constant cadence.

From the study:
To the best of the authors’ knowledge,
just one study has examined its validity and reliability (in this case,
to measure spatiotemporal gait characteristics [9]), with no data
to demonstrate the validity and reliability of this device for measuring
power and related variables
. For this study, only two out of
twelve metrics were used (running velocity and power output).
The agreement between measured intervals (watts) at the same treadmill speed was good.
For example the last graph showed near identical power over 0-120 vs 0-180 seconds:

With the conclusion:
The results show that
power data during running, as measured through the Stryd™ system,
is a stable metric with negligible differences, in practical
terms, between shorter (i. e., 10, 20, 30, 60 or 120 s) and longer
recording intervals (i. e., 180 s). 
In summary, these results show that power output during running,
measured through the Stryd™ system, is stable over time
when velocity is constant and under controlled conditions, with no
differences between different time intervals recorded during a
3-min run. Nevertheless, it is worth noting that the analysis conducted
shows that longer recording intervals yield smaller systematic
bias and narrower limits of agreement.
Therefore, if maximum
accuracy is required (e. g., scientific approach), longer recording
periods must be used (i. e., 2–3 min).
  • So one still needs to be careful with the shorter time recordings, but they are still reasonable.

We still don't have a published VO2 vs power study of the newer, foot mounted unit.  Thanks to my friend the cross country skier we at least have data in one indivisual.  

To start with, let's look at what VO2 vs cycling power looks like from my data.  This was using the Assomia Duo pedals (1% accuracy) and VO2 done at the University of Florida Sports Performance Center.

Here is an example from my VO2 max testing - cycling power vs VO2 (oxygen usage):
(VO2 average from last 60 sec of interval, cycling power average from each 3 minute stage)

Note the last value falls off (it's an error in the gas exchange measurement), we won't count that.  Otherwise, the curve is quite linear, with only a few watts deviation off the fit.

How did the cross country skier do in regards to VO2 vs Stryd power?
Here is his treadmill test, plotting power over time with each stage encompassing 5 minutes, followed by a brief time between stages to do lactate testing, then resumption of running.  The trend of higher power per stage is very evident.

A close up of the last 2 stages shows no major lag from zero power to full treadmill speed. 

Therefore, "heart rate like lag" is avoided (from Kubios):

Finally, the all important VO2 vs power:
(VO2 average from last 60 sec, Stryd power from the entire 5 minute interval)

  • This looks every bit as close as my cycling ramp shown above!

Why do we need power when the heart rate to VO2 curve is just as good if not better?
After all the heart rate to VO2 relation is very tight and predicable.
  • In this case, if he knew his heart rate, the VO2 would be easily extrapolated.
Why do we need power?
  • One problem with simple heart rate is measuring high intensity work rates during HIT.  For instance, if you do a 30 second burst interval at high power, the heart rate does not fully stabilize until the interval is over.  
  • In addition, if you train by pace/speed, knowing the speed is problematic given the state of GPS tracking accuracy.  
  • Many studies point to the importance of intensities beyond LT2/MLSS as critical for optimal training.  Those higher power zones can't be properly monitored by heart rate, especially in the early part of the stage.  
  • If one is an advocate of the "fast start" strategy, power is an essential way of achieving that type of session.  Careful power modulation is the only way to perform a fast start interval.
Where we don't necessarily need power monitoring is during the mind numbing, long, slow runs that should be kept below VT1.  In that case, heart rate stability kinetics are very acceptable to keeping effort down.  However, in situations like hill and trail running, having running power may be of additional benefit to staying in the proper zone 1 location.

  • The current Stryd device appears to measure power at stable levels at constant treadmill speeds.  The older Pioneer device is not recommended.
  • According to a plot of VO2 vs power in one individual, the Stryd relative power is linear.  In other words, with a given percent boost in power, the percent oxygen consumption rises the same relative amount.  This indicates there is relative accuracy of the device.  Absolute accuracy is not easily testable with this method.
  • Usage with HIT and running intensities above MLSS would make sense since heart rate will lag well behind the effort.  Studies have shown the downsides to intervals at "threshold" intensity (MLSS).  Stryd usage may be a way to avoid excessive zone 2 activity.
  • Usage in recovery and zone 1 training is potentially useful especially with less than stable running conditions.  As discussed in previous posts, high volumes of zone 1 are essential to proper performance development.
Many thanks to my friend for sharing his data!

See also:
How to get your training zones from HRV and muscle O2 data
VT1 correlation to HRV indexes - revisited  

Friday, November 8, 2019

VT1 correlation to HRV indexes - revisited

In a previous post, the correlation of the first ventilatory threshold with various HRV indexes was discussed.  Since that post was based on my data only, was before I knew my true VT1 by gas exchange and was early on in my learning curve regarding HRV, I though it might be interesting to take another look at the situation addressing these issues.  To that end, let's revisit the possibility of getting an idea of VT1 power or VT1 heart rate based on HRV.  The data will be that of myself, and the cross country skier (on a treadmill, running).  We will focus on the following HRV indexes:
  • SDNN
  • SD1
  • HF power x HF freq 
  • DFA a1

The SDNN index is defined as:
The standard deviation of the IBI of normal sinus beats (SDNN) is measured in ms. "Normal" means that abnormal beats, like ectopic beats (heartbeats that originate outside the right atrium’s sinoatrial node), have been removed. While the conventional short-term recording standard is 5 min (11), researchers have proposed ultra-short-term recording periods from 60 s (30) to 240 s (31).
If the RR intervals are artifact and arrhythmia free, this equates to calling this SDRR - standard deviation of RR intervals over a time period.  
Note should be made that this is a time domain index compared the HF power which is frequency domain with SD1/DFA a1 both in the non linear category.

SDNN as a measure of VT1 detection was looked at by Karapetian et al in 2007

The testing consisted of 3-min stages [17, 32], allowing more stability of RR
intervals, and began with the subject resting on the bike, with
no pedaling as a baseline rest. Pedaling began at the second
stage, at which exercise intensity started at 25 W. Every 3min,
intensity increased at 25-W increments. Subjects were instructed
to maintain a cycling speed of 50 revolutions per min.
Exercise test time ranged from 15 to 35min
The RR intervals from the last 2min of rest and each stage of exercise
were used for analysis of HRV.

To determine the HRVT, the MSD and SD of heart rate intervals
for each stage of exercise were graphically plotted against work
rate (l" Fig. 1 and 2, respectively). Then, in a manner similar to
the determination of LT, a visual interpretation was made to locate
the point at which therewas no further decline in HRV, thus
indicating vagal withdrawal. Thus, this HRV deflection point was
defined as the HRVT

This resulted in a plot of work rate vs SDNN, second figure on the right.  The flattening of the curve was the recorded deflection point:
Interestingly they found a difference between LT1 and VT1 that was statistically significant:
The mean difference between VO2 vt and VO2 lt was small (0.12 L/min) but
statistically significant (p < 0.05); however, a strong linear relationship
was observed (r = 0.89).
SDNN deflection vs LT1:
The results for the determination of HRVT during incremental
exercise testing, using RR interval data, showed similarities in
VO2 (L/min) values between LT detection and HRVT detected by
HRV deflection point. The mean difference between VO2 hrvt and
VO2 lt was nonsignificant (p > 0.05), and shows no bias between
mean values. A strong correlation betweenVO2 values for LT and
HRVT was observed (r = 0.82). The mean VO2 hrvt–V˙ O2 lt was
1.40 ± 0.46 (L/min).
SDNN deflection vs VT1:
The mean difference between VO2 hrvt and VO2 vt was also nonsignificant
(p > 0.05), and shows no bias between mean values.
A strong linear relationship betweenVO2 hrvt andV˙ O2 vt was also
detected (r = 0.89), where the mean VO2 hrvt–V˙ O2 vt was 1.46 ±
0.46 (L/min)
  • Therefore, from this study, SDNN seems a valid HRV index to explore for VT1/LT1 association.
  • They also found a disparity between VT1 and LT1 power.

More on what SDNN signifies:
How does SDNN look graphically at low and high intensities?
This is a distribution of RR values from Kubios of a 2 minute window while coasting before a 5 minute maximal interval:

  • Notice the wide area of distribution of the values of RR intervals (in red).

There is a more narrow RR value spread with moderate effort:

And finally, a very restricted set of RR values at high intensity (this is the 5 minute maximal interval):

In a prior post, SD1 was covered and I would refer back to those studies for details.  An early study not reviewed previously was done by Tulppo et al showing correlation of the VT1 with the SD1 index:

With SD1 dropping from rest to VT1 with minimal drop thereafter:

HF power and HF peak frequency:
The HF power and HF frequency were used by Cottin et al to find VT1.  They also concluded that in some subjects whose VT1 could not be detected by HF power alone, the product of HF power x HF frequency peak uncovered the breakpoint:

Individual examples of two subjects data:

The cross country skier underwent a partial VO2 max test recently.  I say partial, in that the test was not done to total exhaustion and the peak lactate was 4.5 mmol.  Therefore, the metabolic cart software could not spit out it's usual results such as VT1 nor LT1.  Before looking at the HRV index correlations, we will need to get the VT1 and LT1 ourselves (putting into practice some of the concepts covered in other posts).  

Here are the raw values of the ramp.  The protocol was 5 minute per stage, increasing by 1 km/hr with both gas exchange and Hexoskin vest as monitoring units.

Here is my attempt to get the VT1 by the V slope technique.  

First a plot of VO2 vs heart rate was done to confirm that the relationship is linear:

It looks great, with a nice linear regression.  As discussed before, this relationship is the foundation of Garmin's attempt at the "performance condition" metric which should be valid but will be subject to error if the comparison is of a session with different ambient temperature, altitude, hydration status or cardiac drift.

Next - a plot of VO2 vs VCO2.  The v slope technique is where we pick two different linear set of values, draw the lines and see where the intersection is.  The underlying idea is that the linear relation shifts after the VT1:

As the legend says, the fit lines were ended/started at a VO2 of about 3 L/min.  The exact intersection was 3.08 L/min which corresponded to a heart rate of 155 bpm on the first graph (HR vs VO2).

LT1 determination as per the Newell formula:
The LT via log log corresponds to a heart rate of 147 bpm.  The accuracy may be affected by not having a lactate higher than 4.5 mmol on the ramp and instead, using another value of 5 mmol during a separate 5 minute constant 15 km/hr interval later during the testing.  Regardless, the LT1 and VT1 are seldom at the same work rate which is part of the problem of training intensity exercise prescriptions.

Now that the LT1, VT1 are known, how do they match up with the HRV indexes? 

Here is a tracing of VO2 vs SDNN, with a marker at the corresponding VT1:

And Heart rate vs SDNN with the range of VT1/LT1 in red:
  • The SDNN deflection point conveniently fits right between the VT1/LT1!

My data (5 minute ramp stages, 2 minute Kubios window calculations):

  • The SDNN deflection point yet again falls between the VT1, LT1 as in the cross country skier example.

SD1 examples:
Cross country skier (5 minute ramp stages, 2 minute Kubios windows at the end of each stage):
  • The SD1 curve "breakpoint" occurs within the range of the Vt1 to LT1 zone.
  • No further change occurs after that heart rate.

My ramp (5 minute stages, 30w per stage, 2 minute Kubios windows):
  • Here things are not so clear.
  • There is no pattern, no breakpoint.
  • The dynamic range of SD1 is very restricted, 2.1 to 2.7.  In the XC skier example, he starts above 6 and the nadir is around 3.  For some reason I don't have high values even at low power.  On other tracings this pattern is the same:

Another look at SD1:
The following is from my VO2 max test (3 minute stages, 2 minute Kubios windows).  Perhaps SD1 will now show a pattern.

  • Same issues, no pattern and restricted range of values.

The possibility of having an SD1 rise at even trivial levels of effort (for whatever reason) was explored.  I recently traveled to North Carolina where the roads were either going up or down, with nothing flat.  Therefore, the efforts were either above VT1 power or coasting downhill with no pedaling.  

As you can see in the following SD1 tracing (2 minute windows), the SD1 is capable of increasing to "normal" levels (value=8) with a rise during each episode of coasting:

The bottom line is that SD1 may be quite valid in some individuals for VT1/LT1 demarcation, however in my case it was not.  In my example, SD1 drops very early with super low intensity efforts, making one wonder if it may signify another physiologic/biochemical threshold that in most subjects takes place near the VT1.

HF power x HF peak freq:
Using the metric derived by Cottin at al, this is the XC skier data:
Raw data

The plot:
  • Both the raw data (HF peak freq, HF power and product of the two) and plots appear to have no correlation to the work rate - and certainly not the VT1/LT1.

This is a plot of HF power in the time varying feature of Kubios, since there may be a bug in the software, giving inaccurate HF power results as "m2" (see below):
  • There still appears to be no relation of heart rate to HF power in normalized units.

My data:
Here is my HF peak freq x HF power vs cycling power:
 Based on the raw data:

Since this did not make any sense given the published study, I went back into Kubios and looked at the time varying curve for HF power in normalized units:
The grey colored area is the normalized HF power which clearly rises at the VT1 and plateaus at the VT1 +30 watts.  There may be a bug in the Kubios software, since the single display values do not show this.  I have reported it and will update this post as needed.

HF peak x power validity:
This is much murkier than the SDNN/SD1 and I don't know if it's related to a Kubios bug, my failure to calculate the index or simply from a less than universal index.  Remember why the authors derived an additional formula in the first place, it was due to some subjects not having a proper HF power relation.  
Furthermore, Blain et al. have demonstrated that it was possible to detect both VTs from fHF (TRSA) when TRSA1 was corresponding to VT1 and TRSA2 was corresponding to VT2. However, in 20% of their data it was not possible
to detect VTs 
Given these issues, it may not be the best choice for VT1 derivation.

DFA a1
I have already presented my data as well as the cross country skier in a previous post.  You may notice that the XC skier had a loss of RR complexity at a much lower heart rate (in that post) than during the VO2 max test below.  Part of that differential may be because the VO2 max test was done at altitude, whereas the previous ramps were not.  

How does the DFA a1 tracing look during the VO2 max test done at altitude?

  • The plot shows that the heart rate corresponding to a DFA a1 of .7 is about 145, just below LT1.
  • A DFA a1 of .5 (white noise) is reached at the stage above the VT1.  Avoiding exercise associated with this index value should provide sessions at or below the VT1.
  • The relationship of complexity loss with work rate is present with the heart rate shifted up about 10 bpm, presumably from altitude. 

Comparison between SDNN and DFA a1 during and after a 5 minute interval at VO2 max power:
Since SDNN and DFA a1 seem to be the most reliable and consistent of the markers tested for delineating low intensity exercise, here is a comparison of each side by side, before, during and after a 5 minute interval at VO2 peak power (watt level derived from my last and highest stage on the VO2 max test).  After a 5 minute interval at that high power, the following 5 minutes were done at about 20 watts above the VT1, then a brief active rest for 3 minutes then another 6 minutes at 173w (at VT1):

The RR intervals were extracted from the Hexoskin data, raw values and tracing below:

  • Although the scaling is different, the patterns are remarkably similar (including overlap of values).  Initial data values during coasting are high, typical of at rest readings.  At the end of the 5 minute max interval, DFA a1 and SDNN are at historic lows, then rise somewhat but are still in the area above VT1/LT1.  Both indexes still don't recover fully 15 minutes later but are getting closer to the usual values at VT1.
  • Both DFA a1 and SDNN appear to be reasonable markers for VT1/LT1 transition and behave in a parallel fashion over an undulating set of intervals.