Tuesday, February 20, 2024

HRVTs: Agreement with Established Approaches and Reproducibility, MSSE 2024

The field of HRV threshold research has progressed over the past several years.  We have learned quite a bit, but some questions still remain.  One issue that continues is just how well the first threshold, HRVT1 agrees with established measures such as lactate and gas exchange.  In addition, the repeatability of these thresholds is quite important for everyday use.  Today, our newest work was released in one of the most well known sports medicine journals, MSSE.  The manuscript stands on it's own merits and I encourage you to read through it (I'm including the intro and discussion below).

I want to thank Juan Murias, the team at Calgary and his grad student, Pablo Fleitas-Paniagua, whose hard work made this possible.  What started as some email correspondence with Juan many years ago has blossomed into some very cool research into a series of cutting edge topics.  This has been a fun project and a valuable learning experience on my end.


An important consideration for endurance exercise training and fitness intervention is the proper delineation of the boundaries separating exercise intensity domains (i.e., moderate, heavy, and severe) (1–5), and the various exercise intensity zones that can be compartmentalized within them. To that end, efforts are continuously made to circumvent reliance on the more invasive, costly, and laboratory-based testing modalities such as gas exchange and lactate blood testing (6–9).
Detrended fluctuation analysis alpha 1 (DFA a1) is an index of heart rate variability (HRV) that has gathered interest as a means to demarcate the boundaries separating the moderate, heavy, and severe exercise intensity domains (9). Unlike other well-established practices for threshold determination that require gas-exchange, ventilatory, and lactate evaluations during incremental tests, or the performance of relatively long and/or exhaustive trials (i.e., maximal lactate steady state and critical power) (10–14), these boundaries can be estimated by analyzing DFA a1 using inexpensive wearable devices and software applications (15–18). This makes the use of HRV-derived thresholds (HRVT) suitable for exercise training prescription and monitoring for the large population unable to utilize “gold standard” methods. Although several publications have found good group agreement of these HRVTs with either gas exchange or lactate-based measures (7, 8, 16–19), others have not and have shown large degrees of bias in the moderate to heavy boundary (7, 20, 21). Even when group mean measures are in good agreement, individual levels of bias may remain high. To help reduce individual bias, recent attempts have been made to improve the accuracy of surrogate threshold determination by combining HRVT with other physiological variables such as breathing frequency (20) and near-infrared spectroscopy-derived thresholds (8).
Similar to other HRV indexes studied during an incremental test (22, 23), the DFA a1 index decreases as exercise intensity increases. A value of 0.75 (HRVT1) has been associated with responses corresponding with those observed at the gas exchange threshold (GET) or the first lactate threshold (LT1) (16, 17, 19). Although HRVT1 has been reported as an effective tool to demarcate the moderate-heavy exercise intensity boundary during incremental tests on the treadmill (17) as well as during cycling exercise in male participants (19), other studies have not been as clear (7, 20, 21). Careful evaluation of the results seen in a study comparing different ramps (7), another involving cycling (20) and one composed of only female participants (21), revealed an overestimation of the HRVT1 in comparison to the GET. As the intensity of the exercise increases above the GET, DFA a1 continues to decline with a value of 0.5 (HRVT2) being strongly associated with the respiratory compensation point (RCP) (7, 8, 18, 20, 21) or second lactate threshold (19), which can be used to identify the metabolic rate demarcating the boundary between the heavy and the severe exercise intensity domains (24, 25). Studies that have investigated HRVT2 during incremental exercise on a treadmill (18) and on a cycle-ergometer in females and males (8, 20, 21) have shown a strong agreement between HRVT2 and the boundary separating heavy from severe intensity exercise. Additionally, a recent study has shown during three different ramp incremental rates (15 W·min-1, 30 W·min-1, and 45 W·min-1) that the heart rate (HR) and the oxygen consumption (V̇O2) at which HRVT1 and HRVT2 occur were consistent across ramps in both female and male participants, independently of the intensity increase rate (7).
Despite the practical value of using HRVTs, the reasons for some of the inconsistent results indicated above remain unknown. Thus, to confidently use DFA a1 derived thresholds to demarcate exercise intensity domains, more research is needed to establish the validity and reproducibility of DFA a1 as a proxy to estimate physiological thresholds (9). Additionally, to date, no study has examined the reproducibility of identical incremental protocols of HRVT measures. This is a critical aspect because consistent responses are needed for HRVT estimations to be useful in practical settings. Therefore, given the conflicting results concerning HRVT1 and GET/LT1 agreement, a study with larger sample sizes of both male and female participants, and consideration of both lactate and gas exchange thresholds would be helpful.
The present study aimed to determine i) the agreement between GET, LT1 and HRVT1; ii) the agreement between RCP, LT2 and HRVT2 and; iii) the reproducibility of HRVT1 and HRVT2 determination during 2 min incremental step protocols in female and male trained individuals. We hypothesize that, based on prior studies (7, 20, 21), the HR and V̇O2 at the HRVT1 may be overestimated compared to those at the GET/LT1, but that the HR and V̇O2 at the HRVT2 would show excellent agreement to the RCP/LT2. Moreover, based on the previous observation that individual HRVTs performed using ramp slopes between 15 to 45 W·min-1 showed excellent correlation and agreement (7), we hypothesize that HRVT1/2 reproducibility during similar step-tests would be high as well.

The aim of this study was to explore the strength of agreement between the HRVT1 with the GET/LT1, the HRVT2 with the RCP/LT2, as well as to assess the reproducibility of HRVT1 and HRVT2 across two identical testing protocols. The main findings were that: i) there was a strong agreement between the V̇O2 and HR reached at the HRVT2 compared to the values observed at the RCP/LT2; ii) there was significant bias in the V̇O2 and HR seen at the HRVT1 compared to the values obtained at the GET/LT1; iii) regardless of whether HRVT1 or HRVT2 agreed with recognized “gold standard” measurements, the reproducibility of individual HRV related thresholds was good.
HRVT2 and RCP/LT2 agreement.
The strong HRVT2 agreement with established methods to determine the heavy-severe intensity boundary such as the RCP and the LT2, reinforces results from previous investigations using both cycling and running test modalities (8, 18, 20, 21). Group bias of the HRVT2 compared to the RCP was negligible (i.e., < ±1 bpm for HR and < ±1 mL∙kg-1∙min-1 for V̇O2), with LOAs of ~15 bpm and 9 mL∙kg-1∙min-1 and no meaningful differences when stratified by sex (Figure 3). The HRVT2 agreement to the LT2 was also high (i.e., < ±1 bpm for HR and < ±1 mL∙kg-1∙min-1 for V̇O21), with LOAs of ~17 bpm and 10 mL∙kg-1∙min-1, which also displayed no meaningful differences between females and males (Table 2). Furthermore, moderate to strong correlations of HR and V̇O2 at the HRVT2 to the HR and V̇O2 at the RCP (r > 0.65) and the LT2 (r > 0.50) were found. The confidence interval within the limits of agreements confirm equivalence for V̇O2-HRVT2 and HR-HRVT2 for both, females and males. Although the small biases and the high correlations between the HRVT2 and well-established outcomes of the threshold separating the heavy to severe intensity domains (i.e., the RCP and the LT2) existed, one should still consider the moderately wide LOA of either V̇O2 or HR, which could make exact individual participant threshold identification challenging. As previously proposed, using a combination of non-invasive, consumer grade device-based signals such as HRV, near-infrared spectroscopy deoxyhemoglobin, and/or breathing frequency (e.g., HRVT2, [HHb]BP, and EDRT2, respectively) for the estimation of the heavy-severe boundary of exercise appears to lead to more precise delineation of this threshold (8, 20). Further, it should be considered that exact identification of exercise intensity thresholds is not absolutely precise, even when using the commonly accepted gold-standard evaluations. For example, gas exchange and ventilatory parameters are known to have noise in the signal of ~100-150 mL∙kg-1∙min-1 (38, 39), and to rely on the interpretation of an experienced evaluator (with typically at least two evaluators contributing to the analysis) (3). Lactate derived thresholds also include limitations for the analysis that can be impacted by the approach used for their estimations (e.g., visual inspection, mathematical model, etc.) (31, 40), and by the amplitude of the steps selected for the evaluation. Then, given the uncertainties that even the so-called “gold-standard” approaches, and considering the negligible bias in HR and V̇O2 and the high correlations reported in this study, the HRVT2 can be considered a viable low-cost, non-invasive surrogate to approximate the RCP or LT2.
HRVT1 and GET/LT1 agreement.
The HRVT1 agreement to the GET/LT1 was considerably weaker than that of the HRVT2 to the RCP/LT2. The mean bias for the GET was significant and relatively high in females and males for both HR and V̇O2 (i.e., > ±11 bpm and > ±5 mL∙kg-1∙min-1) (Figure 2). This was similar for the HRVT1 to LT1 agreement, which showed a significant bias in both females and males that was greater than 14 bpm (Table 2). Similarly, the LOA for HRVT1 in relation to the GET for both HR and V̇O2 were large (>22 bpm and 9 mL∙kg-1∙min-1, respectively), and the correlations were also weaker than those seen for the HRVT2 metrics. Thus, unlike what was observed for the HRVT2 in relation to the RCP and LT2, the viability of the HRVT1 as a low-cost, non-invasive surrogate to the GET or LT1 is not evident based on the present data. The positive bias observed for the HRVT1 (i.e., greater HR and V̇O2) in relation to the GET has been noted in several other reports (7, 20, 21). In a study exploring HRVTs in female participants, the positive bias in the HRVT1 was speculated to be a consequence of sex hormone differences. However, in the current report, although the positive bias seen in the female cohort was similar to the previous study in women (21), the group bias in males was even larger, which is at odds with the idea postulated above. Nevertheless, contrary to the current study, Mateo March et al. (19), using a similar cycling model with LT1 determination criteria, the same HRV interpretation software (Kubios), and recording device (Polar H10), found strong agreement and negligible bias in the HRVT1 compared to the LT1. Strong agreement with minimal bias was also seen in smaller studies using both the Polar H10 (16, 41) and ECG recording devices (17, 42).

Explaining the discrepancy.
The reasons for the ambiguities noted in the literature investigating HRVT1 are not easily explained. The strong agreement observed when assessing HRVT1, HRVT2, GET, and RCP in three different ramps and using the same methodology for thresholds detection might indicate that there is a tight physiological mechanism behind. Studies where GET was assessed based on the concept of a change in ventilation at the beginning of the isocapnic buffering (Figure 1A) related to an increase in [La-]b (Figure 1B) found an overestimation of HRVT1. The discrepancy in the results shown by different research groups might be due to the different incremental protocols and analysis models used to estimate thresholds. The rate of increase in workload can affect the dynamic profiles of some physiological responses such as V̇O2 and [La-]b, (40, 43). Further, the criteria or mathematical model used to evaluate exercise thresholds can also affects the V̇O2 or HR assigned to each of them (31). For instance, methods like the excess carbon dioxide detect a later breakpoint related to the change in the V̇E/V̇CO2 relationship (Figure 1A) (44). Several potential factors have been noted to affect DFA a1 measurement including artifact correction methodology (33), recording device bias (33), ECG vs chest strap sensor pad placement and HRV preprocessing/detrending (9, 45). Although recording devices have varied per published report, it is apparent that studies using the identical device (Polar H10), HRV software (Kubios HRV), and working off low artifact data have shown markedly different HRVT1 agreements to what is reported here (19). To explore other considerations, an examination of the logic behind the HRVT1 concept is in order. The HRVT1 was originally founded on the notion that cardiac beat to beat correlation patterns decline as exercise intensity rises, reaching certain benchmark values that denote the GET, then eventually, the RCP.  DFA a1 is a dimensionless number representative of the fractal complexity, correlation of the cardiac beat series and is primarily determined by autonomic nervous system balance. After reviewing past studies concerning DFA a1 decline with increasing work rates, it was observed that index values were well above 1.0 at very low intensities, moved through a “partially” correlated zone at moderate intensities, passed the “uncorrelated” value of 0.5 near the heavy/severe intensity boundary, and finally reaching values below 0.5, signifying an “anticorrelated” pattern at severe intensities (15).  It was originally theorized that at the GET/LT1, the cardiac beat pattern would be found in an intermediate zone between well correlated (DFA a1 > 1) and uncorrelated (DFA a1 = 0.5) behavior, which was set to 0.75 (17). Since the HRVT1 was based on a hypothetical midpoint, it can be argued that some individuals may have higher baselines and hence altered midpoints from the fixed 0.75. Although not done in the current data set, recalculation based on either adjusted midpoints or back calculation from the GET to derive “personal” DFA a1 HRVT1 values should be further explored. As seen in figure 2, utilizing the HRVT1 boundaries for training purposes would have led to inappropriate training targets in the majority of individuals, illustrating the need to better define the HRVT1.
In contrast, virtually all studies to date show strong agreement of the RCP to the HRVT2 (8, 18, 20, 21). The weaker agreement of the HRVT2 to the LT2 seen in one report could be attributed to a variation in LT2 definition (19). As opposed to the HRVT1, the reasoning behind the HRVT2 (a DFA a1 value of 0.5 corresponding to the RCP), is more straightforward. The value of 0.5 represents a random, uncorrelated beat pattern with loss of fractal properties (46). A potential functional objective of these correlation rhythm patterns may be a physiologic optimization and/or stabilization strategy to best suit internal load requirements (47). As DFA a1 values drop below the uncorrelated boundary (below 0.5), they enter an anticorrelated range. This behavior may denote a condition of maximum organismic energy flow at the cost of cardiovascular self-regulation and can only be maintained for short time spans (48). In view of these considerations, the HRVT2 association to a discrete mathematical concept of cardiac regulation differs from the observational “midpoint” nature of the HRVT1 and therefore may display less individual variability.

HRVT reproducibility.

When depending upon a given surrogate value for intensity threshold purposes, not only it is important to have good group and individual agreement, but it is imperative that the test is reproducible when done on a repeated basis. Thus far, no studies evaluating reproducibility of DFA a1 related HRVTs have been published. Even with a four month of gap between sessions, the results seen here indicate good to excellent degree of reliability based on two repeat tests with an ICC > 0.86, Pearson’s r > 0.79 for HRVT1, an ICC > 0.81, Pearson’s r > 0.68 for HRVT2, and no difference between mean values according to paired t testing (p > 0.05).  These results somewhat mimic those seen in the comparison of HRVTs derived from cycling ramps of different slopes (7). That analysis showed very strong correlation with negligible bias of both the HRVT1 and HRVT2 over ramp slopes of 15, 30 and 45 W·min-1. Based on the present results and those of the cycling ramp slope report, it appears that HRVT1 and HRVT2 are highly reproducible and repeatable even when a significant bias is present as in the case of the HRVT1 in relation to the GET/LT1. Therefore, if an athlete was able to calculate a personal HRVT1 and HRVT2 against an established method (GET, RCP, LT1, LT2, or others), they potentially could use HRVT boundaries for training intensity distribution and longitudinal physiologic benchmarks. Additionally, since deriving the HRVTs may not require exercise testing to exhaustion, incorporating brief testing sessions for the purpose of monitoring fitness status or assessing autonomic fatigue may be practical.
Experimental considerations
Discussions concerning HRVT and DFA a1 methodological limitations has been reviewed previously (9). Considerations include proper software preprocessing (detrending), low artifact containing data, optimal chest belt placement and mixed female/male participants. Further, in this study in which short steps were used, the dissociation between V̇O2 and PO from incremental compared to constant load exercise could not be fully resolved. Thus, the HR and V̇O2 rather than the PO associated with the HRVTs were calculated. Alternatively, when identifying the PO associated with the HRVTs is relevant, longer steps that allow for steady state response (49), or short steps/ramp incremental test that consider the mean response time and the slow component of V̇O2 (50, 51) and/or shallow ramp slopes (43) would be recommended. Additionally, since an autonomic index such as DFA a1 may be affected by fatigue, testing after prior intense exercise, stress or infection may affect the results. Although the 2-minute measurement window with a short-term fluctuation (N1) setting of 4 ≤ n ≤ 16 beats was used in previous HRVT studies, modification of these settings may or may not improve agreements with gold standards. The modifications could be further evaluated thought a sensitivity and/or exploratory factor analysis. Finally, the relationship of DFA a1 related behavior with constant work exercises below, at, and above threshold estimations needs further exploration.
The agreement of the DFA a1 based HRVT1 HR and V̇O2 to the GET/LT1 HR and V̇O2 exhibited a substantial bias with moderate correlation in both females and males. Despite this, there was strong agreement/correlation between the HRVT2 HR and V̇O2 with those values derived from the RCP/LT2. Test to test reproducibility of HRVT1/2 was excellent with similar ICC when comparing to established methods during test 1 and test 2. The current study provides evidence that the HRVT2 is a reproducible estimate of the RCP/LT2 but further investigation into better defining and improving HRVT1 methodology is needed.  

Reviewer comments.

As a change of pace, I wanted to share a couple of interesting questions posed by the reviewers.  The first has to do with the sample rate of the Polar H10 and the second with altering the "window width" of the DFA a1 calculation (4 minutes instead of 2 minutes).

Comment #1 - As another example, although a sampling rate of 1,000 Hz is mentioned, as far as I know the ECG waveform with that sampling rate is not available from the H10 sensor, only the RR interval time series, which does not have a sampling rate of 1,000 Hz unless it was somehow resampled as such. A few additional details on the processing steps would help the reader understand potential limitations.

Thank you for your comment. Polar does publish an API that allows for ECG recording with a fixed sample rate of 130 Hz (as seen in the Polar API https://github.com/polarofficial/polar-ble-sdk). We used the Android app “ECG logger” which uses the Polar API and records the Bluetooth output from the H10 as continuous ECG data stream for the purpose of waveform optimization and belt placement. The actual
sample rate for RR recording for the H10 is conventionally noted as being 1000 Hz
(https://www.nature.com/articles/s41598-023-38329-w#Sec8 and
https://pubmed.ncbi.nlm.nih.gov/31004219/), however, there is no “official” documentation currently on the Polar site showing this (to our surprise). It has also been discussed on the Polar API GitHub site
(https://github.com/polarofficial/polar-ble-sdk/issues/343) and it seems that there is some relation to a
1/1024 time fractionation. The Polar H10 related DFA a1 agreement to a reference ECG (lead 2, 500 Hz
sample rate) during incremental exercise has been evaluated previously (Schaffarczyk et al., Validity of the Polar H10 Sensor for Heart Rate Variability Analysis during Resting State and Incremental Exercise in
Recreational Men and Women. Sensors 2022, 22, 6536. https://doi.org/10.3390/s22176536) and the
correlation /agreement was strong.

Take home lesson - there is no published "official" sample rate for the H10 RR series!

Comment #2 - The authors claim that 2 min recordings were chosen based on the minimum number of RR intervals recommended for DFA and cite Chen et al. as evidence. I'm not certain Chen et al. advocated for such short windows to obtain a consistent/robust DFA metric.
Although short time series (256-400 RR intervals) can produce accurate DFA values, it is by no means

Thank you for highlighting this as it is a very interesting topic for discussion. This has been a concern from the start, with some theoretical exploration as well (clinical eval here, https://www.mdpi.com/1424-
8220/23/6/3325 and theoretical here, https://www.mdpi.com/1099-4300/24/1/61). Originally, based on
a previous study from Chen’s research group
(https://journals.aps.org/pre/pdf/10.1103/PhysRevE.64.011114), they recommended a ratio of N/10
between the total number of points included in the window and the short term fluctuation window width.
The reference was updated including their previous paper (line 166). Assuming HRVT1 above 80 bpm (or
160 bpm in the 2 minutes time window) this time window has been often chosen in the literature as a
compromise between accurate physiologic evaluation and DFA a1 validity. If we used longer time spans,
we potentially would not capture the dynamic changes to the index as exercise intensity rises. For
instance, with a 45 watt/minute ramp, a 4-minute DFA a1 measurement will encompass 180 watts of
intensity change. Therefore, even at peak intensity, a good portion of the HRV window will contain
substantially lower intensity effects. The example below may shed some light on this issue. The following
plot of DFA a1 over time is from a participant who performed a 45 watt/min ramp (data from
https://physoc.onlinelibrary.wiley.com/doi/10.14814/phy2.15782 and the same participant in Figure 1B).
The plot below includes warmup, a step up for MRT calculation (the dip in a1 at 1000 seconds) the full
ramp and cool down. We plotted the DFA a1 using the “conventional” 2-minute sample window as well
as a 4-minute sample window. One can observe that during steady state exercise (warmup/cooldown),
the a1 is quite similar. However, there is a difference in both curve shape and curve nadir during the ramp:

Take home lesson - a1 measurement windows longer than 2 min are not suitable for rapidly changing situations such as incremental ramps, but could be useful for more steady state conditions (pre/post ramp) or constant intensity testing.

Overall study comments:
  • This reports adds to the existing evidence that the HRVT2 is a valid proxy for the RCP/LT2 at least on a group basis.  Additionally, adding EDRT and/or NIRS O2 desaturation breakpoints should get you even closer to individual gold standards.
  • There needs to be further thought on the HRVT1 concept since this report along with what we cited above (and recently this) indicate over estimation compared to the GET/LT1/VT1.  We are currently working on that topic.....😁
  • Day to day repeatability of HRVT1/2 is quite good as noted in the current study and the new report in Frontiers. Therefore, even is the HRV threshold is biased, at least it will be repeatable, making it useful for following fitness, illness, stress and fatigue. This repeatability is also independent of ramp slope (15 to 45 w/min).
Heart rate variability during dynamic exercise