Brad Schoenfeld came out with a new paper,
–Resistance Training Volume Enhances Muscle Hypertrophy
I want to give some details and thoughts I haven’t seen discussed, with the background of Lyle McDonald review, here.
What did the study do?
Trained subjects performed,
- 1 set
- 3 sets
- 5 sets
Per exercise, per workout, over 3 workouts per week, 8-12 reps per set.
“Using previously established criteria, this translated into a total weekly number of sets per muscle group of 6 and 9 sets for 1 SET, 18 and 27 sets for 3 SET, and 30 and 45 sets for 5 SET in the upper and lower limbs, respectively. The number of sets was greater for the lower-body as compared to the upper-body musculature.”
Another way to view it is, over 3 workouts per week,
- 1 set group = ~20 reps per muscle group, per workout (~80 reps per week)
- 3 set group = ~76 reps per muscle group, per workout (~225 reps per week)
- 5 set group = ~123 reps per muscle group, per workout (~370 reps per week)
The study found?
There was no difference in 1 rep max strength between the groups.
There was a difference in muscle size, with, in general, more volume meaning more hypertrophy.
This was over 8 weeks, with 34 subjects.
The critique
One of Lyle’s main critiques, and the one that caught on the most in Schoenfeld’s comment section, was the subjects performing 5 sets started from a lower baseline. Lyle’s main contention being if they didn’t, he doubts there’d be much improvement beyond the 3 set group, or 76 reps per muscle group, per workout, at 3 times per week.
The response to the critique
From Schoenfeld,
“The claim that the highest volume group was “simply catching up” with growth is not based on a scientific approach to the concept. It assumes that you can just look at raw data scores and make such inferences. You can’t. We actually employed a statistical measure called an ANCOVA that adjusted for pre-test means. Thus, differences at baseline were accounted for statistically and did not influence results.”
Alright, so now it’s getting very technical. Before examining ANCOVA, I want to back up,
More obvious concerns
I preface this section by saying I am a big fan of Brad Schoenfeld’s work. But good research is hard, incredibly so when we’re trying to model humans.
This is a study of 34 people. Thirty. Four. Roughly 11 per group.
25% of the subjects dropped out.
If this were a drug, would a doctor change his prescribing habits?
Reading some people’s response to this study you’d think five thousand people were assessed. My god, flip a coin 34 times and you might not know much about its probability.
Furthermore, look at how often this kind of statement is mentioned,
“Squat 1-RM
We were unable to gauge successful 1RM in one of the subjects from SET1 and one of the subjects from SET3 in the allotted number of trials, and thus had to exclude their data from analysis.”
“Elbow Extensor Thickness
We were unable to achieve satisfactory imaging in one of the subjects from SET3, and thus had to exclude his data from analysis.”
“Mid-Thigh Thickness
We were unable to achieve satisfactory imaging in three of the subjects from SET3 and two of the subjects from SET5, and thus had to exclude their data from analysis”
The sample size wasn’t always 34.
The study was only done for 8 weeks. 8 weeks!
Next, there are multiple instances where the researchers weren’t blinded. At least it doesn’t say they were in the paper,
- “All routines were directly supervised by the research team…”
- “The lead researcher, a trained ultrasound technician, performed all testing…” (this is how muscle size was measured)
- “Confirmation of squat depth was obtained by a research assistant…”
Including the research assistants thanked in the study, we’re dealing with 35 people. How many different people were part of carrying out the routines? Are we sure they were all on the same page? Were any biased one way or another?
-> I interned at one of the more well known private gyms in America and have worked at one of the biggest commercial gyms out there. I’ve never seen any two coaches (never mind dozens) exactly on the same page. I’m not even sure it’s possible.
Again, if a drug study isn’t blinded, what’s your first response?
Another critical point Lyle brought up was the dietary monitoring. What I’d add to his critique is the subjects were only monitored one week before the study started, and the final week of the study. There’s a lot of room for change there.
Did the higher volume group start eating more at some point?
Now we’re getting really nitty gritty, but how about this one-
“all subjects reported performing multi-set routines prior to the onset of the study and a majority did not regularly train to momentary failure. It is unclear how the novelty of altering these variables affected the respective groups.”
The subjects were college aged males. In my experience, these guys loooove higher volume upper body training, but not lower body. Where I could see giving them a break from higher volume upper body work could make them stronger bench press wise, while pushing the lower body volume could also make them stronger squat wise.
That’s what you see in this study.
- The 5 SET group did get stronger than the 1 SET group, in the squat movement, it just didn’t hit statistical significance
- While the 1 SET group got stronger than the 5 SET in the bench press
Did the upper body simply get some time to deload?
My takeaway
Again, I think Brad is doing some very good work. For example, they state they waited 48 hours until after training to make muscle size measurements, to try and insure inflammation had calmed down. That’s attention to detail.
And they acknowledge some of these limitations. My only point here is you can’t read a study like this and dramatically change your opinion on the topic. There are too many ifs and buts.
Warren Buffett and Charlie Munger are the most famous investors ever. When they examine a potential new investment, they group each one into,
- “Yes”
- “No”
- “Too hard”
For instance, if the business is in software, they’ve historically said that’s too hard. Not because they can’t understand the business, but because it’s too tough to know what the business will look like in ten years, because how much software tends to change.
With banks, sometimes it’s too hard because, even with their enormous accounting backgrounds, they can’t figure out the numbers. That makes them suspicious.
With the “more obvious concerns,” I find it easy to throw this into the too hard pile. What would happen if there were a bigger sample size, what would happen if we used older participants, it’s too hard to project this study out to my clients, which is my benchmark for valuing a study.
At most, if a client is struggling with hypertrophy, in general, I have ammunition to say, “Let’s try more volume.” If strength is the concern, “We don’t need to think about volume as much.” But that’s already been fairly well accepted.
As far as figuring out the numbers…
Mathematics and Statistics and Humans, oh my!
The bolding of Brad’s statement was what really piqued my interest,
“The claim that the highest volume group was “simply catching up” with growth is not based on a scientific approach to the concept. It assumes that you can just look at raw data scores and make such inferences. You can’t. We actually employed a statistical measure called an ANCOVA that adjusted for pre-test means. Thus, differences at baseline were accounted for statistically and did not influence results.”
That’s a strong statement in the world of statistics.
I’m not intimately familiar with ANCOVA though. So I started digging around.
Wikipedia:
Another use of ANCOVA is to adjust for preexisting differences in nonequivalent (intact) groups. This controversial application aims at correcting for initial group differences (prior to group assignment) that exists on DV among several intact groups. In this situation, participants cannot be made equal through random assignment, so CVs are used to adjust scores and make participants more similar than without the CV. However, even with the use of covariates, there are no statistical techniques that can equate unequal groups. Furthermore, the CV may be so intimately related to the IV that removing the variance on the DV associated with the CV would remove considerable variance on the DV, rendering the results meaningless.
I then found this paper,
–Misunderstanding Analysis of Covariance
Which is a deep dive at how misapplied ANCOVA is.
I still couldn’t make sense of it. For what it’s worth, I have a minor in mathematics, but my math is no doubt rusty.
I hit up three people. My brother and two former clients. Between the three of them:
- B.S. computer science
- B.S. physics
- B.S. mathematics
- B.S. mathematics
- M.S. biostatistics
- in-the-middle-of-his PhD
I exchanged about three thousand words with them.
None could speak absolutely about the topic, but all felt the ANCOVA use was probably alright. Certainly not out of the ordinary. All I can say is I have to lean towards their opinions, because I can’t say I’ve been able to garner a valid enough one of my own. That is, none of them swayed me to put more weight into this paper.
Tangent
If after extensive effort you still can’t wrap your head around a paper’s methods section, or you’re not getting a strong opinion one way or another, you need to seriously question if you can put that paper into real world use. Now, you can say that’s incredibly arrogant. Maybe the person is too ignorant / stupid / lazy / biased to grasp it. That’s fine. I won’t argue those are possibilities with anyone, myself included.
But Warren Buffett and Charlie Munger are again a fitting mention. First, a quote, paraphrasingly,
“We don’t need to, nor is it possible, to have an opinion about every investment.”
The majority of investments they see cause them to take no action. That’s how the majority of research you read should go. If you’re changing your practice after every study you read, you’re way off. Most research is not a home run.
Second, back in 2007, they, along with people like David Einhorn (‘nother billionaire), chastised the esoteric mathematical modeling being done in finance about assessing risk. It’s a debate I would highly recommend to anyone who deals with risk in any manner. Personal trainers deal with this every training session when having to assess how risky an exercise is.
Actually, back in 1998 this debate was also going on. Some Nobel laureates, along with a group of what many would consider the highest IQ ever assembled, possibly in any endeavor, started a firm called Long Term Capital Management.
This is how they started,
Those are outrageously good returns.
But we all remember what happened in 2008 and 2009. The economy imploded. Five sigma events -“million year floods”- were happening daily. Turns out, people like Buffett and Munger were correct. Guys who largely state formulas are useless, and if you’re ever doing anything more than algebra with a pencil, you’ve gone wrong.
I have a lot of respect for math, and it can be hard for those who haven’t gotten to a certain level to grasp what higher level math is even remotely doing. However, my experience in that world, and one reason I stopped when I got to that level, is because it is damn near impossible to speak with any real certainty in applying that math to human beings. (Physics? Different ballgame.) The assumptions many mathematical and statistical models make, when applied to humans, are down right comedic. And if you don’t get the assumptions right, the math is pointless.
Sometimes high level math, or methods you don’t understand, are not a reflection of your intellect lacking. Sometimes it’s a reflection of overthinking, misleading, having excessive comfort with false precision. Taking on risk feels better when you can quantify it to the tenth decimal point, but that doesn’t mean it’s right.
Here’s what a Nobel prize and tenth decimal precision gets you in investing, which is all about understanding human behavior,
–When Genius Failed: The Rise and Fall of Long-Term Capital Management<- great book!
kierfinnegan
September 10, 2018
This is a great article. I’d like to address Lyle’s comments and Brad’s response. I think Brad’s faith in the ANCOVA method is perhaps a bit over the top and I think Lyle has a point. It’s true that ANCOVA adjusts for baseline measurements but you still assume that the slopes are equal and linear which doesn’t seem to be the case here. So, the assumption is that all groups change at a similar rate. Failing that, you have to change your interpretation somewhat. Here’s a nice explanation:
https://www.theanalysisfactor.com/ancova-assumptions-when-slopes-are-unequal/
I think it would have been more appropriate to use a mixed effects model which allows for direct measurement of subject-specific responses as well. We can usually safely assume that individual differences outweigh average differences. It can be summed up by 2 simple examples:
1) you give subjects a drug. You will typically see high and low responders to the treatment. If you measure them over time you will likely see higher measurements over time for the high responders and vice versa. With vaccines, you tend to see high responders protected for a long time but low responders may need a booster shortly after. By Allowing subjects to vary randomly in the model you can measure the heterogeneity in subjects alongside the mean response of their group. Sometimes the differences between subjects can be more interesting/informative
2) This ties in more closely to Lyle’s comment. if you think about treatment of patients with depression. Mental wellbeing is usually measured by a score, based on a standardised questionnaire. You can treat them and measure their score afterwards. Patients with a low score have less room to improve whereas patients more severely depressed can be expected to make bigger improvements. Even if they end up at the same score after treatment, the people with higher mental impairment will have had a bigger change in their score. So their baseline is negatively correlated with their response to treatment (higher baseline = bigger decrease in score). So, I have to agree with Lyle; a lower baseline could allow for a bigger change, or simply a shift towards the norm.
That said, I know next to nothing about responses to exercise. And I still haven’t read the whole paper…
b-reddy
September 11, 2018
Hey Kier!
First, I did read your last email before I posted this :). After 10 days or so of discussion with people, I just wanted to get this post up while the study was still fresh in people’s minds.
(To other readers, Kier is one of those I asked to help me due to his stat background.)
That is a good link you sent. One of the more clarifying reads for me. The parallel lines section of that post remains one of my hiccups with the study we’ve been discussing.
Your second point is actually a point I just sent to the PhD student. That the covariate -starting amount of muscle mass- will, in some way, certainly at some point, dictate how much muscle can be gained. That, best I can tell, is a big wrinkle when trying to adjust for baselines.
It’s interesting. At first, everyone I asked about this seemed ok with the statistics methods. Not overly convinced, but ok. As I’ve talked to you all more though, I can feel you all becoming a little less ok with it. (The PhD student just yesterday mentioned the mixed effects model to me too.)
One reason I think that may be, and something that is very hard with papers like this, is you can’t fully separate the statistics from the biology, due to everyone needing to understand the assumptions. As you hit on, it’s hard for you to speak too affirmatively because you don’t have the exercise background. While I have the opposite problem.
That was more or less the rationale behind the bottom section of what I wrote.
Something I’ve discussed with the other stat client is I think there is a lot of opportunity in that world to some how help practitioners and researchers better use statistics. I think, as Lyle alluded to, without enough background so many practitioners revert to “Screw this, I”m assessing it in more basic terms.” That might be valid, but many times it’s not. Perhaps a field for you to get into now that the thesis is all done!
Thank you again for looking at all this with me.
Iason P.
September 13, 2018
For me the main thing is that, before this study, if you told somebody that a person would be doing 45s/wk for quads, for 8 straight weeks, he would tell you that he would be dead by the second week. Well, guess what…
To your drop-out point, we don’t know if the people were solely from the 5-set group. Brad did mention that in the exit interviews, the subjects in the 5-set group didn’t report anything particularly bad. And to your blinding point, this is something you practically cannot avoid in training studies.
Main takeaways: if you have a muscle group, that you haven’t been able to bring up for the duration of your lifting career (“Stupid calf genetics, bruh!”), you probably haven’t tried EVERYTHING. There is room for more. The guys were doing 45 sets and were STILL gaining. The threshold could be even higher (50-60 sets), which is INSANE.
b-reddy
September 14, 2018
I think the amount of workload is a very crucial point once trying to generalize this research. Common sense says trying to extrapolate research on college aged males to the average person (e.g. 35 years old) is precarious.
My experience seeing personal trainers work with athletes, then try to write about that experience and translate it to everyday people, along with how researchers will write a study like this but then say “If people are interested in hypertrophy, they should X, Y, Z,” (they should say “if college aged males are interested in yada yada) says the average person in exercise science way, way underestimates the difference between a college aged male with a college lifestyle to an everyday person, with kids and a 9-5 job.
The workload doesn’t sound crazy to me for a college dude. For a single workout, 5 sets of leg pressing, 5 sets of squats, 5 sets of leg extensions, and you’re there for the quads. It’s a lot, but when you have that much free time…But for a regular person, you’re right, that is pushing it, to say the least.
I had the same question about where the dropouts came from. They were actually split evenly amongst the three groups. (They have a flow chart in the full text showing this.) Now, what would happen if this was done for longer than 8 weeks?
I like your takeaway. That’s largely mine too.
I have another post coming about this study, as I got more data from a James Krieger post (coauthor). To play a little devil’s advocate to our takeaway though: in one muscle group, you see 20% of the 5 set people *lose* muscle, while 66% of the 3 set group and 50% of the 1 set group gained muscle.
Maybe our takeaway is too simple. After all, what if we have a 3 set person, we push them to 5 sets, and they lose muscle? Is the right answer to go to 6 sets, or 2?
Iason P.
September 14, 2018
Of course you don’t just generalize to everybody. Even in the study they only applied really-extreme (by previous standards) volume to one muscle group. A ramping approach is probably best. Nobody said what Lyle thinks this study said. That “hey, you should switch your program so that you train EVERY muscle group with 45 sets”. He lost a bit of credibility in my eyes with statement.
Regarding the muscle group where the 5-set group had a high muscle loss percentage. You are talking about the triceps, right? The distribution of results is similar to the other muscles (high responders on 5>HR-3>HR-1). But given that the Triceps showed the most muscle losses in all groups, it seems that, maybe, possibly(?), they were doing less work than they were previously doing or than they needed for more to progress. This is where the insane part (50-60 sets) comes in (maybe throught isolation work and whatnot).
Your last question is composed of many of the main questions everyone has on their mind. 1)How do you track volume or measure hypertrophy EXACTLY (lab-level accuracy) on day-to-day applications, when scientists haven’t figured out how to do so themselves?! What proxies are useful (soreness, general tiredness, repetition strength, max strength)? It is especially frustrating given the fact that we cannot be sure that we will know before we die!
But to answer your question, my hunch, stills seems to be 6 more than 2, for the 2nd-paragraph reasons. One thing that I didn’t mention in my first comment: they did 45 sets TO FAILURE. Really, let’s all let that sink in. Think of all the muscle damage, think of everything else. What % of people push themselves adequately (less than 4 reps in reserve)? Not many (https://www.ncbi.nlm.nih.gov/pubmed/29112055). Working harder seems to be the answer, a lot of the times.
b-reddy
September 17, 2018
I’ll have another post about the study up today, and I agree with pretty much everything you’re hitting.
The only pushback I have is the conclusion of the study does generalize, to *everyone*. It specifically says “Thus, those seeking to maximize muscular growth [should go with high volume training]…” It doesn’t say college aged males seeking. It doesn’t say those seeking over an 8 week period. It doesn’t say one should ramp first and adapt based on results. It doesn’t say those with a trainer watching their technique, those eating a certain level of protein, with a certain level of lifting experience, etc.
I get that may be pedantic, and that someone like you isn’t just saying everyone should jump to 45 sets for hypertrophy or 13 minute workouts for strength, but I had clients pushing 70 years old asking me, half jokingly, why we’re doing anything more than 13 minute workouts…linking me to this study. Due to coverage in the NYTimes. (Where Schoenfeld more or less gives the green light for older people to do the same kind of training to get the same kind of results.)
My point being while you clearly have cautious, inquisitive eyes, with a study like this, I’m not so sure your level of caution was exemplified by the authors in their discussions of it. In my view, the conclusion is where all the hoopla has come from. In the least, when you have at least two of the authors having to write multiple blog responses to the critiques (James Krieger, who I’m an enormous fan of, has also felt the need to throw his hat in the ring), you can make an argument something wasn’t communicated well.
Iason P.
September 17, 2018
Yeah, totally. I was mainly approaching this study in the “fitness industry circle”-manner. And if someone gets the wrong idea from an article quoting Brad, on a study by Brad, we cannot have much faith in people receiving proper info, considering the rest of the articles in the field are written by less reputable sources!