Schoenfeld’s new volume and muscle strength and size study

Posted on September 10, 2018

(Last Updated On: September 10, 2018)

Brad Schoenfeld came out with a new paper,

Resistance Training Volume Enhances Muscle Hypertrophy

His more layman’s write up

I want to give some details and thoughts I haven’t seen discussed, with the background of Lyle McDonald review, here.


What did the study do?

Trained subjects performed,

  • 1 set
  • 3 sets
  • 5 sets

Per exercise, per workout, over 3 workouts per week, 8-12 reps per set.

“Using previously established criteria, this translated into a total weekly number of sets per muscle group of 6 and 9 sets for 1 SET, 18 and 27 sets for 3 SET, and 30 and 45 sets for 5 SET in the upper and lower limbs, respectively. The number of sets was greater for the lower-body as compared to the upper-body musculature.”

Another way to view it is, over 3 workouts per week,

  • 1 set group = ~20 reps per muscle group, per workout (~80 reps per week)
  • 3 set group = ~76 reps per muscle group, per workout (~225 reps per week)
  • 5 set group = ~123 reps per muscle group, per workout (~370 reps per week)


The study found?

There was no difference in 1 rep max strength between the groups.

There was a difference in muscle size, with, in general, more volume meaning more hypertrophy.

This was over 8 weeks, with 34 subjects.


The critique

One of Lyle’s main critiques, and the one that caught on the most in Schoenfeld’s comment section, was the subjects performing 5 sets started from a lower baseline. Lyle’s main contention being if they didn’t, he doubts there’d be much improvement beyond the 3 set group, or 76 reps per muscle group, per workout, at 3 times per week.


The response to the critique

From Schoenfeld,

“The claim that the highest volume group was “simply catching up” with growth is not based on a scientific approach to the concept. It assumes that you can just look at raw data scores and make such inferences. You can’t. We actually employed a statistical measure called an ANCOVA that adjusted for pre-test means. Thus, differences at baseline were accounted for statistically and did not influence results.”

Alright, so now it’s getting very technical. Before examining ANCOVA, I want to back up,


More obvious concerns

I preface this section by saying I am a big fan of Brad Schoenfeld’s work. But good research is hard, incredibly so when we’re trying to model humans.

This is a study of 34 people. Thirty. Four. Roughly 11 per group.

25% of the subjects dropped out.

If this were a drug, would a doctor change his prescribing habits?

Reading some people’s response to this study you’d think five thousand people were assessed. My god, flip a coin 34 times and you might not know much about its probability.

Furthermore, look at how often this kind of statement is mentioned,

“Squat 1-RM

We were unable to gauge successful 1RM in one of the subjects from SET1 and one of the subjects from SET3 in the allotted number of trials, and thus had to exclude their data from analysis.”


“Elbow Extensor Thickness

We were unable to achieve satisfactory imaging in one of the subjects from SET3, and thus had to exclude his data from analysis.”


“Mid-Thigh Thickness

We were unable to achieve satisfactory imaging in three of the subjects from SET3 and two of the subjects from SET5, and thus had to exclude their data from analysis”

The sample size wasn’t always 34.


The study was only done for 8 weeks. 8 weeks!


Next, there are multiple instances where the researchers weren’t blinded. At least it doesn’t say they were in the paper,

  • “All routines were directly supervised by the research team…”
  • “The lead researcher, a trained ultrasound technician, performed all testing…” (this is how muscle size was measured)
  • “Confirmation of squat depth was obtained by a research assistant…”

Including the research assistants thanked in the study, we’re dealing with 35 people. How many different people were part of carrying out the routines? Are we sure they were all on the same page? Were any biased one way or another?

-> I interned at one of the more well known private gyms in America and have worked at one of the biggest commercial gyms out there. I’ve never seen any two coaches (never mind dozens) exactly on the same page. I’m not even sure it’s possible.

Again, if a drug study isn’t blinded, what’s your first response?


Another critical point Lyle brought up was the dietary monitoring. What I’d add to his critique is the subjects were only monitored one week before the study started, and the final week of the study. There’s a lot of room for change there.

Did the higher volume group start eating more at some point?


Now we’re getting really nitty gritty, but how about this one-

“all subjects reported performing multi-set routines prior to the onset of the study and a majority did not regularly train to momentary failure. It is unclear how the novelty of altering these variables affected the respective groups.”

The subjects were college aged males. In my experience, these guys loooove higher volume upper body training, but not lower body. Where I could see giving them a break from higher volume upper body work could make them stronger bench press wise, while pushing the lower body volume could also make them stronger squat wise.

That’s what you see in this study.

  • The 5 SET group did get stronger than the 1 SET group, in the squat movement, it just didn’t hit statistical significance
  • While the 1 SET group got stronger than the 5 SET in the bench press

Did the upper body simply get some time to deload?


My takeaway

Again, I think Brad is doing some very good work. For example, they state they waited 48 hours until after training to make muscle size measurements, to try and insure inflammation had calmed down. That’s attention to detail.

And they acknowledge some of these limitations. My only point here is you can’t read a study like this and dramatically change your opinion on the topic. There are too many ifs and buts.

Warren Buffett and Charlie Munger are the most famous investors ever. When they examine a potential new investment, they group each one into,

  • “Yes”
  • “No”
  • “Too hard”

For instance, if the business is in software, they’ve historically said that’s too hard. Not because they can’t understand the business, but because it’s too tough to know what the business will look like in ten years, because how much software tends to change.

With banks, sometimes it’s too hard because, even with their enormous accounting backgrounds, they can’t figure out the numbers. That makes them suspicious.

With the “more obvious concerns,” I find it easy to throw this into the too hard pile. What would happen if there were a bigger sample size, what would happen if we used older participants, it’s too hard to project this study out to my clients, which is my benchmark for valuing a study.

At most, if a client is struggling with hypertrophy, in general, I have ammunition to say, “Let’s try more volume.” If strength is the concern, “We don’t need to think about volume as much.” But that’s already been fairly well accepted.

As far as figuring out the numbers…


Mathematics and Statistics and Humans, oh my!

The bolding of Brad’s statement was what really piqued my interest,

“The claim that the highest volume group was “simply catching up” with growth is not based on a scientific approach to the concept. It assumes that you can just look at raw data scores and make such inferences. You can’t. We actually employed a statistical measure called an ANCOVA that adjusted for pre-test means. Thus, differences at baseline were accounted for statistically and did not influence results.”

That’s a strong statement in the world of statistics.

I’m not intimately familiar with ANCOVA though. So I started digging around.


Another use of ANCOVA is to adjust for preexisting differences in nonequivalent (intact) groups. This controversial application aims at correcting for initial group differences (prior to group assignment) that exists on DV among several intact groups. In this situation, participants cannot be made equal through random assignment, so CVs are used to adjust scores and make participants more similar than without the CV. However, even with the use of covariates, there are no statistical techniques that can equate unequal groups. Furthermore, the CV may be so intimately related to the IV that removing the variance on the DV associated with the CV would remove considerable variance on the DV, rendering the results meaningless.

I then found this paper,

Misunderstanding Analysis of Covariance

Which is a deep dive at how misapplied ANCOVA is.

I still couldn’t make sense of it. For what it’s worth, I have a minor in mathematics, but my math is no doubt rusty.

I hit up three people. My brother and two former clients. Between the three of them:

  • B.S. computer science
  • B.S. physics
  • B.S. mathematics
  • B.S. mathematics
  • M.S. biostatistics
  • in-the-middle-of-his PhD

I exchanged about three thousand words with them.

None could speak absolutely about the topic, but all felt the ANCOVA use was probably alright. Certainly not out of the ordinary. All I can say is I have to lean towards their opinions, because I can’t say I’ve been able to garner a valid enough one of my own. That is, none of them swayed me to put more weight into this paper.



If after extensive effort you still can’t wrap your head around a paper’s methods section, or you’re not getting a strong opinion one way or another, you need to seriously question if you can put that paper into real world use. Now, you can say that’s incredibly arrogant. Maybe the person is too ignorant / stupid / lazy / biased to grasp it. That’s fine. I won’t argue those are possibilities with anyone, myself included.

But Warren Buffett and Charlie Munger are again a fitting mention. First, a quote, paraphrasingly,

“We don’t need to, nor is it possible, to have an opinion about every investment.”

The majority of investments they see cause them to take no action. That’s how the majority of research you read should go. If you’re changing your practice after every study you read, you’re way off. Most research is not a home run.

Second, back in 2007, they, along with people like David Einhorn (‘nother billionaire), chastised the esoteric mathematical modeling being done in finance about assessing risk. It’s a debate I would highly recommend to anyone who deals with risk in any manner. Personal trainers deal with this every training session when having to assess how risky an exercise is.

Actually, back in 1998 this debate was also going on. Some Nobel laureates, along with a group of what many would consider the highest IQ ever assembled, possibly in any endeavor, started a firm called Long Term Capital Management.

This is how they started,

Those are outrageously good returns.

But we all remember what happened in 2008 and 2009. The economy imploded. Five sigma events -“million year floods”- were happening daily. Turns out, people like Buffett and Munger were correct. Guys who largely state formulas are useless, and if you’re ever doing anything more than algebra with a pencil, you’ve gone wrong.

I have a lot of respect for math, and it can be hard for those who haven’t gotten to a certain level to grasp what higher level math is even remotely doing. However, my experience in that world, and one reason I stopped when I got to that level, is because it is damn near impossible to speak with any real certainty in applying that math to human beings. (Physics? Different ballgame.) The assumptions many mathematical and statistical models make, when applied to humans, are down right comedic. And if you don’t get the assumptions right, the math is pointless.

Sometimes high level math, or methods you don’t understand, are not a reflection of your intellect lacking. Sometimes it’s a reflection of overthinking, misleading, having excessive comfort with false precision. Taking on risk feels better when you can quantify it to the tenth decimal point, but that doesn’t mean it’s right.

Here’s what a Nobel prize and tenth decimal precision gets you in investing, which is all about understanding human behavior,

When Genius Failed: The Rise and Fall of Long-Term Capital Management<- great book!

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.