The futility of averages in exercise science research (new info on Schoenfeld’s new study)

Posted on September 17, 2018

(Last Updated On: September 17, 2018)

I recently discussed Brad Schoenfeld’s latest study.

I went over the small sample size issue. I would have liked to go more in depth on that, but the paper did not include the results for each individual.

Luckily, James Krieger, a coauthor, has published that data.

Much like Brad, I’m a big fan of James’. I’ve been following him on and off for ten years. His nutrition writing in particular has been phenomenal. There were times in college I read him more than my exercise physiology books, and was only better off because of it…

But his rationale for why this data wasn’t included in the final publication is seriously lacking, passing the blame to “reviewers” and you have to “pick your battles.” In the least, put a link to an appendix or something. It’s one image. That last alcohol study I reviewed had two such links to thousands of pages.

If exercise science researchers are going to insist on doing these studies with like 35 or less people, you have to publish these kind of plots, as they explicitly illustrate the fatal variability of small sample sizes. Not doing so implies the data was less varied than it likely was.

-> No, posting standard deviations is not enough.

I’ve talked about this when it comes to eating response after exercise:

What’s the deal with exercise and your appetite?

After a certain kind of workout, basically 50% of people eat more while the other eat less. Sooo, how do you make a recommendation, at the individual level, with that knowledge? Do you have any better advice than “Play with it”?

-> This is why some, legitimately, say “Exercise makes me fat.” While others the weight starts falling right off them.

Let’s look at the muscle size data, and why an argument can be made studies like this are borderline uninterpretable for practitioners wanting to use the data “in the field.”

As a reminder, the study found a dose response relationship for exercise volume and muscle size.

But look at tricep thickness.

Four of the participants in the one set group, damn near 50% of the visible participants, they did BETTER than five participants in the 3 sets group:

And six did better than two in the 5 set group:

Again, this isn’t like one out of a thousand participants. Even with one set vs five, we’re talking ~66% of the one set group doing better than ~20% of five sets. Because we in exercise science refuse to use larger sample sizes, we have to pause and analyze what happens to two people.

To be clear, those two people in the five sets group LOST muscle.

Someone might say “Well, at least you can deduce 5 sets, or more volume, has a better chance of gaining muscle than 1 set.” Are you positive about that? Devil’s advocate: Look at the above again. What about the fact the people in the 3 sets group who lost muscle, had a greater average loss than 1 set? Are we now in the realm of risk / reward analysis? Do we need to assess the triceps differently because of this?

You can pick and choose example after example.

Furthermore, notice the variability within a group. 

For five sets, in bicep thickness, two people gained ~6mm, nobody else gained more than 4, most gained 3 or less. 

Aesthetically (I’m aware this is not the true statistical usage), those two top people in the 5 sets group sure look like outliers. Should we include or exclude outliers? When it comes to this topic, are they anomalies, or are they highly genetically suited for high volume training?

Triceps. One gained ~6.5mm; another lost over 3mm!

Look, I know we’re talking millimeter differences here, but +6 mm vs -3 mm is a 9 mm swing. In fact, just consider the difference between the person who gained 1mm and the person who gained 6mm. That’s a 6x difference. 500% difference. Seriously. five HUNDRED percent.

This is why assessing and modeling humans can be so impossible. Small sample sizes => Impossible * infinity.

I’m a personal trainer. What am I supposed to do with this information? How do I know whether I have a person where 3 sets will make them gain or lose muscle? Or if one set is better than three? This study does not remotely help me in that regard. All it does is, maybe, help me speak in extremely broad generalities. That’s fine if I’m talking about a large group, but when I’m directly speaking to an individual?

This study used trained people i.e. been resistance training for over a year. If I have a trained client, can I automatically tell them more volume will help them get bigger? This is what the conclusion of the study says,

“we show that increases in muscle hypertrophy follow a dose-response relationship, with increasingly greater gains achieved with higher training volumes. Thus, those seeking to maximize muscular growth need to allot a greater amount of weekly time to achieve this goal”

Do you think that’s a fair conclusion now that you’ve seen some people might get smaller if they do more volume?  Because that sure sounds like painting a brush amongst anybody interested in hypertrophy.

If you have a new client and they tell you muscle size is their goal, do you immediately jump to 5 sets, or do you start at 1? What’s the injury, adherence and burnout risk of a workout with 5x as many sets?

The study was done on college aged males. The conclusion says “thus, those seeking…” Do you think it’s fair to take a 40 year old male, with two kids, a mortgage, 9-5 job, and tell him he needs to do 45 sets a week for his quads –to failure- to maximize hypertrophy? Because he’s one of “those seeking…” And, if your recommendation is that, does your recommendation have to be “do 45 sets and also have a college lifestyle”? Oh, “and take some steroids so your testosterone levels are what they were in your early 20s.”

And “eat a certain level of protein.”

And “have a trainer next to you watching your technique and keeping you accountable.”

Someone is going to say that’s too pedantic. Too onerous. But that’s science. Generalizing too much, reading too much into sample sizes of 30 (split into three groups!), gets me clients pushing 70 years old, half jokingly, asking me why we’re not only doing 13 minute workouts for strength? Linking me to a NYTimes article about this study.

-> The 13 minute strength finding from this study hasn’t been as talked about. All the powerlifters and olympic lifters in the world must feel bad they’ve been lifting more than 13 minutes all these years.

A much more sensible conclusion is,

“Those looking to maximize hypertrophy should start with low volume and increase as results dictate.”

Something I do not like about many of the evidence driven writers in the exercise science world is how often they fall back on what is and isn’t, in their often bizarre rationalizations, scientific.

-> This study used self-reported food intake to monitor the subjects’ eating. That’s not falsifiable, which is not scientific.

Why are we so confused about how and what to eat? <- an enormous problem in nutrition research

Denouncing anyone who does not speak in scientific terms, despite the fact many researchers’ conclusions are not scientific themselves. I routinely see, and saw it with this study, responses to the criticisms of “the critics are unscientific.” They’re arbitrarily picking a level of scientificness critics need to attain to be valid. Meanwhile, go ask some people in physics how scientific they think studies like this are.

-> In human research, we’re happy if we get to a p value of 0.05. One out of twenty, which is what this study used (and which was rarely attained). In physics, they shoot for one out of 3.5 million.

Or let me put it this way: If you’re a person struggling with hypertrophy, the bane of the evidence driven person’s life is for you to go on an internet forum and ask around for advice. For you to seek the endlessly cited confirmation and availability biases.

You’ll hear some tell you “Do more volume. Look at bodybuilders.” You’ll hear others say “Do less. That worked for me.” The person concludes,

“For a fair amount of people, more volume worked, but there’s definitely enough others telling me less might work too. Based on my experience and circumstances, I think I’ll try X and go from there.”

So the person goes off and experiments. Does a study like this do any more than suggest the same? If it doesn’t, how scientific is it? (How scientific, research wise, are humans able of being?) If we’re left to the same devices as anecdote, what’s the value or utility in continuing this kind of research? Are we adding clarity, or confusion?


Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.