Post-Doc Ergo Propter Hoc: October 2015

Monday 26 October 2015

World Records in Weightlifting and Powerlifting

A few weeks ago, I looked at world records in running and swimming, to see what information I could get out of them and what I could learn about human athletic performance. In this post, I'll look at world records in weightlifting and powerlifting, and see what there is to see. Before I write this, I have a general sense of what the data will look like, I know that bigger people tend to be stronger and men tend to be stronger than women, but I don't yet know what nuances will emerge. The overarching question is: can we learn about an extremely complicated system, a human being, from simple data, like how much they can lift?

When I do this analysis, I make the assumption, as before, that the current world is close to the pinnacle of human achievement, and that slight improvements in a given record will not change the trends that much. This is not as true for powerlifting as it is for running.

The difference between weightlifting and powerlifting is that powerlifters lift more weight and weightlifting requires more power. Powerlifting involves the squat, bench press, and deadlift, and weightlifting involves the snatch and the clean and jerk, two ways of lifting a weight from the ground to overhead. Weightlifting used to involve the clean and press as well, but it was removed because people started leaning really far back to make the lift easier and it became impossible to judge.

From left to right: Squat, bench press, deadlift, clean and jerk, snatch.

Powerlifting data was taken from the website PowerliftingWatch.com, which is generally well updated. It's important to note that I focus on "raw" records, meaning they don't use supportive equipment, which can vastly increase the amount of weight lifted. I don't particularly care whether they were on drugs or not; I want to know the limits of human achievement. Weightlifting data was taken from Wikipedia, with the caveat that the records were annulled in the 1990s when the weight classes were redistributed, and not all the old ones have been beaten. In the case of the "superheavyweight" category, I use the weight of the record holder rather than of the weightclass.

The naive trend we expect is based on the square-cube law: if you make a person bigger by some factor x, their weight will increase as $x^3$ but their strength only as $x^2$, so the increase in strength with respect to weight should be roughly the two-thirds power. In an article I wrote on scaling laws, I showed that men's deadlift records did follow this prediction until the athletes start getting fattier. This ignores many biomechanical effects, for example the benefits or disadvantages of having longer limbs when performing a lift.

Let's first look at the world record data for powerlifting.

Powerlifting world records.

First, we confirm what was expected: bigger athletes are stronger, and men are stronger than women. There is a trend evident in the squat and deadlift that is the same between sexes: the record increases with weight and reaches a plateau. This plateau occurs when there is a transition weight where the athletes stop getting more muscular as they get heavier, and start getting fattier. This occurs at about 242 pounds for men and 165 pounds for women. It's interesting that this plateau trend is not really evident in the bench press data; it is roughly a smooth increase (the men have an outlier at 242 lb). Why are heavy bench pressers immune to fattiness? Perhaps the reduction in bar-travel distance due to a fattier chest helps, perhaps it's due to the existence of bench-only competitions that avoid the fatigue of squatting before benching. I don't know.

Champions in the second-heaviest and heaviest weightlifting categories.

I have often heard that women have comparatively stronger lower bodies than men. Is this supported by the data? If we average the female:male ratios across all coincident weights for the three lifts, they are roughly the same for the squat and the deadlift, about 0.71 ± 0.02, whereas the bench press is a lower 0.65 ± 0.01 .

There are two fairly extreme outliers evident in the squat data: the lightest man, Andrzej Stanaszek, is about four feet tall and has considerably different biomechanics; the heaviest woman, April Mathis, is just much better than any other female raw powerlifter. Excluding Andrzej and April, there is a downward trend present in the squat ratio data: large men get better at squatting compared to large women; I believe that this is because the adiposity transition occurs at a lower weight for women.

Squat ratios.

I tried to measure a "legginess quotient" by dividing the squat records by the bench press records, but there isn't much of a trend. The average ratio is 1.27 for women and and 1.26 for men.

When I try to compare trends between different lifts across weight classes, or between sexes, I come to the conclusion that the assumption upon which I base this analysis is violated: the "raw" powerlifting records do not serve as a proxy for the pinnacle of human performance, and there is significant person-to-person variability that skews the data. This is in part based on my choice to focus on "raw" records, which typically have fewer competitors. To smooth out the stats, I will look at the records for the raw "total," which is the sum of the three lifts. This at least will allow me to more easily compare apples to apples, because each data point is just one person. It would be good to compare all three lifts for each record total, but the data isn't in a neat little package.

Powerlifting Totals, in linear and logarithmic axes. The superimposed lines are the 2/3 power law. Also I screwed up the colours.

These trends look less scattered than each of the individual lift records. Doing my favourite thing and fitting a power law to the data, excluding the heaviest category, we get a scaling exponent of 0.58±0.05 for men and 0.7 ± 0.1 for women. The naive expectation is 0.66. Another way to look at this by dividing out by body weight, and looking at how many times their own weight they lifted. This generally decreases, for reasons I mentioned above. We also see how good April Mathis is, that she rises significantly above this decreasing trend.

How many times their body weight they lifted.

When we look at weightlifting, things get a bit clearer. There are no longer fluctuations about the trend due to small numbers or sub-maximal performance. Only the largest athletes fall below the general trend due to fattiness. As a Canadian living in the US, I will now switch from pounds to kilograms as I switch from powerlifting to weightlifting.

Weightlifting record data. The men's snatch champion is a really big dude.

Something interesting is that for weightlifting, the women tend to get stronger-er with weight compared to men. For both the clean and jerk and the snatch, men's records increase with roughly the square-root of weight, whereas for women it's the .65 and .75 power respectively, which I did not expect and can't currently explain. There is a weak downward trend in the sex ratios, but nothing to get too analytical about. Looking at the ratio of snatch to clean and jerk, there is generally no trend with respect to weight, but the ratio is 0.829 ±0.007 for men and 0.801±0.007 for women, a small but statistically significant difference. I do not know the physiological reason behind this, but it strikes me as the opposite of the powerlifting sex ratio trend.

I would say that we did not learn anything too-too interesting from looking at this data. The records are handwavingly near the prediction of the square-cube law, and the adiposity transition is quite visible when that law fails. Probably the most interesting things I learned were that the bench press sex ratio is lower than the squat and deadlift ratios, and the snatch:jerk ratio is higher for men than for women, both by a small but significant amount. Generally, I think the sport of raw powerlifting has not developed enough to make firm conclusions about a lot of these trends.

Monday 19 October 2015

Blog Article on PhysicsForums: Fun With Self-Avoiding Walks

https://www.physicsforums.com/insights/fun-self-avoiding-walks/

This is the story all about how my random walk simulations got flipped, turned upside down.

On an unrelated note, I was doing some molecular dynamics simulations and one of the polymers ended up looking like some kind of undersea explorer.

Thursday 15 October 2015

Mildly Interesting Math Finding: Areas between trivial zeros of the Riemann Zeta function.

One of the reasons I started this blog was to talk about things that I have discovered that are interesting enough to share but not interesting enough to publish. This is one of those things. It's something I found out about the Riemann Zeta function when I was bored and playing around with Maple. I apologize in advance because the LaTeX on this page looks awful.

The Riemann Zeta function, $\zeta(x)$ is the sum of every integer to the power of -x:

$\zeta(x)=\sum_{n=1}^{\infty}\frac{1}{n^x}=\frac{1}{1^x}+\frac{1}{2^x}+\frac{1}{3^x}...$

It is undefined at x=1, at x=2 it evaluates to $\pi^{2}/6$, and generally decreases over the positive numbers. When evaluated for complex numbers, using the necessary analytic continuation of the function, it occasionally equals zero. All known Zeta zeros of complex numbers have a real part of 1/2, and if there are no exceptions to this trend, it has profound implications for the distribution of prime numbers. This is known as the Riemann hypothesis, and its solution is worth a million dollars. Recently, there was a viral video about how $\zeta(-1)=1+2+3+4...=-1/12$, which isn't quite right.

What are less interesting than the non-trivial zeros are the trivial zeros: negative even numbers. Negative even numbers make the Zeta function equal zero because a re-writing of the equation for negative numbers includes a sine function with a period of two which has a phase of $\pi$ for negative integers.

One day I was plotting the Riemann Zeta function over negative numbers for some reason. It looks like this:

The less popular strip of the Riemann Zeta function.

After some initial adjustment, the area subtended by the curve between each of the zeros gets bigger and bigger. In fact, after this graph truncates to the left, it starts getting too big to reasonably show on a graph.

Out of curiosity, I calculated the integrals of the curve between each zero (we'll call them G), and looked at it. This grows extremely rapidly, blowing the exponential function out of the water and surpassing $x^x$ at around the 70th interval.

The numbers are too big for my regular graph-making program to handle.

I have a vague recollection of what I did with these numbers (I think it was 2010?), so I tried to re-create it when I thought of this exercise tonight. Let's look at the ratio of each interval to the one before it.

This is a pretty well-behaved function, and if you look at it on logarithmic axes you can squint a straight line into existence. This implies that the ratio is described by a power law, and if you do a naive power-law fit you get an exponent of like 1.97. That's almost quadratic! Performing a quadratic fit to this data gives the function:

$G(i)=0.101i^{2}+0.298i-0.036$

There is some uncertainty on the intercept term, but the index coefficients are pretty tight. The R-squared coefficient is 1, it is basically a perfect fit.

So, this tells us something:

$|\frac{\int_{2(i-2)}^{2(i-1)}\zeta(x)dx}{\int_{2(i-1)}^{2i}\zeta(x)dx}|=0.101i^{2}+0.298i-0.036$

Coupled with the initial value for i=1, $G_{1}=.011$, we can find the magnitude of each interval iteratively.
$\int_{2(i-1)}^{2i}\zeta(x)dx=(-1)^{i}0.011 \times \prod_{n=2}^{i}\left[0.101n^{2}+0.298n-0.036\right]$

Interestingly, this product has a closed form solutions in terms of gamma functions of the roots of the quadratic polynomial describing the ratio:

$\frac{{a}^{i}\Gamma \left( 1/2\,{\frac {2\,ia+b-\sqrt {{b}^{2}-4\,ac}}{a}}
\right) \Gamma \left( 1/2\,{\frac {2\,ia+b+\sqrt {{b}^{2}-4\,ac}}{a}
} \right)}{{a}^{2} \left( \Gamma \left( 1/2\,{\frac {4\,a+b-\sqrt {{b
}^{2}-4\,ac}}{a}} \right) \right) \left( \Gamma \left( 1/2\,{
\frac {4\,a+b+\sqrt {{b}^{2}-4\,ac}}{a}} \right) \right) }$

I plugged in the fitting parameters for a, b, and c, but didn't have much success predicting the interval integrals; I could get within a factor of two for some of them. This does however explain the growth of this interval sizes: its an exponential times the square of a factorial, so it should grow roughly like $x^{x}x!$.

Taking in all this information, I don't really know what this means, whether it's obvious or interesting or if it can be derived or what. It's hard to search for trivial zeros of the Riemann Zeta function because I just get stuff on non-trivial zeros. I don't know if this quadratic ratio behaviour is unique to the Riemann Zeta function, or holds for any sufficiently complex function. I'm treating these numbers as empirical phenomena, and it would be cool to know if there are first-principles behind them. This was something I came across stumbling blindly through math land, so if anyone has any insight I'd be happy to read it.

Friday 2 October 2015

An empirical look at the scaling of world-record running and swimming speeds.

Rather than physics, I'm going to talk about running, and then swimming. I have run a few races in the last year or so, but I am not as knowledgeable about sports physiology as I am about physics.

Out of curiosity, one day I looked up the records for various running races, calculated their average speeds, an plotted them versus distance. Before I attempt to analyse them, let's look at the data.

Record running speed vs. race distance. Linear-log and log-log scales. Red dots are 100 m and marathon.

A few comments about the data. It comes from Wikipedia. Distances range from the 40 yard NFL combine sprint to the 24-hour record. Different races have different standards for timing. Some have official records that are kept, and some are informal. Sometimes the fastest time for a certain distance is actually a subset of a longer race, e.g. running the fastest 30 km as part of a marathon. Somewhat implicit here is the assumption that the current world records are close to the pinnacle of human possibility. Because these are average speeds, they neglect information about speed variation within a race.

If we look at the data, we see four to five different regimes. At least, I do.

Let's look at each of these individually.

1. Acceleration Zone. 0-100 m. For the shortest races, below 100 m, the average speeds are slower because a large portion of the time is accelerating. The shorter the race, the less time is spent at top speed.

2. The Usain Bolt Golden Zone. 100-200 m This is sort of a transition zone between 1 and 3, where it's long enough to reach top speed and coast at it, but not so long that the runners start to slow down. Usain Bolt holds all of these records at roughly the same speed, his fastest average race being the 150 meters which he did at 37.6 km/h. For a long time, the 200 m record was faster than the 100 m, but Usain Bolt effectively tied them up.

3. Sprint Zone. 200-1000 m. They're trying to go as fast as they can, but it doesn't last forever. As the race gets longer, they got slower at roughly the -1/5 power of distance*. I have a feeling terms like "fast-twitch" and "anaerobic" would come into play here if the physiology were to be discussed.

4. Endurance Zone. 1.5-42 km. The races are long enough that conserving energy becomes more important than going all-out. Speed decreases with roughly the -1/13 (-0.08) power of distance. Marathon champions are slightly more than half as fast as Usain Bolt.

5. Pain Zone/Low Stats Zone. 50+ km. Few venture beyond the marathon. This is the ultimate test of human endurance, what separates us from the gazelles. Speed decreases with roughly the -1/5 power again. This may a different physiological regime than the Endurance Zone, or it could be convolved with the fact that there are much fewer people running these races so a true champion has not emerged (at the risk of stereotyping, I find it auspicious that no supermarathon record holder is Kenyan), and the races are long enough that people have to take bathroom breaks, lowering the average speed.

I mentioned low statistics. What I mean by that is that there are certain races where the record is clearly not what is humanly possible, but because comparatively fewer people run that distance, the best hasn't been achieved yet. This is responsible for the jumble in between half and full marathon speeds, and you can see an explicit example if you look at the Sprint Zone. 200, 400, 800 m and (less so) 1 km are commonly contested events, while nobody really runs 500 meters. It falls below the trend of human excellence.

500 meter glory is ripe for the taking. Also, this is a log-log plot so a straight line means a power law.

Now let's look at women's records.

Women's records compared to men's, and the ratio in speeds.

Generally, the fastest women are about 12% slower than the fastest men, with some variation. The small-sample effects are more present, especially around the marathon distance. One could try to squint out a trend in the ratio, but it is essentially constant: if you do a power law fit, it is consistent with zero, 0.002±0.003 (it's even closer to zero if you take out ultramarathons). Both the 100 m and the marathon have the same ratio, about 9% faster for men. The best and worst are both ultramarathons (6% and 20 %), indicating sample-size effects. My interpretation of this information is that whatever physiological differences separate sprinters and distance runners, they do not differ between men and women.

However, things get different when we look at swimming records. Pool events range from 50 to 1600 meters, and there are longer outdoor events, which are harder to compare because the I imagine environment conditions start to make a big difference. Looking at the data, it looks similar to running: longer is slower. It gets interesting when we compare men and women.

The two longest races are outdoors and are suspect. The first six are pool races.

As you can see, the relative advantage men have over women decreases as the race gets longer. This is pretty interesting, because it's something that's found in swimming but not running. What is the physiological reason for this? I don't know, but if I had to hypothesize, I'd say that in longer races, more energy is spent on maintaining buoyancy compared to sprints, and women are naturally more buoyant than men, and in longer races this starts to matter and tires men out comparatively more.

Now, let's get silly. My squint-analysis tells me that there is a sprint zone below 400 m. For men the scaling exponent is -.13, while for women it's -.11 (being less negative is consistent with the previous paragraph). For the endurance zone, it's -0.526 for men and -.485 for women, with an error of about .003. These numbers are about two-thirds their running counterparts, and I can't explain why, but I'm sure it's interesting. The scaling exponent of the ratio is about -.008 for all the data or -0.014 for only the pool records. If that seems ridiculously small, look at the y-axis of that ratio graph. These numbers are really small, but they are consistently nonzero; the error is roughly .001. With this information, I draw a natural conclusion: women will overtake men in a billion kilometer race.

Extrapolation is never wrong and always justified.

I am not the only person who has had these ideas: a lot of it was discussed in this paper which I didn't consult until writing everything above. They analyse sexual dimorphism in a lot of different sports, and find jumping (especially pole vault) is the most dimorphic...sexually. They also discuss these trends for running, swimming, and speed skating. They also did experiments with pigeons over hundreds of kilometers and found no evidence of sexual dimorphism.

In looking at all this information, I am basically doing what I would do with a set of experimental data where I don't really understand the underlying mechanisms (in this case, how people work). I look at plots, try to split the data into different regions, and look for trends within each region, and then try to figure out what underlying phenomena lead to those trends. Sometimes you can learn stuff this way.

If any physiologists or kinesiologists are reading this and have some ideas about the trends I've discussed, please share them.

*What this means is that in a race 32 times as long as another, the speed would be half. I would recommend reading the intro to my article on animal speeds for an overview of scaling analysis. If you want.