-1

My understanding is that Intel's quoted "max turbo boost" speed is only guaranteed when a single core is active, and that when all cores are active, the clock speed could conceivably fall to the base clock speed (which is often much lower).

I'm wondering how this actually looks in practice for a modern server Intel CPU. I ran some tests on AWS and I can't detect any clock speed slowdown even up to n=32 active processes, which is surprising and contradicts online references (links at bottom).

I'm on an AWS EC2 box with 32 core Xeon Platinum 8375C, which advertises 3.5 GHz max turbo boost and 2.9 GHz base speed.

When I run stress --cpu 32 --timeout 20 and measure clock speed via watch -n 1 "cat /proc/cpuinfo | grep 'MHz'", I see speeds extremely close to 3.5 GHz regardless of how many active cores I use.

This contradicts all the analysis I can find online - each of the links below has a table showing all core frequency significantly less than the max turbo frequency. The general intuition is that turbo speed will decrease as active cores increases. What's going on here? Does AWS have unusually good cooling?

From a practical perspective, when selecting a server CPU should one pay any attention at all to the base frequency or simply optimize for the max turbo boost frequency, even for highly parallel workloads?

1
  • I read somewhere that AWS has custom processors from Intel. Maybe they've done something like use very good cooling so it can work at max speed all the time.
    – Tim
    Commented May 25 at 19:52

1 Answer 1

1

No, clock is not as useful for capacity planning as it would seem. Max boost is dynamic given thermal conditions, supposedly. And CPU instructions done is not strictly a function of how fast the clock. Definitely CPU is not the only factor in overall system performance.

stress is unlikely to represent your actual workload. Most types of useful work do I/O and so is not instruction speed limited.

Also test something approximating your actual workload, doing web requests or database queries or processing jobs or whatever. Very likely you have an instructions per clock less than one, a symptom of memory as a bottleneck, more cycles won't necessarily help things.

Experiment with various sizes of compute instances, optimizing for real work done, and right-sizing. Do capacity planning based on generic GB memory to CPU ratios.

After finding an instance size you like, and if not satisfied, optimize further:

  • Profile applications to see what is on CPU.
  • Find storage bottlenecks where the block device is the limiting factor.
  • Check if other CPU platforms are an option on your cloud of choice, to try something completely different.

Bad analogy, vehicle edition. Max boost CPU clock is like the maximum RPM of an engine before it sustains damage. Good to know, and it might be able to hold that for some time in good conditions. But in practice, a freight train's high level metrics are cargo per unit fuel and on time statistics. Beyond just a locomotive's engine, also relevant to the overall system performance is how quick to process cars through the rail yard. Vaguely analogous to how I/O is slow in computers.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .