William H. Magill on Mon, 9 Jun 2003 13:01:29 -0400


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Hardware question



On Monday, June 9, 2003, at 08:07 AM, Paul wrote:
Forge wrote:
Doesn't a dual CPU system require twice the amount of RAM?
No, not at all.

Actually, the answer is - It depends on the workload.

Really? So two CPUs share the same RAM? If both are working under the same load, will they split their memory use in half? In other words, if you have 512MB of RAM, will each CPU only have 256MB to work with? (Not that that isn't a good amount of RAM.)

There are, historically, two primary schemes for "multiple" CPU utilization - tightly coupled and loosely coupled. In a loosely coupled environment, a process is assigned to a single CPU and stays there for its entire life. In a tightly coupled scenario, each INSTRUCTION can be executed on the next available CPU. Note that this is a hardware feature, and has nothing to do with threading, which is a software feature.


In FreeBSD, as used by Apple in Rhapsody days, portions of the Kernel code were always executed on one CPU. So, if you needed to do I/O, it didn't matter which CPU was executing the program, it had to wait for the I/O CPU to "free" to get the I/O done. I've been told this deficiency has been "fixed" but I don't know. (It's one of the issues in micro kernel vs monolithic kernel.)

Resources are shared based upon the compiler's capabilities and the OS' multi-threading features. Today, if you "roll your own" the optimization for your hardware setup pretty much happens automagically. If you use pre-compiled binaries - "ya pays yer money and ya takes yer chances." Some apps can be "optimized" much more than others by simple virtue of what it is that they do. Others cannot be optimized at all. And with others, it's simply not worth the effort to optimize them.

All this goes back to what made a given "super-computer" super. The reality was, each of the different so called super-computers were pretty close to each other in raw crunch power. It was only when you invoked the particular feature and optimized your program to utilize it that the "super" part of the computer was used -- some were array processors, some vector processors, etc. The nasty thing was, if you optimized your code for a vector processor, you had to re-optimize for an array processor, which normally meant extensive re-coding.

This is not unlike the concepts behind a Beowulf cluster as compared to a Digital VMS or Tru64 cluster. They are both "clusters," but they have entirely different functionalities and concepts behind them. VMS clusters aimed at robustness and zero downtime. The original VMS cluster in Maynard Mass ran for something like 30+ years without ever being shutdown for any reason. Every single piece of hardware in the cluster was replaced and the OS upgraded many times (including the transition from VAX to Alpha), but the cluster kept running, completely available to the user community 24x7x365x30.

With proprietary Unix systems additional CPUs provide a roughly linear power increase up to 4 CPUs, at which point you have about 3.5 times the processing power of a single CPU. The hardware and kernel designs, "just do it." I don't know how the Linux kernel fits into this equation today. However, beyond 4 CPUs, the numbers change radically by vendor and chip architecture. (I've seen numbers as low as 4.5X for an 8 CPU box!)

Also with proprietary Unix (at least with Tru64 Unix from Digital/Compaq/HP) it is quite simple to allocate a fixed amount of memory to individual CPUs. It's called memory partitioning. However, it is a feature associated with VLM systems - Very Large Memory - which tend to cost big bucks.

With today's hardware designs, basically the Alpha and Power4, SMP technology is now on the chip level. SPARC, PARC and IA64 would like to be able to do this but can not. If the rumors are true that IA64 circa 2005 is really Alpha Inside, then it will also. Once the SMP technology is reduced to the chip level, the associated memory and I/O management and access technologies are also of necessity transformed. NUMA - Non Uniform Memory Access - suddenly becomes a BIG issue. Grace Hopper loved to hold up a 12 inch long piece of copper wire while intoning, "This is a nanosecond." So when you have a VLM system, the time to access physically close memory is less than that necessary to access "distant" memory. The end result is that you have very different memory management problems and hence models in serious SMP systems than you find in single processor boxes.

The other thing to remember, there is a VERY big difference between the way a multi-user time-sharing system works when compared to a system dedicated to a single application. You can optimize/maximize a single application seven-ways-from-Tuesday, to really get maximum performance every time you run it on a given hardware configuration. But that takes work. With a multi-user system, you don't have consistent or predictable resources or demands with which to begin your optimizations. So you make some gross assumptions and let-er-rip.
You trade off the work necessary to fine-tune for maximum performance


High Performance Technical Computing, aka Super Computing is fun, but it is a very different animal than mail and web serving.

T.T.F.N.
William H. Magill
# Beige G3 - Rev A motherboard - 768 Meg
# Flat-panel iMac (2.1) 800MHz - Super Drive - 768 Meg
# PWS433a [Alpha 21164 Rev 7.2 (EV56)- 64 Meg]- Tru64 5.1a
magill@mcgillsociety.org
magill@acm.org
magill@mac.com


_________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.netisland.net/mailman/listinfo/plug-announce General Discussion -- http://lists.netisland.net/mailman/listinfo/plug