In his wish list for the next version of Mac OS X, published on this site, Will Robertson speculated about the future of the Digital Hub. The fulcrum of his musing was a hope for the consumerization of the technologies behind two obscure Apple products: Xgrid and Xsan. Neither is actually shipping yet, but the second preview release of Xgrid is available for the intrepid and curious to download. Xsan has been announced as under development, has a page in the Apple Server Solutions wing of apple.com, and will be available "later this fall." Most readers won't know what they do, and for good reason: Xgrid and Xsan aren't meant for you. They are solutions to problems consumers never had. Will acknowledges this, but wonders whether Apple might not market some form of their capabilities to the consumer. He envisions a new level of automatic integration in the home network, where computers can pool their processing power, and an iServe seamlessly stores and retrieves data using an iSan.
This future is too rosy. Not that Will's spectacles are unduly tinted; Xsan and Xgrid are significant offerings from Apple. Grid computing and SAN are needful technologies for certain enterprise and research endeavors, but they are of little use in the home or office.
To explain why, we need to explore the nature of the problems solved by grid computing and by Storage Area Networks. I'll start with Xgrid.
What grids do:
Will's article described Xgrid as "a program that builds a cluster of network Macs that share CPU power." This is loosely true, but the statement requires some qualification. A computer in a cluster does not share its processing power with other computers, not so that, say, eight 2GHz processors could do the work of a 16GHz CPU. No single job can be shared by more than one CPU. Rather, each computer makes its processor available to the others in case someone has an extra job to throw its way. Imagine a mother with a long To Do list and a passel of children to order around. Ironing is a one-person job; so are cleaning out the oven, mowing the lawn, and changing the baby's diaper (my infant son tries to make this a two-person job). None of these tasks can be shared by two children, but they can be assigned individually. Here we have the first limitation of grids:
Limitation: Distributed computing doesn't help with single, indivisible tasks.
Imagine a further complication: what if one of the items on the list is "Buy detergent," another is "Do laundry," and a third is "Iron Dad's shirts"? Assigning these jobs to different children won't save any time, because the jobs must be performed in sequence. Mom might as well make one child do them all in order. Task distribution is simply not useful in this situation. Distributed computing suffers from the same limitation: sequentially dependent tasks must be performed in order, meaning a single processor will perform just as well as a thousand.
Limitation: The tasks you wish to distribute must be parallelizable. That is, the computations must be divisible into jobs that do not depend on the others' results.
Another wrinkle. Mom's list also contains the following items:
- Unlock the front door (Dad forgot his keys)
- Send postcard to Granny at her new apartment
- Change the thermostat to 72 degrees
- Rearrange knick-knacks the way I like them
In each of these cases, Mom could have a kid perform the task, but that might take longer than actually doing it herself. Unlocking the door and changing the thermostat are trivial tasks sooner done than said. Odds are the kids won't know Granny's new address, so Mom has to spell it out. And rearranging anything to a mother's specification is always best done by the mother.
Limitation: Distributed computing is not useful when it takes longer to specify a job than to do the job.
One last example. The last item on Mom's list is "Bake three layer cake." Layers are parallelizable, right? What happens to the project when Mom assigns one layer to each of three children? Assuming they have enough ingredients and utensils to work in parallel, when does each layer go in the single oven? If all three layers are different sizes, how does she ensure they are ready to be assembled in the right order?
Limitation: When the timing of parallel jobs is important, distributing the jobs to a cluster may not be prudent. Corollary: Real-time media applications do not lend themselves well to distributed computation; you cannot trust that all jobs will be performed in synchronization.
How this affects you:
What consumer software meets these qualifications? GarageBand regularly performs some calculations in parallel (think tracks), but synchronization between parallel jobs cannot be dispensed with. Applications that require real-time responsiveness (think Final Cut or video games) are similarly limited. With a few major exceptions, the work done by home and business applications or by the operating system is done in bits and pieces that take longer to distribute than to complete locally. Moreover, most work done with computers is serially dependent on the input of the user; when you are not telling it new things to do, the computer is not doing much of anything.
This last suggests a simple rule of thumb: Grid computing addresses the problem domain of tasks that require the user to run a command, go away for a LONG while, and then come back to sort through the results. In the real world, this has mostly meant scientific research involving pattern recognition (SETI@home, DNA sequencing, fingerprint or handwriting analysis), modelling and simulating complex physical phenomena, or image rendering of 3D animations (the digital effects shop that handled The Lord of the Rings films, for example, used hundreds of Linux workstations to render the extensive computer-animated sequences on schedule). When the development and use of grid software becomes less onerous, more mundane ends may fit themselves to its means.
The good news about grid technology is that some of Apple's flagship software could potentially benefit; Video rendering, media compression, and batch processing are the exceptions alluded to above. Batch processing jobs are embarassingly parallel and do not depend on synchronization of results; as long as the tasks are sufficiently complex, distributed computation makes sense. Consider a real world example: applying an effect to a large number of images can take a very long time, even if your image editor provides a batch processing feature. If you want to apply the same effect to all images, you can assign a portion of the images to each CPU available in your cluster. The more workstations available, the fewer images each has to process. Thus Apple could, at least in theory, modify iPhoto to distribute parallel computations.
Grids are also useful in data compression. Good compression algorithms require extensive calculations, which is why it takes iTunes longer to encode CD audio to mp3 or AAC than it does to import the audio as an uncompressed AIFF. iDVD and DVDStudioPro similarly compress video data as a part of the encoding process that prepares an image for DVD burning. Note, though, that some compression algorithms are more difficult than others to break into portions for distribution. Video rendering, contrariwise, decomposes relatively easily into distributable tasks; you can hand out each frame to a different CPU, or further split every frame into sections. iMovie is a possible beneficiary.
What Xgrid does that others don't:
Distributed computing is not new; solutions have been available on most platforms for years. Combining ease-of-use with flexibility, though, is a rare priority. Grid software has tended to be either/or: we have complex tools to fully specify all conceivable configurations, and we have screensavers that draw pretty pictures while doing mysterious things for far away scientists. Middle ground is hard to find, and for good reason. Conscripting a legion from crumbs of spare processor time demands careful resource management, and so does marshalling the now dispersified* results. Another concern is the potential misuse of the donated processor time; proper security always complicates things.
With Xgrid, it looks as though Apple has made that characteristic leap into the mess which usually ends up with Apple defining (and owning) the middle ground**. The idea is that all tedious manipulations should be automated and hidden, so the user can start a job rolling using a pretty interface without having to know much about the churning of infernal engines below. Xgrid can't automatically gridify existing software, so developers still have to do a fair amount of work to patch into the distributive goodness. But because Xgrid offers a plug-in architecture, any developer with a glut of parallelizable processes can write an Xgrid plug-in to distribute them. Developers can concentrate on the unique features of their own products, because Xgrid takes care of the work common to all distributed computing solutions. Rendezvous discovery rounds up willing participants. Client configuration happens in a System Preferences pane. Security features are included, and are on by default. If you want to do your deeds in the GUI, you can, but if you prefer the flexibility of the command line, Xgrid works that way, too. Xgrid represents an elegant and extensible solution to problems that are very important--but only to those who happen to be afflicted.
-------
In the second part of this article, I'll discuss the problems addressed by storage area networks and by Apple's SAN products, current and future.
In Part Three, having declared Apple's enterprise lineup to be of no use to the consumer, I will make a case for their significance nonetheless to the future of Apple Computer.
-------
* Yes, I made this word up. It had to be done.
** Some aboriginal solutions have already trod this ground (cf. Pooch at the link below), but now the colonists have arrived, and the aborigines may end up on a reservation.
References:
http://developer.apple.com/hardware/hpc/xgrid_intro.html
http://a1664.g.akamai.net/7/1664/51/fba6e53e44ade6/
www.apple.com/acg/xgrid/pdf/xgridguide.pdf
http://unu.novajo.ca/simple/archives/000024.html
http://www.daugerresearch.com/pooch/whatis.html
-------
hey folks, click here to shop Amazon and support us...