Thursday, November 22, 2012
Supercomputers face growing resilience problems
The problem with checkpointing, according to Fiala, is that as the
number of nodes grows, the amount of system overhead needed to do
checkpointing grows as well -- and grows at an exponential rate. On a
100,000-node supercomputer, for example, only about 35 percent of the activity will be involved in conducting work. The rest will be taken up
by checkpointing and -- should a system fail -- recovery operations,
Fiala estimated.
___________________________
99% BAD HARDWARE WEEK: Odyssey 2010, Hal says I am perfect well. Well, the problem is not only in fault tolerance but mainly in communication bottlenecks and overheating.Thus only of 3% processing power is sustained the rest is peak computing power !. Modern supercomputers have for their internal use more fiber connections the the whole human world !.
___________________________
99% BAD HARDWARE WEEK: Odyssey 2010, Hal says I am perfect well. Well, the problem is not only in fault tolerance but mainly in communication bottlenecks and overheating.Thus only of 3% processing power is sustained the rest is peak computing power !. Modern supercomputers have for their internal use more fiber connections the the whole human world !.