Quantum Bayesian Networks

June 24, 2010

ExaFLOPS Computing, Is it a Foolhardy Pursuit Headed for a Painful Belly-FLOPS?

Filed under: Uncategorized — rrtucci @ 7:48 pm

HPC industry

Previously (here and here), I discussed the present state of supercomputing (a.k.a. High Performance Computing, or HPC), and its relevance to quantum computing. Let me now say a few words about the future plans of the HPC community. They are attempting to build, hopefully before the end of this decade, machines that use multicore processors to achieve ExaFLOPS (1e18 floating point operations per second). Current technology can only achieve a mere ~1e15 FLOPS.

IBM, Cray, Intel, and HP are each pouring millions of dollars into R&D of exascale computing. Unfortunately, those same companies have invested approximately zero in quantum computing. They haven’t even funded a no-brainer like an X-prize for quantum computing. The sad truth is that about 99.9% of current research into quantum computing in the USA is being funded by military and spy agencies.

It’s fine and natural for those companies to try to extend what they already know—multicores. What I find troubling is that they are not investing in quantum computing at the same time. This despite the fact that commercially viable exascale computing will be extremely difficult to achieve—as difficult or even more so than quantum computing, IMHO.

Failure in achieving commercially viable exascale computing is a very real possibility. Dreams of parallel computing have flopped monumentally before; for example, with Japan’s Fifth Generation computer project. Even Linus Torvalds, whose common sense is legendary, has expressed some reservations about exascale computing using multicore processors. Even more damning is the 2008 Kogge report, a report commissioned by DARPA and authored by a slew of supercomputing luminaries, which concluded that power consumption, especially the power used to “move data around”, will make multicore exascale computers impractical. Is “deep computing” in deep trouble?

Some people might say, Oh well, even if we fail to achieve commercially viable exascale computing, our efforts in that direction will still produce numerous spinoffs. Well, I think quantum computing has the potential to produce even more spinoffs than exascale computing. Why? Let me put it this way. Exascale computing is like trying to reach for some extremely high fruit in the tree of multicore processors, because all the lowest lying fruit have already been picked from that tree. Quantum computing, on the other hand, is like picking low lying fruit from a tree that has never been picked before.

Here we have quantum computing, most likely a greener (more energy efficient) technology than multicore processors, and one which, if successful, could produce thousands of new jobs for the USA. But are those companies funding it? Nah. Instead, those geriatric companies are trying to build a horse buggy 10 times the usual length, even when everyone tells them that maybe they should also look into internal combustion engines.

It appears that QCs can perform only certain special tasks better than classical computers. Hence, QCs will probably never totally replace classical computers, just complement them. Therefore, I believe in the need for research into exascale computing. And I sincerely hope that the HPC community will succeed in their quest for exascale computing. But let us be perfectly frank. According to many supercomputer experts, there are many, many reasons why (commercially viable) exascale computing may fail. Oh, let me count the ways.

A Toaster by any other name is still a …
Today’s fastest supercomputers have about 2e5 cores. At 10 watts/core, that means they consume 2e6 Watts, as much power as a whole town. Using the same technology with 1e8 cores would consume 1e9 Watts, the power produced by a medium-sized nuclear power plant. This is clearly indefensible. Hence, to achieve commercially viable exascale computing, the current technology must be made dramatically more energy efficient. The exascale computer advocates have set themselves a limit of 1e7 Watts per installation. I personally believe that 1e7 Watts is still too high. Even the current 2e6 Watts per installation is already too high, in my opinion.

Making supercomputers more energy efficient will require revolutions in chip design, interconnections and cooling. Furthermore, “The currently available main memories (DRAM) and disk drives (HDD)… consume way too much power. New technologies are needed.” there too.

And forget about obtaining any dramatic boosts in performance by further miniaturizing chip components. We’ve pretty much reached the end of that road, unless you want to reduce transistors to a few atoms, and then you might as well be doing quantum computing.

Programming Nightmare
Current top500 supercomputers have ~1e5 cores. The planned exascale computers will have 1e8 cores. With that many cores, one faces the prospect of cores failing constantly. The software running exascale computers will have to constantly detect and correct for hardware failures, or else the computer will only run for short periods of time of random length. So radically new “autonomous” software will be required. (Deja vu? QCs may also require error correction! But note that with QCs, one doesn’t correct for hardware failures. One corrects instead for noise induced errors, which is preferable if you want your hardware to last longer.)

Writing and debugging parallel programs is already notoriously difficult. Typically, one has to break the program into as many semi-autonomous tasks as possible, and design by hand how all those tasks are going to communicate with each other. Then one has to worry about the possibility that the computer will hang due to network collisions. These problems will be exacerbated by increasing the number of cores from 1e5 to 1e8. (Deja vu? QCs are also hard to program)

A Prestige-Race. Price Tag Puts It Out of Reach For Almost Everyone
Just one of today’s fastest supercomputers (Jaguar, Nebulae, Roadrunner) costs more than 100 million dollars (and that doesn’t include the additional costs of infrastructure, maintenance, staff and electric bills). One can expect exaFLOPS computers to cost at least that much. Who is going to be able to afford them, except maybe China? European countries and Japan are currently instituting austerity measures. The USA is reeling after 2 wars, the financial disaster of 2008 with the ensuing bailouts and stimulus packages, record unemployment, 48 out of 50 states that can’t balance their budgets, and the Gulf of Mexico Oil Spill. Few private companies or universities can afford to spend 100 million on just one computer, especially considering the fact that most computers quickly become obsolete. Volunteer distributed computing like BOINC shields the server managers from this huge price tag. But with current technology, an exaFLOPS BOINC will expect its volunteers to consume 1e9 Watts, which is kind of unconscionable. As I already pointed out here, even now, high performance computing is a race in which only rich countries can participate. India, South America, and Africa are too poor to compete, and even mighty Japan has been falling behind in this race for at least a decade.

Can Only Solve Narrow Class of Problems
Exascale computer software will have to distribute workload approximately evenly among 1e8 cores, or else the computer’s advantage will be nullified. There is only a narrow class of practical problems for which this is feasible. (Deja vu? Quantum computers also excel at solving only a narrow class of problems).

Wladawsky-Berger says in Ref.1,
“One of their most compelling conclusions is that with exascale computing, we are reaching a tipping point in predictive science, an area with potentially major implications across a whole range of new, massively complex problems.”

Hey! The same is true about quantum computing. Quantum computerists also plan to do MCMC (Markov Chain Monte Carlo) to make predictions based on probabilities. Using statistical techniques when faced with large data sets is a no-brainer. But note that mathematical complexity theory tells us that QCs can do MCMC much faster (in a time-complexity sense) than classical computers (and that means faster than a zillion cores, or a biological computer, which is a classical computer too).


  1. Opinion – Challenges to exascale computing
    by Irving Wladawsky-Berger (retired from IBM), ISGTW (International Science Grid This Week), April 7, 2010
  2. Future exaflop supercomputers will consume too much power without new software by BY ANNE-MARIE CORLEY, Ieee Spectrum, JUNE 2010
  3. Exascale supercomputers: Can’t get there from here? by Sam Moore, in Blogs/Tech talk/Ieee Spectrum, OCTOBER 07, 2008
  4. Why the computer is doomed, by Omar El Akkad, Saturday’s Globe and Mail, Jan. 29, 2010
  5. Exascale Computing Requires Chips, Power and Money,by Alexis Madrigal, Wired Feb 22, 2008

June 22, 2010

Top500 Supercomputers

Filed under: Uncategorized — rrtucci @ 2:25 am

Warning: This article deals with boring computer statistics that only a computer nerd could love. So if you are a normal person, you better skip this one.

As some devil like Machiavelli (author of “Il Principe”) or SunTzu (author of “The Art of War“) would advise, we quantum computerists should study our competition if we want to increase our chances of beating it someday . The main competition for quantum computers are parallel supercomputers. It looks like QCs won’t outperform supercomputers at most tasks, but they may outperform them, and by a very wide margin, at some special tasks, such as MCMC and simulation of quantum systems.

One way to start learning about the weaknesses of our supercomputer brethren/competitors is to look at the excellent compendium of supercomputer stats published biannually by top500.org. Their latest list is for June 2010 and can be viewed here in html format. (A much more detailed list in Excel format can be obtained here. The Excel one is what I used to generate Figs.2,3,4).



The news flash for this semester is that China is nipping at the heels of the USA. Nebulae (Rmax=1.27e15 FLOPS), owned by China, is currently the second fastest (of those that submitted stats) supercomputer in the world, after Jaguar (Rmax=1.75e15 FLOPS), owned by the USA. Fig.1, taken from the top500.org website, shows how various countries are faring in the supercomputing sweepstakes. Sadly, India, Africa and South America are nowhere to be seen, and even mighty Japan has been falling behind for more than a decade.

Given such nice data (and already in Excel format to boot!), as a nerdy scientist that I am, I couldn’t resist making some plots (so easy to do with Excel or R once the data has been entered into a spreadsheet!). So patient reader, if you are still with me, and you are a weirdo like me, you might enjoy the following 3 beautiful plots that I generated:

    Fig.2 A scatterplot of Speed = Rmax (FLOPS) versus Number of Cores
    Fig.3 A scatterplot of Power (Watts) versus Number of Cores
    Fig.4 A scatterplot of Power/Speed (Joules per Operation) versus Number of Cores

The moral from Fig.2 is that, as one would expect, there is much variation depending on technology, but for our best technology, Speed scales linearly with Number of Cores, at a rate of about 2e14 FLOPS/(1e5 cores) ~ 2e9 FLOPS/core

In Fig.3, we again see variation due to technology, but for our best technology, Power Consumption scales linearly with Number of Cores, at a rate of about 1e6 Watts/(1e5 cores) ~ 10 Watts/core

Since both Power and Speed scale linearly with Number of Cores, Power/Speed = Energy Per Flop is approximately constant as a function of the Number of Cores. Fig.4 confirms this. For our best technology, we get from Fig.4 about 3e-9 Watts/FLOPS = 3e-9 Joules/Floating Point Op

For comparison, Boltzmann’s constant k = 1.4e-23 Joules/Kelvin. At room temperature 25 °C (77 °F, 298 K) 1kT is equivalent to 4.1e−21 Joules

And what would be the analogous metrics for QCs? Too early to tell, but let me speculate. For QCs, instead of number of cores, we can use number of qubits. And instead of number of floating point ops, we can count the number of CNOTs (see footnote). I expect that Power Consumption and CNOTs/sec will both scale linearly with number of qubits, and therefore alpha= Joules/CNOT will be independent of number of qubits. alpha will vary depending on which QC hardware type (ion traps, Josephson junctions, etc.) we use. Of course, alpha=Joules/Flop for classical computers and alpha=Joules/CNOT for QCs will differ. The great advantage of QCs over classical computers is that for certain fixed tasks like MCMC, the number of CNOTs required by a QC will go as the square root of the number of floating point operations required by a classical supercomputer.
You can always express your computation as a SEO (Sequence of Elementary Operations), where the elementary operations consists of 1-body operations like qubit rotations and 2-body operations like CNOTs. 2-qubit operations take much longer to perform than 1-qubit operations, so neglect the 1-qubit operations and count only CNOTs. Even if 1-qubit operations are not negligibly short, the number of 1-qubit and 2-qubit operations are roughly proportional, so counting only 2-body operations is good enough as a metric of speed.

June 19, 2010

Ultra-Secret NSA Computer Farms

Filed under: Uncategorized — rrtucci @ 8:18 am

I’ve previously touched upon the future use of quantum computers by guvernmental snoops. See, for example, the following blog posts:

In those blog posts I argued that spy agencies will want to use QCs to do MCMC (Markov Chain Monte Carlo). They will not want QCs for doing quantum cryptography, which is snakeoil. Nor will they want QCs for breaking RSA codes by means of Shor’s algorithm. Shor’s algorithm is wonderful, but I doubt spy agencies will have many opportunities to use it. It’s quite likely that by the time quantum computers are available, the world will no longer be using RSA, having replaced it by post-quantum cryptography.

In a future blog post, I will argue that QCs will consume much less energy than classical computers to perform certain calculations. If this is true, then another reason for spy agencies to want QCs is to save money on their electric bill. I’m not joking. This is a really important consideration for them. Read on to see why.

Check out this excellent review article:

Who’s in Big Brother’s Database? by James Bamford, The New York Review of Books, Nov 5, 2009,

of the fascinating non-fiction book:

The Secret Sentry: The Untold History of the National Security Agency“, by Matthew M. Aid.

Excerpts from the review article:

At a million square feet, the mammoth $2 billion structure will be one-third larger than the US Capitol and will use the same amount of energy as every house in Salt Lake City combined.

It’s being built by the ultra-secret National Security Agency …to house trillions of phone calls, e-mail messages, and data trails: Web searches, parking receipts, bookstore visits, and other digital “pocket litter.”

Lacking adequate space and power at its city-sized Fort Meade, Maryland, headquarters, the NSA is also completing work on another data archive, this one in San Antonio, Texas, which will be nearly the size of the Alamodome.

Just how much information will be stored in these windowless cybertemples?

As the sensors associated with the various surveillance missions improve,” says the report, referring to a variety of technical collection methods, “the data volumes are increasing with a projection that sensor data volume could potentially increase to the level of Yottabytes (10^24 Bytes) by 2015.”

Once vacuumed up and stored in these near-infinite “libraries,” the data are then analyzed by powerful infoweapons, supercomputers running complex algorithmic programs..

But even third-world cryptography can be daunting. During the entire war in Vietnam, writes Aid, the agency was never able to break the high-level encryption systems of either the North Vietnamese or the Vietcong.

In the 1950s, as over 100,000 heavily armed North Korean troops surged across the 38th parallel into South Korea, the codebreakers were among the last to know”…”the Armed Forces Security Agency (AFSA), the NSA’s predecessor, didn’t even have a Korean-language dictionary.

Another major surprise came in the 1960s when the Soviet Union was able to move large numbers of personnel, large amounts of equipment, and many ballistic missiles to Cuba without the NSA hearing a peep.

More recently, the NSA was unaware of India’s impending nuclear test in 1998, the 1993 attack on the World Trade Center, the attack on the USS Cole in 2000, and the 1998 bombing of two of America’s East African embassies. The agency first learned of the September 11 attacks on $300 television sets tuned to CNN, not its billion-dollar eavesdropping satellites tuned to al-Qaeda.

Where does all this leave us? Aid concludes that the biggest problem facing the agency is not the fact that it’s drowning in untranslated, indecipherable, and mostly unusable data, problems that the troubled new modernization plan, Turbulence, is supposed to eventually fix. “These problems may, in fact, be the tip of the iceberg,” he writes. Instead, what the agency needs most, Aid says, is more power. But the type of power to which he is referring is the kind that comes from electrical substations, not statutes. “As strange as it may sound,” he writes, “one of the most urgent problems facing NSA is a severe shortage of electrical power.” With supercomputers measured by the acre and estimated $70 million annual electricity bills for its headquarters, the agency has begun browning out, which is the reason for locating its new data centers in Utah and Texas. And as it pleads for more money to construct newer and bigger power generators, Aid notes, Congress is balking.

The issue is critical because at the NSA, electrical power is political power. In its top-secret world, the coin of the realm is the kilowatt. More electrical power ensures bigger data centers.
Rather than give the NSA more money for more power—electrical and political—some have instead suggested just pulling the plug. “NSA can point to things they have obtained that have been useful,” Aid quotes former senior State Department official Herbert Levin, a longtime customer of the agency, “but whether they’re worth the billions that are spent, is a genuine question in my mind.”

June 12, 2010

Best Heavy Duty Quantum Computer Simulators

Filed under: Uncategorized — rrtucci @ 6:39 am

Simulating the behavior of a quantum computer on a classical computer complements QC theory in several important ways:

  • Theoretical analyses of physical systems often contain approximations for which error bounds are unknown or hard to calculate accurately. A numerical simulation can avoid such approximations.
  • Computer programs help us to spot mistakes/omissions in theoretical proofs, algorithms and subtly incorrect preconceptions.
  • Computer programs help us to visualize the implications of theory and to generate specific examples of it.

In addition, QC simulators are useful for developing and debugging QC algorithms of all types including: QC programming languages, quantum error correction algorithms, etc. In addition, QC simulators can be used to study how QC algorithms degrade with noise.

Numerous QC simulators that can simulate a small number (<=16 qubits) have been written, and you can find links to them at quantiki. Few of those take advantage of parallel computing though. This blog post is about those that do.

Two recent efforts to simulate QCs using parallel computers are

  • D-wave’s AQUA@home The goal of AQUA (Adiabatic QUantum Algorithms) is to simulate the performance of superconducting adiabatic quantum computers. D-wave has a very nice webpage here explaining their setup. Their software was written by Peter Young (UCSC), Neil Dickson (D-Wave), Firas Hamze (D-Wave), and Kamran Karimi (D-Wave), et al.

  • JUGENE QC simulator JUGENE is a supercomputer located in the town of Jülich, Germany. This excellent paper describes JUGENE’s 42 qubit QC simulator written by Hans de Raedt, his son Koen, Kristel Michielsen, et al. You may have already heard about it from this recent press release.

There are two main paths to massively parallel (a.k.a. high performance) computing (i.e., computing that achieves ~ e15 FLOPS). The poor man’s way and the rich man’s way.

Poor people like D-wave 🙂 set up a central server that uses the Internet to send parcels of work to, and receive results from the PCs of volunteers who allow their PCs to be used during idle times. The central server also compiles all the results. To choreograph this intricate dance of information exchange, the central server and volunteer PCs use open-source software from Berkeley Univ., called BOINC. BOINC is used by many famous research projects besides aqua@home; for instance, SETTI@home, Folding@home, etc.

Rich people like the Julich group use a “big iron” supercomputer such as JUGENE. Here are some stats for that puppy:

JUGENE(an IBM Blue Gene/P)
22st in the green500 (Nov2009) biannual list (another of Julich’s computers was 1st)

4th in the top500 (Nov2009) biannual list (1st in Europe)

peak speed: ~e15 FLOPS
memory: 1.44e14 bytes
storage: 6.0e15 bytes
power consumption: 2-3e6 Watts for entire installation, 0.35e9 FLOPS/Watt

The Programmer’s Perspective: to write software that runs on a parallel computer, one uses special software libraries for “message passing”. A task may either share all its memory with other tasks, or none of it, or just part of it (a hybrid). In situations where memory is shared, one can use libraries such as CUDA (as in barracuda) and OpenMP. These are nicely described here and here by a very hungry game-coder named Mark Pope. In situations where memory is not shared, one can use a library called MPI. (For a quick introduction to MPI, see, for example, this) The code of aqua@home uses mainly CUDA, whereas the code of JUGENE’s QC simulator uses mainly MPI.

Recall that 1 complex number = 2 double precision reals = 16 bytes = 16*8 bits, and that n qubits are described by 2^n complex numbers. Hence

qubits number of complex numbers bytes
1 2 32
2 4 64
3 8 128
16 65536 1.05e+6
23 1.34e+8
32 6.87e+10
42 7.03e+13

This explains why the JUGENE simulator is limited to 42 qubits. The reason is not that 42 is the Answer to the Ultimate Question of Life, but that JUGENE’s memory is about e14 bytes.

June 6, 2010

Dumbing Down A Quantum Language

Filed under: Uncategorized — rrtucci @ 10:21 am

In the past few weeks, I’ve been extending QJava, my burgeoning Java class library for writing quantum computing software. I’ve already used QJava to write several applications. Most recently I used it to write “Quibbs”. Quibbs outputs a quantum circuit in a “modern language” that utilizes a sophisticated vocabulary with goodies such as nested loops, U(2) operations with arbitrarily many controls, and quantum multiplexors. What I’ve done in the past few weeks is to write classes that translate this modern language to a much simpler “primitive language” that avoids most of the goodies. Methinks translator applications that dumb-down a quantum language, will someday be useful, because future QC hardware will only “understand” a primitive language, but the QC programmer will prefer writing programs in a modern language. Furthermore, parallelized software, for simulating a QC on classical computers, prefers as input a quantum circuit written in a primitive language. It’s easier to write parallelized software with such an input.

Fig.1 Picture File Dictionary(Click to Enlarge)

Fig.2 English File Dictionary (Click to Enlarge)

Figs. 1 and 2, taken from my paper arXiv:1004.2205, define precisely the vocabulary of the modern language. In these tables, \sigma_X, \sigma_Y and \sigma_Z stand for the Pauli matrices. Also, n = P_1 = |1\rangle\langle 1| and \bar{n} = 1-n = P_0 = |0\rangle\langle 0|, where |0\rangle =(1,0)^{transpose} and |1\rangle =(0,1)^{transpose}. In the modern language, a quantum circuit is specified using two text files: an “English File” and a “Picture File”. For a given circuit, the lines of its English and Picture Files are in 1-1 correspondence. In both of these files, time flows from top to bottom, and each line represents one gate. In the Picture file, the rightmost qubit is labeled 0, the next one as we move from right to left, is labeled 1, the next one 2, and so on. Click here to see a previous blogpost of mine praising the beauty of Picture Files.

The vocabulary of the primitive language is much simpler. It has no loops or multiplexors. It has gates that act on no more than 2 qubits (so no gates with arbitrarily many controls). Okay, we might make an exception and allow a gate that acts on more than 2 qubits. We’ll allow the Toffoli gate, which acts on 3, but only because its name sounds like that of a tasty Italian pastry.

In my software, I go from the modern to the primitive language in several stages:

  1. Eliminate multiplexors This is done by my previously released Java application “Multiplexor Expander”, which can re-express multiplexors either as an exact SEO (SEO=Sequence of Elementary Operations), or as an approximate SEO (in the “oracular approximation”).
  2. Reduce number of controls of each gate. Here I eliminate all gates with more than 1 control, except for the Toffoli which has 2. This is done using identities communicated to us from antediluvian times by a famous paper, viz. Phys.Rev.A 52(1995)3457, written by Barenco, Carlo Beneto, et al.—a cast of thousands.
  3. Eliminate Loops

Blog at WordPress.com.

%d bloggers like this: