# Quantum Bayesian Networks

## July 24, 2018

### Are You a Young Male Interested in Quantum Computing? We Recommend a Date with ROSA (Write Once, Simulate Anywhere)

Filed under: Uncategorized — rrtucci @ 12:44 am

Five days ago (7/19), Google released it’s long awaited language for quantum computers, called Cirq. Cirq is available at Github as open source under the Apache license. I expect that Google’s 72 qubit quantum computer and accompanying cloud service, also long awaited, will be unveiled soon too.

(Yes, I am referring to the same company that on (7/18), one day before Cirq was released , was fined \$5B by the European Union because it favors Google’s search engine in Android devices, and it also is gradually making closed source and proprietary all the new R&D for the key apps in the Android ecosystem, and it also ruthlessly excommunicates anyone who tries to fork the Android repo to produce a serious competitor to Android. It also excommunicates any company that uses any Android fork in any of its products. Google, please say it ain’t so!… and say you won’t try to destroy Qubiter—my qc language and simulator, a microscopic competitor to Cirq.)

Qubiter is available at Github as open source under the BSD license.

So as not to be destroyed by the bad hombres at Google, a mere five days after the release of Cirq, I have given to Qubiter amazing new superpowers. Qubiter now has the ability to translate Qubiter qasm to Google Cirq, IBM qasm and Rigetti Pyquil. I equate these superpowers to the ability to go out on dates with an Italian bombshell actress called ROSA. ROSA is an acronym for

Write Once, Simulate Anywhere (ROSA)

Let me explain further. In the Qubiter language, you can use as an operation: any one qubit rotation or a swap of two qubits, with any number of controls attached to them. Qubiter has tools (this Jupyter notebook shows how to use those tools) which allow you to expand such multiply controlled operations into simpler “qasm” that contains only single qubit rotations and cnots. If you want to run that Qubiter qasm on IBM’s, Rigetti’s, or Google’s hardware, Qubiter can also translate its qasm to IBM qasm, Rigetti PyQuil and Google Cirq. The notebook below shows how to do this translation

https://github.com/artiste-qb-net/qubiter/blob/master/jupyter-notebooks/translating-qubiter-english-file-to-AnyQasm.ipynb

So, the previous notebook in effect shows you how to go on a date with beautiful Miss ROSA. Hurry up and call her before she is all booked up. Signorina ROSA also enjoys befriending other females interested in quantum computing.

## July 4, 2018

### Why doesn’t the BBVI (Black Box Variational Inference) algorithm use back propagation?

Filed under: Uncategorized — rrtucci @ 1:36 pm

Quantum Edward uses the BBVI training algorithm. Back Propagation, invented by Hinton, seems to be a fundamental part of most ANN (Artificial Neural Networks) training algorithms, where it is used to find gradients used to calculate the increment in the cost function during each iteration. Hence, I was very baffled, even skeptical, upon first encountering the BBVI algorithm, because it does not use back prop. The purpose of this blog post is to shed light on how BBVI can get away with this.

Before I start, let me explain what the terms “hidden (or latent) variable” and “hidden parameter” mean to AI researchers. Hidden variables are the opposite of “observed variables”. In Dustin Tran’s tutorials for Edward, he often represents observed variables by $x$ and hidden variables by $z$. I will use $\theta$ instead of $z$, so $z=\theta$ below. The data consists of many samples of the observed variable $x$. The goal is to find a probability distribution for the hidden variables $\theta$. A hidden parameter is a special type of hidden variable. In the language of Bayesian networks, a hidden parameter corresponds to a root node (one without any parents) whose node probability distribution is a Kronecker delta function, so, in effect, the node only ever achieves one of its possible states.

Next, we compare algos that use back prop to the BBVI algo, assuming the simplest case of a single hidden parameter $\theta$ (normally, there is more than one hidden parameter). We will assume $\theta\in [0, 1]$. In quantum neural nets, the hidden parameters are angles by which qubits are rotated. Such angles range over a closed interval, for example, $[0, 2\pi]$. After normalization of the angles, their ranges can be assumed, without loss of generality, to be $[0, 1]$.

CASE1: Algorithms that use back prop.

Suppose $\theta \in [0, 1],\;\;\eta > 0.$ Consider a cost function $C$ and a model function $M$ such that

$C(\theta) = C(M(\theta)).$

If we define the change $d\theta$ in $\theta$ by

$d\theta = -\eta \frac{dC}{d\theta}= -\eta \frac{dC}{dM} \frac{dM}{d\theta},$

then the corresponding change in the cost is

$d C = d\theta \frac{dC}{d\theta} = -\eta \left( \frac{dC}{d\theta}\right)^2.$

This change in the cost is negative, which is what one wants if one wants to minimize the cost.

CASE2: BBVI algo

Suppose $\theta \in [0, 1],\;\;\eta > 0,\;\; \lambda > 0.$ Consider a reward function $R$ (for BBVI, $R$ = ELBO), a model function $M$, and a distance function $dist(x, y)\geq 0$ such that

$R(\lambda) = R\left[\sum_\theta dist[M(\theta), P(\theta|\lambda)]\right].$

In the last expression, $P(\theta|\lambda)$ is a conditional probability distribution. More specifically, let us assume that $P(\theta|\lambda)$ is the Beta distribution. Check out its Wikipedia article

https://en.wikipedia.org/wiki/Beta_distribution

The Beta distribution depends on two positive parameters $\alpha, \beta$ (that is why it is called the Beta distribution). $\alpha, \beta$ are often called concentrations. Below, we will use the notation

$c_1 = \alpha > 0,$

$c_2 = \beta > 0,$

$\lambda = (c_1, c_2).$

Using this notation,

$P(\theta|\lambda) = {\rm Beta}(\theta; c_1, c_2).$

According to the Wikipedia article for the Beta distribution, the mean value of $\theta$ is given in terms of its 2 concentrations by the simple expression

$\langle\theta\rangle = \frac{c_1}{c_1 + c_2}.$

The variance of $\theta$ is given by a fairly simple expression of $c_1$ and $c_2$ too. Look it up in the Wikipedia article for the Beta distribution, if interested.

If we define the change $dc_j$ in the two concentrations by

$dc_j = \eta \frac{\partial R}{\partial c_j}$

for $j=1,2$, then the change in the reward function $R$ will be

$dR = \sum_{j=1,2} dc_j \frac{\partial R}{\partial c_j}= \eta \sum_{j=1,2} \left(\frac{\partial R}{\partial c_j}\right)^2$

This change in the reward is positive, which is what one wants if one wants to maximize the reward.

Comparison of CASE1 and CASE2

In CASE1, we need to calculate the derivative of the model $M$ with respect to the hidden parameter $\theta$:

$\frac{d}{d\theta}M(\theta).$

In CASE2, we do not need to calculate any derivatives at all of the model $M$. (That is why it’s called a Black Box algo). We do have to calculate the derivative of $P(\theta|\lambda)$ with respect to $c_1$ and $c_2$, but that can be done a priori since $P(\theta|\lambda)$ is known a priori to be the Beta distribution:

$\frac{d}{dc_j}\sum_\theta dist[M(\theta), P(\theta|\lambda)]= \sum_\theta \frac{d dist}{dP(\theta|\lambda)} \frac{dP(\theta|\lambda)}{dc_j}$

So, in conclusion, in CASE1, we try to find the value of $\theta$ directly. In CASE2, we try to find the parameters $c_1$ and $c_2$ which describe the distribution of $\theta$‘s. For an estimate of $\theta$, just use $\langle \theta \rangle$ given above.

## July 1, 2018

### Is Quantum Computing Startup Xanadu, Backed by MIT Prof. Seth Lloyd, Pursuing an Impossible Dream?

Filed under: Uncategorized — rrtucci @ 5:44 pm

As all Python programmers learn soon, if you ever have a question about Python, it’s almost certain that someone has asked that question before at Stack OverFlow, and that someone has provided a good answer to it there. The same company that brings Stack Overflow to us, now brings also “quantum computing stack exchange” (beta started 3 months ago). I’ve answered a few questions there already. Here is the first question I asked:

https://quantumcomputing.stackexchange.com/questions/2414/is-probabilitistic-universal-fault-tolerant-quantum-computation-possible-with

### Quantum Edward, Quantum Computing Software for Medical Diagnosis and GAN (Generative Adversarial Networks)

Filed under: Uncategorized — rrtucci @ 7:28 am

Quantum Edward at this point is just a small library of Python tools for doing classical supervised learning by Quantum Neural Networks (QNNs). The basic idea behind QEdward is pretty simple: In conventional ANN (Artificial Neural Nets), one has layers of activation functions. What if we replace each of those layers by a quantum gate or a sequence of quantum gates and call the whole thing a quantum computer circuit? The replacement quantum gates are selected in a very natural way based on the chain rule of probabilities. We take that idea and run with it.

As the initial author of Quantum Edward, I am often asked to justify its existence by giving some possible use cases. After all, I work for a startup company artiste-qb.net, so the effort spent on Quantum Edward will not be justified in the eyes of our investors if it is a pure academic exercise with no real-world uses. So let me propose two potential uses.

(1) Medical Diagnosis

It is interesting that the Bayesian Variational Inference method that Quantum Edward currently uses was first used in 1999 by Michael Jordan (Berkeley Univ. prof with same name as the famous basketball player) to do medical diagnosis using Bayesian Networks. So the use of B Nets for Medical Diagnosis has been in the plans of b net fans for at least 20 years.

More recently, my friends Johann Marquez (COO of Connexa) and Tao Yin (CTO of artiste-qb.net) have pointed out to me the following very exciting news article:

It took 2 years to train the Babylon Health AI, but the investment has begun to pay off. Currently, their AI can diagnose a disease correctly 82% of the time (and that will improve as it continues to learn from each case it considers) while human doctors are correct only 72% of the time on average. Babylon provides an AI chatbot in combination with a remote force of 250 work-from-home human doctors.

Excerpts:

The startup’s charismatic founder, Ali Parsa, has called it a world first and a major step towards his ambitious goal of putting accessible healthcare in the hands of everyone on the planet.

Parsa’s most important customer till now has been Britain’s state-run NHS, which since last year has allowed 26,000 citizens in London to switch from its physical GP clinics to Babylon’s service instead. Another 20,000 are on a waiting list to join.

Parsa isn’t shy about his transatlantic ambitions: “I think the U.S. will be our biggest market shortly,” he adds.

Will quantum computers (using quantum AI like Quantum Edward) ever be able to do medical diagnosis more effectively than classical computers? It’s an open question, but I have high hopes that they will.

(2) Generative Adversarial Networks (GAN)

GANs (Wikipedia link) have been much in the news ever since they were invented just 4 years ago, for their ability to make amazingly accurate predictions with very little human aid. For instance, they can generate pictures of human faces that humans have a hard time distinguishing from the real thing, and generate 360 degree views of rooms from only a few single, fixed perspective photos of the room.

Dusting Tran’s Edward (on which Quantum Edward is based) implements inference algorithms of two types, Variational and Monte Carlo. With Edward, one can build classical neural networks that do classification via the so called Black Box Variational Inference (BBVI) algorithm. Can BBVI also be used to do GAN classically? Yes! Check out the following 4 month old paper:

Graphical Generative Adversarial Networks, by Chongxuan Li, Max Welling, Jun Zhu, Bo Zhang https://arxiv.org/abs/1804.03429 (see footnote)

Can this be generalized to quantum mechanics, i.e. can one use BBVI to do classification and GAN on a quantum computer? Probably yes. Quantum Edward already does classification. It should be possible to extend the techniques already in use in Quantum Edward so as to do GAN too. After all, GAN is just 2 neural nets, either classical or quantum, competing against each other.

(footnote) It is interesting to note that 3 out the four authors of this exciting GAN paper work at Tsinghua Univ in Beijing. Their leader is Prof. Jun Zhu (PhD from Tsinghua Univ, post-doc for 4 yrs at Carnegie Mellon), a rising star in the AI and Bayesian Networks community. He is the main architect of the software ZhuSuan. ZhuSuan is available at GitHub under the MIT license. It is a nice alternative to Dustin Tran’s Edward. Like Edward, it implements Bayesian Networks and Hierarchical Models on top of TensorFlow. The above GAN paper and the ZhuSuan software illustrate how advanced China is in AI.

Create a free website or blog at WordPress.com.