The Knock Nevis, sister ship to the Grover's Algorithm

My wife had been imploring me all day long that I go to the supermarket to buy some vittles, whatever those are. As I drove the old Ford to market, I suddenly realized that…

Now it was up to me, and only me, to maneuver this fully loaded supertanker, the Grover’s Algorithm, to a safe fixed point, or else. The or-else was too horrible to contemplate. It was nighttime. According to radar, the Grover’s Algorithm was steaming inexorably towards some shoals, 5 miles away from a pristine Alaskan bay. Its rusty, creaky cargo holds had been filled to the brim just 2 days before with a particularly toxic grade of Prudhole crude oil. The captain and co-captain were passed out, drunk to oblivion. All other crew members were asleep in their bunk beds, blissfully unaware of the impending disaster. Even if I could wake them up in time, they would be of little use to me, as none had any knowledge of how to steer the ship. Lesser men in a similar situation would have been paralyzed by fear. Luckily, the cool analytical mind and sangfroid of the Walter-Mitty-Tucci’s prevailed in me. As the stars shone brightly above on that crisp summer night, I headed alone towards the bridge, while I tried to remember all that I had learned, from reading blogs and Wikipedia, about steering a ULCC (Ultra Large Crude Carrier)(pronounced HULK by some). Suddenly an idea clicked in my mind: what was needed was a fixed-point adaptive approach to maneuvering the Grover’s Algorithm. I applied such a scheme, conceived in utter desperation, and, praise the Lord a million times, it worked! The trajectory of the massive supertanker began to budge, ever so gradually, towards the safe fixed point I had assigned to it. No one but me would ever come to know what I had accomplished that lonely night. When the captain had regained his senses, he would assume, as was his wont, that nothing momentous had happened while he was in drunken stupor, and I would be sent back to swabbing decks…

A horn toot behind me by a rude driver awoke me from my deep reverie. I parked my car in the supermarket parking lot, with the centimeter precision one would expect from a seasoned supertanker captain. As soon as I got home, and my wife had been appeased, I began putting to paper the following account of the desperate scheme which I conceived on that lonely, frightful night when I became a temporary captain of the Grover’s Algorithm.

Check out

“An Adaptive, Fixed-Point Version of Grover’s Algorithm”, by R.R. Tucci,

arxiv:1001.5200
LOTS OF PICTURES!

SOFTWARE INCLUDED!

(batteries not included, but none are needed. Okay, I confess, the software is pretty trivial, just some Octave m-files:

but that’s

more software than you’ll ever get with any paper by a “quantum complexity theorist”. (By the way, if you didn’t know, Octave is a free, open source attempt to clone, at least partially, Matlab. It’s a lifesaver for people like me who are very poor and can’t afford to buy Matlab. Octave m-files supposedly work in the Matlab environment with zero or few modifications.)

### Like this:

Like Loading...

*Related*

Hi,

I am a psychology student working on Quantum recently. I tried to implement the paper named ” Quantum Reinforcement Leaning” by Daoyi Dong, Chunlin Chen, Hanxiong Li, Tzyh-Jong Tarn. They use Grover search for choosing one action among 4 actions in an undefined environment. I try to implement the Matlab code, however, my code does not work. Assume you have 4 actions with equal probabilities, once the Grover amplify the amplitudes, it amplify one probability to “one” and so there would be no choice of searching other actions, thus my code stick to one action till termination criteria. I dont understand how Grover helps. I have 4 actions in one state with probabilities of 0.25 for each, then one actions randomly selected,like 3 and so without considering whether it is good or not, the Grover amplify the probability from 0.25 to 1. and so next time the agent go to this state it definitely choose action 3 for ever! so what is Grover “Search” mean?

Would you please help me?

Thanks

Pegah

Comment by Pegah — July 30, 2011 @ 12:15 am

Hi Pegah,

I looked briefly at the paper that you mention. In my opinion, it uses VERY ugly and confusing notation sometimes.

You don’t say whose MatLab code your are using.

The problem you are describing, that the action sticks on one value forever, is something that has nothing to do with the quantum (Grover part) of their algorithm.

Most of their algorithm is really classical (i.e non quantum). In particular, how they update their function V(s) is non-quantum, although at the end of section C on page 5 they promise that they will do the updating of V(s) in a quantum fashion in future work. They say :”How to realize some specific functions of the algorithm using quantum gates in detail is our future work.” In other words, they don’t know how to do it.

In the gridworld example, there are 4 actions: move up, down, right, left. If the program sticks on one action, say “move right” for a while, then it eventually reaches a wall of gridworld, at which point the function V(s) should be updated, by a non-quantum algorithm, so that the next time going right again is forbidden.

What I would do myself is to first try to write a computer program that works, that does the traditional (i.e., non-quantum) reinforced learning for gridworld. Once I had accomplished this first goal, then I would dare contemplate how I could replace a part of that algorithm with a quantum version of it.

Comment by rrtucci — July 30, 2011 @ 1:40 pm

Thank you very much for your quick response. Indeed I am really frustrated with the Grover and your feedback is good.

Actually I did write the code for traditional reinforcement learning and it works. In the method they used, “epsilon- greedy”, actions were selected randomly based on initial state values each time and if there is a reward for that action it updates the Values and use these new values for next episode. So I reach the first goal you mentioned, however the quantum algorithm is vague. In order to amplify the amplitudes in the Grover by “L” they use L = int(k(r +V (s))); It means that parallel to Quantum algorithm, the traditional algorithm must work and update the values and then give them to Quantum so it can be used in Grover ( like a feedback for Quantum which determines how much amplification must be done in the Grover).

The problem is in Quantum, my code selects an action randomly in a state, (say action go down in starting point which is not good ) and gives it to Grover as an solution. So Grover algorithm amplifies the amplitude of selecting this action to nearly one without considering that this action might not be the best. Thus in next episode, while the agent is in starting state, the action with amplitude of nearly one selected and would not search for other actions and so it stays there till termination criteria.

Here is a code:

function [ProbValue] = QiAmp(NewState, State, Reward, Action, QValue, ProbValue, Param)

L = floor(Param.c * (Reward + Param.Discount * max(QValue(:, NewState))) );

% L is determined by parallel Qvalue updating in traditional RL

A = 1.*(( 1 : size(Param.NumAction, 2) )’ == Action);

% A is vector with zeros in every row except the row j corresponding to

% action Action

Ua = eye(size(Param.NumAction, 2)) – 2 * (A * A’);

Ub = 2 * (ProbValue(:,State) * ProbValue(:, State)’) – eye(size(Param.NumAction, 2));

Ug = Ub * Ua;

ProbValue(:,State) = (Ug ^ L)* ProbValue(:,State);

ProbValue(:,State) = ProbValue(:,State) / sqrt(sum((abs(ProbValue(:, State))).^2));

end

Comment by Pegah — July 30, 2011 @ 4:59 pm

Hi Pegah,

You’ve piqued my interest. Let’s continue the conversation by email.

Comment by rrtucci — July 31, 2011 @ 3:10 pm