Multi-Armed Bandits of the most nefarious ML kind

August 13, 2021

Multi-Armed Bandits of the most nefarious ML kind

Filed under: Uncategorized — rrtucci @ 12:37 am

I just finished a new chapter entitled “Multi-armed bandits” (MABs) for my free, open source book “Bayesuvius” (432 pgs) about Bayesian Networks and Causal Inference.

There are numerous pedagogical articles about MABs on the internet, and also chapters in books. Most of these include computer code. In fact, my chapter is based on the wonderful chapter on MABs in the book “Foundations of Reinforcement Learning with Applications in Finance“, by Ashwin Rao and Tikhon Jelvis. So you might think another such article (without code to boot) doesn’t amount to a hill of beans in this crazy world. That might very well be true. But I think I did something slightly new. I expressed and explained everything about MABs using Bayesian Networks (B nets)—something which no one seems to have done before. As I’ve said many times before in this blog, I believe B nets are a fundamental definition, and most of Machine Learning can be expressed very precisely, intuitively and graphically using B nets.

The term “one-armed-bandit” is a humorous term for what is also called a slot machine. A slot machine is a gambling device which has a slot into which you put coins or tokens for the privilege of being allowed to pull down a lever (arm) on one side of the device. This action generates a random combination of three shapes, which may or may not, depending on their combination, entitle the player to a money award. Multi-armed bandit (MAB) is the name given to the optimization problem that considers an agent (gambler) that is playing multiple one-armed-bandits, each with a possibly different odds of winning. The optimization problem is to determine an efficient schedule whereby the gambler can converge on the device with the highest odds of winning.

MABs are often used in marketing as an alternative to A/B testing. These 2 methods yield different information but overlap in that they both can discover consumer preferences.

The MAB problem is an optimization problem (i.e., finding the maximum of a reward function or the minimum of a cost function). As with any minimization problem, an algorithm to solve it runs the danger of converging to a local minimum that isn’t the global (i.e., the overall) minimum. This danger can be diminished by doing both exploration and exploitation. Algorithms that do no exploration, only exploitation, are said to be greedy, and they are at the highest risk of converging to a non-global minimum.

Quantum Bayesian Networks

August 13, 2021