Quantum Bayesian Networks

November 30, 2020

My Free Open Source Book “Bayesuvius” on Bayesian Networks and Causal Inference

Filed under: Uncategorized — rrtucci @ 3:08 pm

THIS BOOK IS CONTINUOUSLY BEING IMPROVED AND EXPANDED. MAKE SURE YOU HAVE THE LATEST VERSION FROM GITHUB.

See also “Famous uses of Bayesian Networks

June 27, 2020

My Pinned Tweet at Twitter

Filed under: Uncategorized — rrtucci @ 9:28 pm

This is the pinned Tweet on my company’s (www.ar-tiste.xyz) Twitter account

September 14, 2021

Stanford championing shaky Foundation Models, a dead end street for AI

Filed under: Uncategorized — rrtucci @ 4:27 pm

sand-house

Check out this excellent article that was just published. I highly recommend it:

Has AI found a new foundation? by Gary Marcus and Ernest Davis (The Gradient, 11 Sept 2021)

A month ago, Stanford University published a massive 212 page report, authored by 149 scientists from Stanford University, coining the term “Foundation Models”, and announcing the opening of a new institute dedicated to them. The above article explains why these so called Foundation Models (FMs) are a very poor foundation for AI, and therefore, dedicating so many resources to them is extremely foolish and counter-productive for the AI field. 

The article makes many great points that I agree with wholeheartedly. It particularly shines with scary/hilarious stories of catastrophic FM failures. It points out that FMs are (1) very limited in what they can do, (2) Even when doing their forte, they are dangerously erratic and not trustworthy. (3) Very ill-defined—their definition seems to be: “something that looks like Google’s BERT”. (1), (2) and (3) is not what you want in a foundation. It reminds me of a famous quote from the classic movie Animal House: “Fat, drunk and stupid is no way to go through life son.” I’d give that advice to BERT. The article has a desiderata list of what a good AI Foundation should have. FMs fail most of the items in that list.

Let me add a few of my own criticisms of FM.

The Stanford monstrosity paper takes the stand that FMs are “risky” but fixable. I disagree. They are not fixable, their flaws are too deeply entrenched. As an advocate of Bayesian Networks and Causal Inference (CI), I see FMs as a dead-end street. FMs like BERT prove that our current machines are really good at curve fitting, better than humans. But we already knew that. But FMs are model free so they are incapable of doing CI. That is why I call them a dead end street. I believe CI is a necessary part of any human-like AI.

The Stanford paper acknowledges that FMs are risky, but it fails to point out one of the biggest risks of FMs. By pouring so many resources into FMs, Stanford is promoting mono-AI; i.e., mono-culture and group think in AI. Stanford is sending a message that FMs are the only, or the main game in town. Do we want there to be only one game in town, and that that game be a dead-end street?

FMs are controlled by rich, monopolistic companies such as Google, and Big Science such as Stanford’s new FM institute. Do we want to have a few rich companies be the gate keepers (and main financial beneficiaries) of the only game in town?

FMs seem awfully expensive to use. Do we want to make the only game in town be one that only rich corporations and universities can afford to participate in?

 

September 10, 2021

I just sold the Brooklyn Bridge to my parents for $20M

Filed under: Uncategorized — rrtucci @ 1:29 am

brooklyn-bridge-drone

September 6, 2021

Bayesian Networks and GIT

Filed under: Uncategorized — rrtucci @ 10:15 pm

Did you realize that GIT is a Bayesian Network (bnet) generator? In a GIT-bnet, each node carries the NEW CHANGES to your document—a merging of all the changes carried by the incoming arrows. As in all bnets, the arrows connecting the nodes of a GIT-bnet describe a partial time ordering of the nodes. In a previous blog post, I discussed how all bnets reflect to some extent the passage of time. It is possible to define bnets which contain some nodes which stand for subjective qualities that don’t have a well defined time associated with them (people in the social sciences define bnets with such subjective nodes all the time). GIT-bnets, however, have no such nodes;  each node in a GIT-bnet has a time stamp. Note also that GIT-bnets are deterministic bnets; the probability tables  (aka, TPM Transition Probability Matrices)  associated with each node of a GIT-bnet are deterministic, but that is perfectly acceptable. Any node of a bnet can be deterministic. A deterministic TPM is just a special type of probability distribution. Artificial Neural Nets are also deterministic bnets.

If you’ve come this far reading my ruminations, here is your reward: A video game designed to learn how to use GIT.

https://ohmygit.org/

oh-my-git

September 5, 2021

Godzilla-KingKong-Doge

Filed under: Uncategorized — rrtucci @ 8:57 pm

godzilla-kk-doge-nn-ci

This meme was generated with this online meme generator

August 29, 2021

Is Rubin’s Potential Outcomes theory well-defined?

Filed under: Uncategorized — rrtucci @ 1:49 am

wrong-book

stats-done-wrong

Today I wrote a short (5 page) letter arguing that Rubin’s PO theory is not well-defined, inconsistent. This is obviously a controversial topic, heresy, and there is a high probability that I am wrong, but there is 100% probability that I will learn a lot from the feedback that I will get for it. Here is the paper. I haven’t submitted it to any journal or even to arxiv yet. Feedback is very welcome.

ADDENDUM:

I wrote the little “paper” accompanying this blog post because I am writing a book about Bayesian networks and causal inference, and I was confused about this issue, and I wanted to present clearly my ideas to my peers so that they could un-confuse me.

The little paper had its intended effect and I got some great feedback on Twitter from Judea Pearl, Victor Chernozhukov and Matheus Facure Alves. They rightfully pointed out that Rubinologists assume graph G_1 in my paper, BUT without the arrow D–>[Y(0),Y(1)]. I was convinced that arrow was being used by Rubinologists, but no. Without that arrow, SUTVA and CIA can hold simultaneously.

I’m good now. I now understand why SUTVA and CIA can both hold simultaneously. The little “paper” was wrong to claim that they can’t hold simultaneously. I’ve revised extensively my book to reflect my new understanding. I’m constantly improving my book. It’s a labor of love 🙂

August 23, 2021

Explaining Shapley Explainability

Filed under: Uncategorized — rrtucci @ 5:26 pm

titanic-shap-plot

“AI” is an ill-defined term. So is the term “explainability”. So the term Explainable AI (XAI) is doubly ill-defined. In 2018, the European Union codified the need for XAI — because  of an individual’s “right to explanation” —  into a law called the General Data Protection Right (GDPR). This EU law was a strong motivation for Neural Net (NN) and boosted decision tree practitioners to come up with a way to enhance their machine learning algorithms so that these comply with that law. Shapley explainability (SX, pronounced SEX) is one of the most popular methods for doing XAI.

So what does SX do? It ranks, for each individual of a population, the features (for example, race) of a dataset, in the order of how influential those features were in arriving at the decision the classifier made for that individual.

Personally, I don’t find SX that great. For my money, Bayesian Networks (bnets) are much better than SX. 🙂 SX “explains”, a posteriori, the outcome of a model, whereas bnets reveal the a priori process whereby that outcome was reached. Thus, SX can tell you that a model is racist, but it can’t suggest how to fix it. On the other hand, if a bnet is acting racist, you don’t have to throw it away. It can be fixed. Another weakness of SX is that it is quite expensive computationally. Bnets have explainability built into them. For bnets, explainability is not an additional, posterior and quite onerous, calculation. That is why I like to call bnets the gold standard of XAI.

The nodes of an unboosted decision tree (dtree) have meaningful labels, just like a bnet, so dtrees are explainable in the same way bnets are, but bnets are causal whereas dtrees are acausal; i.e., the nodes of a dtree are ordered without any causal motivation.

Even though I don’t find SX that appealing as a solution to XAI, I recently found a great blog post (see below) that explains SX using bnets. Because of that cool (or hot?) connection of bnets to SX, I just wrote a chapter on SX for my book Bayesuvius. Chapter 57 of Bayesuvius is called “Shapley Explainability”. Remember that the chapters of Bayesuvius are ordered alphabetically by title.

August 18, 2021

Self-publishing a scientific or technical book on Amazon, and the 2 decade old quest to convert LaTeX to HTML and ePub

Filed under: Uncategorized — rrtucci @ 1:03 am

bayesuvius-cover.-small

During the past year, I wrote a 433 page technical/scientific book entitled “Bayesuvius”. Its book cover(1) is shown above. Bayesuvius was written in LaTeX, a language that is very familiar to anybody that has written a scientific or engineering paper or thesis. The LaTeX software produces a beautiful PDF. This blog post describes my saga to convert that PDF to EPUB format, in order to publish it as an ebook on Amazon.

Amazon, Barnes & Noble, Apple and several other companies allow one to self-publish a book, in either paper or ebook form or both, for free, but they charge a sales-commission  fee, ka-ching,  for every book sold through their website. In my case, I am not publishing in paper form, only in ebook form, and I am giving the ebook away for free, so the commission fee is of no concern to me.

I’ve chosen to make Bayesuvius a free open-source book. What I mean by this is that all the LaTeX files and a pdf compilation of it are available at github, under a Creative Commons license. If you too want to self-publish a science/technical book as an ebook, even if you don’t plan to give it away for free like me, you will still have to jump the hurdle PDF->EPUB.

If your book is already in EPUB format, self-publishing on Amazon is a very easy process. You create an account here, upload your book, answer a few questions, and presto, it shows up on their website a few days later.  Millions of authors have already self-published in Amazon. Fiction writers often self-publish ebooks on Amazon and other places, and give them away for free, in order to develop a reputation and a following.

However, if your book is in PDF format and contains equations, Amazon will do an atrocious job (the result is illegible) converting it into the common ebook formats (EPUB for generic ebooks, AZW3 for Kindle/Amazon ebooks). And Barnes&Noble doesn’t even accept PDF submissions. Neither of them accepts LaTeX (most scientific publishers do).

Since my book was in PDF form, this presented a seemingly insurmountable obstacle to self-publishing. I spent a whole weekend, to no avail, trying to convert my book from LaTeX or PDF to EPUB.

latex2epub

I tried converting the  LaTeX or PDF directly to EPUB, or, indirectly to EPUB, via an intermediate step (HTML, DOCX, RTF). I read every article I could find in Stackoverflow about converting LaTeX or PDF to HTML. HTML seemed like a promising intermediate step because EPUB is based on HTML (more precisely, the latest EPUB, v3, is based on HTML5 and is quite powerful).

I tried dozens of online document conversion sites, and when that didn’t produce a good result, I downloaded and tried the following conversion software packages. I was desperate.

  • latex2html
  • tex4ht/make4ht
  • pandoc
  • wkhtmltopdf
  • prince
  • calibre
  • poppler-utils/pdftohtml

All yielded terrible results for my book. I’m sure those software libraries do a great job for simple pdf or latex documents, but my book is a tough customer, because it has hundreds of equations and hundreds of figures generated by complicated LaTeX packages.

Then, finally, somewhere around 2 in the morning at the end of this 2 day ordeal, the dark clouds over me lifted and my problem was solved in an instant. I found, ta-tan,

pdf2htmlEX (https://github.com/pdf2htmlEX/pdf2htmlEX)

and its experimental adjunct

pdf2epubEX (https://github.com/dodeeric/pdf2epubEX)

Check out also this github issue that alerted me to the existence of pdf2epubEX.

pdf2htmlEX and pdf2epubEx both work lighting fast. They are able to convert my whole book in less than a minute.

pdf2htmlEX produced this excellent html of my book.

pdf2epubEx also worked very fast on my book, but the resulting epub is a bit misaligned. I’ve reported the issue to its author dodeeric. I have no doubt that dodeeric and his co-workers at pdf2epubEx will be able to fix this minor misalignment issue  soon. They are a brilliant bunch. They have far outwitted and outperformed the Amazon and Barnes&Noble programmers. Those 2 companies are fools not to be funding the  pdf2htmlEX/pdf2epubEx  organization so that it can continue developing these 2 awesome open-source tools.

I see pdf2htmlEX/pdf2epubEx  as the culmination of a quest lasting at least 2 decades, to convert LaTeX to HTML (L2H). The reason it took so long is that we had to wait that long until web-browsers matured sufficiently before conversion tools could do a decent job at L2H conversion. The earliest attempts at L2H conversion, such as the latex2html software, were based on the original HTML. The original HTML was quite limited in its capabilities, by today’s standards. With it, it was impossible to generate a faithful reproduction of a LaTeX document in a webpage, unless you took a jpeg of each page. latex2html tried to convert all equations into jpeg or gif images (later attempts used mathjax) and disregarded the source’s fonts, positions of objects on the page, dimensions, etc. But once web-browsers acquired the super-powers of javascript and HTML5, all those stylistic details could be reproduced faithfully on a webpage. This table from the pdf2htmlEX website illustrates well the correlation between the evolution of web-browsers and the evolution of L2H conversion tools .

ADDENDUM (Aug 20, 2021):

dodeeric informed me that the epub being generated by pdf2htmlEX is correct. The problem is that most current ereaders, including Google’s ereader, don’t render “fixed layout” epub files correctly. I was able to find just one free ereader that does it properly. It’s called “PocketBook”.

dodeeric posted a very helpful message, in this github issue, alerting me to the existence of an Amazon format called “print replica” which can be read by the Amazon Kindle ebook reader. “It’s a format which mainly “wraps” the PDF file into another file containing metadata.” See How to prepare a print replica file from a PDF file with the Kindle Create tool

Following dodeeric’s kind advice, I downloaded Kindle Create, created a replica of my pdf, submitted it to Kindle Direct Publishing. All went through without hitting any snags. My book showed up in the Amazon website the same day. https://www.amazon.com/dp/B09D8Q4CNC/


(1) If you are curious, I created this book cover online, free of charge, using the canva.com website. I have no affiliation  with canva.com. This is the first time I use their services. They were just one of the many hits that I got  when I googled “design book cover online”).

August 13, 2021

Multi-Armed Bandits of the most nefarious ML kind

Filed under: Uncategorized — rrtucci @ 12:37 am

octupus-bandit

I  just finished a new chapter entitled “Multi-armed bandits” (MABs) for my free, open source book “Bayesuvius” (432 pgs) about Bayesian Networks and Causal Inference.

There are numerous pedagogical articles about MABs on the internet, and also chapters in books. Most of these include computer code. In fact, my chapter is based on the wonderful chapter on MABs in the book “Foundations of Reinforcement Learning with Applications in Finance“, by Ashwin Rao and Tikhon Jelvis. So you might think another such article (without code to boot) doesn’t amount to a hill of beans in this crazy world. That might very well be true. But I think I did something slightly new. I expressed and explained everything about MABs using Bayesian Networks (B nets)—something which no one seems to have done before. As I’ve said many times before in this blog, I believe B nets are a fundamental definition, and most of Machine Learning can be expressed very precisely, intuitively and graphically using B nets.

The term “one-armed-bandit” is a humorous term for what is also called a slot machine. A slot machine is a gambling device which has a slot into which you put coins or tokens for the privilege of being allowed to pull down a lever (arm) on one side of the device. This action generates a random combination of three shapes, which may or may not,  depending on their combination, entitle the player to a money award. Multi-armed bandit (MAB) is the name given to the optimization problem that considers an agent (gambler) that is playing multiple one-armed-bandits, each with a possibly different odds of winning. The optimization problem is to determine an efficient schedule whereby the gambler can converge on the device with the highest odds of winning.

MABs are often used in marketing as an alternative to A/B testing. These 2 methods yield different information but overlap in that they both can discover consumer preferences.

The MAB problem is an optimization problem (i.e., finding the maximum of a reward function or the minimum of a cost function). As with any minimization problem, an algorithm to solve it runs the danger of converging to a local minimum that isn’t the global (i.e., the overall) minimum. This danger can be diminished by doing both exploration and exploitation. Algorithms that do no exploration, only exploitation, are said to be greedy, and they are at the highest risk of converging to a non-global minimum.

August 1, 2021

Meta-learners for estimating treatment effects

Filed under: Uncategorized — rrtucci @ 5:40 am

MULLER-SWARBRICK

I just finished a new chapter entitled “Meta-learners for estimating ATE” for my free, open source book “Bayesuvius” (410 pgs) about Bayesian Networks and Causal Inference. I based my new chapter on the last 2 chapters of a wonderful book by economist Matheus Facure Alves  which I described in a previous blog post. My new chapter has no computer code, only equations. Matheus’s 2 chapters, on the other hand, include beautiful Python Code. Matheus is a wizard with Pandas, numpy, scikit-learn and matplotlib. I’ve learned a tremendous amount about those libraries  by reading his code. Apart from Matheus’s book, another place where one can find Python code pertaining to meta-learners for estimating ATE is Uber’s CausalML.

One of the main endeavors of Causal Inference (CI) is to calculate the Average Treatment Effect (ATE). ATE is defined as the average of  Y^\sigma (1) - Y^\sigma (0), were Y^\sigma(1) is the outcome for individual \sigma if he/she was given a treatment, and Y^\sigma(0) is the outcome if he/she was not given the treatment. Since individual \sigma can either receive the treatment or not, but not both, one of the two Y’s is always counterfactual.

Economists are huge fans of Linear Regression (LR), and traditionally calculate ATE using LR. But in recent times, they have begun to calculate ATE using Machine Learning (ML) instead. My new chapter describes various methods that economists and others have devised for calculating ATE with ML. These methods are called meta-learners because they involve multiple ML or LR steps.

Using ML to calculate ATE captures non-linear trends whose exclusion might sometimes lead to a poor result. On the other hand, ML is more expensive computationally than LR, and it introduces the danger of overfitting, a danger which is nonexistent with LR.

July 16, 2021

Linus Torvalds’s opinion about the people that work in quantum computing

Filed under: Uncategorized — rrtucci @ 10:46 am

Bayesian networks (aka Causal Models, DAGs) and the Passage of Time

Filed under: Uncategorized — rrtucci @ 7:23 am

bird_AKKamperTime flies.

In this blog post, I will try to answer the following questions:

  1. What is Causality, really, and how do Bayesian Networks (aka Causal Models, DAGs) encode it? (Henceforth in this blog post, we will use the terms Bayesian Network (bnet), causal model and DAG as synonymous. I’ve explained why this is justified in a previous blog post.
  2. Give a simple yet convincing explanation for why a dataset does not fully specify a causal model. (I gave a more technical explanation of this in a previous blog post.)
  3. Give a simple non-rigorous method for deciding, given an undirected graph, in what direction to point the links of that undirected graph so as to get a DAG.

(1) For me, Causality is a time-induced ordering between two events, the transmission of information from the earlier of the two events to the latter one, and the physical response of the latter event to the reception of that information. The nodes of a bnet represent random variables. Some of those random variables are clearly events (i.e., they occur at a definite time). For example, let D=0 if a patient is not given a drug, D=1 if he/she is given it. D occurs at a definite time. But other random variables represent qualities which do not occur at a definite time. For example, G=gender=male,female. G does not occur at a definite time.  But even in the case of a quality like G, its value is first decided at birth, so one can ascribe to G a particular, albeit fuzzy time interval during which it is decided. If M=0(single), 1(married), then we can assign to M the day of the marriage. Both the time interval assigned to G and to  M are somewhat ambiguous, but still, most people would say that G occurs before M (if a marriage occurs at all). Saying the opposite, that M occurs before G, seems pretty hard to understand. If two nodes A and B of a bnet have time intervals ascribed to them such that the time interval of A does not clearly occur before or after the time interval of B, then let’s call those events contemporaneous and not draw any arrows from A to B or vice versa.

(2) Now that we understand that the arrows in a bnet really do encode the direction of time, it becomes clear why a dataset does not fully specify a bnet. By a dataset (think of a dataframe in Pandas or R), I mean an array of numbers where the columns refer to features and the rows refer to individuals in a population. The column labels of the dataset become the node names of the bnet. Nowhere in a dataset is there any indication of the time ordering of the features. Hence, it’s imposible to create, from a dataset alone, a bnet, because bnets do carry such time-ordering information.

(3) Now that we understand that a bnet’s arrows are encoding roughly the passage of time, it becomes possible to glean from this insight a simple method, which, although not very rigorous, is really helpful to me. I will illustrate said method with the famous “Asia” bnet shown below. In this bnet, all nodes have two  possible values, 0 and 1.

Given a dataset for this bnet, one can calculate the correlation between every 2 features of the dataset. The feature names become the node names, and links are drawn between any 2 nodes whose correlation is greater than some threshold value. This gives an undirected graph that can be obtained from the bnet above by erasing the directions of the arrows. So how can we guess the directions of the arrows? Well, one uses a little bit of “expert knowledge” to conclude that

time(Visited Asia) < time(Tuberculosis) < time(Or) < time(X-Ray, Dispnea)

Also

time(smokes) < time(LungCancer, Bronchitis) < time(Or) < time(Dispnea)

If time(A) < time(B), then A–>B. Like I said before, the times we ascribe to these events are somewhat fuzzy and open to debate, so this algorithm is far from being rigorous. But often, saying that  time(A)<time(B) makes much more sense than saying that time(B)<time(A). When in doubt about the best direction to give to an arrow of an undirected graph, I recommend calculating a Goodness of Causal Fit metric which makes ample use of “do” operator experimentation.

ADDENDUM: Prof. Judea Pearl and others have alerted  me to Granger Causality (GC). The critics of GC point out that it assumes erroneously, much like I do, that if event A precedes B and the two events are correlated, A must cause B. I agree. A rooster can crow before sunrise because he has an alarm clock that wakes him up 30 minutes before sunrise. I still think those cases are uncommon and seem to involve other intermediate events. Most roosters crow in response to the stimulus of the sunrise light.  The moral is that time ordering and correlation are necessary but not sufficient conditions for causality. To establish causality with more certainty, one also needs a pinch of prior expert knowledge, or one must gain that expert knowledge through “do” operator experimentation.

My answer to question (2) should have said: a dataset cannot fully specify a bnet because it lacks time ordering info. A dataset also cannot do the harder task of specifying a bnet that is a good causal fit to the problem, because it lacks time ordering info AND prior expert knowledge AND expert knowledge gained from posterior “do” operator experimentation.

July 8, 2021

First Page of Feynman’s “Statistical Mechanics” book

Filed under: Uncategorized — rrtucci @ 8:42 pm

Long ago, the first time I read this passage, it was an epiphany moment for me. I thought. Wow! So that’s all there is to thermo? I still find this passage to be a stunningly beautiful and highly effective way to begin a book on statistical mechanics: (from Feynman’s “Statistical Mechanics” book, highlight is my own.)

CHAPTER 1

INTRODUCTION TO STATISTICAL MECHANICS

1.1 THE PARTITION FUNCTION

The key principle of statistical mechanics is as follows:

If a system in equilibrium can be in one of N states, then the probability of the system having energy E_n is (1/Q) e^{-\frac{E_n}{kT}}, where

Q = \sum_{n=1}^{N}e^{-\frac{E_n}{kT}},

k= Boltzmann’s constant, T= temperature. Q is called the partition function.

If we take |i\rangle as a state with energy E_i, and A as a quantum mechanical operator for a physical observable, then the expected value of the observable is

\langle A \rangle = \frac{1}{Q} \sum_i \langle i|A|i \rangle e^{-\frac{E_i}{kT}}.

This fundamental law is the summit of statistical mechanics, and the entire subject is either the slide-down from this summit, as the principle is applied to various cases, or the climb-up to where the fundamental law is derived and the concepts of thermal equilibrium and temperature T clarified. We will begin by embarking on the climb.

June 30, 2021

Microsoft’s CausalCity

Filed under: Uncategorized — rrtucci @ 4:03 pm

Microsoft has just released the first public version of CausalCity
Reminds me of CausalWorld

Both try to combine Reinforcement Learning and Causal Inference. 

June 28, 2021

DAGs versus Bayesian Networks, You say tomato, I say tomato

Filed under: Uncategorized — rrtucci @ 8:49 pm

tha-gravy

Pass da Gravy!

A Bayesian Network is a DAG+ probability tables. One can easily compute the probability tables from DAG + Dataset. Therefore,

You say DAG+Dataset, I say Bayesian Network.

The use of the terms “causal model” and “DAG”, as an alternative to the term “Bayesian Network”, seems to have become more popular in the last decade among economists, AI researchers and even Judea Pearl himself. It seems some people think “causal models” and “DAGs” are revolutionary, whereas Bayesian Networks are a concept that was tried 25 years ago and has been replaced since then by stuff that works better. But any time you have a Dataset, which is almost always true in practice in Economics and AI, a DAG implies a Bayesian Network and vice versa.

Bayesian Networks are a graphical representation of the chain rule for conditional probabilities. Just like Calculus, they will never go out of fashion. They are not a “heuristic algorithm” like XGBoost or Neural Nets. They are a very simple, intuitive, basic and general definition. I would say that the definition of a Bayesian Network is as important to Probability Theory as the definition of a Group is to Abstract Algebra. Algebraic groups are never going to go out of fashion and neither are B nets.

An Artificial Neural Net can be defined as a Bayesian Network with a layered structure, and such that all its nodes are deterministic(1). A decision tree is not exactly a Bayesian Network, but it can be trivially replaced by an equivalent B net that has the same tree structure (for more details about this equivalence, see the chapter on decision trees in my book Bayesuvius.). In fact, as I show in my book Bayesuvius, most methods in AI can be understood in terms of B nets. Just like many theorems in Abstract Algebra can be understood in terms of groups.

(1) NNs are DAGs, but they contain a lot of extra, spurious nodes with no causal motivation. So I like to say that NNs are acausal DAGs.

June 22, 2021

Andrew Ng blames the Data

Filed under: Uncategorized — rrtucci @ 6:23 pm
andrew-ng-titanic
Check out this fascinating article from Forbes:
Andrew Ng Launches A Campaign For Data-Centric AI (by Gil Press, Forbes)

excerpts:

Data is eating the world so Andrew Ng wants to make sure we radically improve its quality. “Data is food for AI,” says Ng, and he is launching a campaign to shift the focus of AI practitioners from model/algorithm development to the quality of the data they use to train the models.

In the dominant model-centric approach to AI, according to Ng, you collect all the data you can collect and develop a model good enough to deal with the noise in the data. The established process calls for holding the data fixed and iteratively improving the model until the desired results are achieved. In the nascent data-centric approach to AI, “consistency of data is paramount,” says Ng. To get to the right results, you hold the model or code fixed and iteratively improve the quality of the data.
“The model and the code for many applications are basically a solved problem,” says Ng.
“A data-centric approach,” says Ng, “allows people in manufacturing, hospitals, farms, to customize the data, making it more feasible for someone without technical training in AI to feed it into an open-source model.” That will help uncover many new opportunities for AI to make an impact in traditional environments with small data sets and no AI expertise. “What I see across the world is lots of these 1 to 5 million dollars projects that aren’t been worked on,” says Ng.

I think Andrew Ng is missing the big picture. Improving the data with human curation is not how you deal with the fragility of artificial NNs. That is a shortsighted band-aid fix. Human curated data does not scale well and is prone to selection bias. Plus it’s very time consuming and expensive to collect, so companies which collect it almost always make it proprietary instead of open source. When Yahoo’s original human curated hierarchical list was failing to scale-up, I’m sure some people blamed faulty data. But improving the data did not fix all the problems or even the worst problems. What did fix them was to overhaul the algorithm itself, from a human curated hierarchical list to a modern search engine.

Contrary to what Ng asserts, I don’t think models are “a solved problem”.

A huge problem with artificial NNs is that they are agnostic about a causal model and that makes them fragile; i.e.,  small quirks and imperfections in the data can cause the behavior of an artificial NN to change dramatically. Nature encountered this problem at the dawn of life, and found a solution to it. The human brain clearly uses causal models. https://qbnets.wordpress.com/2021/05/09/right-brain-dag-modeling-left-brain-curve-fitting/

Next Page »

Blog at WordPress.com.

%d bloggers like this: