silicon valley stories: April 2020

Thursday, April 30, 2020

Book Summary : Antifragility by Nassim Nicolas Taleb

Antifragile by Nassim Nicholas Taleb

Things that can gain from disorder

Between Democles and Hydra

Story of the "Sword of Damocles" - with great fortune and power also comes great danger. You cannot rise and rule, without facing continuous danger. Someone will be working towards toppling you. Like the sword, danger will be silent, inexorable and discontinuous. It will fall abruptly after long periods of quiet , perhaps at the very moment, one has gotten used to it and forgotten about its existence. Black swans will be out there to get you as you now have much more to lose, a cost of success.
Hydra - a serpent like creature with numerous heads. Each time a head is cut off, two grow back up. Hydra represents antifragility.
apophatic - what cannot be explicitly said or directly be described in our current vocabulary

Overcompensation and Over-reaction everywhere

The excess energy released from over-reaction to setbacks is what innovates
Sophistication is born out of hunger
How to win a horse race : It is said that the best horses lose when they compete with the slower ones, and win against better rivals. Undercompensation from the absence of a stressor, inverse hormesis, absence of challenge, degrades the best of the best.
It is a well known trick that if you need to get something urgently done, give the task to the busiest person in office. Most humans manage to squander their free time as free time makes them dysfunctional, lazy and unmotivated. The busier they get, the more active they are at other tasks.
Mechanism of overcompensation makes us concentrate better in the modicum of background noise
Redundancy is ambiguous because it feels like a waste if nothing unusual happens. But something unusual happens usually.
A system that overcompensates is necessarily in overshooting mode, building extra capacity and strength in anticipation of the worst outcome and in response to information about the possibility of a hazard.
Lucretis problem - a fool who believes that the tallest mountain the world will be equal to the tallest one he has observed. Analysts take the worst historical recession, the worst war, the worst historical move in interest rates, or the worst point in unemployment as the exact estimate of the worst future outcome.
Fukushima nuclear reactor which experience catastrophic failure in 2011 during the tsunami was built to withstand the worst past historical earthquake. Alan greenspan during his apology to the congress said "It never happened before". Assuming worst harm is possible.
Books and ideas are antifragile and get a lot of nourishment from attacks
Some jobs and professions are fragile to reputational harm, something that in the age of the internet cannot be controlled - these jobs arent worth having. You do want to control your reputation, you wont be able to do it by controlling information flow. Focus on altering your exposure, put yourself in a situation to benefit from antifragility of information. - He demonstrates that the authors profession benefits from the antifragility of information.
With a few exceptions, those who dress outrageously are robust or even antifragile in reputation. Those who dress in suits are fragile to information about them.

The cat and the washing machine

Causal opacity : In complex systems, it is hard to see the arrow from cause to consequence, making much of the conventional method of analysis, in addition to standard logic inapplicable.

What kills me makes others stronger

Antifragility for one is fragility for someone else. Fail for others to succeed, one day you might get a thank you note.
The fragility of every startup is necessary for the economy to be antifragile, and thats what makes entrepreneurship work : fragility of the individual entrepreneurs and their necessarily high failure rate.
Individual stocks may be fragile and that is what makes index funds antifragile
There is tension between nature and individual units. Organisms need to die for nature to be antifragile - nature is opportunistic, ruthless and selfish and takes advantage of stressors, randomness, uncertainty and disorder.
Systems subject to randomness - and unpredictability build a mechanism beyond the robust to opportunistically reinvent themselves each generation, with a continuous change of population and species.
Evolution benefits from randomness in two ways : randomness in mutations and randomness in the environment - both act in similar ways to change the traits of next generations
Nature is antifragile upto a point - if a calamity kills life on earth, completely, the fittest will not survive.
Someone who has made several errors, though not the same error twice, is more reliable than someone who has not made any
For the economy to be antifragile and to undergo evolution, every single individual business must necessarily be fragile.
Economy as a collective wants them to not survive, rather to take a lot of imprudent risks themselves and be blinded by the odds. Their respective industries improve from failure to failure. The want local overconfidence and not global overconfidence. Their failure should not impact others.
Government bailout is a form of transferring fragility from the collective to the unfit.
Nietzsche's "what doesnt kill me makes me stronger" -> "what did not kill me did not make me stronger, but spared me because I am stronger than others; but it killed others and now the average population is stronger because the weak are gone" -> this is transfer of antifragility from the individual to the system
Nature wants the aggregate to survive, not every species
Every species wants the organisms to be fragile so that evolutionary selection can take place
Heroism and the respect it commands, is a form of compensation by society for those who take risks for others.

Modernity and the denial of antifragility

Other recommended books by the same author

Wednesday, April 29, 2020

Timeline of 2008 - Great Financial Crisis

Investors are having a hard time trying to put the stock market and the economic performance of 2020 into perspective. Hence, I dug up the timeline of the 2008 financial crisis, with stock market performance at key points to help put things into perspective.

There were early signs that the housing industry had something brewing in 2007. Here is a rough timeline of events in 2007.

Feb 2007 : Homesales peaked
Feb 2007 : Greenspan warns of recession, but FED ignores it
March 2007 : The subprime lending industry is hammered and hedge funds and investment banks feel the pain
March 2007 : Stocks plummet on subprime woes, Dow plummets 242 points
June 2007 : Homesales forecast revised down
Aug 2007: Fed lowers rate to 4.75%
Oct 2007 : Homesales prices further tumbled in October
Nov 2007 : Treasury creates 75billion superfund to help institutions having exposure to Mortgage backed securities
Dec 2007 : Lowering the Fed rate wasnt enough. Banks were too afraid to lend. Fed announces TAF, no bank wanted to get caught with bad debt on their books
Dec 2007 : Foreclosure rates double, 75% more foreclosures in 2007.
Dec 2007 : By this time Fed lowered rates two more times and rate was 4.25%

All the above commentary, as scary as it looks, are all from 2007. There were clear warnings that the crisis was cooking, but no clear measures were taken.

S&P500 was at 1410 on Jan 12, 2007. The peak was at 1552 on July 13, 2007. It was at 1411 on Jan 4, 2008, effectively closing 2007 flat. It was at 1425 on May 16, 2008. The 2008 story follows to put the May 16, 2008 price into perspective.

January 2008 - Feb lowered rate to 3.5%, 57% more foreclosures than 12 months earlier, January existing home sales fell to its lowest level in 10 years(slipped 23.4%)
Feb 2008 : Economic stimulus act of 2008 worth $152 billion, Feb home sales fell 24%, foreclosures up 60%
Mar 2008 : Fed trying to buy time
Apr - June 2008 : Fed lowers rates and buys more toxic debt
July 2008 : IndyMac bank fails, talks of Fannie Mae and Freddy Mac bailout as these two hold more than half of 12 trillion US mortgages, Congress approves housing and recovery act
Sep 2008 : Fannie and Freddie are nationalized, Lehman brothers bankruptcy trigger global panic. This was the point when the world realized that an unregulated industry like investment banking couldnt function without government intervention, money market funds broke the buck, Fed insures money market accounts, Washington mutual bank goes bankrupt
Oct 2008 : Emergency economic stabilization act of 700 billion, 1.7 trillion commercial loans program, stock markets sell off despite co-ordinated central bank action
December 2008 : TARP, Big 3 bailout, zero interest rates

More detailed overview of the 2008 timeline available here

S&P500 was at 1292 on August 12, 2008, 1255 on Sep 19 and reached 899 on Oct 10, 2008. Market bottom was 756 on Mar 13, which was still 5 months away.

In February 2009, Congress came with another 787billion package called the American recovery and reinvestment act. The Dow dropped to its lowest in March and unemployment reached its peak 10% in Oct 2009. As you can see, by the time the worst unemployment numbers came in, stock market was already of the lows.

Friday, April 10, 2020

Personalization & Recommender Systems Papers - Deep Learning

Deep Learning Recommendation Model for personalization and recommendation systems - Facebook

This paper describes how to use both dense and sparse features
Open source pytorch and caffe2 implementations

Neural Collaborative Filtering - NCF - Google

Implementation of NCF

Wide and deep learning for recommender systems - Google
Deep Interest Evolution Network for Click through rate prediction - Alibaba
Semantic Product Search from Amazon
PyTorch-BigGraph - a large scale Graph Embedding System
Recommending what video to watch next - a multi task ranking system - Youtube
Sampling bias corrected neural modelling for large corpus item recommendations - Youtube
Wavenet - a generative model for raw audio - Google, Deepmind
Dynamic routing between capsules
Read about Facebook's novel CNN approach for language translation that achieves state-of-the-art accuracy at nine times the speed of RNN models.
Explainable Deep Learning - a field guide for the uninitiated

Thursday, April 9, 2020

Pytorch Kaggle Kernels

"I hear, and I forget. I see, and I remember. I do, and I understand." - Chinese proverb

This post contains a series of notebook from kaggle competitions that I encourage ML enthusiasts to read and implement. Like the Chinese proverb states above, implementing some of these kernels and getting hands dirty are key to understanding some of the key principles of neural networks.

Learning Computer Vision with PyTorch

LeNet-5

Andrew Ng explainer

AlexNet
VGG16
Resnet

Andrew Ng explainer

Network in networks
Inception Network
Transfer Learning
Object Detection
Object Localization
Landmark Detection
YOLO algorithm
Face Recognition
Face Verification
Style Transfer
Capsule Networks

I will populate this post with videos, notes and implementations of the above.

Wednesday, April 8, 2020

Step by step guide to become a Machine Learning Expert during Coronavirus Lockdown

This is a step by step guide to become a deep learning expert during the Coronavirus Lockdown. There is lot of doom and gloom outside and we are all stuck at home. I wish best of health to all my readers. On the positive side, we can use this time to gain expertise in any field of choice. The goal of this blog post is to show you how to gain in depth ML Expertise. I am also personally following this plan to brush myself up. My stretch goal is also to explore other areas, especially biotechnology and how AI could enable the development of the field for the next 10 years. However, details about that would be the content of another post. This post will primarily focus on Deep Learning and will cater to anyone who has some experience in programming and wants to get into the field.

Before we get into the learning plan, here is a motivation video of how the first 20 hours of learning goes

Step 0 : Learn the classical ML course by Andrew Ng

This is the recommended course for anyone who wants to learn ML. This doesnt need you to know calculus and Andrew Ng's style will make the learning fun and intuitive

Step 1 : Learn the basics of deep learning from Andrew Ng

Take the Deeplearning.ai course. Here are the video lectures from youtube with course notes links. Classic Andrew Ng style, will help you build intuition and understanding of neural networks and various complex concepts.

Neural Networks and Deep Learning 👉April 7th
Improving deep neural networks : hyperparameter tuning, regularization and optimization 👉April 8th

Notes for this course available here

Step 2 : Learn PyTorch

I would recommend this after you have understood the Andrew Ng lectures. PyTorch was developed at Facebook and is a powerful tool for deep learning practitioners.

Step 3 : Learn from Jeremy Howard of fast ai

This is a very practical course for all engineers who want to get into deep learning.

Step 4 : Participate and practice in Kaggle

If you have reached here, you have gained great knowledge already. Time to put things into practice and help the world during this coronavirus pandemic. Take part in these kaggle competitions and many more

Step 5 : Miscellaneous reading list to supplement/continue learning during/post lockdown

Improving deep neural networks : hyperparameter tuning, regularization and optimization - Andrew Ng course

Bias - Variance

High Bias - doesnt do well on training set. Fixes are

Bigger network
Training Longer
More complicated NN architecture

High Variance - doesnt do well on test set, does well on training set. Performance doesnt generalize very well

More data
Data augmentation
Regularization

Classical bias/variance tradeoff doesnt exist

Getting a bigger network almost always reduces your bias without hurting your variance
Getting more data almost always reduces variance

Regularization fixes high variance

L2 norm - With a high lambda, a lot of the hidden units will get zeroed out making the NN behave more like simpler LR
Inverted Dropout

Inputs are randomly eliminated in dropout, so cant rely on one feature. Need to spread out the weights
Different keep_probabilities can be used for different layers. Lower keep_prob where there are more inputs will mean more regularization
Since (1-keep_prob) percent of the units are missing, the inversion step ensures that the expected values are still the same
Dropout was started in the CV field which has a lot of inputs. It doesnt mean that dropouts should be used everywhere blindly

Early stopping

Speed up training

Data normalization - if all your features are on a similar scale it will help your learning algo to converge faster
Vanishing/Exploding gradients - if all your features become too small, gradient descent will take a long time
Weight Initialization

Xavier Initialization
This could also be used as a hyperparameter that will help you train your NN much more quickly

Optimization Algorithms

They enable you to train your algorithms much faster

Mini-batch gradient descent

Gradient descent reads the entire training set and then processes it. What if m = 5mn or 50 mn
It would be much better if the processing could start earlier. Mini-batches = baby training sets having 1000 examples each
If mini batch size = m, we have batch gradient descent
If mini batch size = 1, we have stochastic gradient descent. This is very noisy

Exponentially weighted moving averages

Gradient Descent with momentum

Normal GD will take lot of steps and slowly oscillate towards the minimum
On the vertical axis, the learning rate to be slower as the positive and negative iterations will average out
On the horizontal axis, the learning rate to be faster as it gains momentum

RMS Prop and Adam generalizes well across several architectures. Most algos cant better gradient descent
RMS Prop
Adam Optimization Algorithms

It takes momentum and RMS Prop and putting them together

Learning rate decay

In the beginning iterations, you can take larger steps and as you approach the minima you can start taking smaller steps
Otherwise, your algorithm may not really converge and can keep wondering around the minimum

Hyperparameters

Listed in order of importance

Learning rate (Alpha)
Momentum (Beta)
Number of hidden units
mini-batch size
Number of layers
Learning rate decay
Beta1, Beta2, Epsilon - adam algorithm

Tuning Process

In classical ML, grid search was used to choose hyperparams. In DL, it is hard to know in advance which of the hyperparams are most effective for your problem. Hence you should choose the hyperparams in random and not use grid search. For example, if you have 2 hyperparams, you may train 25 models but you are just trying 5 different values of the most important hyper param if you use grid search
Coarse to fine sampling scheme
Use the learning rate on the log scale and then randomly sample from the log scale

silicon valley stories

Pages