Thursday, April 30, 2020

Book Summary : Antifragility by Nassim Nicolas Taleb


Antifragile by Nassim Nicholas Taleb

Things that can gain from disorder

Between Democles and Hydra

  1. Story of the "Sword of Damocles" - with great fortune and power also comes great danger. You cannot rise and rule, without facing continuous danger. Someone will be working towards toppling you. Like the sword, danger will be silent, inexorable and discontinuous. It will fall abruptly after long periods of quiet , perhaps at the very moment, one has gotten used to it and forgotten about its existence. Black swans will be out there to get you as you now have much more to lose, a cost of success.
  2. Hydra - a serpent like creature with numerous heads. Each time a head is cut off, two grow back up. Hydra represents antifragility. 
  3. apophatic - what cannot be explicitly said or directly be described in our current vocabulary

Overcompensation and Over-reaction everywhere

  1. The excess energy released from over-reaction to setbacks is what innovates
  2. Sophistication is born out of hunger
  3. How to win a horse race : It is said that the best horses lose when they compete with the slower ones, and win against better rivals. Undercompensation from the absence of a stressor, inverse hormesis, absence of challenge, degrades the best of the best.
  4. It is a well known trick that if you need to get something urgently done, give the task to the busiest person in office. Most humans manage to squander their free time as free time makes them dysfunctional, lazy and unmotivated. The busier they get, the more active they are at other tasks.
  5. Mechanism of overcompensation makes us concentrate better in the modicum of background noise
  6. Redundancy is ambiguous because it feels like a waste if nothing unusual happens. But something unusual happens usually. 
  7. A system that overcompensates is necessarily in overshooting mode, building extra capacity and strength in anticipation of the worst outcome and in response to information about the possibility of a hazard.
  8. Lucretis problem - a fool who believes that the tallest mountain the world will be equal to the tallest one he has observed. Analysts take the worst historical recession, the worst war, the worst historical move in interest rates, or the worst point in unemployment as the exact estimate of the worst future outcome. 
  9. Fukushima nuclear reactor which experience catastrophic failure in 2011 during the tsunami was built to withstand the worst past historical earthquake. Alan greenspan during his apology to the congress said "It never happened before". Assuming worst harm is possible. 
  10. Books and ideas are antifragile and get a lot of nourishment from attacks
  11. Some jobs and professions are fragile to reputational harm, something that in the age of the internet cannot be controlled - these jobs arent worth having. You do want to control your reputation, you wont be able to do it by controlling information flow. Focus on altering your exposure, put yourself in a situation to benefit from antifragility of information. - He demonstrates that the authors profession benefits from the antifragility of information. 
  12. With a few exceptions, those who dress outrageously are robust or even antifragile in reputation. Those who dress in suits are fragile to information about them. 

The cat and the washing machine

  1. Causal opacity : In complex systems, it is hard to see the arrow from cause to consequence, making much of the conventional method of analysis, in addition to standard logic inapplicable. 

What kills me makes others stronger

  1. Antifragility for one is fragility for someone else. Fail for others to succeed, one day you might get a thank you note.
  2. The fragility of every startup is necessary for the economy to be antifragile, and thats what makes entrepreneurship work : fragility of the individual entrepreneurs and their necessarily high failure rate. 
  3. Individual stocks may be fragile and that is what makes index funds antifragile
  4. There is tension between nature and individual units. Organisms need to die for nature to be antifragile - nature is opportunistic, ruthless and selfish and takes advantage of stressors, randomness, uncertainty and disorder. 
  5. Systems subject to randomness - and unpredictability build a mechanism beyond the robust to opportunistically reinvent themselves each generation, with a continuous change of population and species. 
  6. Evolution benefits from randomness in two ways : randomness in mutations and randomness in the environment - both act in similar ways to change the traits of next generations
  7. Nature is antifragile upto a point - if a calamity kills life on earth, completely, the fittest will not survive. 
  8. Someone who has made several errors, though not the same error twice, is more reliable than someone who has not made any
  9. For the economy to be antifragile and to undergo evolution, every single individual business must necessarily be fragile. 
  10. Economy as a collective wants them to not survive, rather to take a lot of imprudent risks themselves and be blinded by the odds. Their respective industries improve from failure to failure. The want local overconfidence and not global overconfidence. Their failure should not impact others. 
  11. Government bailout is a form of transferring fragility from the collective to the unfit.
  12. Nietzsche's "what doesnt kill me makes me stronger" -> "what did not kill me did not make me stronger, but spared me because I am stronger than others; but it killed others and now the average population is stronger because the weak are gone" -> this is transfer of antifragility from the individual to the system
  13. Nature wants the aggregate to survive, not every species
  14. Every species wants the organisms to be fragile so that evolutionary selection can take place
  15. Heroism and the respect it commands, is a form of compensation by society for those who take risks for others. 

Modernity and the denial of antifragility 




Other recommended books by the same author

Wednesday, April 29, 2020

Timeline of 2008 - Great Financial Crisis

Investors are having a hard time trying to put the stock market and the economic performance of 2020 into perspective. Hence, I dug up the timeline of the 2008 financial crisis, with stock market performance at key points to help put things into perspective.

There were early signs that the housing industry had something brewing in 2007. Here is a rough timeline of events in 2007.
All the above commentary, as scary as it looks, are all from 2007. There were clear warnings that the crisis was cooking, but no clear measures were taken. 

S&P500 was at 1410 on Jan 12, 2007. The peak was at 1552 on July 13, 2007. It was at 1411 on Jan 4, 2008, effectively closing 2007 flat. It was at 1425 on May 16, 2008. The 2008 story follows to put the May 16, 2008 price into perspective. 


More detailed overview of the 2008 timeline available here

S&P500 was at 1292 on August 12, 2008, 1255 on Sep 19 and reached 899 on Oct 10, 2008. Market bottom was 756 on Mar 13, which was still 5 months away.

In February 2009, Congress came with another 787billion package called the American recovery and reinvestment act. The Dow dropped to its lowest in March and unemployment reached its peak 10% in Oct 2009. As you can see, by the time the worst unemployment numbers came in, stock market was already of the lows.



Thursday, April 9, 2020

Pytorch Kaggle Kernels

"I hear, and I forget. I see, and I remember. I do, and I understand." - Chinese proverb

This post contains a series of notebook from kaggle competitions that I encourage ML enthusiasts to read and implement. Like the Chinese proverb states above, implementing some of these kernels and getting hands dirty are key to understanding some of the key principles of neural networks.

Learning Computer Vision with PyTorch


I will populate this post with videos, notes and implementations of the above. 

Wednesday, April 8, 2020

Step by step guide to become a Machine Learning Expert during Coronavirus Lockdown

This is a step by step guide to become a deep learning expert during the Coronavirus Lockdown. There is lot of doom and gloom outside and we are all stuck at home. I wish best of health to all my readers. On the positive side, we can use this time to gain expertise in any field of choice. The goal of this blog post is to show you how to gain in depth ML Expertise. I am also personally following this plan to brush myself up. My stretch goal is also to explore other areas, especially biotechnology and how AI could enable the development of the field for the next 10 years. However, details about that would be the content of another post. This post will primarily focus on Deep Learning and will cater to anyone who has some experience in programming and wants to get into the field.

Before we get into the learning plan, here is a motivation video of how the first 20 hours of learning goes

Step 0 : Learn the classical ML course by Andrew Ng

This is the recommended course for anyone who wants to learn ML. This doesnt need you to know calculus and Andrew Ng's style will make the learning fun and intuitive

Step 1 : Learn the basics of deep learning from Andrew Ng

Take the Deeplearning.ai course. Here are the video lectures from youtube with course notes links. Classic Andrew Ng style, will help you build intuition and understanding of neural networks and various complex concepts.
  1. Neural Networks and Deep Learning ðŸ‘‰April 7th
  2. Improving deep neural networks : hyperparameter tuning, regularization and optimization ðŸ‘‰April 8th
  3. Structuring Machine Learning Projects
  4. Convolution Neural Networks ðŸ‘‰April 9th
  5. Sequence Models  ðŸ‘‰April 10-13th

Step 2 : Learn PyTorch

I would recommend this after you have understood the Andrew Ng lectures. PyTorch was developed at Facebook and is a powerful tool for deep learning practitioners. 

Step 3 : Learn from Jeremy Howard of fast ai

This is a very practical course for all engineers who want to get into deep learning. 

Step 4 : Participate and practice in Kaggle

If you have reached here, you have gained great knowledge already. Time to put things into practice and help the world during this coronavirus pandemic. Take part in these kaggle competitions and many more

Step 5 : Miscellaneous reading list to supplement/continue learning during/post lockdown

Improving deep neural networks : hyperparameter tuning, regularization and optimization - Andrew Ng course

Bias - Variance

      • High Bias - doesnt do well on training set. Fixes are
        • Bigger network
        • Training Longer
        • More complicated NN architecture
      • High Variance - doesnt do well on test set, does well on training set. Performance doesnt generalize very well
        • More data 
        • Data augmentation
        • Regularization
      • Classical bias/variance tradeoff doesnt exist
        • Getting a bigger network almost always reduces your bias without hurting your variance
        • Getting more data almost always reduces variance

Regularization fixes high variance

      • L2 norm - With a high lambda, a lot of the hidden units will get zeroed out making the NN behave more like simpler LR
      • Inverted Dropout
        • Inputs are randomly eliminated in dropout, so cant rely on one feature. Need to spread out the weights
        • Different keep_probabilities can be used for different layers. Lower keep_prob where there are more inputs will mean more regularization
        • Since (1-keep_prob) percent of the units are missing, the inversion step ensures that the expected values are still the same
        • Dropout was started in the CV field which has a lot of inputs. It doesnt mean that dropouts should be used everywhere blindly
      • Early stopping

Speed up training

      • Data normalization - if all your features are on a similar scale it will help your learning algo to converge faster
      • Vanishing/Exploding gradients - if all your features become too small, gradient descent will take a long time
      • Weight Initialization 
        • Xavier Initialization
        • This could also be used as a hyperparameter that will help you train your NN much more quickly 

Optimization Algorithms

They enable you to train your algorithms much faster
  • Mini-batch gradient descent
    • Gradient descent reads the entire training set and then processes it. What if m = 5mn or 50 mn
    • It would be much better if the processing could start earlier. Mini-batches = baby training sets having 1000 examples each
    • If mini batch size = m, we have batch gradient descent
    • If mini batch size = 1, we have stochastic gradient descent. This is very noisy
  • Exponentially weighted moving averages
    • Gradient Descent with momentum
      • Normal GD will take lot of steps and slowly oscillate towards the minimum
      • On the vertical axis, the learning rate to be slower as the positive and negative iterations will average out
      • On the horizontal axis, the learning rate to be faster as it gains momentum
    • RMS Prop and Adam generalizes well across several architectures. Most algos cant better gradient descent
    • RMS Prop
    • Adam Optimization Algorithms
      • It takes momentum and RMS Prop and putting them together
    • Learning rate decay
      • In the beginning iterations, you can take larger steps and as you approach the minima you can start taking smaller steps
      • Otherwise, your algorithm may not really converge and can keep wondering around the minimum

Hyperparameters

Listed in order of importance
  • Learning rate (Alpha)
  • Momentum (Beta)
  • Number of hidden units
  • mini-batch size
  • Number of layers
  • Learning rate decay
  • Beta1, Beta2, Epsilon - adam algorithm

Tuning Process

  • In classical ML, grid search was used to choose hyperparams. In DL, it is hard to know in advance which of the hyperparams are most effective for your problem. Hence you should choose the hyperparams in random and not use grid search. For example, if you have 2 hyperparams, you may train 25 models but you are just trying 5 different values of the most important hyper param if you use grid search
  • Coarse to fine sampling scheme
  • Use the learning rate on the log scale and then randomly sample from the log scale

Books I am reading