Wednesday, June 24, 2020

The goose that laid Golden Eggs - H1B ban

All of us have read Aesop's fables and the popular one : The goose that laid golden eggs. You dont want to kill that Goose, do you? Well it would be extremely short sighted to.

Silicon valley and tech stocks have been the Golden Goose for S&P500 and the American economy. America won in the last 3 decades because it won in technology. American stocks outperformed international equities because of the dominance of the top tech companies. The list of illustrious H1B alumni is huge : Sundar Pichai(CEO, Google), Satya Nadela(CEO, Microsoft), Elon Musk(Founder and CEO, Tesla) and many more. With open arms to immigrants, America has always been the top destination for top talent worldwide. The H1B bans sends a wrong message to future talent, also it starts the precedence of sending back top trained professionals from Silicon Valley to their home countries like India and China. One country's loss is another country's gain.

I enjoyed my 10 years in the valley and happy to take my learnings and capital elsewhere. I will continue to build. Thank you for the business. Good bless you, America.




Apple's top 5 cities

Ever wonder what Apple's most important markets are?

Look no further than the first five cities Apple decided to launch cycling support for in Apple Maps:
New York City, Shanghai, Los Angeles, San Francisco, and Beijing.

Noticed how global powerhouse cities like London and Tokyo didn't make the cut. Neither did any city in The Netherlands—the biking capital of the world which has more bikes than people.

You don't have to look at any financial statements to know that Apple's revenue from China is second only to America. Products speak for themselves.

Monday, June 22, 2020

H1B's who created jobs

1. Sundar Pichai - CEO of Google

2. Satya Nadela - CEO of Microsoft
3. Andrew Ng - Co-founder of Coursera

4. Jyoti Bansal - Founder and CEO, AppDynamics

5. Clement Delangue - Founder and CEO, Huggingface



Saturday, June 20, 2020

Amazon Principle Engineer Review

In Amazon, if a feature spans multiple systems and teams; it usually requires a review from a Principal Engineer (PE review).

In one of the internal talks, a Principal Engineer (PE) shared a simple technique they employ to drive the PE review. It was to repeatedly ask "why" like a five year old.

For example the conversation in a PE review can go something like this:

Engineer: We're creating a system to duplicate an item in different marketplaces.
PE: Why?
Engineer: To allow customers to buy the same item in different marketplaces.
PE: Why does it have to be duplicated for that to happen?
Engineer: Because each marketplace has a separate catalog
PE: Why is it separate?
And so on...

This technique forces the engineers to dig deeper into their own design and question the foundations on which it is built.

It is possible we're treating the symptoms of an issue instead of addressing the root cause and by repeatedly asking "why", we may expose some assumptions we've made or issues we did not consider in our initial design.

Good thing about this technique is that you don't have to be a Principal Engineer to use it. Even if we're fresh out of college, we can ask "why" in a design meeting. If it doesn't uncover anything new, it will at least help us understand the system deeply.

“Millions saw the apple fall, Newton was the only one who asked why?” - Bernard M. Baruch

Sunday, June 7, 2020

Bias Variance Tradeoffs - Classic

  • More training examples fixes high variance but not high bias.
  • Fewer features fixes high variance but not high bias.
  • Additional features fixes high bias but not high variance.
  • The addition of polynomial and interaction features fixes high bias but not high variance.
  • When using gradient descent, decreasing lambda can fix high bias and increasing lambda can fix high variance (lambda is the regularization parameter).
  • When using neural networks, small neural networks are more prone to under-fitting and big neural networks are prone to over-fitting. Cross-validation of network size is a way to choose alternatives.


Error Metrics for Skewed Classes

It is sometimes difficult to tell whether a reduction in error is actually an improvement of the algorithm.
  • For example: In predicting a cancer diagnoses where 0.5% of the examples have cancer, we find our learning algorithm has a 1% error. However, if we were to simply classify every single example as a 0, then our error would reduce to 0.5% even though we did not improve the algorithm.
This usually happens with skewed classes; that is, when our class is very rare in the entire data set.
Or to say it another way, when we have lot more examples from one class than from the other class.
For this we can use Precision/Recall.
  • Predicted: 1, Actual: 1 --- True positive
  • Predicted: 0, Actual: 0 --- True negative
  • Predicted: 0, Actual, 1 --- False negative
  • Predicted: 1, Actual: 0 --- False positive
Precision: of all patients we predicted where y=1, what fraction actually has cancer?
True PositivesTotal number of predicted positives=True PositivesTrue Positives+False positives
Recall: Of all the patients that actually have cancer, what fraction did we correctly detect as having cancer?
\dfrac{\text{True Positives}}{\text{Total number of actual positives}}= \dfrac{\text{True Positives}}{\text{True Positives}+\text{False negatives}}
These two metrics give us a better sense of how our classifier is doing. We want both precision and recall to be high.
In the example at the beginning of the section, if we classify all patients as 0, then our recall will be \dfrac{0}{0 + f} = 0, so despite having a lower error percentage, we can quickly see it has worse recall.
Accuracy = \frac {true positive + true negative} {total population}
Note 1: if an algorithm predicts only negatives like it does in one of exercises, the precision is not defined, it is impossible to divide by 0. F1 score will not be defined too.

Trading Off Precision and Recall

We might want a confident prediction of two classes using logistic regression. One way is to increase our threshold:
  • Predict 1 if: h_\theta(x) \geq 0.7
  • Predict 0 if: h_\theta(x) < 0.7
This way, we only predict cancer if the patient has a 70% chance.
Doing this, we will have higher precision but lower recall (refer to the definitions in the previous section).
In the opposite example, we can lower our threshold:
  • Predict 1 if: h_\theta(x) \geq 0.3
  • Predict 0 if: h_\theta(x) < 0.3
That way, we get a very safe prediction. This will cause higher recall but lower precision.
The greater the threshold, the greater the precision and the lower the recall.
The lower the threshold, the greater the recall and the lower the precision.
In order to turn these two metrics into one single number, we can take the F value.
One way is to take the average:
\dfrac{P+R}{2}
This does not work well. If we predict all y=0 then that will bring the average up despite having 0 recall. If we predict all examples as y=1, then the very high recall will bring up the average despite having 0 precision.
A better way is to compute the F Score (or F1 score):
\text{F Score} = 2\dfrac{PR}{P + R}
In order for the F Score to be large, both precision and recall must be large.
We want to train precision and recall on the cross validation set so as not to bias our test set.


References

Books I am reading