Showing posts with label deep learning. Show all posts

Monday, February 20, 2023

Microsoft vs Google : Strategy wars

With the launch of ChatGPT and open declaration of war on Google from Microsoft CEO Satya Nadella, we are living in one of the most exciting duels in most recent tech history. Lets analyze relative positioning and odds of who will come out on top in this epic Satya vs Sundar. Behind this lies a fascinating tale of the careers of these 2 Indian American executives who rose up the ranks to head these behemoths in Mountainview and Redmond.

Declaration of war

Microsoft has been making steady progress in the Deep Learning space through its investment in Open AI and the release of Dalle-2 and ChatGPT. While that is regular part of innovation and other companies like META has also been making steady progress in this field, Microsoft went a step ahead with product integration with Bing and open declaration of war. Here is what Satya said :

Google is the 800 pound Gorilla in the room. This new Bing will make Google come out and want to show they can dance, and I want people to know that we made them dance

With that let us revisit how things go to here.

Google and its search monopoly

While Google is often credited for its technology, Google is able to retain its market share and monopoly in search due to its product strategy. The average user doesn't have much incentive to go and change the default search engine from settings. Google has successfully gated the entry points to search via Android OS for mobile, Chrome browser on desktops and by paying $20B to Apple for staying as the default search option on MacOS. No wonder Sundar Pichai (and not some engineer) became the CEO of Google because he was the Product Lead(read gatekeeper) of two of this products(read gates) : Chrome and Android. This virtually sealed Google's monopoly status on search.

Monopoly and culture

What happened after Google became the monopoly it is today is exactly what a monopoly would need to do to stay a monopoly : lower profits so that they are not perceived as a monopoly. What better way to do it while stiffling competition by raising costs for competition by hiring developers at premium prices thus increasing cost and reducing supply of engineers. While the strategy was sound and paid off in the last decade, the recent ad recession of 2022-23 has exposed the flaws. Overhiring engineers, paying exhorbitant RSUs, lack of any need to deliver anything at all, delusions of exceptionalism, leads to a level of entitlement and lack of self awareness in engineers unseen in a while in Silicon valley. There are 2 high level problems with this

[Financial Mismanagement]As pointed out by investors/hedge funds TCI and Altimeter

Rapid headcount growth has led to reckless empire building. Managers reporting to Managers reporting to Managers..... Bloated org structures, title inflation, redundant levels - basically investors in Wall Street paying for the Sushi bar in Mountainview
The median compensation at Alphabet was 67% higher than Microsoft and 153% higher than the 20 largest technology companies and there is no justification behind this enormous disparity

[Cultural Trainwreck] Google engineers lost the ability to ship because for the last 10 years they didnt really need to. As pointed out in Mice in a Maze Google has 4 cultural problems :

The way I see it, Google has four core cultural problems. They are all the natural consequences of having a money-printing machine called “Ads” that has kept growing relentlessly every year, hiding all other sins.
(1) no mission, (2) no urgency, (3) delusions of exceptionalism, (4) mismanagement.

Challenge from Microsoft - surprise ?

In the meantime Satya Nadella has been playing 4D chess. Nadella was the boss of Bing before he got elevated to CEO in Redmond. So Bing vs Google is close to his heart. While Sundar has been enjoying his Sushi in Mountainview financed by monopoly taxes, Nadella has been plotting Bing revival one step at a time. Key milestones being investing in Open AI, Integrating it to Azure and then Launching Bing+Edge+ChatGPT in a bid to reinvest search.

Microsoft CEO not only did the product announcements in Redmond, but also openly launched war on Google Search with its ChatGPT + Bing integration :

“There is such margin in search, which for us is incremental. For Google it’s not, they have to defend it all,” he added, referring to the competition against Google as “asymmetric”.

Microsoft says for every point of share gain in the search advertising market, it’s a $2 billion revenue opportunity.

There are several upsides of this strategic play from Microsoft.

Microsoft Strategic Upsides

Asymmetric battle : This is all for Google to defend and any incremental market share win for Microsoft has a huge revenue upside as Amy Hood (Microsoft CFO pointed out)
Microsoft doesn't need to gain any market share at all to make Google lose. If it can change customer behavior to expect Search results from 10 link clicks(legacy search) to some mix of legacy search and some mix of conversational AI through ChatGPT(10-20% of the queries), it will be a big win. Conversational AI queries wont be monetized and the change in the mix of the search queries means, Google would also have to serve the unmonetized queries through Bard in order to stay competitive. Even if Google maintains its market share, it will put further margin pressure on Google and thus exacerbate the financial mismanagement and the cultural trainwreck issues highlighted above. This point is very important. It is not a matter of which AI is better, what matters is how will the user behavior and expectation change with the new form of search. Any deviation will hurt Google.
Here is rough math to prove the point

Google search queries : 300k queries per second
Revenue : 160B in 2022, 1.6cents per query
Cost : Apple 20B, 24% services margin, roughly 1.06cents per query
So Google has 50 cents margin per query which can go to inference costs of an LLM
Deploying current ChatGPT into every search done by Google would require 512,820.51 A100 HGX servers with a total of 4,102,568 A100 GPUs. The total cost of these servers and networking exceeds $100 billion of Capex alone
Essentially 30B $GOOGL profit could evaporate overnight
Looks like Microsoft knows how to flip a monopoly if not beat it

Flipping Search monopoly is beneficial for Microsoft because it reduces competition for Azure as Google cannot funnel its monopoly riches to money losing Google Cloud investments any more.
What Google is facing is classic innovators dilemna

how large incumbent companies lose market share by listening to their customers and providing what appears to be the highest-value products, but new companies that serve low-value customers with poorly developed technology can improve that technology incrementally until it is good enough to quickly take market share from established business.

ChatGPT is doing free marketing for Azure AI Services which hosts ChatGPT thus increasing cloud adoption
Satya Nadella looks like a mastermind wartime CEO who looks like a peacetime CEO

Google Strategic Upside

Yes you read that right, Google has an upside here too. While Google bungled its latest Bard announcement and the picture looks bleak right now, the biggest upside is that it could get support that it is not a monopoly in its latest department of justice lawsuit due to this competition from Microsoft.
Google has investments in Cloud hardware and TPUs could get more investments in the future to compete with Nvidia GPUs. So essentially the battle of search could be won or lost on the hardware front which could lead to significant value capture and also change the winners and losers of search

The next 1 year will be an interesting battleground for these two companies in Tech and how the personal lives, successes, failures and tales of two Indian American CEOs influence how they carve out the tech future for their companies.

Sunday, June 7, 2020

Bias Variance Tradeoffs - Classic

More training examples fixes high variance but not high bias.
Fewer features fixes high variance but not high bias.
Additional features fixes high bias but not high variance.
The addition of polynomial and interaction features fixes high bias but not high variance.
When using gradient descent, decreasing lambda can fix high bias and increasing lambda can fix high variance (lambda is the regularization parameter).
When using neural networks, small neural networks are more prone to under-fitting and big neural networks are prone to over-fitting. Cross-validation of network size is a way to choose alternatives.

Error Metrics for Skewed Classes

It is sometimes difficult to tell whether a reduction in error is actually an improvement of the algorithm.

For example: In predicting a cancer diagnoses where 0.5% of the examples have cancer, we find our learning algorithm has a 1% error. However, if we were to simply classify every single example as a 0, then our error would reduce to 0.5% even though we did not improve the algorithm.

This usually happens with skewed classes; that is, when our class is very rare in the entire data set.

Or to say it another way, when we have lot more examples from one class than from the other class.

For this we can use Precision/Recall.

Predicted: 1, Actual: 1 --- True positive
Predicted: 0, Actual: 0 --- True negative
Predicted: 0, Actual, 1 --- False negative
Predicted: 1, Actual: 0 --- False positive

Precision: of all patients we predicted where y=1, what fraction actually has cancer?

True PositivesTotal number of predicted positives=True PositivesTrue Positives+False positives

Recall: Of all the patients that actually have cancer, what fraction did we correctly detect as having cancer?

\dfrac{\text{True Positives}}{\text{Total number of actual positives}}= \dfrac{\text{True Positives}}{\text{True Positives}+\text{False negatives}}

These two metrics give us a better sense of how our classifier is doing. We want both precision and recall to be high.

In the example at the beginning of the section, if we classify all patients as 0, then our recall will be

\dfrac{0}{0 + f} = 0

, so despite having a lower error percentage, we can quickly see it has worse recall.

Accuracy =

\frac {true positive + true negative} {total population}

Note 1: if an algorithm predicts only negatives like it does in one of exercises, the precision is not defined, it is impossible to divide by 0. F1 score will not be defined too.

Trading Off Precision and Recall

We might want a confident prediction of two classes using logistic regression. One way is to increase our threshold:

Predict 1 if: $h_\theta(x) \geq 0.7$
Predict 0 if: $h_\theta(x) < 0.7$

This way, we only predict cancer if the patient has a 70% chance.

Doing this, we will have higher precision but lower recall (refer to the definitions in the previous section).

In the opposite example, we can lower our threshold:

Predict 1 if: $h_\theta(x) \geq 0.3$
Predict 0 if: $h_\theta(x) < 0.3$

That way, we get a very safe prediction. This will cause higher recall but lower precision.

The greater the threshold, the greater the precision and the lower the recall.

The lower the threshold, the greater the recall and the lower the precision.

In order to turn these two metrics into one single number, we can take the F value.

One way is to take the average:

\dfrac{P+R}{2}

This does not work well. If we predict all y=0 then that will bring the average up despite having 0 recall. If we predict all examples as y=1, then the very high recall will bring up the average despite having 0 precision.

A better way is to compute the F Score (or F1 score):

\text{F Score} = 2\dfrac{PR}{P + R}

In order for the F Score to be large, both precision and recall must be large.

We want to train precision and recall on the cross validation set so as not to bias our test set.

References

1. Andrew Ng : course notes

Thursday, May 28, 2020

Attention Explained

Screenshot of Andrew Ng's explanation of attention model from the deeplearning.ai course

The problem with regular encoder decoder architectures arise when we have long sentences because RNNs dont do well in these scenarios. For eg : while translating a long sentence humans probably dont read the whole sentence and then translate it. The human mind probably reads parts of the sentence and then processes the translation for that part. This leads us to attention models. While translating a word, it weighs in the inputs to the word differently.