Difference between fit() and transform() method:

Hi Everyone, Today we’ll be talking in Brief about Regression in Machine Learning most Common Doubts.

Difference between fit() and transform() function:

So this is the most common doubt arising among people, Let me make this understand to you with the help of an example.

While Data preprocessing in Machine learning we sometimes need to scale the data to get the more accurate results.

Scaling can be done in 2 ways either we go with Standardisation and Normalisation (Below are the formulas for them)

Now while fitting the data in accordance to any of this method i.e either Standardisation or Normalisation (Below are the Steps for Scaling in accordance with Standardisation )

In this now the most common doubt that arises among people is that , Why we have applied fit_transform method on X_train and just fit method on X_test. Its a good doubt and should arise in your mind if you are a good student .

Now Coming to answer of your questions

1. fit() Method — So what basically fit() method does is that it calculates the values of the mean and standard deviation which we require in order to calculate the xstand as mentioned in the above formula (MARK MY WORDS !!!! ) — It only Computes the value of each of the features and trains the scaler sc1 for those values

2. transform() Method — So now coming to the transform method what is does is bascially it applies the standardisation formula to all those values tranforms matrix of feature X_train having those Standardised values HOPE YOU GOT MY POINT

So in brief fit method computes the Values and Transform method applies those values to get all the values in same scale .


Then why we applied the fit_transform in X_Train(The traning set) and only fit method in (X_test). The answer to your question would be because we want to train the sc1 on the Training set we applied it on the X_train , but we do not need fit method in X_test is because we want to use the same scalar sc1 to scale the features of the test set too. So we basically don;t need to fit the scalar again according to the values of sc1 ( I mean we could do that but that definitely won’t make any sense as we want the results in the same scale )

To dig deeper as we know the X_test will be the input to the predict function and as we have trained the model on X_train so we also want the result in the scale of X_train so we just tranform it on the same scalar…