Pre-processing Steps

    1. Cleaning data (ipek)
    2. Filling the missing values
    3. Feature Engineering
    4. Converting categories to numbers

1. Cleaning Data

Check categorical variables: Item_Fat_Content, Item_Type, Outlet_Size, Outlet_Location_Type, Outlet_Type

Item_Visibility

In our EDA we observe that Item_Visibility had minimum value 0. This make no sense, lets consider it as missing value and impute with its mean.

2. Filling the missing values

Categorical Data: Mode

Outlet_Size: Fill missing values with the "mode" by Outlet_Type

Continuous Data: Mean

Item_Weight

3. Feature Engineering

4. Converting categories to numbers

Split the dataset into train and test and save as preprocessed

Building a model

    1. Load and check the dataset
    2. Creating training and validation set
    3. Models

1. Check the dataset

2. Creating training and validation set

In order to check how well the model will perform on unseen data, we'll be creating a small validation set out of this training set.

For simplification, we have used test data and validation data interchangeably. In practice, we do not have the actual labels of test data present, so we separate validation data from train data in order to evaluate our algorithm on data it has not seen before.

To center the data (make it have zero mean and unit standard error), you subtract the mean and then divide the result by the standard deviation:

You do that on the training set of data. But then you have to apply the same transformation to your testing set (e.g. in cross-validation), or to newly obtained examples before forecast. But you have to use the exact same two parameters μ and σ (values) that you used for centering the training set.

Hence, every sklearn's transform's fit() just calculates the parameters (e.g. μ and σ in case of StandardScaler) and saves them as an internal object's state. Afterwards, you can call its transform() method to apply the transformation to any particular set of examples.

fit_transform() joins these two steps and is used for the initial fitting of parameters on the training set x, while also returning the transformed x′. Internally, the transformer object just calls first fit() and then transform() on the same data.

Models

NN Model with Keras

Since Keras uses TensorFlow in the backend, we also check TensorFlow's version.

Tuning NN

4. Compiling the model (defining loss function, optimizer)

5. Training the model

6. Evaluating model performance on validation set

Comparing model performances

Run selected model