End To End Machine Learning Project Using Python

Hello Learners…

Welcome to my blog…

Table Of Contents

  • Introduction
  • End To End Machine Learning Project Using Python
    • Create a GitHub Repository
    • Clone The GitHub Repository To Local System
    • Create Python Virtual Environment For The Project
    • Add A .gitignore file
    • GitHub Sync With Our Local System
    • Add setup.py and requirements.txt
    • Defining The Structure Of The Machine Learning Project
      • Components
        • Data Ingestion
        • Data Transformation
        • Model Training
      • Pipeline
        • Training Pipeline
        • Predict Pipeline
  • Summary
  • References

Introduction

In this post, we create an End To End Machine Learning Project Using Python. this is the structure of a machine learning project which we can use in any machine learning project.

End To End Machine Learning Project Using Python

Requirements

  • Knowledge Of Python Programming
  • Modular Coding
    • Modular programming is a software design technique that emphasizes separating the functionality of a program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality.
  • Machine Learning Algorithms

Now we are creating an end-to-end machine learning project. here we do all the things required to create a machine-learning project from scratch. we divide this end-to-end machine learning projects into pats and implement them using the Python programming language.

So lets Start…

Create a GitHub Repository

Here we write code line by line using Python so to maintain our code we use GitHub.when we are working in a team then every member of the team is working on the same project and collaborating at the same time, they change the code, committing the code of their models so that’s why first we have to create a GitHub Repository.

So Please Go to GitHub and create an account if you don’t have an account.

After that please log into that account.

we can see the below

Now we are going to create a new repository, click on Repositories

Now Click on New

Give the repository name click on Add a README file and create a repository

This is our repository

Clone The GitHub Repository To Local System

Now we clone our GitHub repository to our local system

Now open the terminal or command prompt in the directory where you want to clone this repository and run the below command

git clone https://github.com/galaxyofai/ml_project_structure.git

and hit the enter

The Repository is cloned successfully in our local system

Now it is ready to use for implementation.

Create Python Virtual Environment For The Project

It is good practice to create a Python virtual environment for every project. All the packages or libraries we install for this project will install into that environment.

So open the cloned folder in your code editor whichever you prefer to use for your comfortablility.

Here we are going to use VSCode.Here we can see that.

Now open the terminal in the VSCode.

This is the terminal as we can see below, and the path is in our cloned GitHub repository folder.

To create a Python virtual environment run the below command

python3 -m venv venv

It will create a folder name venv.

Now activate the Python virtual environment (Linux).

source venv/bin/activate

Please refer to this if you find any difficulties in creating Python Virtual environment

Add A .gitignore File

Now we add a .gitignore file

.gitignore file used to ignore the files that we don’t want to push into the GitHub repository.

Now we do some changes in the README.md file

Here we add venv in .gitignore to ignore that directory to push changes to GitHub.

GitHub Sync With Our Local System

Now we are going to push our changes to the GitHub repository for that first we run the below command.

git add .

When we run the above command for the first time then we have to set global to our directory to that run the below command.

git config --global --add safe.directory /mnt/B0DC86C9DC8688F4/Galaxy_Of_AI/github/ml_project_structure

NOTE: put your directory path

Now run the below command again

git add .

we can see the changes using the below command.

git status

Now commit the changes, ‘-m’ used for adding the message.

git commit -m "add .gitignore and change in readme file"

We are ready to push our changes to the GitHub repository, and for that run the below command.

git push

we can see the changes in our GitHub account.

Here we can see the .gitignore and README.md files.

Python virtual environment and GitHub Repository are ready to use, from now whenever we implement one task we push our code into our GitHub.

Add setup.py and requirements.txt

Now we are going to add the setup.py file and requirements.txt file.

All the required packages or libraries are coming in this requirements.txt.

This setup.py creates our machine learning applications as a package, so anyone can install this package in their project and use it. also, we can deploy this full package on PyPI.

You can refer to this to learn more about the setup.py file.

Now we are writing a code in the setup.py file.

from setuptools import find_packages,setup



setup(

    name="mlproject",
    version='0.0.1',
    author="galaxy_of_ai",
    author_email="contact@galaxyofai.com",
    packages=find_packages(),
    install_requires=['pandas','numpy','seaborn']


)

find_packages() finds the folder which contains the __init__.py file and treats it as a package.

Now we create a folder named src and create __init__.py file in that folder. our entire project code we write into that folder.

here we have to add some required packages in our setup.py, but as we know that there are lots of packages that we need for the project. so, that is not a good way to add all the packages in the setup.py file.

now we create a function in setup.py to use the requirements.txt file and install all the packages from requirements.txt.

when we are installing all the packages using requirements.txt that time setup.py file will be also run so for that we have to add ‘-e .’ in our requirements.txt file that will automatically trigger our setup.py file.

Updated setup.py file.

from setuptools import find_packages,setup


HYPEN_E_DOT='-e .'
def get_requirements(file_path:str)-> list:
    """
        This is function will return the list of requirements
    
    """
    requirements=[]
    with open(file_path) as file_obj:
        requirements=file_obj.readlines()
        requirements=[req.replace("\n","") for req in requirements ]
    print(requirements)
    if HYPEN_E_DOT in requirements:
        requirements.remove(HYPEN_E_DOT)
        
    return requirements



setup(

    name="mlproject",
    version='0.0.1',
    author="galaxy_of_ai",
    author_email="contact@galaxyofai.com",
    packages=find_packages(),
    install_requires=get_requirements('requirements.txt')


)

Now run the below command.

pip install -r requirements.txt

All the packages are installed

It will generate a folder named mlproject.egg.info in which we can see all the details about our project.

Here, We can see the details below, you can check your project details based on what you provide.

Upload All Changes To The GitHub Repository

Now we add all changes to our GitHub Repository. For that run the below commands.

git add .
git status
git commit -m "mlproject info added"

End To End Machine Learning Project Sructure Using Python

Now we push the code by using the below command.

git  push
End To End Machine Learning Project Using Python

It will upload all the local changes to the GitHub repository that we have made.

Defining The Structure Of The Machine Learning Project

Now we add some folders into to src folder.

End To End Machine Learning Project Using Python

This is our current folder structure.

Components

First, we create a folder named components. Components are all the modules that we are going to create for different processes of our project.

Data Ingestion

The first component we create is the data_ingestion.py in which we read data from different sources like databases, CSV files, or AWS S3 bucket.

In data_ingestion.py we also write the code that divides the data into a train set and test set for the training purpose.

Data Transformation

After reading the data we may probably do data validation, and data transformation for that we create a file named data_transformation.py in which we write code related to how we can change the categorical features into numerical features, how to handle one hot encoding, how to handle label encoding.

Model Training

After that we create model_trainer.py for the training of the models, in this file, we write the code about training using the different models, and the confusion matrix for the classification problem. find the R2 score for the Regression problems.

End To End Machine Learning Project Using Python

This is all the files in the components folder, and all the files are empty, we write code in that files as we go ahead.

Pipeline

Now we create another folder named pipeline in the src folder.

Training Pipeline

To create the training pipeline we create a file named train_pipeline.py and we write all the code related to training the models.

Predict Pipeline

Now we create another file named predict_pipeline.py in this we write the code related to the prediction of our models.

Other Important Files For The Machine Learning Project

  • Logger
    • we create logger.py for the logging of our project.
  • Exception Handling
    • We create exception.py for exception handling in our project.
  • For other Functionalities,
    • we create utils.py to add some other Python functions which need in the projects.
End To End Machine Learning Project Structure With Python

This is the directory and file structure of our machine learning project. Now we are going to start coding into the all files one by one.

Be With Us…

Continue Updating…

Summary

This is the basic Machine learning or Data science project structure that we can use with any datasets and implements a project.

You can download the full code from my GitHub,

Related Articles

References

Leave a Comment