Hello Learners…
Welcome to my blog…
Table Of Contents
- Introduction
- End To End Machine Learning Project Using Python
- Create a GitHub Repository
- Clone The GitHub Repository To Local System
- Create Python Virtual Environment For The Project
- Add A .gitignore file
- GitHub Sync With Our Local System
- Add setup.py and requirements.txt
- Defining The Structure Of The Machine Learning Project
- Components
- Data Ingestion
- Data Transformation
- Model Training
- Pipeline
- Training Pipeline
- Predict Pipeline
- Components
- Summary
- References
Introduction
In this post, we create an End To End Machine Learning Project Using Python. this is the structure of a machine learning project which we can use in any machine learning project.
End To End Machine Learning Project Using Python
Requirements
- Knowledge Of Python Programming
- Modular Coding
- Modular programming is a software design technique that emphasizes separating the functionality of a program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of the desired functionality.
- Machine Learning Algorithms
Now we are creating an end-to-end machine learning project. here we do all the things required to create a machine-learning project from scratch. we divide this end-to-end machine learning projects into pats and implement them using the Python programming language.
So lets Start…
Create a GitHub Repository
Here we write code line by line using Python so to maintain our code we use GitHub.when we are working in a team then every member of the team is working on the same project and collaborating at the same time, they change the code, committing the code of their models so that’s why first we have to create a GitHub Repository.
So Please Go to GitHub and create an account if you don’t have an account.
After that please log into that account.
we can see the below
Now we are going to create a new repository, click on Repositories
Now Click on New
Give the repository name click on Add a README file and create a repository
This is our repository
Clone The GitHub Repository To Local System
Now we clone our GitHub repository to our local system
Now open the terminal or command prompt in the directory where you want to clone this repository and run the below command
git clone https://github.com/galaxyofai/ml_project_structure.git
and hit the enter
The Repository is cloned successfully in our local system
Now it is ready to use for implementation.
Create Python Virtual Environment For The Project
It is good practice to create a Python virtual environment for every project. All the packages or libraries we install for this project will install into that environment.
So open the cloned folder in your code editor whichever you prefer to use for your comfortablility.
Here we are going to use VSCode.Here we can see that.
Now open the terminal in the VSCode.
This is the terminal as we can see below, and the path is in our cloned GitHub repository folder.
To create a Python virtual environment run the below command
python3 -m venv venv
It will create a folder name venv.
Now activate the Python virtual environment (Linux).
source venv/bin/activate
Please refer to this if you find any difficulties in creating Python Virtual environment
Add A .gitignore File
Now we add a .gitignore file
.gitignore file used to ignore the files that we don’t want to push into the GitHub repository.
Now we do some changes in the README.md file
Here we add venv in .gitignore to ignore that directory to push changes to GitHub.
GitHub Sync With Our Local System
Now we are going to push our changes to the GitHub repository for that first we run the below command.
git add .
When we run the above command for the first time then we have to set global to our directory to that run the below command.
git config --global --add safe.directory /mnt/B0DC86C9DC8688F4/Galaxy_Of_AI/github/ml_project_structure
NOTE: put your directory path
Now run the below command again
git add .
we can see the changes using the below command.
git status
Now commit the changes, ‘-m’ used for adding the message.
git commit -m "add .gitignore and change in readme file"
We are ready to push our changes to the GitHub repository, and for that run the below command.
git push
we can see the changes in our GitHub account.
Here we can see the .gitignore and README.md files.
Python virtual environment and GitHub Repository are ready to use, from now whenever we implement one task we push our code into our GitHub.
Add setup.py and requirements.txt
Now we are going to add the setup.py file and requirements.txt file.
All the required packages or libraries are coming in this requirements.txt.
This setup.py creates our machine learning applications as a package, so anyone can install this package in their project and use it. also, we can deploy this full package on PyPI.
You can refer to this to learn more about the setup.py file.
Now we are writing a code in the setup.py file.
from setuptools import find_packages,setup
setup(
name="mlproject",
version='0.0.1',
author="galaxy_of_ai",
author_email="contact@galaxyofai.com",
packages=find_packages(),
install_requires=['pandas','numpy','seaborn']
)
find_packages() finds the folder which contains the __init__.py file and treats it as a package.
Now we create a folder named src and create __init__.py file in that folder. our entire project code we write into that folder.
here we have to add some required packages in our setup.py, but as we know that there are lots of packages that we need for the project. so, that is not a good way to add all the packages in the setup.py file.
now we create a function in setup.py to use the requirements.txt file and install all the packages from requirements.txt.
when we are installing all the packages using requirements.txt that time setup.py file will be also run so for that we have to add ‘-e .’ in our requirements.txt file that will automatically trigger our setup.py file.
Updated setup.py file.
from setuptools import find_packages,setup
HYPEN_E_DOT='-e .'
def get_requirements(file_path:str)-> list:
"""
This is function will return the list of requirements
"""
requirements=[]
with open(file_path) as file_obj:
requirements=file_obj.readlines()
requirements=[req.replace("\n","") for req in requirements ]
print(requirements)
if HYPEN_E_DOT in requirements:
requirements.remove(HYPEN_E_DOT)
return requirements
setup(
name="mlproject",
version='0.0.1',
author="galaxy_of_ai",
author_email="contact@galaxyofai.com",
packages=find_packages(),
install_requires=get_requirements('requirements.txt')
)
Now run the below command.
pip install -r requirements.txt
All the packages are installed
It will generate a folder named mlproject.egg.info in which we can see all the details about our project.
Here, We can see the details below, you can check your project details based on what you provide.
Upload All Changes To The GitHub Repository
Now we add all changes to our GitHub Repository. For that run the below commands.
git add .
git status
git commit -m "mlproject info added"
Now we push the code by using the below command.
git push
It will upload all the local changes to the GitHub repository that we have made.
Defining The Structure Of The Machine Learning Project
Now we add some folders into to src folder.
This is our current folder structure.
Components
First, we create a folder named components. Components are all the modules that we are going to create for different processes of our project.
Data Ingestion
The first component we create is the data_ingestion.py in which we read data from different sources like databases, CSV files, or AWS S3 bucket.
In data_ingestion.py we also write the code that divides the data into a train set and test set for the training purpose.
Data Transformation
After reading the data we may probably do data validation, and data transformation for that we create a file named data_transformation.py in which we write code related to how we can change the categorical features into numerical features, how to handle one hot encoding, how to handle label encoding.
Model Training
After that we create model_trainer.py for the training of the models, in this file, we write the code about training using the different models, and the confusion matrix for the classification problem. find the R2 score for the Regression problems.
This is all the files in the components folder, and all the files are empty, we write code in that files as we go ahead.
Pipeline
Now we create another folder named pipeline in the src folder.
Training Pipeline
To create the training pipeline we create a file named train_pipeline.py and we write all the code related to training the models.
Predict Pipeline
Now we create another file named predict_pipeline.py in this we write the code related to the prediction of our models.
Other Important Files For The Machine Learning Project
- Logger
- we create logger.py for the logging of our project.
- Exception Handling
- We create exception.py for exception handling in our project.
- For other Functionalities,
- we create utils.py to add some other Python functions which need in the projects.
This is the directory and file structure of our machine learning project. Now we are going to start coding into the all files one by one.
Be With Us…
Continue Updating…
Summary
This is the basic Machine learning or Data science project structure that we can use with any datasets and implements a project.
You can download the full code from my GitHub,