Hello Learners…
Welcome to the blog…
Table Of Contents
- Introduction
- Introducing Gemma: Google’s New Open-Source LLM Model
- Fine Tuning Gemma LLM
- Access Gemma Model With HuggingFace
- Running the Gemma model on a CPU
- Running the Gemma model on a single / multi GPU
- Summary
- References
Introduction
In this post we Introducing Gemma: Google’s New Open-Source LLM Model. It is the new LLM model by Google and it is an Open Source.
Gemma is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models.
Introducing Gemma: Google’s New Open-Source LLM Model
Gemma is developed by Google DeepMind and other teams across Google, Gemma is inspired by Gemini, and the name reflects the Latin gemma, meaning “precious stone.”
Accompanying their model weights, they are also releasing tools to support developer innovation, foster collaboration, and guide responsible use of Gemma models.
Fine Tuning Gemma LLM
Large Language Models (LLMs) like Gemma have been shown to be effective at a variety of NLP tasks. An LLM is first pre-trained on a large corpus of text in a self-supervised fashion. Pre-training helps LLMs learn general-purpose knowledge, such as statistical relationships between words.
You can fine-tune an LLM with domain-specific data to execute downstream tasks like sentiment analysis.
LLMs are extremely large in size (parameters in the order of billions).
Most applications don’t necessitate full fine-tuning, as the fine-tuning datasets are generally much smaller than the pre-training datasets.
Low Rank Adaptation (LoRA){:.external} is a fine-tuning technique which greatly reduces the number of trainable parameters for downstream tasks by freezing the weights of the model and inserting a smaller number of new weights into the model. This makes training with LoRA much faster and more memory-efficient, and produces smaller model weights (a few hundred MBs), all while maintaining the quality of the model outputs.
Access Gemma Model With HuggingFace
Running the Gemma model on a CPU
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Running the Gemma model on a single / multi GPU
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Gemma model on a GPU using different precisions
- Using
torch.float16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto", torch_dtype=torch.float16)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
- Using
torch.bfloat16
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto", torch_dtype=torch.bfloat16)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
This stage is fabulous. The magnificent information uncovers the publisher’s excitement. I’m shocked and anticipate additional such mind blowing substance.