Introducing Gemma: Google's New Open-Source LLM Model

Hello Learners…

Welcome to the blog…

Introduction
Introducing Gemma: Google’s New Open-Source LLM Model
Fine Tuning Gemma LLM
Access Gemma Model With HuggingFace
- Running the Gemma model on a CPU
- Running the Gemma model on a single / multi GPU
Summary
References

Introduction

In this post we Introducing Gemma: Google’s New Open-Source LLM Model. It is the new LLM model by Google and it is an Open Source.

Gemma is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models.

https://ai.google.dev/gemma

Introducing Gemma: Google’s New Open-Source LLM Model

Gemma is developed by Google DeepMind and other teams across Google, Gemma is inspired by Gemini, and the name reflects the Latin gemma, meaning “precious stone.”

Accompanying their model weights, they are also releasing tools to support developer innovation, foster collaboration, and guide responsible use of Gemma models.

Fine Tuning Gemma LLM

Large Language Models (LLMs) like Gemma have been shown to be effective at a variety of NLP tasks. An LLM is first pre-trained on a large corpus of text in a self-supervised fashion. Pre-training helps LLMs learn general-purpose knowledge, such as statistical relationships between words.

You can fine-tune an LLM with domain-specific data to execute downstream tasks like sentiment analysis.

LLMs are extremely large in size (parameters in the order of billions).

Most applications don’t necessitate full fine-tuning, as the fine-tuning datasets are generally much smaller than the pre-training datasets.

Low Rank Adaptation (LoRA){:.external} is a fine-tuning technique which greatly reduces the number of trainable parameters for downstream tasks by freezing the weights of the model and inserting a smaller number of new weights into the model. This makes training with LoRA much faster and more memory-efficient, and produces smaller model weights (a few hundred MBs), all while maintaining the quality of the model outputs.

https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lora_tuning.ipynb?utm_source=agd&utm_medium=referral&utm_campaign=open-in-colab&utm_content=

Access Gemma Model With HuggingFace

https://huggingface.co/google/gemma-7b-it

Running the Gemma model on a CPU

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it")

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Running the Gemma model on a single / multi GPU

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto")

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Gemma model on a GPU using different precisions

Using torch.float16

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto", torch_dtype=torch.float16)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Using torch.bfloat16

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", device_map="auto", torch_dtype=torch.bfloat16)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

Summary

https://galaxyofai.com/tag/llms/

References

https://ai.google.dev/gemma

1 thought on “Introducing Gemma: Google’s New Open-Source LLM Model”

puravive reviews

February 27, 2024 at 7:36 pm

This stage is fabulous. The magnificent information uncovers the publisher’s excitement. I’m shocked and anticipate additional such mind blowing substance.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Introducing Gemma: Google’s New Open-Source LLM Model

Table Of Contents

Introduction

Introducing Gemma: Google’s New Open-Source LLM Model

Fine Tuning Gemma LLM

Access Gemma Model With HuggingFace

Running the Gemma model on a CPU

Running the Gemma model on a single / multi GPU

Gemma model on a GPU using different precisions

Summary

References

1 thought on “Introducing Gemma: Google’s New Open-Source LLM Model”

Leave a Comment Cancel reply