Hello Learners…
Welcome to the blog…
Table Of Content
- Intoriduction
- DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence
- Model Downloads DeepSeek-Coder-V2
- Chat Website Of DeepSeek-Coder-V2
- API Platform For DeepSeek-Coder-V2
- How To Run DeepSeek-Coder-V2 Locally
- Code Completion In DeepSeek-Coder-V2
- Code Insertion In DeepSeek-Coder-V2
- Chat Completion with DeepSeek-Coder-V2
- Inference with vLLM (recommended) Of DeepSeek-Coder-V2
- Summary
- References
Introduction
In this post we discuss about DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence.
DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that rivals the performance of GPT4-Turbo in code-specific tasks.
DeepSeek-Coder-V2 is a significant upgrade from DeepSeek-Coder-V2-Base, boasting enhanced capabilities in coding and mathematical reasoning.
This enhancement is achieved through extensive pre-training with 6 trillion tokens from a high-quality, multi-source corpus.
As a result, DeepSeek-Coder-V2 excels in code-related tasks, reasoning, and general language tasks. Additionally, it supports a wider range of programming languages, increasing from 86 to 338, and extends the context length from 16K to 128K.
In standard benchmark evaluations, DeepSeek-Coder-V2 outperforms closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. For a detailed list of supported programming languages, refer to their comprehensive paper.
DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence
Model Downloads DeepSeek-Coder-V2
DeepSeek-Coder-V2 is available with two parameter configurations: 16B and 236B, based on the DeepSeekMoE framework.
Despite their large total parameter counts, the models have active parameters of only 2.4B and 21B, respectively, making them highly efficient. Both base and instruct models are publicly accessible.
Model | #Total Params | #Active Params | Context Length | Download |
---|---|---|---|---|
DeepSeek-Coder-V2-Lite-Base | 16B | 2.4B | 128k | š¤ HuggingFace |
DeepSeek-Coder-V2-Lite-Instruct | 16B | 2.4B | 128k | š¤ HuggingFace |
DeepSeek-Coder-V2-Base | 236B | 21B | 128k | š¤ HuggingFace |
DeepSeek-Coder-V2-Instruct | 236B | 21B | 128k | š¤ HuggingFace |
Chat Website Of DeepSeek-Coder-V2
We can interact with DeepSeek-Coder-V2 on DeepSeek’s official website: coder.deepseek.com.
API Platform For DeepSeek-Coder-V2
DeepSeek also provides an OpenAI-Compatible API on the DeepSeek Platform: platform.deepseek.com.
Sign up to receive millions of free tokens or opt for a pay-as-you-go model at an unbeatable price.
How To Run DeepSeek-Coder-V2 Locally
To utilize the DeepSeek-Coder-V2-Lite model locally, we will need 80GB*8 GPUs if running in BF16 format. Here are some examples of how to use the model with Huggingface’s Transformers library.
Inference with Huggingface’s Transformers
Code Completion In DeepSeek-Coder-V2
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Code Insertion In DeepSeek-Coder-V2
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = """<ļ½fimābeginļ½>def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
left = []
right = []
<ļ½fimāholeļ½>
if arr[i] < pivot:
left.append(arr[i])
else:
right.append(arr[i])
return quick_sort(left) + [pivot] + quick_sort(right)<ļ½fimāendļ½>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])
Chat Completion with DeepSeek-Coder-V2
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
messages = [
{'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
We can find the complete chat template within the tokenizer_config.json
located in the Huggingface model repository. An example chat template is as follows:
<ļ½begināofāsentenceļ½>User: {user_message_1}
Assistant: {assistant_message_1}<ļ½endāofāsentenceļ½>User: {user_message_2}
Assistant:
We can also add an optional system message:
<ļ½begināofāsentenceļ½>{system_message}
User: {user_message_1}
Assistant: {assistant_message_1}<ļ½endāofāsentenceļ½>User: {user_message_2}
Assistant:
Inference with vLLM (recommended) Of DeepSeek-Coder-V2
To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: vLLM Pull Request.
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
max_model_len, tp_size = 8192, 1
model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])
messages_list = [
[{"role": "user", "content": "Who are you?"}],
[{"role": "user", "content": "write a quick sort algorithm in python."}],
[{"role": "user", "content": "Write a piece of quicksort code in C++."}],
]
prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)
Summary
DeepSeek-Coder-V2 is poised to revolutionize the field of code intelligence, offering robust performance and a wide range of functionalities.
Whether we are a developer seeking advanced code completion tools or an enterprise looking for reliable API access, DeepSeek-Coder-V2 has we covered. Explore the possibilities today by downloading the models or interacting with them through their chat website and API platform.