DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence

Hello Learners…

Welcome to the blog…

Table Of Content

Intoriduction
DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence
Model Downloads DeepSeek-Coder-V2
Chat Website Of DeepSeek-Coder-V2
API Platform For DeepSeek-Coder-V2
How To Run DeepSeek-Coder-V2 Locally
- Code Completion In DeepSeek-Coder-V2
- Code Insertion In DeepSeek-Coder-V2
- Chat Completion with DeepSeek-Coder-V2
Inference with vLLM (recommended) Of DeepSeek-Coder-V2
Summary
References

Introduction

In this post we discuss about DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence.

DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that rivals the performance of GPT4-Turbo in code-specific tasks.

DeepSeek-Coder-V2 is a significant upgrade from DeepSeek-Coder-V2-Base, boasting enhanced capabilities in coding and mathematical reasoning.

This enhancement is achieved through extensive pre-training with 6 trillion tokens from a high-quality, multi-source corpus.

As a result, DeepSeek-Coder-V2 excels in code-related tasks, reasoning, and general language tasks. Additionally, it supports a wider range of programming languages, increasing from 86 to 338, and extends the context length from 16K to 128K.

In standard benchmark evaluations, DeepSeek-Coder-V2 outperforms closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. For a detailed list of supported programming languages, refer to their comprehensive paper.

DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence

Model Downloads DeepSeek-Coder-V2

DeepSeek-Coder-V2 is available with two parameter configurations: 16B and 236B, based on the DeepSeekMoE framework.

Despite their large total parameter counts, the models have active parameters of only 2.4B and 21B, respectively, making them highly efficient. Both base and instruct models are publicly accessible.

Model	#Total Params	#Active Params	Context Length	Download
DeepSeek-Coder-V2-Lite-Base	16B	2.4B	128k	🤗 HuggingFace
DeepSeek-Coder-V2-Lite-Instruct	16B	2.4B	128k	🤗 HuggingFace
DeepSeek-Coder-V2-Base	236B	21B	128k	🤗 HuggingFace
DeepSeek-Coder-V2-Instruct	236B	21B	128k	🤗 HuggingFace

Chat Website Of DeepSeek-Coder-V2

We can interact with DeepSeek-Coder-V2 on DeepSeek’s official website: coder.deepseek.com.

API Platform For DeepSeek-Coder-V2

DeepSeek also provides an OpenAI-Compatible API on the DeepSeek Platform: platform.deepseek.com.

How To Run DeepSeek-Coder-V2 Locally

To utilize the DeepSeek-Coder-V2-Lite model locally, we will need 80GB*8 GPUs if running in BF16 format. Here are some examples of how to use the model with Huggingface’s Transformers library.

Inference with Huggingface’s Transformers

Code Completion In DeepSeek-Coder-V2

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Code Insertion In DeepSeek-Coder-V2

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = """<｜fim▁begin｜>def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = []
    right = []
<｜fim▁hole｜>
        if arr[i] < pivot:
            left.append(arr[i])
        else:
            right.append(arr[i])
    return quick_sort(left) + [pivot] + quick_sort(right)<｜fim▁end｜>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])

Chat Completion with DeepSeek-Coder-V2

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
messages = [
    {'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

We can find the complete chat template within the tokenizer_config.json located in the Huggingface model repository. An example chat template is as follows:

<｜begin▁of▁sentence｜>User: {user_message_1}

Assistant: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

Assistant:

We can also add an optional system message:

<｜begin▁of▁sentence｜>{system_message}

User: {user_message_1}

Assistant: {assistant_message_1}<｜end▁of▁sentence｜>User: {user_message_2}

Assistant:

Inference with vLLM (recommended) Of DeepSeek-Coder-V2

To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: vLLM Pull Request.

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 1
model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "write a quick sort algorithm in python."}],
    [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

Summary

DeepSeek-Coder-V2 is poised to revolutionize the field of code intelligence, offering robust performance and a wide range of functionalities.

Whether we are a developer seeking advanced code completion tools or an enterprise looking for reliable API access, DeepSeek-Coder-V2 has we covered. Explore the possibilities today by downloading the models or interacting with them through their chat website and API platform.

https://galaxyofai.com/tag/llms/

References

https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Table Of Content

Introduction

DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence

Model Downloads DeepSeek-Coder-V2

Chat Website Of DeepSeek-Coder-V2

API Platform For DeepSeek-Coder-V2

How To Run DeepSeek-Coder-V2 Locally

Inference with Huggingface’s Transformers

Inference with vLLM (recommended) Of DeepSeek-Coder-V2

Summary

References

Leave a Comment Cancel reply