DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence

Hello Learners…

Welcome to the blog…

Table Of Content

  • Intoriduction
  • DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence
  • Model Downloads DeepSeek-Coder-V2
  • Chat Website Of DeepSeek-Coder-V2
  • API Platform For DeepSeek-Coder-V2
  • How To Run DeepSeek-Coder-V2 Locally
    • Code Completion In DeepSeek-Coder-V2
    • Code Insertion In DeepSeek-Coder-V2
    • Chat Completion with DeepSeek-Coder-V2
  • Inference with vLLM (recommended) Of DeepSeek-Coder-V2
  • Summary
  • References

Introduction

In this post we discuss about DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence.

DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that rivals the performance of GPT4-Turbo in code-specific tasks.

DeepSeek-Coder-V2 is a significant upgrade from DeepSeek-Coder-V2-Base, boasting enhanced capabilities in coding and mathematical reasoning.

This enhancement is achieved through extensive pre-training with 6 trillion tokens from a high-quality, multi-source corpus.

As a result, DeepSeek-Coder-V2 excels in code-related tasks, reasoning, and general language tasks. Additionally, it supports a wider range of programming languages, increasing from 86 to 338, and extends the context length from 16K to 128K.

In standard benchmark evaluations, DeepSeek-Coder-V2 outperforms closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. For a detailed list of supported programming languages, refer to their comprehensive paper.

DeepSeek-Coder-V2: Advancing Open-Source Code Intelligence

Model Downloads DeepSeek-Coder-V2

DeepSeek-Coder-V2 is available with two parameter configurations: 16B and 236B, based on the DeepSeekMoE framework.

Despite their large total parameter counts, the models have active parameters of only 2.4B and 21B, respectively, making them highly efficient. Both base and instruct models are publicly accessible.

Model#Total Params#Active ParamsContext LengthDownload
DeepSeek-Coder-V2-Lite-Base16B2.4B128kšŸ¤— HuggingFace
DeepSeek-Coder-V2-Lite-Instruct16B2.4B128kšŸ¤— HuggingFace
DeepSeek-Coder-V2-Base236B21B128kšŸ¤— HuggingFace
DeepSeek-Coder-V2-Instruct236B21B128kšŸ¤— HuggingFace

Chat Website Of DeepSeek-Coder-V2

We can interact with DeepSeek-Coder-V2 on DeepSeek’s official website: coder.deepseek.com.

API Platform For DeepSeek-Coder-V2

DeepSeek also provides an OpenAI-Compatible API on the DeepSeek Platform: platform.deepseek.com.

Sign up to receive millions of free tokens or opt for a pay-as-you-go model at an unbeatable price.

How To Run DeepSeek-Coder-V2 Locally

To utilize the DeepSeek-Coder-V2-Lite model locally, we will need 80GB*8 GPUs if running in BF16 format. Here are some examples of how to use the model with Huggingface’s Transformers library.

Inference with Huggingface’s Transformers

Code Completion In DeepSeek-Coder-V2

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Code Insertion In DeepSeek-Coder-V2

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = """<ļ½œfimā–beginļ½œ>def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = []
    right = []
<ļ½œfimā–holeļ½œ>
        if arr[i] < pivot:
            left.append(arr[i])
        else:
            right.append(arr[i])
    return quick_sort(left) + [pivot] + quick_sort(right)<ļ½œfimā–endļ½œ>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])

Chat Completion with DeepSeek-Coder-V2

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
messages = [
    {'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

We can find the complete chat template within the tokenizer_config.json located in the Huggingface model repository. An example chat template is as follows:

<ļ½œbeginā–ofā–sentenceļ½œ>User: {user_message_1}

Assistant: {assistant_message_1}<ļ½œendā–ofā–sentenceļ½œ>User: {user_message_2}

Assistant:

We can also add an optional system message:

<ļ½œbeginā–ofā–sentenceļ½œ>{system_message}

User: {user_message_1}

Assistant: {assistant_message_1}<ļ½œendā–ofā–sentenceļ½œ>User: {user_message_2}

Assistant:

Inference with vLLM (recommended) Of DeepSeek-Coder-V2

To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: vLLM Pull Request.

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 1
model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "write a quick sort algorithm in python."}],
    [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

Summary

DeepSeek-Coder-V2 is poised to revolutionize the field of code intelligence, offering robust performance and a wide range of functionalities.

Whether we are a developer seeking advanced code completion tools or an enterprise looking for reliable API access, DeepSeek-Coder-V2 has we covered. Explore the possibilities today by downloading the models or interacting with them through their chat website and API platform.

References

Leave a Comment