魔搭社区模型速递(6.14-6.21)

 

🙋魔搭ModelScope本期社区进展:

📟1154个模型:Kimi-Dev-72B、MiniMax-M1、Lingshu-7B等;

📁185个数据集:EQ-bench_ca、thaimos-tts-annotation、ReasonMed等;

🎨63个创新应用MiniMax-M1、Nanonets-OCR-s、AscendMira:你的专属美妆魔镜等;

📄 9 篇内容:

  • 利用OpenVINO™高效推理MiniCPM4系列模型

  • Nanonets-OCR-s开源!复杂文档转Markdown SoTA,颠覆复杂文档工作流

  • 2025魔搭开发者大会!来了!

  • MiniMax-M1开源:支持百万级上下文窗口的混合MoE推理模型!

  • ModelScope魔搭25年6月发布月报

  • 同“西游”,见“万相”冠军|皮影西游LoRA创作分享

  • 同“西游”,见“万相”亚军|悟空传美学增强专用LoRA创作分享

  • 同“西游”,见“万相”季军|水墨烟雾西游LoRA创作分享

  • 同“西游”,见“万相”季军|赛博悟空西游LoRA创作分享

 

01

 

模型推荐

 

MiniMax-M1

MiniMax-M1 是MiniMax近期开源发布的全球首个开源的大规模混合架构推理模型,支持百万级上下文输入和最长 8 万 Token 的推理输出,总参数量 4560 亿,单次激活 459 亿 Tokens。它在长上下文理解、软件工程和工具使用等复杂任务中表现优异,性价比极高,并通过创新的强化学习算法 CISPO 实现高效训练。

 

模型链接:

https://modelscope.cn/models/MiniMax/MiniMax-M1-80k

 

示例代码:

介绍使用ms-swift对MiniMax-M1-40k进行推理。在推理之前,请确保环境已准备妥当:

git clone https://github.com/modelscope/ms-swift.gitcd ms-swiftpip install -e .
 
使用transformers作为推理后端:

显存占用: 8 * 80GiB

import osos.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3,4,5,6,7'from transformers import QuantoConfigfrom swift.llm import PtEngine, RequestConfig, InferRequestquantization_config = QuantoConfig(weights='int8')messages = [{    'role''system',    'content''You are a helpful assistant.'}, {    'role''user',    'content''who are you?'}]engine = PtEngine('MiniMax/MiniMax-M1-40k', quantization_config=quantization_config)infer_request = InferRequest(messages=messages)request_config = RequestConfig(max_tokens=128, temperature=0)resp = engine.infer([infer_request], request_config=request_config)response = resp[0].choices[0].message.contentprint(f'response: {response}')"""<think>Okay, the user asked "who are you?" I need to respond in a way that's helpful and clear. Let me start by introducing myself as an AI assistant. I should mention that I'm here to help with information, answer questions, and assist with tasks. Maybe keep it friendly and open-ended so they know they can ask for more details if needed. Let me make sure the response is concise but informative.</think>I'm an AI assistant designed to help with information, answer questions, and assist with various tasks. Feel free to ask me anything, and I'll do my best to help! 😊"""
 
更多推理实战教程详见:

MiniMax-M1开源:支持百万级上下文窗口的混合MoE推理模型!

 

Kimi-Dev-72B

Kimi-Dev-72B 是Kimi最新开源的一款大型编程语言模型,专为软件工程任务设计,通过大规模强化学习优化,能够在真实代码库中自动修复漏洞并通过测试验证,其在 SWE-bench Verified 数据集上以 60.4% 的性能刷新了开源模型的最高纪录。

 

模型链接:

https://modelscope.cn/models/moonshotai/Kimi-Dev-72B

 
示例代码:
  •  
from modelscope import AutoModelForCausalLM, AutoTokenizermodel_name = "moonshotai/Kimi-Dev-72B"model = AutoModelForCausalLM.from_pretrained(    model_name,    torch_dtype="auto",    device_map="auto")tokenizer = AutoTokenizer.from_pretrained(model_name)prompt = "Give me a short introduction to large language model."messages = [    {"role""system""content""You are a helpful assistant."},    {"role""user""content": prompt}]text = tokenizer.apply_chat_template(    messages,    tokenize=False,    add_generation_prompt=True)model_inputs = tokenizer([text], return_tensors="pt").to(model.device)generated_ids = model.generate(    **model_inputs,    max_new_tokens=512)generated_ids = [    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

 

Nanonets-OCR-s

Nanonets-OCR-s是一款强大的OCR模型,该模型基于Qwen2.5-VL-3B微调,9G显存可跑,能够通过智能内容识别和语义标记,将杂乱的文档转换为现代人工智能应用所需的干净、结构化且上下文丰富的 Markdown 格式。它的功能远超传统的文本提取,是目前图像转 Markdown 领域的SoTA模型。

 

模型链接:
https://www.modelscope.cn/studios/nanonets/Nanonets-ocr-s

 

示例代码:

使用transformers推理

下载模型

modelscope download --model nanonets/Nanonets-OCR-s --local_dir nanonets/Nanonets-OCR-s
 

推理脚本

from PIL import Imagefrom transformers import AutoTokenizer, AutoProcessor, AutoModelForImageTextToTextmodel_path = "nanonets/Nanonets-OCR-s"model = AutoModelForImageTextToText.from_pretrained(    model_path,     torch_dtype="auto"    device_map="auto"    attn_implementation="flash_attention_2")model.eval()tokenizer = AutoTokenizer.from_pretrained(model_path)processor = AutoProcessor.from_pretrained(model_path)def ocr_page_with_nanonets_s(image_path, model, processor, max_new_tokens=4096):    prompt = """Extract the text from the above document as if you were reading it naturally. Return the tables in html format. Return the equations in LaTeX representation. If there is an image in the document and image caption is not present, add a small description of the image inside the <img></img> tag; otherwise, add the image caption inside <img></img>. Watermarks should be wrapped in brackets. Ex: <watermark>OFFICIAL COPY</watermark>. Page numbers should be wrapped in brackets. Ex: <page_number>14</page_number> or <page_number>9/22</page_number>. Prefer using ☐ and ☑ for check boxes."""    image = Image.open(image_path)    messages = [        {"role""system""content""You are a helpful assistant."},        {"role""user""content": [            {"type""image""image"f"file://{image_path}"},            {"type""text""text": prompt},        ]},    ]    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)    inputs = processor(text=[text], images=[image], padding=True, return_tensors="pt")    inputs = inputs.to(model.device)    output_ids = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)    generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]    output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)    return output_text[0]image_path = "/path/to/your/document.jpg"result = ocr_page_with_nanonets_s(image_path, model, processor, max_new_tokens=15000)print(result)
 

Lingshu系列

Lingshu-7B 是由阿里巴巴达摩院开源的一个专注于医疗领域的大型语言模型,推出7B、32B两个参数版本,在大多数医疗多模态/文本 QA 和报告生成任务中达到 SOTA 性能,能够为医学文本处理、临床辅助决策和医疗知识问答等任务提供高效支持。

Lingshu-32B 在大多数多模态 QA 和报告生成任务中优于 GPT-4.1 和 Claude Sonnet 4。Lingshu 支持超过 12 种医学成像模式,包括 X 射线、CT 扫描、MRI、显微镜、超声波、组织病理学、皮肤镜检查、眼底、OCT、数字摄影、内窥镜检查和 PET。

 

模型链接:

Lingshu-7B

https://modelscope.cn/models/lingshu-medical-mllm/Lingshu-7B

 

Lingshu-32B

https://modelscope.cn/models/lingshu-medical-mllm/Lingshu-32B

 
示例代码:

使用 transformers

from modelscope import Qwen2_5_VLForConditionalGeneration, AutoProcessorfrom qwen_vl_utils import process_vision_info# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.model = Qwen2_5_VLForConditionalGeneration.from_pretrained(    "lingshu-medical-mllm/Lingshu-7B",    torch_dtype=torch.bfloat16,    attn_implementation="flash_attention_2",    device_map="auto",)processor = AutoProcessor.from_pretrained("lingshu-medical-mllm/Lingshu-7B")messages = [    {        "role""user",        "content": [            {                "type""image",                "image""example.png",            },            {"type""text""text""Describe this image."},        ],    }]# Preparation for inferencetext = processor.apply_chat_template(    messages, tokenize=False, add_generation_prompt=True)image_inputs, video_inputs = process_vision_info(messages)inputs = processor(    text=[text],    images=image_inputs,    videos=video_inputs,    padding=True,    return_tensors="pt",)inputs = inputs.to(model.device)# Inference: Generation of the outputgenerated_ids = model.generate(**inputs, max_new_tokens=128)generated_ids_trimmed = [    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]output_text = processor.batch_decode(    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)print(output_text)

 

使用 vLLM

 
from vllm import LLM, SamplingParamsfrom qwen_vl_utils import process_vision_infoimport PILfrom modelscope import AutoProcessorprocessor = AutoProcessor.from_pretrained("lingshu-medical-mllm/Lingshu-7B")llm = LLM(model="lingshu-medical-mllm/Lingshu-7B", limit_mm_per_prompt = {"image"4}, tensor_parallel_size=2, enforce_eager=True, trust_remote_code=True,)sampling_params = SamplingParams(            temperature=0.7,            top_p=1,            repetition_penalty=1,            max_tokens=1024,            stop_token_ids=[],        )text = "What does the image show?"image_path = "example.png"image = PIL.Image.open(image_path)message = [    {        "role":"user",        "content":[            {"type":"image","image":image},            {"type":"text","text":text}            ]            }]prompt = processor.apply_chat_template(    message,    tokenize=False,    add_generation_prompt=True,)image_inputs, video_inputs = process_vision_info(message)mm_data = {}mm_data["image"] = image_inputsprocessed_input = {  "prompt": prompt,  "multi_modal_data": mm_data,}outputs = llm.generate([processed_input], sampling_params=sampling_params)print(outputs[0].outputs[0].text)
 

 

 
02

 

数据集推荐

 

 

EQ-bench_ca

EQ-bench_ca 是一个用于评估模型在因果关系理解任务上的性能的数据集,由 BSC-LT 团队创建,专注于测试模型对因果逻辑的推理能力。

数据集链接:

https://modelscope.cn/datasets/BSC-LT/EQ-bench_ca

 

thaimos-tts-annotation

thaimos-tts-annotation是一个用于泰语语音合成(TTS)的数据集,包含泰语语音的标注信息,旨在支持泰语语音合成模型的开发和优化。

数据集链接:

https://modelscope.cn/datasets/scb10x/thaimos-tts-annotation

 

ReasonMed
ReasonMed 是迄今为止最大的开源医学推理数据集,包含 370,000 条高质量的问题-答案示例,附有多步思维链(CoT)理由和简洁总结。这些是从由三个竞争性的大型语言模型(Qwen-2.5-72B、DeepSeek-R1-Distill-Llama-70B 和 HuatuoGPT-o1-70B)生成的 175万 初始推理路径中提炼出来的,使用了严格的多代理验证和精炼流程。

数据集链接:

https://modelscope.cn/datasets/AI-ModelScope/ReasonMed

 

03

 

创空间

 

MiniMax-M1

MiniMax-M1 是一个支持百万级上下文窗口的混合 MoE 推理模型的在线体验平台,用户可以在此测试其长文本处理和复杂任务推理能力。

 

体验链接:

https://modelscope.cn/studios/MiniMax/MiniMax-M1

 

 

Nanonets-OCR-s

Nanonets-OCR-s 是一个在线演示平台,提供光学字符识别(OCR)功能的体验,用户可以上传图片进行文字识别和提取。

 

体验链接:

https://modelscope.cn/studios/nanonets/Nanonets-ocr-s

 

 

 

04

 

社区精选文章

 


 


 

 

👇点击关注ModelScope公众号获取
更多技术信息~