魔搭社区模型速递(6.14-6.21)
作者:魔搭ModelScope社区 更新时间:2025-06-23 10:35:29 共9人关注
魔搭社区模型速递(6.14-6.21)

🙋魔搭ModelScope本期社区进展:
📟1154个模型:Kimi-Dev-72B、MiniMax-M1、Lingshu-7B等;
📁185个数据集:EQ-bench_ca、thaimos-tts-annotation、ReasonMed等;
🎨63个创新应用:MiniMax-M1、Nanonets-OCR-s、AscendMira:你的专属美妆魔镜等;
📄 9 篇内容:
-
利用OpenVINO™高效推理MiniCPM4系列模型
-
Nanonets-OCR-s开源!复杂文档转Markdown SoTA,颠覆复杂文档工作流
-
2025魔搭开发者大会!来了!
-
MiniMax-M1开源:支持百万级上下文窗口的混合MoE推理模型!
-
ModelScope魔搭25年6月发布月报
-
同“西游”,见“万相”冠军|皮影西游LoRA创作分享
-
同“西游”,见“万相”亚军|悟空传美学增强专用LoRA创作分享
-
同“西游”,见“万相”季军|水墨烟雾西游LoRA创作分享
-
同“西游”,见“万相”季军|赛博悟空西游LoRA创作分享
MiniMax-M1
MiniMax-M1 是MiniMax近期开源发布的全球首个开源的大规模混合架构推理模型,支持百万级上下文输入和最长 8 万 Token 的推理输出,总参数量 4560 亿,单次激活 459 亿 Tokens。它在长上下文理解、软件工程和工具使用等复杂任务中表现优异,性价比极高,并通过创新的强化学习算法 CISPO 实现高效训练。
https://modelscope.cn/models/MiniMax/MiniMax-M1-80k
示例代码:
介绍使用ms-swift对MiniMax-M1-40k进行推理。在推理之前,请确保环境已准备妥当:
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .
使用transformers作为推理后端:
显存占用: 8 * 80GiB
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3,4,5,6,7'
from transformers import QuantoConfig
from swift.llm import PtEngine, RequestConfig, InferRequest
quantization_config = QuantoConfig(weights='int8')
messages = [{
'role': 'system',
'content': 'You are a helpful assistant.'
}, {
'role': 'user',
'content': 'who are you?'
}]
engine = PtEngine('MiniMax/MiniMax-M1-40k', quantization_config=quantization_config)
infer_request = InferRequest(messages=messages)
request_config = RequestConfig(max_tokens=128, temperature=0)
resp = engine.infer([infer_request], request_config=request_config)
response = resp[0].choices[0].message.content
print(f'response: {response}')
"""
<think>
Okay, the user asked "who are you?" I need to respond in a way that's helpful and clear. Let me start by introducing myself as an AI assistant. I should mention that I'm here to help with information, answer questions, and assist with tasks. Maybe keep it friendly and open-ended so they know they can ask for more details if needed. Let me make sure the response is concise but informative.
</think>
I'm an AI assistant designed to help with information, answer questions, and assist with various tasks. Feel free to ask me anything, and I'll do my best to help! 😊
"""
更多推理实战教程详见:
MiniMax-M1开源:支持百万级上下文窗口的混合MoE推理模型!
Kimi-Dev-72B
模型链接:
https://modelscope.cn/models/moonshotai/Kimi-Dev-72B
from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "moonshotai/Kimi-Dev-72B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Nanonets-OCR-s
Nanonets-OCR-s是一款强大的OCR模型,该模型基于Qwen2.5-VL-3B微调,9G显存可跑,能够通过智能内容识别和语义标记,将杂乱的文档转换为现代人工智能应用所需的干净、结构化且上下文丰富的 Markdown 格式。它的功能远超传统的文本提取,是目前图像转 Markdown 领域的SoTA模型。
使用transformers推理
下载模型
modelscope download --model nanonets/Nanonets-OCR-s --local_dir nanonets/Nanonets-OCR-s
推理脚本
from PIL import Image
from transformers import AutoTokenizer, AutoProcessor, AutoModelForImageTextToText
model_path = "nanonets/Nanonets-OCR-s"
model = AutoModelForImageTextToText.from_pretrained(
model_path,
torch_dtype="auto",
device_map="auto",
attn_implementation="flash_attention_2"
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_path)
processor = AutoProcessor.from_pretrained(model_path)
def ocr_page_with_nanonets_s(image_path, model, processor, max_new_tokens=4096):
prompt = """Extract the text from the above document as if you were reading it naturally. Return the tables in html format. Return the equations in LaTeX representation. If there is an image in the document and image caption is not present, add a small description of the image inside the <img></img> tag; otherwise, add the image caption inside <img></img>. Watermarks should be wrapped in brackets. Ex: <watermark>OFFICIAL COPY</watermark>. Page numbers should be wrapped in brackets. Ex: <page_number>14</page_number> or <page_number>9/22</page_number>. Prefer using ☐ and ☑ for check boxes."""
image = Image.open(image_path)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": [
{"type": "image", "image": f"file://{image_path}"},
{"type": "text", "text": prompt},
]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], padding=True, return_tensors="pt")
inputs = inputs.to(model.device)
output_ids = model.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]
output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
return output_text[0]
image_path = "/path/to/your/document.jpg"
result = ocr_page_with_nanonets_s(image_path, model, processor, max_new_tokens=15000)
print(result)
Lingshu系列
Lingshu-32B 在大多数多模态 QA 和报告生成任务中优于 GPT-4.1 和 Claude Sonnet 4。Lingshu 支持超过 12 种医学成像模式,包括 X 射线、CT 扫描、MRI、显微镜、超声波、组织病理学、皮肤镜检查、眼底、OCT、数字摄影、内窥镜检查和 PET。
Lingshu-7B
https://modelscope.cn/models/lingshu-medical-mllm/Lingshu-7B
Lingshu-32B
https://modelscope.cn/models/lingshu-medical-mllm/Lingshu-32B
使用 transformers
from modelscope import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"lingshu-medical-mllm/Lingshu-7B",
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
processor = AutoProcessor.from_pretrained("lingshu-medical-mllm/Lingshu-7B")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "example.png",
},
{"type": "text", "text": "Describe this image."},
],
}
]
# Preparation for inference
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to(model.device)
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
使用 vLLM
from vllm import LLM, SamplingParams
from qwen_vl_utils import process_vision_info
import PIL
from modelscope import AutoProcessor
processor = AutoProcessor.from_pretrained("lingshu-medical-mllm/Lingshu-7B")
llm = LLM(model="lingshu-medical-mllm/Lingshu-7B", limit_mm_per_prompt = {"image": 4}, tensor_parallel_size=2, enforce_eager=True, trust_remote_code=True,)
sampling_params = SamplingParams(
temperature=0.7,
top_p=1,
repetition_penalty=1,
max_tokens=1024,
stop_token_ids=[],
)
text = "What does the image show?"
image_path = "example.png"
image = PIL.Image.open(image_path)
message = [
{
"role":"user",
"content":[
{"type":"image","image":image},
{"type":"text","text":text}
]
}
]
prompt = processor.apply_chat_template(
message,
tokenize=False,
add_generation_prompt=True,
)
image_inputs, video_inputs = process_vision_info(message)
mm_data = {}
mm_data["image"] = image_inputs
processed_input = {
"prompt": prompt,
"multi_modal_data": mm_data,
}
outputs = llm.generate([processed_input], sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
EQ-bench_ca 是一个用于评估模型在因果关系理解任务上的性能的数据集,由 BSC-LT 团队创建,专注于测试模型对因果逻辑的推理能力。
数据集链接:
https://modelscope.cn/datasets/BSC-LT/EQ-bench_ca
thaimos-tts-annotation
thaimos-tts-annotation是一个用于泰语语音合成(TTS)的数据集,包含泰语语音的标注信息,旨在支持泰语语音合成模型的开发和优化。
数据集链接:
https://modelscope.cn/datasets/scb10x/thaimos-tts-annotation
数据集链接:
https://modelscope.cn/datasets/AI-ModelScope/ReasonMed
MiniMax-M1
体验链接:
https://modelscope.cn/studios/MiniMax/MiniMax-M1
Nanonets-OCR-s
体验链接:
https://modelscope.cn/studios/nanonets/Nanonets-ocr-s