MLinfo | 機械学習・AI論文まとめ

transformers — 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

🤗 Transformersは、テキスト・ビジョン・音声など複雑なモデル定義をサポートするフレームワークで、インフェレンスターやトレーニングに使用できる。

深層学習Transformer分類テキスト音声

用途: 機械学習モデル定義
難易度: Easy
コスト: High

強化学習方策勾配 (PPO / A3C)分類テキスト

paperless-ngx — A community-supported supercharged document management system: scan, index and archive all your documents

paperless-ngxは、コミュニティによってサポートされたスーパーチャージドのドキュメント管理システムで、ドキュメントのスキャン・インデックス・アーカイブが可能である。

用途: ドキュメント管理
難易度: Easy
コスト: Low

diffusers — 🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

.diffusion モデルのライブラリ。画像・動画・音声生成に利用可能。

生成AI拡散モデル生成画像テキスト

用途: 画像・動画・音声生成
難易度: Easy
コスト: High

コンピュータビジョン物体検出分類セグメンテーション画像

label-studio — Label Studio is a multi-type data labeling and annotation tool with standardized output format

データラベル化と注釈化を行うためのツールです。

用途: データラベル化ツール
難易度: Easy
コスト: Low

cs249r_book — Machine Learning Systems

マシンラーニングシステムの理論と実装に関する本。

深層学習テキスト

用途: 機械学習システム
難易度: Easy
コスト: Medium

Medical_Image_Analysis — Foundation models based medical image analysis

医学画像分析は、医療の診断や治療を支援するために画像に記載されたデータから情報を抽出する研究分野です。この研究では、foundation modelsを用い、医療画像分析のための新しいアプローチを提案しました。found

自然言語処理大規模言語モデル生成画像テキスト

用途: 医学画像分析
難易度: Easy
コスト: High

自然言語処理大規模言語モデルテキスト音声マルチモーダル

screenpipe — YC (S26) | Record your screen 24/7 and plug into your agents. Local, private, secure. Connect to OpenClaw, Hermes agent and 100+ apps

ユーザーの行動を認識し、オートエージェントを構築するためのツール。

用途: オートエージェント構築
難易度: Easy
コスト: High

Meshroom — Node-based Visual Programming Toolbox

ノードベースのビジュアルプログラミングツールです。

コンピュータビジョン3D・点群画像テキスト3D

用途: ビジュアルプログラミングツール
難易度: Easy
コスト: High

unsloth — Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.

Unsloth Studioは、オープンモデルのトレーニングと実行を支援するWebUIです。このライブラリは、Gemma4、Qwen3.5などのオープンモデルのテストとトレーニングを支援するために使われます。

自然言語処理大規模言語モデルテキスト音声

用途: オープンモデルのトレーニングと実行
難易度: Easy
コスト: High

深層学習Transformer画像テキストマルチモーダル

sglang — SGLang is a high-performance serving framework for large language models and multimodal models.

SGLangは、大規模言語モデルのサービングフレームワークです。このライブラリは、高性能なサービスフレームワークで、大規模言語モデルのサービングをサポートしています。

用途: 大規模言語モデルのサービングフレームワーク
難易度: Easy
コスト: High

自然言語処理大規模言語モデルテキストマルチモーダル

ai-agent-book — 《深入理解 AI Agent：设计原理与工程实践》（李博杰著）开源主仓库：全书正文、编译版 PDF 与按章配套代码

この論文では、現在のVision-Language-Benchmark（VLB）を超える、MLLMがアクティブな観察を実演できるようにするためのバenchmark、ActiveVisionを提案する。このActiveVi

用途: 弁論の実際的な対象を形成するためにAIが活用される
難易度: Easy
コスト: High

Sana — SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

SANAは、高解像度画像生成モデルSANAを紹介する本研究であり、低計算コストで優れた高解像度画像を生成できる。

深層学習Transformer生成画像テキスト

用途: 高解像度画像合成
難易度: Easy
コスト: High

haystack — Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

オープンソースのAIオーケストレーションフレームワークです。LLMアプリケーションの構築に必要なパイプラインやエージェントワークフローの設計ができるようになっています。

深層学習Transformer生成要約テキスト

用途: LLMアプリケーションの構築
難易度: Easy
コスト: High

FunASR — Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenAI-compatible/MCP serving.

電気生理信号から表現を学習し、脳コンピューターインターフェースの開発を支援する。

深層学習Transformer分類検出テキスト

用途: 電気生理信号から表現を学習する
難易度: Easy
コスト: High

DocsGPT — Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

このリポジトリでは、トークナイザーの最適化を提供しています。

深層学習Transformerテキスト

用途: トークナイザーの最適化
難易度: Easy
コスト: Medium

rasa — 💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

rasaは、テキストやボイスベースの会話を自動化するオープンソースの機械学習フレームワークです。自然言語理解(NLU)、会話管理、 slackやFacebook等への接続など、幅広い機能を提供しています。

自然言語処理テキスト

用途: チャットボット作成
難易度: Easy
コスト: Medium

表形式向き深層学習Transformer分類検出画像

presidio — An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

presidioは、テキスト、画像、構造化データを含む敏感データを検出、削除、マスク、アノニマイズするオープンソースフレームワークです。自然言語処理、パターンマッチング、カスタマイズ可能なパイプラインをサポートします。

用途: データのプライバシーを保護する
難易度: Easy
コスト: Low

awesome-llm-unlearning — A resource repository for machine unlearning in large language models

このリポジトリは大規モデルの無学習に関するリソースをまとめたものです。

用途: 大規模言語モデルの無学習
難易度: Easy
コスト: High

FinGPT — FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

このリポジトリでは、Lecture Learning Modelsに対してReinforcement Learningを実行するライブラリを提供しています。

用途: 可搬性のあるReinforcement Learning
難易度: Easy
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成テキスト動画

Causal-Forcing — [ICML 2026] Official codebase for "Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation" & Causal Forcing++

この論文では、Causal-Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive

用途: 高品質のビデオ生成を実現する。
難易度: Easy
コスト: High

表形式向き自然言語処理大規模言語モデル画像テキスト表形式

unstructured — Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

ドキュメントを構造化するために使えるオープンソースのETLソリューション。

用途: ドキュメントの構造化
難易度: Easy
コスト: High

txtai — 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

LLMを利用するために、セマンティック検索やLLMのオーケストレーションなどを行えるフレームワーク。

深層学習Transformer生成テキスト

用途: セマンティック検索
難易度: Easy
コスト: High

githubGitHubあり2026-07-21

TextBlob — Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

テキスト分析、センチメント分析や単語分割などを行えるライブラリ。

自然言語処理テキスト音声

用途: テキスト分析
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-20

Open-dLLM — Open diffusion language model for code generation — releasing pretraining, evaluation, inference, and checkpoints.

Open-dLLMはOpen diffusion language modelを公開しており、コード生成の前トレーニング、評価、推論、チェックポイントを公開しています。

自然言語処理大規模言語モデル生成テキスト

用途: コード生成の問題を解決する
難易度: Easy
コスト: High

githubGitHubあり2026-07-19

testtimescaling.github.io — "what, how, where, and how well? a survey on test-time scaling in large language models" repository

大規模言語モデルのテスト時間調整に関する調査のリポジトリ。

用途: 大規模言語モデルのテスト時間調整
難易度: Easy
コスト: High

githubGitHubあり2026-07-18

maths-cs-ai-compendium — Become a cracked AI/ML Research Engineer

Becoming a cracked AI/ML Research Engineerには、AI/ML研究者のスキルと知識を高めるための手法が紹介されています。

コンピュータビジョンマルチモーダルテキスト音声

用途: AI/ML研究者を育成
難易度: Easy
コスト: High

自然言語処理大規模言語モデル生成テキストマルチモーダル

githubGitHubあり2026-07-17

generative-ai — Comprehensive resources on Generative AI, including a detailed roadmap, projects, use cases, interview preparation, and coding preparation.

ゼネレーティブAIに関連するリソースの一覧。

用途: ゼネレーティブAI
難易度: Easy
コスト: High

githubGitHubあり2026-07-15

vowpal_wabbit — Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Vowpal Wabbitは、機械学習を進歩させるためのオンライン学習、ハッシュ、reduceなどの強力なアルゴリズムを含むシステムです。その結果、さまざまな問題に応じて、高品質な解決策を提供できます。

強化学習テキスト

用途: 強い機械学習アルゴリズムを実行し複雑な問題を解決するためのシステム
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-14

Awesome-Embodied-Robotics-and-Agent — This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥

Embodied AIやロボットとLarge Language Modelを組み合わせた研究のリポジトリ。

用途: Embodied AIやロボット研究
難易度: Easy
コスト: High

githubGitHubあり2026-07-14

LakonLab — Official implementation of AsymFlow, pi-Flow, GMFlow

LakonLabは、AsymFlow、pi-Flow、GMFlowなどの生成型流体力学を実装するためのオープンソースプロジェクトです。

深層学習軽量化・量子化生成画像テキスト

用途: 生成型流体力学の実装
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-14

memvid — Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.

MemVidは、サーバーレスで単一ファイルの記憶層を提案し、AIエージェントが即時検索と長期的な記憶を持つようにする記憶層です。

自然言語処理大規模言語モデル生成テキスト動画

用途: AIエージェントの記憶を管理する
難易度: Easy
コスト: High

githubGitHubあり2026-07-13

Matcha-TTS — [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching

Matcha-TTSは、高速で条件付き流のマッチングを実現するTTSアーキテクチャであり、話者の特徴を考慮する。

生成AI拡散モデルテキスト音声

用途: TTSアーキテクチャ設計
難易度: Easy
コスト: High

githubGitHubあり2026-07-13

Irodori-TTS — A Flow Matching-based Text-to-Speech Model with Emoji-driven Style Control

Emotion-driven Style Controlを使用してテキストから声の変換が実行され、感情のあるテキストをエモタイザブルな声に変換することが可能になります。

生成AI拡散モデル生成テキスト音声

用途: テキスト-to-声の変換
難易度: Easy
コスト: High

githubGitHubあり2026-07-11

awesome-nlp — :book: A curated list of resources dedicated to Natural Language Processing (NLP)

このリポジトリは自然言語処理(NLP)に関するリソースをまとめたものです。

自然言語処理テキスト

用途: NLPリソースのまとめ
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-08

VoxCPM — VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

マルチラギングスピーチ生成やクリエイティブボイスデザイン、ルートライフクライミングなど、テクスチャファリーTTSの最新技術を実現するためのフレームワークです。

生成AI音声・音楽生成生成テキスト音声

用途: マルチラギングスピーチ生成
難易度: Easy
コスト: Medium

githubGitHubあり2026-07-07

enchanted — Enchanted is iOS and macOS app for chatting with private self hosted language models such as Llama2, Mistral or Vicuna using Ollama.

iOS、macOS用のアプリ「Enchanted」は、個人でホストした言語モデル（LLama2、Mistral、Vicunaなど）とのチャットを可能にする。

用途: 私家版の言語モデルとチャットするためのiOS、マックアプリ
難易度: Easy
コスト: High

githubGitHubあり2026-07-02

langextract — A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

LLMを使用して、自然言語処理における情報抽出を行うためのPythonライブラリです。

自然言語処理大規模言語モデル画像テキスト

用途: 自然言語処理情報抽出
難易度: Easy
コスト: High

githubGitHubあり2026-06-30

mxcp — Model eXecution + Context Protocol: Enterprise-Grade Data-to-AI Infrastructure

データをAIに変換する基盤を構築することで、ビジネス上の問題を解決できます。この研究では、Model eXecution + Context ProtocolであるMXCPを提案し、データの変換を簡素化した上で、AIアプ

用途: データをAIに変換する基盤を構築することによって、ビジネスを改善する
難易度: Easy
コスト: High

githubGitHubあり2026-06-30

ComfyUI-LTXVideo — LTX-Video Support for ComfyUI

医療画像分析で、深層學習モデルが実装されている問題に対する解決策を提示します。治療を導くために、批判的結果に影響を与える変化について特に重点が置かれています。

生成AI拡散モデル生成画像テキスト

用途: 医療画像を分析し治療を導く
難易度: Easy
コスト: High

品質予測/異常検知深層学習軽量化・量子化生成画像テキスト

githubGitHubあり2026-06-25

ml-mdm — Train high-quality text-to-image diffusion models in a data & compute efficient manner

Train high-quality text-to-image diffusion models in a data & compute efficient manner

用途: 生成
難易度: Easy
コスト: High