MLinfo | 機械学習・AI論文まとめ

コンピュータビジョン物体検出分類検出セグメンテーション

ultralytics — Ultralytics YOLO26, YOLO11, YOLOv8 — object detection, instance segmentation, semantic segmentation, image classification, pose estimation, object tracking

ultralyticsはYOLO(You Only Look Once)の技術を使用したオブジェクト検出ライブラリで、高い精度を提供している。

用途: オブジェクト検出
難易度: Easy
コスト: Low

コンピュータビジョン物体検出分類検出セグメンテーション

yolov5 — Ultralytics YOLOv5 in PyTorch for object detection, instance segmentation, classification, training, and export.

YOLOv5という物体検出アルゴリズムをPyTorchから他の言語に変換できるライブラリ。

用途: 物体検出
難易度: Easy
コスト: High

コンピュータビジョン物体検出分類セグメンテーション画像

label-studio — Label Studio is a multi-type data labeling and annotation tool with standardized output format

データラベル化と注釈化を行うためのツールです。

用途: データラベル化ツール
難易度: Easy
コスト: Low

品質予測/異常検知コンピュータビジョンセグメンテーション分類検出画像

cvat — Computer Vision Annotation Tool (CVAT) is a leading platform for building high-quality visual datasets for vision AI. It offers open-source, cloud, and enterprise products, as well as labeling services, for image, video, and 3D annotation with AI-assisted labeling, quality assurance, team collaboration, analytics, and developer APIs.

CVATは、機械学習用の業界標準のデータエンジンです。さまざまなスケールのチームが使用し、さまざまなスケールのデータに対応しています。

用途: データのラベル付けと管理
難易度: Easy
コスト: High

品質予測/異常検知機械学習教師あり学習分類検出画像

fiftyone — Refine high-quality datasets and visual AI models

FiftyOneは、データセットの精査とAIモデル可視化を支援するライブラリです。このライブラリは、データセットの品質を高め、AIモデルを可視化するのを支援するために使用できます。

用途: データセットの精査とAIモデル可視化
難易度: Easy
コスト: Low

FunASR — Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenAI-compatible/MCP serving.

電気生理信号から表現を学習し、脳コンピューターインターフェースの開発を支援する。

深層学習Transformer分類検出テキスト

用途: 電気生理信号から表現を学習する
難易度: Easy
コスト: High

表形式向き深層学習Transformer分類検出画像

presidio — An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

presidioは、テキスト、画像、構造化データを含む敏感データを検出、削除、マスク、アノニマイズするオープンソースフレームワークです。自然言語処理、パターンマッチング、カスタマイズ可能なパイプラインをサポートします。

用途: データのプライバシーを保護する
難易度: Easy
コスト: Low

3D-Aware VLMs with Implicit and Explicit Geometries

3次元空間理解技術のための新しいアプローチであるVLM-IE3D（Vision-Language Models with Implicit and Explicit 3D geometry）を提案しました。VLM-IE3

コンピュータビジョン3D・点群検出画像テキスト

用途: 3次元空間理解技術の開発
難易度: Hard
コスト: High

When Are Reasoning-Based Guardrails Not Efficient? ResponseGuard: A Fast Vision-Language Guard for Real-Time Moderation

A vision-language AI assistant returns its answer as a stream of generated tokens. Therefore, a safety guard t

深層学習軽量化・量子化検出画像テキスト

用途: 検出
難易度: Hard
コスト: High

HGeo-TopoMap: Boosting Topological Mapping with Hierarchical Geometric Priors

Topological maps are key outputs of autonomous driving perception systems, delivering essential road informati

深層学習Transformer検出

用途: 検出
難易度: Easy
コスト: Low

Explainable Deepfake Detection Challenge

Deepfake detection is moving beyond binary classification decisions toward systems that can also explain the v

説明可能コンピュータビジョン画像分類分類検出生成

用途: 分類
難易度: Easy
コスト: Low

CPUで試しやすい機械学習教師あり学習分類検出回帰

githubGitHubあり2026-07-23

pycaret — Open-source, low-code AutoML platform for Python. PyCaret 4.0: sklearn-native engine + React control plane.

pycaretは、Pythonによるオープンソースの低コストオートMLプラットフォームで、Reactコントロールプレーンを備えたsklearnネイティブエンジンを搭載しています。

用途: オートMLプラットフォーム
難易度: Easy
コスト: Low

センサ/時系列深層学習軽量化・量子化検出セグメンテーション埋め込み

arxivGitHubあり2026-07-22

Not All Patches are Equal: Sampling Matters for Visible-Infrared Pre-Training

Visible-infrared (VIS-IR) alignment is a key pre-training task for robust multi-sensor perception. Most existi

用途: 検出
難易度: Hard
コスト: High

品質予測/異常検知深層学習軽量化・量子化検出セグメンテーション動画

arxivGitHubあり2026-07-22

Efficient Tracking and Understanding Object Transformations

Tracking objects through state transformations is essential for understanding real-world dynamics. However, ex

用途: 疼痛位置
難易度: Hard
コスト: High

自然言語処理プロンプトエンジニアリング検出画像テキスト

arxivGitHubあり2026-07-22

ReferTrack: Referring Then Tracking for Embodied Visual Tracking

ReferTrack は、自然言語で対象の車両に付近する自動車を追従させるシステムである。このシステムでは、対象の車両に付近する自動車を認識する後、自動車の動きを予測する。

用途: 自動車が対象の車両に付きそわせるシステム
難易度: Hard
コスト: High

コンピュータビジョン物体検出分類検出セグメンテーション

githubGitHubあり2026-07-22

supervision — We write your reusable computer vision tools. 💜

supervisionは、機械学習技術を活用して、ユーザー独自のコンピュータビジョンツールを作成することができる。

用途: オリジナルコンピュータビジョンツール
難易度: Easy
コスト: High

githubGitHubあり2026-07-22

insightface — State-of-the-art 2D and 3D Face Analysis Project

このプロジェクトは２Ｄおよび３Ｄ顔の分析を実現するための基盤プロジェクトであり、最先端の技術を導入して顔の分析を実現します。

コンピュータビジョン3D・点群分類検出3D

用途: 面量認証
難易度: Easy
コスト: High

arxivGitHubあり2026-07-21

Detect Early, Escalate Rarely: Anytime Detection of AI-Generated Video from the Compressed Bitstream

Detectors for AI-generated video are evaluated offline. A clip is decoded to pixels and scored once, increasin

CPUで試しやすい深層学習CNN検出画像テキスト

用途: 検出
難易度: Hard
コスト: High

センサ/時系列コンピュータビジョン動画認識検出生成画像

arxivGitHubあり2026-07-21

NGPS: GPS-Denied Aerial Geo-Localization and 2.5D Reconstruction via Deep Satellite Image Matching and Multi-Rate Sensor Fusion

この研究では、高空飛行の無信号位置指示のNGPS (Next-Generation Positioning System)というフレームワークを提案しました。NGPSは、GPSの信号を利用せずに位置推定を可能にします。N

用途: 高空飛行の無信号位置指示
難易度: Hard
コスト: High

センサ/時系列品質予測/異常検知自然言語処理ファインチューニング検出画像

arxivGitHubあり2026-07-20

Polar Coordinate-based Differential Evolution for Moving Target Search Using Vision Sensor on Unmanned Aerial Vehicles

In search and rescue operations, there is a period known as the "golden time" during which the probability of

用途: 検出
難易度: Easy
コスト: Medium

huggingfaceGitHubありHugging Faceあり2026-07-20

Differentiable Logic Gate Networks for Low-Latency EEG Classification on Edge Devices

Real-time EEG classification on edge devices is bottlenecked by the floating-point arithmetic of conventional

CPUで試しやすい強化学習マルチエージェント分類検出

用途: 分類
難易度: Easy
コスト: Low

huggingfaceHugging Faceあり2026-07-20

FlowMimic: Mask-free Visual Editing and Generation with Pixel-pair Warped Flow Field for Online Video Editing Data Generation and Modality Mimicry

In line with the prevailing direction of vision research, we explore the integration of both generation and ed

品質予測/異常検知自然言語処理大規模言語モデル検出生成セグメンテーション

用途: 検出
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-20

Token-Level Off-Policy Learning for Faithful Generation Under Distribution Shift

We propose Token-Level Off-Policy Labeling (TOPL), an off-policy training paradigm that reframes post-training

説明可能自然言語処理ファインチューニング分類生成異常検知

用途: 分類
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-20

Self-State Attacks on Self-Hosted AI Agents: How Far Can OS Defenses Go?

Self-hosted AI agents read and write their own memory and configuration files to function. An agent may get co

深層学習Transformer検出

用途: 検出
難易度: Easy
コスト: Medium

huggingfaceHugging Faceあり2026-07-19

TimeLens2: Generalist Video Temporal Grounding with Multimodal LLMs

Video multimodal large language models (MLLMs) can describe what happens in a video, but rarely identify when

自然言語処理大規模言語モデル検出テキスト動画

用途: 検出
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-16

Trajectory-aware Cross-view Geo-localization with Sequential Observations

Cross-view geo-localization matches ground-level observations against geo-tagged satellite imagery. Recent met

品質予測/異常検知深層学習軽量化・量子化検出画像テキスト

用途: 検出
難易度: Easy
コスト: High

huggingfaceGitHubありHugging Faceあり2026-07-13

RAGU: A Multi-Step GraphRAG Engine with a Compact Domain-Adapted LLM

Graph retrieval-augmented generation (GraphRAG) enhances large language models with structured knowledge, yet

自然言語処理大規模言語モデル検出生成要約

用途: 検出
難易度: Easy
コスト: High

huggingfaceHugging Faceあり2026-07-10

REBASE: Reference-Background Subspace Elimination for Training-Free In-Context Segmentation

Training-free in-context segmentation enables new object categories to be introduced at inference time from a

品質予測/異常検知自然言語処理プロンプトエンジニアリング検出セグメンテーション画像

用途: 検出
難易度: Easy
コスト: High

深層学習Transformer分類検出セグメンテーション

githubGitHubあり2026-07-10

pytorch-grad-cam — Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

このライブラリは、コンピュータービジョンのための高度なAI解釈と可視化ソリューションです。このライブラリは、CNN、ビジョントランスフォーム、分類、物体検出、分割、画像類似度など、さまざまなコンピュータービジョンの

用途: AIの解釈と可視化ソリューション
難易度: Easy
コスト: Low

huggingfaceGitHubありHugging Faceあり2026-07-05

Benchmarking Sensor Robustness in Plasma Diagnostic Models: A Systematic Evaluation on TokaMark

Plasma diagnostic models for tokamak fusion devices are almost universally evaluated on clean, complete sensor

表形式向きCPUで試しやすいセンサ/時系列深層学習Transformer検出

用途: 検出
難易度: Easy
コスト: Medium

githubGitHubあり2026-06-25

face_recognition — The world's simplest facial recognition api for Python and the command line

facial_recognitionライブラリはPythonとコマンドラインでface_recognition APIを提供します。ライブラリはOpenCVのdlibライブラリを利用し、顔認識を単純に扱います。

機械学習教師あり学習分類検出

用途: 面貌認識システムを構築する
難易度: Easy
コスト: Low