screenpipe — YC (S26) | AI that knows what you've seen, said, or heard. Records everything you do, say, hear 24/7, local, private, secure
ユーザーの行動を認識し、オートエージェントを構築するためのツール。
- 用途
- オートエージェント構築
- 難易度
- Easy
- コスト
- High
「text」の検索結果
190 件ユーザーの行動を認識し、オートエージェントを構築するためのツール。
🤗 Transformersは、テキスト・ビジョン・音声など複雑なモデル定義をサポートするフレームワークで、インフェレンスターやトレーニングに使用できる。
paperless-ngxは、コミュニティによってサポートされたスーパーチャージドのドキュメント管理システムで、ドキュメントのスキャン・インデックス・アーカイブが可能である。
.diffusion モデルのライブラリ。画像・動画・音声生成に利用可能。
データラベル化と注釈化を行うためのツールです。
マシンラーニングシステムの理論と実装に関する本。
このリポジトリは大規モデルの無学習に関するリソースをまとめたものです。
ノードベースのビジュアルプログラミングツールです。
Unsloth Studioは、オープンモデルのトレーニングと実行を支援するWebUIです。このライブラリは、Gemma4、Qwen3.5などのオープンモデルのテストとトレーニングを支援するために使われます。
SGLangは、大規模言語モデルのサービングフレームワークです。このライブラリは、高性能なサービスフレームワークで、大規模言語モデルのサービングをサポートしています。
SANAは、高解像度画像生成モデルSANAを紹介する本研究であり、低計算コストで優れた高解像度画像を生成できる。
長時間のビデオ生成を実現するためのモデルのサポートを紹介している。
オープンソースのAIオーケストレーションフレームワークです。LLMアプリケーションの構築に必要なパイプラインやエージェントワークフローの設計ができるようになっています。
このリポジトリでは、トークナイザーの最適化を提供しています。
電気生理信号から表現を学習し、脳コンピューターインターフェースの開発を支援する。
ドキュメントを構造化するために使えるオープンソースのETLソリューション。
LLMを利用するために、セマンティック検索やLLMのオーケストレーションなどを行えるフレームワーク。
テキスト分析、センチメント分析や単語分割などを行えるライブラリ。
自然言語から機械設計や技術図案などの正確な構成を作成することができるシステムを開発しました。このシステムは、Geometric Constraintsを満たす正確な構成を作成するために、Constraint DSL (D
分析研究は、LLM推論速度を速めるため、トークン、レイヤー、ヘッド、次元、注意パターンの削減技術である削減技術を適用し、広範なパラダイムとして成長しています。削減方法の実装によって、実現された加速の度合いは、ハードウェア
Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with
Clinical early warning systems built on electronic health records, in which clinical observations are recorded
この論文では、数値形式の標準化を提案する。これにより、数字の解釈と操作がより効率的に行える。
Existing sparse attention and KV cache compression methods for long-context LLM inference typically apply fixe
Scene Graphs (SGs) provide structured representations of visual scenes by modeling objects and their pairwise
可勉強のターブルの信号に関する表現モデルが、異なるトレーニングパラダイムを持つモデルを評価しやすくする基準であるTRL-Benchを提案している。
Dynamic origin-destination (OD) flow generation seeks to synthesize realistic mobility dynamics from temporal
この研究では、複数の時系列予測を合わせたモデルを使用して、個々の時系列の特性を考慮した予測を行うFAMEを提案します。このモデルは、個々の時系列の特性を考慮することで、より正確な予測が可能になります。
Ensuring the reliability of Large Language Models (LLMs) under distribution drift requires inference-time adap
Court simulation bridges legal education and judicial practice, yet human-based simulations are costly and dif
この研究では、低リソース言語や絶滅言語の辞書のデジタル化が重要であるが、マルチモーダル辞書をデジタル化する方法は今まで難しかったが、この研究では、最近のビジョン言語モデルを用いて辞書のデジタル化が容易になり、辞書内の文字
Large language model agents increasingly rely on skills: reusable procedural documents encoding workflows, too
As large language models (LLMs) are increasingly applied to real-world legal tasks, evaluating the reliability
Large language models (LLMs) sometimes exhibit language confusion when generating non-English text. Existing a
Reasoning Vision-Language Models (VLMs) achieve strong performance on complex multimodal tasks, but reliable r
Clinical ultrasound images often contain artificial markers, such as measurement calipers and text, to assist
Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with r
データをAIに変換する基盤を構築することで、ビジネス上の問題を解決できます。この研究では、Model eXecution + Context ProtocolであるMXCPを提案し、データの変換を簡素化した上で、AIアプ
マルチラギングスピーチ生成やクリエイティブボイスデザイン、ルートライフクライミングなど、テクスチャファリーTTSの最新技術を実現するためのフレームワークです。
Despite the success of image generation from text descriptions, it still faces challenges that are difficult t
Simulation plays a key role in automated robotics research supported by large language models (LLMs). However,
Mathematical reasoning has long served as a stringent test of machine intelligence; over the past decade, it h
Recently, large time series models (LTSMs) have gained increasing attention due to their similarities to large
Symbolic music evaluation for large language models remains fragmented across representations, datasets, and m
Chain-of-thought (CoT) reasoning has proven effective for enhancing problem-solving in large language models.
Multi-contrast brain MRI provide complementary soft-tissue characteristics that aid in the screening and diagn
On-policy distillation (OPD) has become a central post-training tool for large language models (LLMs), providi
presidioは、テキスト、画像、構造化データを含む敏感データを検出、削除、マスク、アノニマイズするオープンソースフレームワークです。自然言語処理、パターンマッチング、カスタマイズ可能なパイプラインをサポートします。
LLM agents increasingly rely on external inference conditions: prompts, tools, memory, SOPs, skills, and harne
Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet the
Diffusion language models (DLMs) offer substantial speed advantages through parallel decoding, but the lack of
Current open-weight large language models (LLMs) are prone to malicious finetuning attacks, which could compro
Human evaluation plays a critical role in assessing the quality of generated text. However, the reliability an
MRI preprocessing defines the input distribution seen by brain MRI foundation models, yet it is usually treate
Designing 3D metamaterial microstructures that meet the intended functions remains a major challenge, as it ty
Recent agent frameworks such as Claude Code, Codex, and OpenClaw are strong at tool use and orchestration, but
大規模言語モデルのテスト時間調整に関する調査のリポジトリ。
分類問題では、多くの場合、ラベルは存在しないため、従来の学習アルゴリズムでは困難に感じられるが、In-Context Multiple Instance Learningという手法を使用することで、低ラベル環境で効率的に
このリポジトリは自然言語処理(NLP)に関するリソースをまとめたものです。
Adapting large language models (LLMs) to clinical workflows often requires costly fine-tuning or manual prompt
この論文では、VLAモデルをedgeハードウェアにデプロイするための手法を提案しています。この手法は、VLAモデルをedgeハードウェアにデプロイするためのフレームワークです。この手法は、edgeハードウェアを利用してV
Repository-level coding benchmarks such as SWE-bench have driven a rapid surge in the capabilities of coding a
On-policy distillation (OPD) is increasingly used to improve large language model reasoning, but its training
We present SigmaScale, a method for learning auxiliary scaling matrices S to aid truncated Singular Value Deco
Large language models exhibit impressive zero-shot capabilities across a wide range of downstream tasks. Howev
We introduce MMAE, a Massive Multitask Audio Editing benchmark, serving as the first comprehensive evaluation
Despite being a pivotal frontier, interactive world modeling remains underexplored in terms of the versatile c
Video understanding is being rapidly transformed by multimodal large language models (MLLMs), as research move
We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that model
Retrieval for search agents is still inherited from non-agentic information retrieval: a retriever ranks the c
Despite advances in 3D scene understanding, existing 3D Large Multimodal Models operate in offline settings, r
Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the
Developers increasingly use AI tools such as ChatGPT, Copilot, and Claude in everyday software workflows, but
Hard-negative source selection for dense retrieval is usually decided only after fine-tuning and downstream ev
Harmony is a compact symbolic layer where mathematical pitch relations, acoustic consonance, and musical conve
この論文では、Causal-Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive
TorchKM is an open-source library for kernel machines, including support vector machines, kernel logistic regr
この研究では、人間-ロボット 協力のためのDistributed Conversational Frameworkを提案します。
Agent systems increasingly use textual skills to encode reusable task procedures, but injecting these skills i
Linear activation steering has gained popularity as a simple and empirically effective way to control language
Retrieval-augmented QA pipelines often route retrieved passages through an LLM rewriter before a smaller reade
Latent visual reasoning (LVR) inserts supervised latent tokens between perception and answer generation in vis
Evaluating LLM mediators remains challenging, as mediation unfolds as a real-time trajectory shaped by disputa
Object insertion aims to seamlessly composite a reference object into a specified region of a background image
Self-evolving agents requires adaptation after deployment, but existing approaches assume a usable learning lo
Persistent AI assistants, such as OpenClaw, accumulate large collections of related memories over long-term in
We introduce UnpredictaBench, an evaluation that tests the ability of large language models (LLMs) to capture
While Vision-Language Models (VLMs) have shown strong visual reasoning capabilities, their spatial reasoning a
Causal graphs provide a high-level language for making mechanisms transparent. Recent work uses Large Language
Despite the rapid progress of Vision-Language Models (VLMs), the field lacks benchmarks that rigorously diagno
We study the transformation of autoregressive models (ARLMs) into diffusion language models (DLMs). Rather tha
In real-world applications, models are expected to perform reliably across diverse settings. Yet, many existin
Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing
Role-playing language agents (RPLAs) should play characters whose values and behavior evolve as the story prog
Planning for real-world problems by language models often involves both world and user constraints, which may
Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs i
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by under
Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery
Large language models can reproduce training data, but existing memorization evaluations mostly measure whethe
Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predo
Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the
Video event prediction (VEP) requires models to infer unobserved future states from partial video evidence. Ex
Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, r
Large language models are increasingly used to simulate social media users and infer how individuals may respo
Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit
Emotion-driven Style Controlを使用してテキストから声の変換が実行され、感情のあるテキストをエモタイザブルな声に変換することが可能になります。
Diffusion models have demonstrated strong performance in time series modeling due to their ability to progress
Muon improves training efficiency over Adam in large language-model training by about two times, but the local
Large language models are increasingly evaluated by other models, raising a natural question: can a model pred
Vision language models (VLMs) excel at many tasks but still struggle with spatial reasoning when critical info
Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on
We introduce VideoKR, the first large-scale training corpus specifically designed to strengthen knowledge- and
Experience internalization converts contextual experience from past interactions into reusable parametric capa
We study the personal camera roll visual question answering setting. In this setting, a conversational AI assi
System prompt optimization improves agent behavior without modifying the underlying model, yielding human-read
Processing video in vision-language models is expensive: each frame occupies hundreds of tokens, and inference
Learning representations of CAD models is a largely open problem. While 3D representation learning has flouris
Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and
Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forw
Autoregressive mesh generation has gained attention by tokenizing meshes into sequences and training models in
Training Data Attribution (TDA) seeks to trace a model's predictions back to its training data. The gold stand
Large language models (LLMs) are increasingly proposed as clinical agents, yet static, single-turn benchmarks
Instruction-guided speech editing requires a model to modify specified speech attributes while preserving unre
Text-to-image models rely on text prompts as their primary interface to human intent. Prompts are encoded by a
Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge
Selection is a core operation in interactive image editing. To be practical, a user should be able to specify
Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet
Recent progress in Large Language Model (LLM) agents has enabled promising advances in automated data science.
Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet
Multimodal agents in robotics, AR, and autonomous driving must reason about places and layouts from continuous
On-policy self-distillation, where a language model conditions on privileged context to supervise its own gene
High-quality pretraining data is a central ingredient in modern language models, but German-language resources
Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize infor
Wide-baseline matching (WBM) requires integrating geometric understanding, viewpoint changes, fine-grained per
We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video
Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs
Existing benchmarks for MLLM-generated web artifacts assess interaction through local evidence and miss the re
Structured financial audit verification is difficult for language-model agents because correctness depends on
Computer-use agents extend language models from text generation to sustained interaction with files, terminals
Large language model (LLM) agents are evolving from request-response assistants into long-running software act
Graph Language Models (GLMs) have become a promising direction for adapting Large Language Models (LLMs) to gr
Training and scaling Large Language Models demand enormous computational resources, motivating both efficient
Large language models improve final-answer accuracy through extended chain-of-thought reasoning, but often spe
Agentic language model systems alternate between two structurally distinct step types: structured tool calls (
Large language models (LLMs) have recently been adopted as synthetic agents for public opinion simulation, off
Financial AI agents often fail for a simple reason: they make users carry the complexity. A user must repeated
Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existin
Agentic LLMs with web search change the threat model for text anonymization: weak contextual cues can become c
We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, i
On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward more s
Abundant procedural knowledge on the Web holds great potential for helping agents solve long-horizon tasks. Ho
このリポジトリでは、Lecture Learning Modelsに対してReinforcement Learningを実行するライブラリを提供しています。
Large language models are increasingly deployed as coding agents, shifting safety from individual responses to
The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the
Open-dLLMはOpen diffusion language modelを公開しており、コード生成の前トレーニング、評価、推論、チェックポイントを公開しています。
Reinforcement learning with verifiable rewards has rapidly advanced reasoning in vision--language models. Howe
Agentic search systems iteratively interact with retrieval models to answer complex queries. Despite substanti
AI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genui
AI benchmarks have well-documented limitations, with prior work examining contamination, saturation, and const
Large Language Models exhibit paradoxical fragility in fundamental arithmetic, implying a disconnect between i
Multimodal Large Language Models (MLLMs) have demonstrated significant achievements in general visual question
Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as the cornerstone for shaping the
Speech translation systems increasingly span speech-to-text translation (S2TT), speech-to-speech translation (
Humans can effortlessly perceive spatial layouts, form cognitive representations, reason about spatial relatio
このリポジトリはChatGPT、GPT-3、FlanT5などのLLMsの在り方や、in-context learningとprompt engineeringのリソースをまとめたものです。
ポーカーはIAの代表的な問題です。しかし、強いエキスパートレベルを達成するために、長時間にわたるトレーニングと解釈が必要とされてきました。LLMを使用すると、トレーニングやソルバーが不要となり、ポーカーをプレイすることが
Memory-augmented LLM agents tackle complex long-horizon tasks by recursively summarizing interaction trajector
Music recommendation systems typically treat songs as opaque tokens, relying on collaborative interaction hist
We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision b
Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all ex
Language models can use verifiable rewards to improve at a wide variety of reasoning tasks. However, both para
Efficient inference is critical for long-context language models, where attention computation and KV-cache acc
画像エディティング用推論モデルの改良方法についての公式実装であるFlowEdit。
MemVidは、サーバーレスで単一ファイルの記憶層を提案し、AIエージェントが即時検索と長期的な記憶を持つようにする記憶層です。
We present DEI: Diversity in Evolutionary Inference, a distributed Quality-Diversity (QD) search framework tha
Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple eva
Matcha-TTSは、高速で条件付き流のマッチングを実現するTTSアーキテクチャであり、話者の特徴を考慮する。
CVPR 2023で発表されたCustom Diffusionは、テキストから画像を生成するプロセスをカスタマイズできるDiffusionモデルです。テキストからイメージを生成する際の要件を設定できるので、画像生成の柔軟
PaddleNLPは、分類モデルと言語モデルを簡単に使用できる強力なライブラリであり、モデルズーという素晴らしいモデル・ザーのコレクションを備えています。
Vision-Language Models (VLMs) are increasingly deployed in embodied environments, where they need produce nume
rasaは、テキストやボイスベースの会話を自動化するオープンソースの機械学習フレームワークです。自然言語理解(NLU)、会話管理、 slackやFacebook等への接続など、幅広い機能を提供しています。
LLMを使用して、自然言語処理における情報抽出を行うためのPythonライブラリです。
LLM(大規模言語モデル)を利用してテキストパラメータを最適化するシステムを提案しました。このシステムは、単一のシステムでさまざまなタスク(単一タスク、複数タスク、未知の入力など)を実行可能でした。また、システムは、最適
💫 Industrial-strength Natural Language Processing (NLP) in Python
VidCom2は、ビデオ圧縮を改善するためのPlug-and-Playのインフェレンスアクセレレーションを備えたVideo Large Language Modelsです。
Becoming a cracked AI/ML Research Engineerには、AI/ML研究者のスキルと知識を高めるための手法が紹介されています。
We present ARES-LSHADE, a memetic differential-evolution variant submitted to the GECCO 2026 competition on LL
CoupleEvoは、大規模言語モデルを活用したカップルの最適化問題の自動ヒューリスティクーデザインアプローチを提案します。3つの進化的調整戦略が提示されます。
Speech-based large language models are typically constrained to spoken replies, which limits their user-facing
Diffusion-based image editing has achieved strong visual fidelity under natural language instructions, yet mos