๐Ÿงช

11ku7-ai

Research Team Based in India

Exploring AI-assisted development tools, authentication systems, and code analysis technologies.

๐Ÿ‡ฎ๐Ÿ‡ณ India ๐Ÿค– AI Research ๐Ÿ’ป Open Source

Current Project

๐Ÿš€

11ku7-ai-nodecoder

AI Coding Tool For Hobbyists & Researchers

An autonomous AI developer with Agent Mode, MCP integration, GitHub support, and 22+ customizable themes. Generate, edit, and deploy code directly from your terminal.

View Project โ†’

Learnings & Outcomes

๐Ÿ› ๏ธ Technology Stack

Node.js

Runtime

Express.js

Web Framework

MongoDB

Database

JWT

Authentication

Socket.IO

Real-time Comm

Neo-Blessed

Terminal UI

Marked

Markdown Parser

Highlight.js

Syntax Highlighting

AI Providers: OpenRouter, Gemini, OpenAI, Anthropic, Grok, Groq, Ollama

๐Ÿ” Device Locking Mechanism

We implemented a secure device-locking system that ties contribution logins to specific devices, preventing unauthorized sharing while maintaining a smooth user experience.

How It Works:

  1. Device Fingerprint: Generate unique device identifier using hardware characteristics
  2. Login Identifier: Store device-username binding in MongoDB collection
  3. Token Validation: Each request validates both JWT token and device fingerprint
  4. Single Device Lock: One active login per user - switching devices requires logout

๐ŸŽซ JWT Authentication

Implemented secure stateless authentication using JSON Web Tokens with automatic expiry handling and session management.

Token Features

  • 30-days expiry for contribution login access
  • 2-hour expiry for community keys
  • Automatic background validation
  • Secure key storage in .env

Security Measures

  • Bcrypt password hashing
  • Rate limiting (express-rate-limit)
  • CORS protection
  • Server-side session revocation

๐Ÿ”ฎ Holograph - Universal Code Analysis Engine

Our most innovative creation - the Holographic Lexer provides language-agnostic code analysis with intelligent symbol tracking and type inference.

Core Components

  • UniversalLexer: Polyglot tokenizer
  • FlowGraph: Symbol graph builder
  • Diagnostics: Unused/undefined detection
  • Linker: Cross-file resolution

Supported Languages

  • JavaScript / TypeScript
  • Python
  • Java / C / C++
  • Go / Rust / Ruby

Key Innovations:

  • ๐Ÿ“Š Heap-based Symbol Tracking: Every definition gets a unique heap ID for refactoring
  • ๐Ÿ”— Type Inference: Automatically detects types from constructors, assignments, and return values
  • ๐Ÿ“ JSDoc Parsing: Extracts @param, @returns, @type annotations
  • ๐ŸŽฏ Scope Boundary Detection: Tracks function/class boundaries for context-aware analysis
  • ๐Ÿ Indentation Languages: Special handling for Python/YAML via virtual brace injection

๐Ÿง  ReAct Agent Architecture

The core of our Agent Mode - a ReAct (Reasoning + Acting) loop that enables autonomous task completion with intelligent decision-making.

Agent Loop Cycle:

  1. Observe: Gather context from project structure, memory, and external sources
  2. Think: Generate reasoning about the current state and next action
  3. Act: Execute a tool (file edit, shell command, search, etc.)
  4. Reflect: Evaluate result and update memory for next iteration

Key Features

  • Up to 500 autonomous steps in Auto Mode
  • Persistent session memory across turns
  • Efficiency protocol to prevent redundant actions
  • Scrutiny system for human-like decision making

Memory System

  • Hot memory: Active session cache
  • Persistent memory: Cross-session archival
  • Memory hydration from past turns
  • Relevant file detection from history

๐Ÿ”Œ MCP (Model Context Protocol) Integration

Extend the agent's capabilities by connecting to external MCP servers for specialized tools and knowledge bases.

MCPClientManager

  • Multi-server connection management
  • StdioClientTransport for local servers
  • Auto server selection based on query
  • Generic tool invocation via schema

Capabilities

  • Memory persistence (store/retrieve)
  • External API access
  • Custom tool definitions
  • JSON schema argument building

Config: ~/.nodecoder/mcp-servers.json | SDK: @modelcontextprotocol/sdk

๐Ÿ“„ PDF Vision Pipeline

A unique approach to PDF understanding - convert PDF pages to high-quality images and leverage vision models for analysis.

Pipeline Flow:

  1. PDF Attachment: User attaches PDF via /browse command
  2. Page Conversion: pdftoppm renders pages at 300 DPI with anti-aliasing
  3. Vision Analysis: Agent uses read_pdf_page_visual tool to request specific pages
  4. AI Interpretation: Vision model extracts text, diagrams, tables from image

Technical Details

  • Uses poppler-utils (pdftoppm)
  • 300 DPI high-quality rendering
  • Anti-aliasing enabled (-aa yes)
  • Supports multi-page PDFs

Why Vision over OCR?

  • Preserves layout and structure
  • Understands diagrams and charts
  • Handles complex formatting
  • No OCR library dependencies

๐Ÿ” Web Search & Browser Integration

One important learning was that live retrieval works better when the agent uses a real browser execution path instead of relying on static search text alone. The current implementation uses built-in Chrome/Chromium browser automation to search the live web, open result pages, and read page text directly.

What We Built

  • Built-in live web search
  • Chrome/Chromium browser automation via CDP
  • Search result page parsing and filtering
  • Direct page-text extraction for follow-up reading
  • Works as an agent tool, not just a UI feature

What We Learned

  • Fresh information needs browser-grounded retrieval
  • Separating search and page-reading improves agent reasoning
  • Real browser automation is more robust for dynamic sites
  • Direct content extraction reduces hallucinated summaries
  • Web access becomes more useful when tied into tool workflows

Core flow: web_search โ†’ open result pages โ†’ browser_read_page style page text extraction

โฐ Cron Job Scheduling

Universal task scheduler that allows the agent to execute autonomous tasks on a recurring basis. Supports natural language scheduling and persistent background execution.

Smart Scheduling

  • Natural language to Cron conversion
  • "every 5 minutes", "every day at 1pm"
  • Support for complex Cron expressions
  • Common schedule presets built-in

Robust Persistence

  • JSON-based job persistence
  • Concurrency guards (no overlap)
  • Job toggling (Enable/Disable)
  • Automatic recovery on restart

Cron tools: manage_cron_job (add, list, remove, toggle)

๐Ÿข Google Workspace Integration

Seamless productivity suite integration allowing the agent to interact with your workspace data securely through authenticated API calls.

๐Ÿ“ง Gmail

  • Send/Read emails
  • Query-based searching
  • Snippet extraction

๐Ÿ“… Calendar

  • Read upcoming events
  • Create new events
  • Attendee management

๐Ÿ“‚ Drive & Docs

  • Search/Read Drive files
  • Create/Edit Google Docs
  • Batch formatting updates

๐Ÿ“Š Sheets

  • Read/Write cell ranges
  • Create new spreadsheets
  • Advanced batch updates

GWS tools: send_gmail, read_emails, read_calendar, create_calendar_event, search_drive, read_drive_file, read_sheet, write_sheet...

๐Ÿ› ๏ธ Core Agent Tools

One of the biggest implementation lessons was that tool architecture needs clear layering. In practice, custom agents run on a configurable essential tool layer, while authenticated server agents unlock a larger premium tool surface from the registry.

Custom Agent Essential Tools

These are the built-in essential tools exposed through the local essential-tools layer and configurable inside nodecoder-agents.js.

  • execute_shell
  • load_file_full
  • web_search
  • browser_read_page
  • general_chat
  • provide_code_analysis
  • read_pdf_page_visual
  • manage_cron_job
  • send_gmail
  • read_emails
  • read_calendar
  • create_calendar_event
  • search_drive
  • read_drive_file
  • read_sheet
  • write_sheet
  • create_sheet
  • batch_update_sheet
  • create_doc
  • read_doc
  • insert_text_doc
  • batch_update_doc
  • computer_screenshot
  • computer_mouse_move
  • computer_mouse_click
  • computer_keyboard_type
  • computer_keyboard_press

What This Taught Us

  • Custom agents are not limited to prompts - they also inherit a configurable base tool policy
  • Essential tools cover execution, analysis, search, PDF vision, scheduling, Workspace, and computer control
  • The agent can run server-free while still keeping a rich built-in capability set
  • Tool layering makes it easier to reason about local vs hosted agent behavior

Server Agent Tools: Core Editing

  • download_asset
  • scan_vulnerabilities
  • get_file_dependencies
  • find_symbol_definition
  • run_diagnostics
  • generate_edit_plan
  • apply_change_interactive
  • apply_multi_file_change

Server Agent Tools: GitHub

  • github_create_repo
  • github_list_repos
  • github_delete_repos
  • github_explore_repo
  • github_load_file_full
  • github_apply_change_interactive
  • github_write_file
  • github_read_file
  • github_create_issue
  • github_list_issues
  • github_read_issue
  • github_comment_issue
  • github_close_issue
  • github_create_branch
  • github_create_pr
  • github_sync_changes

Registry Learning

  • Server-agent tools are layered on top of the essential tool base
  • The registry separates local essentials from authenticated premium tools
  • Custom agents can stay offline, while server agents gain the broader editing and GitHub surface
  • This split made the architecture easier to extend without breaking local-first flows

๐Ÿ–ฅ๏ธ Cross-Platform PTY Terminal

A dual-implementation persistent terminal system that provides true pseudo-terminal (PTY) capabilities across platforms, enabling the agent to run interactive shell sessions.

๐Ÿง Linux/macOS (Python PTY)

  • Uses Python's pty module for Unix PTY
  • Spawns shell via pty.spawn()
  • Full terminal emulation (bash/zsh)
  • Signal forwarding (Ctrl+C โ†’ SIGINT)
  • Works on Termux (Android) natively

๐ŸชŸ Windows (node-pty)

  • Uses node-pty native addon
  • Spawns PowerShell with no profile loading
  • ConPTY integration for true Windows PTY
  • Custom prompt suppression for clean output
  • ANSI escape code stripping

Key Architecture Decisions:

  • ๐Ÿ”„ Persistent Sessions: Terminal stays alive across multiple agent tool calls, preserving environment variables and working directory
  • ๐Ÿ“Š Output Buffering: Ring buffer collects output with configurable timeouts and completion detection via exit code markers
  • ๐Ÿงน Smart Cleanup: ANSI escape sequences and control characters stripped from output before feeding back to AI
  • โฑ๏ธ Timeout Protection: Configurable command timeouts (30s default) with automatic process interruption

๐ŸŒ Headless WebUI Mode

A full web-based interface that runs without any terminal TUI, enabling remote access from browsers and mobile devices via Socket.IO real-time communication.

WebUI Features

  • IDE-style layout with file explorer
  • Monaco editor for file editing
  • Real-time chat with markdown rendering
  • Status bar with all TUI controls
  • Mobile-responsive design

Architecture

  • Express + Socket.IO server
  • Automatic setup wizard for auth
  • Gateway token for remote security
  • Cloudflare tunnel support built-in
  • TUI โ†” WebUI shared state bridge

๐Ÿ”ฌ Diff Validation Engine

A multi-pass validation system that catches AI code generation errors before applying changes, using holographic analysis and structural validation.

Validation Passes

  • Structural syntax validation
  • Holographic scope analysis
  • Search/Replace match verification
  • Context-aware error recovery

Error Prevention

  • Catches duplicate inserts
  • Detects missing context lines
  • Validates bracket/brace balance
  • Auto-retry with AI guidance

๐Ÿ›ก๏ธ Review Council

Another major learning was that autonomous coding becomes safer when code generation and code review are separated into independent AI roles. Review Council acts as a second-pass audit layer that inspects proposed diffs before changes are applied.

Independent Audit Role

  • Reviews generated diffs before apply
  • Checks scope against the actual user goal
  • Flags unrelated edits and accidental deletions
  • Verifies plan completeness for the target file

Why It Matters

  • Reduces self-bias from the generating agent
  • Catches silent scope creep before it lands
  • Produces concrete fix guidance for retries
  • Turns review into a first-class agent capability

โ˜๏ธ Server Agent

We learned that centrally delivered prompts and tool definitions make the hosted agent easier to evolve across versions. Server Agent mode loads authenticated, version-matched runtime behavior from the backend instead of depending only on local configuration.

Runtime Delivered by Backend

  • Version-matched prompts
  • Hosted tool definitions
  • Essential tool policy from server
  • Session-based authenticated recovery

Learning Outcome

  • Central control improves compatibility
  • Remote updates reduce client drift
  • Hosted definitions simplify premium rollout
  • Authentication and runtime design can stay connected

๐Ÿ› ๏ธ Custom Agent Architecture

This became one of the strongest architecture learnings in the project. We moved beyond a fixed assistant and made the agent itself programmable through nodecoder-agents.js. A custom agent can redefine prompts, strategies, essential tool policy, and custom tools, allowing fully offline and domain-specific agent runtimes.

What a Custom Agent Can Control

  • Custom system prompt template
  • Workflow strategies: local, hybrid, remote, skills
  • Essential tool enable/disable settings
  • Custom tool definitions and implementations
  • Server-free mode via server_tools: false

Implementation Power

  • Inline tool handlers inside the config file
  • External tool implementations via blueprint files
  • Custom domain runtimes, not just coding presets
  • Can reshape how the orchestrator behaves
  • Works offline with local prompts and tools only

Why It Is More Powerful Than Skills

  • ๐Ÿงฉ Skills extend the workflow by adding instructions and resources for an existing agent runtime.
  • ๐Ÿ› ๏ธ Custom agents redesign the runtime by changing prompts, strategy selection, tool policy, and tool availability.
  • ๐Ÿš€ Result: skills make the agent better at a task, but custom agents let you decide what kind of agent it is in the first place.

๐Ÿ”€ Both Mode

One useful architectural lesson was that users do not want to choose between hosted reliability and local freedom. Both Mode keeps server-delivered tools active while also re-enabling local custom tools and custom strategies from the local agent configuration.

What Gets Combined

  • Authenticated server toolset
  • Local custom tools
  • Local custom strategy prompts
  • Shared essential tool layer

Learning Outcome

  • Hybrid agent architecture is practical
  • Centralized and local extension can coexist
  • Power users keep flexibility without losing hosted capabilities
  • Tool merging matters as much as prompt merging

๐Ÿงฉ Agent Skill System

The skill system taught us how to package reusable workflow knowledge cleanly. Instead of loose prompt fragments, skills are structured around SKILL.md files with metadata, instructions, and optional resource directories that are exposed to the agent on demand.

How It Works

  • SKILL.md metadata + prompt instructions
  • Auto-discovered from configured skills directories
  • Enabled or disabled per session
  • Injected into the agent as the primary workflow guide

Resource Model

  • On-demand scripts/ for executable helpers
  • On-demand references/ docs for reading
  • Optional assets/ for templates or data
  • Progressive disclosure keeps base context lean

โš™๏ธ Daemon Mode

We also learned how to adapt the same agent into a no-TUI automation runtime. Daemon Mode runs over standard input and output, making it suitable for background services, server execution, and scripted one-shot tasks.

Supported Flows

  • Interactive REPL operation
  • Single-run --prompt execution
  • Persistent sessions
  • Background-friendly standard I/O integration

Learning Outcome

  • One agent core can support multiple interfaces
  • TUI logic can be separated from execution logic
  • Automation and interactivity can share the same session model
  • Developer tools can evolve into service runtimes

๐Ÿ–ฑ๏ธ Computer Use on Windows & WSL

Computer use support showed us that GUI automation becomes much more reliable when the agent is visually grounded before acting. The desktop control layer supports Windows and WSL2 with real screenshot capture, mouse control, typing, and hotkeys.

Capabilities

  • Physical screen capture
  • Mouse move and click control
  • Keyboard typing and shortcut pressing
  • PowerShell-backed execution on Windows/WSL

Reliability Design

  • Coordinate grounding grid drawn on screenshots
  • Visual-first action planning
  • DPI-aware screenshot capture
  • Less coordinate hallucination from the model

๐Ÿ“ฑ Mobile Use via Termux + ADB

Mobile automation extended the same action model into Android. Through Termux and ADB, the agent can inspect the live screen, tap, long-press, type text, send navigation events, and switch apps for mobile-first automation flows.

Android Control Layer

  • ADB screenshot capture
  • Tap and long-press input
  • Keyboard text input
  • Navigation and recent-app key events

What We Learned

  • Desktop-style agent tools can map onto mobile primitives
  • ADB provides a practical control bridge for Android
  • Visual grounding still matters on small screens
  • Termux makes mobile automation developer-friendly

๐Ÿ“ฑ Telegram Bot Integration

Full Telegram bot interface that mirrors the terminal experience, enabling mobile-first AI coding from any device with Telegram installed.

Features

  • Complete command system (/agent, /chat, /cwd)
  • Inline keyboard buttons for consent
  • File upload/download support
  • Real-time typing indicators

Technical

  • node-telegram-bot-api integration
  • Markdown/HTML message formatting
  • Rate limiting and owner-only access
  • Seamless state sharing with TUI

๐Ÿ”— Remote Access & Tunneling

Built-in support for accessing the agent remotely via secure tunnels, enabling development from any location without complex network configuration.

  • ๐Ÿ›ก๏ธ Gateway Tokens: Secure token-based authentication for remote WebUI connections
  • โ˜๏ธ Cloudflare Tunnel: Auto-provisions cloudflared quick tunnels for public HTTPS URLs
  • ๐ŸŒ Zero-Config Remote: Single env var (REMOTE=true TUNNEL=true) enables full remote setup
  • ๐Ÿ”’ Auth Middleware: Socket.IO middleware validates tokens before allowing connection

๐ŸŽจ Theming System

22+ built-in themes with a modular theme engine that controls colors across the entire TUI and WebUI simultaneously.

Theme Categories

  • Game-inspired (Cyberpunk, Zelda, Mario)
  • IDE-style (Monokai, Dracula, Solarized)
  • Nature-themed (Ocean, Forest, Sunset)
  • Custom user-defined themes

Implementation

  • Theme JSON with accent/bg/border colors
  • Runtime theme switching via /theme
  • Logo color auto-adaptation
  • WebUI CSS variable synchronization

๐Ÿ’ฐ Community Funding Model

One of the most important product learnings was that our funding model should directly unlock the most powerful hosted runtime in the system: the Server Agent. The community-funded model is not about locking AI models behind a paywall - it exists to fund infrastructure and, in return, unlock premium server-side prompts, agent strategies, and tool definitions that significantly expand what the agent can do.

What Funding Unlocks

  • Server Agent access: authenticated runtime behavior delivered from the backend
  • Version-matched prompts and strategies: premium hosted agent logic tied to the current client version
  • Expanded server tool registry: structured editing, holographic analysis, terminal helpers, and deep GitHub operations
  • Managed essential tool policy: backend can deliver essential tool configuration together with server runtime definitions
  • Longer-running hosted access model: contribution-backed infrastructure keeps the premium runtime sustainable

๐Ÿ”‘ Community Keys

  • Shared unlock path for the whole community
  • Typically grants short-duration access to the Server Agent runtime
  • Unlocked when community contribution goals are reached
  • Makes premium functionality temporarily available to everyone
  • Best example of the โ€œcommunity contribution unlocks access for allโ€ model

๐Ÿ—๏ธ Contribution Login

  • Personal login path tied to a contributed account
  • Much longer-duration access than the community key path
  • Device-locked security
  • Unlocks the premium Server Agent tool and strategy layer
  • Directly funds hosting, API delivery, and continued development

Why This Model Matters

  • Model access remains bring-your-own-provider - not gated by us
  • Funding is tied to hosted agent infrastructure, not vendor lock-in
  • Community support directly unlocks better agent behavior and tooling
  • The premium layer is about capability depth, not basic access to AI

Premium Functionality Unlocked

  • Holographic analysis tools like symbol definitions and diagnostics
  • Structured editing flows like generate_edit_plan and interactive diff application
  • Advanced GitHub operations including repo exploration, issue flows, and PR creation
  • Hosted runtime prompts and strategies that make the Server Agent more capable than the local essentials alone