🧪

11ku7-ai

Research Team Based in India

Exploring AI-assisted development tools, authentication systems, and code analysis technologies.

🇮🇳 India 🤖 AI Research 💻 Open Source

Current Project

11ku7-ai-nodecoder

AI Coding Tool For Hobbyists & Researchers

An autonomous AI developer with Agent Mode, MCP integration, GitHub support, and 22+ customizable themes. Generate, edit, and deploy code directly from your terminal.

View Project →

Learnings & Outcomes

🛠️ Technology Stack

Node.js

Runtime

Express.js

Web Framework

MongoDB

Database

JWT

Authentication

Socket.IO

Real-time Comm

Neo-Blessed

Terminal UI

Marked

Markdown Parser

Highlight.js

Syntax Highlighting

AI Providers: OpenRouter, Gemini, OpenAI, Anthropic, Grok, Groq, Ollama

🔐 Device Locking Mechanism

We implemented a secure device-locking system that ties contribution logins to specific devices, preventing unauthorized sharing while maintaining a smooth user experience.

How It Works:

Device Fingerprint: Generate unique device identifier using hardware characteristics
Login Identifier: Store device-username binding in MongoDB collection
Token Validation: Each request validates both JWT token and device fingerprint
Single Device Lock: One active login per user - switching devices requires logout

🎫 JWT Authentication

Implemented secure stateless authentication using JSON Web Tokens with automatic expiry handling and session management.

Token Features

30-days expiry for contribution login access
2-hour expiry for community keys
Automatic background validation
Secure key storage in .env

Security Measures

Bcrypt password hashing
Rate limiting (express-rate-limit)
CORS protection
Server-side session revocation

🔮 Holograph - Universal Code Analysis Engine

Our most innovative creation - the Holographic Lexer provides language-agnostic code analysis with intelligent symbol tracking and type inference.

Core Components

UniversalLexer: Polyglot tokenizer
FlowGraph: Symbol graph builder
Diagnostics: Unused/undefined detection
Linker: Cross-file resolution

Supported Languages

JavaScript / TypeScript
Python
Java / C / C++
Go / Rust / Ruby

Key Innovations:

📊 Heap-based Symbol Tracking: Every definition gets a unique heap ID for refactoring
🔗 Type Inference: Automatically detects types from constructors, assignments, and return values
📍 JSDoc Parsing: Extracts @param, @returns, @type annotations
🎯 Scope Boundary Detection: Tracks function/class boundaries for context-aware analysis
🐍 Indentation Languages: Special handling for Python/YAML via virtual brace injection

🧠 ReAct Agent Architecture

The core of our Agent Mode - a ReAct (Reasoning + Acting) loop that enables autonomous task completion with intelligent decision-making.

Agent Loop Cycle:

Observe: Gather context from project structure, memory, and external sources
Think: Generate reasoning about the current state and next action
Act: Execute a tool (file edit, shell command, search, etc.)
Reflect: Evaluate result and update memory for next iteration

Key Features

Up to 500 autonomous steps in Auto Mode
Persistent session memory across turns
Efficiency protocol to prevent redundant actions
Scrutiny system for human-like decision making

Memory System

Hot memory: Active session cache
Persistent memory: Cross-session archival
Memory hydration from past turns
Relevant file detection from history

🔌 MCP (Model Context Protocol) Integration

Extend the agent's capabilities by connecting to external MCP servers for specialized tools and knowledge bases.

MCPClientManager

Multi-server connection management
StdioClientTransport for local servers
Auto server selection based on query
Generic tool invocation via schema

Capabilities

Memory persistence (store/retrieve)
External API access
Custom tool definitions
JSON schema argument building

Config: ~/.nodecoder/mcp-servers.json | SDK: @modelcontextprotocol/sdk

📄 PDF Vision Pipeline

A unique approach to PDF understanding - convert PDF pages to high-quality images and leverage vision models for analysis.

Pipeline Flow:

PDF Attachment: User attaches PDF via /browse command
Page Conversion: pdftoppm renders pages at 300 DPI with anti-aliasing
Vision Analysis: Agent uses read_pdf_page_visual tool to request specific pages
AI Interpretation: Vision model extracts text, diagrams, tables from image

Technical Details

Uses poppler-utils (pdftoppm)
300 DPI high-quality rendering
Anti-aliasing enabled (-aa yes)
Supports multi-page PDFs

Why Vision over OCR?

Preserves layout and structure
Understands diagrams and charts
Handles complex formatting
No OCR library dependencies

🔍 Web Search & Browser Integration

One important learning was that live retrieval works better when the agent uses a real browser execution path instead of relying on static search text alone. The current implementation uses built-in Chrome/Chromium browser automation to search the live web, open result pages, and read page text directly.

What We Built

Built-in live web search
Chrome/Chromium browser automation via CDP
Search result page parsing and filtering
Direct page-text extraction for follow-up reading
Works as an agent tool, not just a UI feature

What We Learned

Fresh information needs browser-grounded retrieval
Separating search and page-reading improves agent reasoning
Real browser automation is more robust for dynamic sites
Direct content extraction reduces hallucinated summaries
Web access becomes more useful when tied into tool workflows

Core flow: web_search → open result pages → browser_read_page style page text extraction

⏰ Cron Job Scheduling

Universal task scheduler that allows the agent to execute autonomous tasks on a recurring basis. Supports natural language scheduling and persistent background execution.

Smart Scheduling

Natural language to Cron conversion
"every 5 minutes", "every day at 1pm"
Support for complex Cron expressions
Common schedule presets built-in

Robust Persistence

JSON-based job persistence
Concurrency guards (no overlap)
Job toggling (Enable/Disable)
Automatic recovery on restart

Cron tools: manage_cron_job (add, list, remove, toggle)

🏢 Google Workspace Integration

Seamless productivity suite integration allowing the agent to interact with your workspace data securely through authenticated API calls.

📧 Gmail

Send/Read emails
Query-based searching
Snippet extraction

📅 Calendar

Read upcoming events
Create new events
Attendee management

📂 Drive & Docs

Search/Read Drive files
Create/Edit Google Docs
Batch formatting updates

📊 Sheets

Read/Write cell ranges
Create new spreadsheets
Advanced batch updates

GWS tools: send_gmail, read_emails, read_calendar, create_calendar_event, search_drive, read_drive_file, read_sheet, write_sheet...

🛠️ Core Agent Tools

One of the biggest implementation lessons was that tool architecture needs clear layering. In practice, custom agents run on a configurable essential tool layer, while authenticated server agents unlock a larger premium tool surface from the registry.

Custom Agent Essential Tools

These are the built-in essential tools exposed through the local essential-tools layer and configurable inside nodecoder-agents.js.

execute_shell
load_file_full
web_search
browser_read_page
general_chat
provide_code_analysis
read_pdf_page_visual
manage_cron_job
send_gmail
read_emails
read_calendar
create_calendar_event
search_drive
read_drive_file
read_sheet
write_sheet
create_sheet
batch_update_sheet
create_doc
read_doc
insert_text_doc
batch_update_doc
computer_screenshot
computer_mouse_move
computer_mouse_click
computer_keyboard_type
computer_keyboard_press

What This Taught Us

Custom agents are not limited to prompts - they also inherit a configurable base tool policy
Essential tools cover execution, analysis, search, PDF vision, scheduling, Workspace, and computer control
The agent can run server-free while still keeping a rich built-in capability set
Tool layering makes it easier to reason about local vs hosted agent behavior

Server Agent Tools: Core Editing

download_asset
scan_vulnerabilities
get_file_dependencies
find_symbol_definition
run_diagnostics
generate_edit_plan
apply_change_interactive
apply_multi_file_change

Server Agent Tools: GitHub

github_create_repo
github_list_repos
github_delete_repos
github_explore_repo
github_load_file_full
github_apply_change_interactive
github_write_file
github_read_file
github_create_issue
github_list_issues
github_read_issue
github_comment_issue
github_close_issue
github_create_branch
github_create_pr
github_sync_changes

Registry Learning

Server-agent tools are layered on top of the essential tool base
The registry separates local essentials from authenticated premium tools
Custom agents can stay offline, while server agents gain the broader editing and GitHub surface
This split made the architecture easier to extend without breaking local-first flows

🖥️ Cross-Platform PTY Terminal

A dual-implementation persistent terminal system that provides true pseudo-terminal (PTY) capabilities across platforms, enabling the agent to run interactive shell sessions.

🐧 Linux/macOS (Python PTY)

Uses Python's pty module for Unix PTY
Spawns shell via pty.spawn()
Full terminal emulation (bash/zsh)
Signal forwarding (Ctrl+C → SIGINT)
Works on Termux (Android) natively

🪟 Windows (node-pty)

Uses node-pty native addon
Spawns PowerShell with no profile loading
ConPTY integration for true Windows PTY
Custom prompt suppression for clean output
ANSI escape code stripping

Key Architecture Decisions:

🔄 Persistent Sessions: Terminal stays alive across multiple agent tool calls, preserving environment variables and working directory
📊 Output Buffering: Ring buffer collects output with configurable timeouts and completion detection via exit code markers
🧹 Smart Cleanup: ANSI escape sequences and control characters stripped from output before feeding back to AI
⏱️ Timeout Protection: Configurable command timeouts (30s default) with automatic process interruption

🌐 Headless WebUI Mode

A full web-based interface that runs without any terminal TUI, enabling remote access from browsers and mobile devices via Socket.IO real-time communication.

WebUI Features

IDE-style layout with file explorer
Monaco editor for file editing
Real-time chat with markdown rendering
Status bar with all TUI controls
Mobile-responsive design

Architecture

Express + Socket.IO server
Automatic setup wizard for auth
Gateway token for remote security
Cloudflare tunnel support built-in
TUI ↔ WebUI shared state bridge

🔬 Diff Validation Engine

A multi-pass validation system that catches AI code generation errors before applying changes, using holographic analysis and structural validation.

Validation Passes

Structural syntax validation
Holographic scope analysis
Search/Replace match verification
Context-aware error recovery

Error Prevention

Catches duplicate inserts
Detects missing context lines
Validates bracket/brace balance
Auto-retry with AI guidance

🛡️ Review Council

Another major learning was that autonomous coding becomes safer when code generation and code review are separated into independent AI roles. Review Council acts as a second-pass audit layer that inspects proposed diffs before changes are applied.

Independent Audit Role

Reviews generated diffs before apply
Checks scope against the actual user goal
Flags unrelated edits and accidental deletions
Verifies plan completeness for the target file

Why It Matters

Reduces self-bias from the generating agent
Catches silent scope creep before it lands
Produces concrete fix guidance for retries
Turns review into a first-class agent capability

☁️ Server Agent

We learned that centrally delivered prompts and tool definitions make the hosted agent easier to evolve across versions. Server Agent mode loads authenticated, version-matched runtime behavior from the backend instead of depending only on local configuration.

Runtime Delivered by Backend

Version-matched prompts
Hosted tool definitions
Essential tool policy from server
Session-based authenticated recovery

Learning Outcome

Central control improves compatibility
Remote updates reduce client drift
Hosted definitions simplify premium rollout
Authentication and runtime design can stay connected

🛠️ Custom Agent Architecture

This became one of the strongest architecture learnings in the project. We moved beyond a fixed assistant and made the agent itself programmable through nodecoder-agents.js. A custom agent can redefine prompts, strategies, essential tool policy, and custom tools, allowing fully offline and domain-specific agent runtimes.

What a Custom Agent Can Control

Custom system prompt template
Workflow strategies: local, hybrid, remote, skills
Essential tool enable/disable settings
Custom tool definitions and implementations
Server-free mode via server_tools: false

Implementation Power

Inline tool handlers inside the config file
External tool implementations via blueprint files
Custom domain runtimes, not just coding presets
Can reshape how the orchestrator behaves
Works offline with local prompts and tools only

Why It Is More Powerful Than Skills

🧩 Skills extend the workflow by adding instructions and resources for an existing agent runtime.
🛠️ Custom agents redesign the runtime by changing prompts, strategy selection, tool policy, and tool availability.
🚀 Result: skills make the agent better at a task, but custom agents let you decide what kind of agent it is in the first place.

🔀 Both Mode

One useful architectural lesson was that users do not want to choose between hosted reliability and local freedom. Both Mode keeps server-delivered tools active while also re-enabling local custom tools and custom strategies from the local agent configuration.

What Gets Combined

Authenticated server toolset
Local custom tools
Local custom strategy prompts
Shared essential tool layer

Learning Outcome

Hybrid agent architecture is practical
Centralized and local extension can coexist
Power users keep flexibility without losing hosted capabilities
Tool merging matters as much as prompt merging

🧩 Agent Skill System

The skill system taught us how to package reusable workflow knowledge cleanly. Instead of loose prompt fragments, skills are structured around SKILL.md files with metadata, instructions, and optional resource directories that are exposed to the agent on demand.

How It Works

SKILL.md metadata + prompt instructions
Auto-discovered from configured skills directories
Enabled or disabled per session
Injected into the agent as the primary workflow guide

Resource Model

On-demand scripts/ for executable helpers
On-demand references/ docs for reading
Optional assets/ for templates or data
Progressive disclosure keeps base context lean

⚙️ Daemon Mode

We also learned how to adapt the same agent into a no-TUI automation runtime. Daemon Mode runs over standard input and output, making it suitable for background services, server execution, and scripted one-shot tasks.

Supported Flows

Interactive REPL operation
Single-run --prompt execution
Persistent sessions
Background-friendly standard I/O integration

Learning Outcome

One agent core can support multiple interfaces
TUI logic can be separated from execution logic
Automation and interactivity can share the same session model
Developer tools can evolve into service runtimes

🖱️ Computer Use on Windows & WSL

Computer use support showed us that GUI automation becomes much more reliable when the agent is visually grounded before acting. The desktop control layer supports Windows and WSL2 with real screenshot capture, mouse control, typing, and hotkeys.

Capabilities

Physical screen capture
Mouse move and click control
Keyboard typing and shortcut pressing
PowerShell-backed execution on Windows/WSL

Reliability Design

Coordinate grounding grid drawn on screenshots
Visual-first action planning
DPI-aware screenshot capture
Less coordinate hallucination from the model

📱 Mobile Use via Termux + ADB

Mobile automation extended the same action model into Android. Through Termux and ADB, the agent can inspect the live screen, tap, long-press, type text, send navigation events, and switch apps for mobile-first automation flows.

Android Control Layer

ADB screenshot capture
Tap and long-press input
Keyboard text input
Navigation and recent-app key events

What We Learned

Desktop-style agent tools can map onto mobile primitives
ADB provides a practical control bridge for Android
Visual grounding still matters on small screens
Termux makes mobile automation developer-friendly

📱 Telegram Bot Integration

Full Telegram bot interface that mirrors the terminal experience, enabling mobile-first AI coding from any device with Telegram installed.

Features

Complete command system (/agent, /chat, /cwd)
Inline keyboard buttons for consent
File upload/download support
Real-time typing indicators

Technical

node-telegram-bot-api integration
Markdown/HTML message formatting
Rate limiting and owner-only access
Seamless state sharing with TUI

🔗 Remote Access & Tunneling

Built-in support for accessing the agent remotely via secure tunnels, enabling development from any location without complex network configuration.

🛡️ Gateway Tokens: Secure token-based authentication for remote WebUI connections
☁️ Cloudflare Tunnel: Auto-provisions cloudflared quick tunnels for public HTTPS URLs
🌍 Zero-Config Remote: Single env var (REMOTE=true TUNNEL=true) enables full remote setup
🔒 Auth Middleware: Socket.IO middleware validates tokens before allowing connection

🎨 Theming System

22+ built-in themes with a modular theme engine that controls colors across the entire TUI and WebUI simultaneously.

Theme Categories

Game-inspired (Cyberpunk, Zelda, Mario)
IDE-style (Monokai, Dracula, Solarized)
Nature-themed (Ocean, Forest, Sunset)
Custom user-defined themes

Implementation

Theme JSON with accent/bg/border colors
Runtime theme switching via /theme
Logo color auto-adaptation
WebUI CSS variable synchronization

💰 Community Funding Model

One of the most important product learnings was that our funding model should directly unlock the most powerful hosted runtime in the system: the Server Agent. The community-funded model is not about locking AI models behind a paywall - it exists to fund infrastructure and, in return, unlock premium server-side prompts, agent strategies, and tool definitions that significantly expand what the agent can do.

What Funding Unlocks

Server Agent access: authenticated runtime behavior delivered from the backend
Version-matched prompts and strategies: premium hosted agent logic tied to the current client version
Expanded server tool registry: structured editing, holographic analysis, terminal helpers, and deep GitHub operations
Managed essential tool policy: backend can deliver essential tool configuration together with server runtime definitions
Longer-running hosted access model: contribution-backed infrastructure keeps the premium runtime sustainable

🔑 Community Keys

Shared unlock path for the whole community
Typically grants short-duration access to the Server Agent runtime
Unlocked when community contribution goals are reached
Makes premium functionality temporarily available to everyone
Best example of the “community contribution unlocks access for all” model

🗝️ Contribution Login

Personal login path tied to a contributed account
Much longer-duration access than the community key path
Device-locked security
Unlocks the premium Server Agent tool and strategy layer
Directly funds hosting, API delivery, and continued development

Why This Model Matters

Model access remains bring-your-own-provider - not gated by us
Funding is tied to hosted agent infrastructure, not vendor lock-in
Community support directly unlocks better agent behavior and tooling
The premium layer is about capability depth, not basic access to AI

Premium Functionality Unlocked

Holographic analysis tools like symbol definitions and diagnostics
Structured editing flows like generate_edit_plan and interactive diff application
Advanced GitHub operations including repo exploration, issue flows, and PR creation
Hosted runtime prompts and strategies that make the Server Agent more capable than the local essentials alone