11ku7-ai
Research Team Based in India
Exploring AI-assisted development tools, authentication systems, and code analysis technologies.
Current Project
11ku7-ai-nodecoder
AI Coding Tool For Hobbyists & Researchers
An autonomous AI developer with Agent Mode, MCP integration, GitHub support, and 22+ customizable themes. Generate, edit, and deploy code directly from your terminal.
Learnings & Outcomes
๐ ๏ธ Technology Stack
Node.js
Runtime
Express.js
Web Framework
MongoDB
Database
JWT
Authentication
Socket.IO
Real-time Comm
Neo-Blessed
Terminal UI
Marked
Markdown Parser
Highlight.js
Syntax Highlighting
AI Providers: OpenRouter, Gemini, OpenAI, Anthropic, Grok, Groq, Ollama
๐ Device Locking Mechanism
We implemented a secure device-locking system that ties contribution logins to specific devices, preventing unauthorized sharing while maintaining a smooth user experience.
How It Works:
- Device Fingerprint: Generate unique device identifier using hardware characteristics
- Login Identifier: Store device-username binding in MongoDB collection
- Token Validation: Each request validates both JWT token and device fingerprint
- Single Device Lock: One active login per user - switching devices requires logout
๐ซ JWT Authentication
Implemented secure stateless authentication using JSON Web Tokens with automatic expiry handling and session management.
Token Features
- 30-days expiry for contribution login access
- 2-hour expiry for community keys
- Automatic background validation
- Secure key storage in .env
Security Measures
- Bcrypt password hashing
- Rate limiting (express-rate-limit)
- CORS protection
- Server-side session revocation
๐ฎ Holograph - Universal Code Analysis Engine
Our most innovative creation - the Holographic Lexer provides language-agnostic code analysis with intelligent symbol tracking and type inference.
Core Components
- UniversalLexer: Polyglot tokenizer
- FlowGraph: Symbol graph builder
- Diagnostics: Unused/undefined detection
- Linker: Cross-file resolution
Supported Languages
- JavaScript / TypeScript
- Python
- Java / C / C++
- Go / Rust / Ruby
Key Innovations:
- ๐ Heap-based Symbol Tracking: Every definition gets a unique heap ID for refactoring
- ๐ Type Inference: Automatically detects types from constructors, assignments, and return values
- ๐ JSDoc Parsing: Extracts @param, @returns, @type annotations
- ๐ฏ Scope Boundary Detection: Tracks function/class boundaries for context-aware analysis
- ๐ Indentation Languages: Special handling for Python/YAML via virtual brace injection
๐ง ReAct Agent Architecture
The core of our Agent Mode - a ReAct (Reasoning + Acting) loop that enables autonomous task completion with intelligent decision-making.
Agent Loop Cycle:
- Observe: Gather context from project structure, memory, and external sources
- Think: Generate reasoning about the current state and next action
- Act: Execute a tool (file edit, shell command, search, etc.)
- Reflect: Evaluate result and update memory for next iteration
Key Features
- Up to 500 autonomous steps in Auto Mode
- Persistent session memory across turns
- Efficiency protocol to prevent redundant actions
- Scrutiny system for human-like decision making
Memory System
- Hot memory: Active session cache
- Persistent memory: Cross-session archival
- Memory hydration from past turns
- Relevant file detection from history
๐ MCP (Model Context Protocol) Integration
Extend the agent's capabilities by connecting to external MCP servers for specialized tools and knowledge bases.
MCPClientManager
- Multi-server connection management
- StdioClientTransport for local servers
- Auto server selection based on query
- Generic tool invocation via schema
Capabilities
- Memory persistence (store/retrieve)
- External API access
- Custom tool definitions
- JSON schema argument building
Config: ~/.nodecoder/mcp-servers.json | SDK: @modelcontextprotocol/sdk
๐ PDF Vision Pipeline
A unique approach to PDF understanding - convert PDF pages to high-quality images and leverage vision models for analysis.
Pipeline Flow:
- PDF Attachment: User attaches PDF via /browse command
- Page Conversion: pdftoppm renders pages at 300 DPI with anti-aliasing
- Vision Analysis: Agent uses read_pdf_page_visual tool to request specific pages
- AI Interpretation: Vision model extracts text, diagrams, tables from image
Technical Details
- Uses poppler-utils (pdftoppm)
- 300 DPI high-quality rendering
- Anti-aliasing enabled (-aa yes)
- Supports multi-page PDFs
Why Vision over OCR?
- Preserves layout and structure
- Understands diagrams and charts
- Handles complex formatting
- No OCR library dependencies
๐ Web Search & Browser Integration
One important learning was that live retrieval works better when the agent uses a real browser execution path instead of relying on static search text alone. The current implementation uses built-in Chrome/Chromium browser automation to search the live web, open result pages, and read page text directly.
What We Built
- Built-in live web search
- Chrome/Chromium browser automation via CDP
- Search result page parsing and filtering
- Direct page-text extraction for follow-up reading
- Works as an agent tool, not just a UI feature
What We Learned
- Fresh information needs browser-grounded retrieval
- Separating search and page-reading improves agent reasoning
- Real browser automation is more robust for dynamic sites
- Direct content extraction reduces hallucinated summaries
- Web access becomes more useful when tied into tool workflows
Core flow: web_search โ open result pages โ browser_read_page style page text extraction
โฐ Cron Job Scheduling
Universal task scheduler that allows the agent to execute autonomous tasks on a recurring basis. Supports natural language scheduling and persistent background execution.
Smart Scheduling
- Natural language to Cron conversion
- "every 5 minutes", "every day at 1pm"
- Support for complex Cron expressions
- Common schedule presets built-in
Robust Persistence
- JSON-based job persistence
- Concurrency guards (no overlap)
- Job toggling (Enable/Disable)
- Automatic recovery on restart
Cron tools: manage_cron_job (add, list, remove, toggle)
๐ข Google Workspace Integration
Seamless productivity suite integration allowing the agent to interact with your workspace data securely through authenticated API calls.
๐ง Gmail
- Send/Read emails
- Query-based searching
- Snippet extraction
๐ Calendar
- Read upcoming events
- Create new events
- Attendee management
๐ Drive & Docs
- Search/Read Drive files
- Create/Edit Google Docs
- Batch formatting updates
๐ Sheets
- Read/Write cell ranges
- Create new spreadsheets
- Advanced batch updates
GWS tools: send_gmail, read_emails, read_calendar, create_calendar_event, search_drive, read_drive_file, read_sheet, write_sheet...
๐ ๏ธ Core Agent Tools
One of the biggest implementation lessons was that tool architecture needs clear layering. In practice, custom agents run on a configurable essential tool layer, while authenticated server agents unlock a larger premium tool surface from the registry.
Custom Agent Essential Tools
These are the built-in essential tools exposed through the local essential-tools layer
and configurable inside
nodecoder-agents.js.
- execute_shell
- load_file_full
- web_search
- browser_read_page
- general_chat
- provide_code_analysis
- read_pdf_page_visual
- manage_cron_job
- send_gmail
- read_emails
- read_calendar
- create_calendar_event
- search_drive
- read_drive_file
- read_sheet
- write_sheet
- create_sheet
- batch_update_sheet
- create_doc
- read_doc
- insert_text_doc
- batch_update_doc
- computer_screenshot
- computer_mouse_move
- computer_mouse_click
- computer_keyboard_type
- computer_keyboard_press
What This Taught Us
- Custom agents are not limited to prompts - they also inherit a configurable base tool policy
- Essential tools cover execution, analysis, search, PDF vision, scheduling, Workspace, and computer control
- The agent can run server-free while still keeping a rich built-in capability set
- Tool layering makes it easier to reason about local vs hosted agent behavior
Server Agent Tools: Core Editing
- download_asset
- scan_vulnerabilities
- get_file_dependencies
- find_symbol_definition
- run_diagnostics
- generate_edit_plan
- apply_change_interactive
- apply_multi_file_change
Server Agent Tools: GitHub
- github_create_repo
- github_list_repos
- github_delete_repos
- github_explore_repo
- github_load_file_full
- github_apply_change_interactive
- github_write_file
- github_read_file
- github_create_issue
- github_list_issues
- github_read_issue
- github_comment_issue
- github_close_issue
- github_create_branch
- github_create_pr
- github_sync_changes
Registry Learning
- Server-agent tools are layered on top of the essential tool base
- The registry separates local essentials from authenticated premium tools
- Custom agents can stay offline, while server agents gain the broader editing and GitHub surface
- This split made the architecture easier to extend without breaking local-first flows
๐ฅ๏ธ Cross-Platform PTY Terminal
A dual-implementation persistent terminal system that provides true pseudo-terminal (PTY) capabilities across platforms, enabling the agent to run interactive shell sessions.
๐ง Linux/macOS (Python PTY)
- Uses Python's pty module for Unix PTY
- Spawns shell via
pty.spawn() - Full terminal emulation (bash/zsh)
- Signal forwarding (Ctrl+C โ SIGINT)
- Works on Termux (Android) natively
๐ช Windows (node-pty)
- Uses node-pty native addon
- Spawns PowerShell with no profile loading
- ConPTY integration for true Windows PTY
- Custom prompt suppression for clean output
- ANSI escape code stripping
Key Architecture Decisions:
- ๐ Persistent Sessions: Terminal stays alive across multiple agent tool calls, preserving environment variables and working directory
- ๐ Output Buffering: Ring buffer collects output with configurable timeouts and completion detection via exit code markers
- ๐งน Smart Cleanup: ANSI escape sequences and control characters stripped from output before feeding back to AI
- โฑ๏ธ Timeout Protection: Configurable command timeouts (30s default) with automatic process interruption
๐ Headless WebUI Mode
A full web-based interface that runs without any terminal TUI, enabling remote access from browsers and mobile devices via Socket.IO real-time communication.
WebUI Features
- IDE-style layout with file explorer
- Monaco editor for file editing
- Real-time chat with markdown rendering
- Status bar with all TUI controls
- Mobile-responsive design
Architecture
- Express + Socket.IO server
- Automatic setup wizard for auth
- Gateway token for remote security
- Cloudflare tunnel support built-in
- TUI โ WebUI shared state bridge
๐ฌ Diff Validation Engine
A multi-pass validation system that catches AI code generation errors before applying changes, using holographic analysis and structural validation.
Validation Passes
- Structural syntax validation
- Holographic scope analysis
- Search/Replace match verification
- Context-aware error recovery
Error Prevention
- Catches duplicate inserts
- Detects missing context lines
- Validates bracket/brace balance
- Auto-retry with AI guidance
๐ก๏ธ Review Council
Another major learning was that autonomous coding becomes safer when code generation and code review are separated into independent AI roles. Review Council acts as a second-pass audit layer that inspects proposed diffs before changes are applied.
Independent Audit Role
- Reviews generated diffs before apply
- Checks scope against the actual user goal
- Flags unrelated edits and accidental deletions
- Verifies plan completeness for the target file
Why It Matters
- Reduces self-bias from the generating agent
- Catches silent scope creep before it lands
- Produces concrete fix guidance for retries
- Turns review into a first-class agent capability
โ๏ธ Server Agent
We learned that centrally delivered prompts and tool definitions make the hosted agent easier to evolve across versions. Server Agent mode loads authenticated, version-matched runtime behavior from the backend instead of depending only on local configuration.
Runtime Delivered by Backend
- Version-matched prompts
- Hosted tool definitions
- Essential tool policy from server
- Session-based authenticated recovery
Learning Outcome
- Central control improves compatibility
- Remote updates reduce client drift
- Hosted definitions simplify premium rollout
- Authentication and runtime design can stay connected
๐ ๏ธ Custom Agent Architecture
This became one of the strongest architecture learnings in the project. We moved beyond a fixed
assistant and made the agent itself programmable through
nodecoder-agents.js. A custom agent can redefine
prompts, strategies, essential tool policy, and custom tools, allowing fully offline and
domain-specific agent runtimes.
What a Custom Agent Can Control
- Custom system prompt template
- Workflow strategies: local, hybrid, remote, skills
- Essential tool enable/disable settings
- Custom tool definitions and implementations
- Server-free mode via
server_tools: false
Implementation Power
- Inline tool handlers inside the config file
- External tool implementations via blueprint files
- Custom domain runtimes, not just coding presets
- Can reshape how the orchestrator behaves
- Works offline with local prompts and tools only
Why It Is More Powerful Than Skills
- ๐งฉ Skills extend the workflow by adding instructions and resources for an existing agent runtime.
- ๐ ๏ธ Custom agents redesign the runtime by changing prompts, strategy selection, tool policy, and tool availability.
- ๐ Result: skills make the agent better at a task, but custom agents let you decide what kind of agent it is in the first place.
๐ Both Mode
One useful architectural lesson was that users do not want to choose between hosted reliability and local freedom. Both Mode keeps server-delivered tools active while also re-enabling local custom tools and custom strategies from the local agent configuration.
What Gets Combined
- Authenticated server toolset
- Local custom tools
- Local custom strategy prompts
- Shared essential tool layer
Learning Outcome
- Hybrid agent architecture is practical
- Centralized and local extension can coexist
- Power users keep flexibility without losing hosted capabilities
- Tool merging matters as much as prompt merging
๐งฉ Agent Skill System
The skill system taught us how to package reusable workflow knowledge cleanly. Instead of loose
prompt fragments, skills are structured around
SKILL.md files with metadata, instructions, and
optional resource directories that are exposed to the agent on demand.
How It Works
SKILL.mdmetadata + prompt instructions- Auto-discovered from configured skills directories
- Enabled or disabled per session
- Injected into the agent as the primary workflow guide
Resource Model
- On-demand
scripts/for executable helpers - On-demand
references/docs for reading - Optional
assets/for templates or data - Progressive disclosure keeps base context lean
โ๏ธ Daemon Mode
We also learned how to adapt the same agent into a no-TUI automation runtime. Daemon Mode runs over standard input and output, making it suitable for background services, server execution, and scripted one-shot tasks.
Supported Flows
- Interactive REPL operation
- Single-run
--promptexecution - Persistent sessions
- Background-friendly standard I/O integration
Learning Outcome
- One agent core can support multiple interfaces
- TUI logic can be separated from execution logic
- Automation and interactivity can share the same session model
- Developer tools can evolve into service runtimes
๐ฑ๏ธ Computer Use on Windows & WSL
Computer use support showed us that GUI automation becomes much more reliable when the agent is visually grounded before acting. The desktop control layer supports Windows and WSL2 with real screenshot capture, mouse control, typing, and hotkeys.
Capabilities
- Physical screen capture
- Mouse move and click control
- Keyboard typing and shortcut pressing
- PowerShell-backed execution on Windows/WSL
Reliability Design
- Coordinate grounding grid drawn on screenshots
- Visual-first action planning
- DPI-aware screenshot capture
- Less coordinate hallucination from the model
๐ฑ Mobile Use via Termux + ADB
Mobile automation extended the same action model into Android. Through Termux and ADB, the agent can inspect the live screen, tap, long-press, type text, send navigation events, and switch apps for mobile-first automation flows.
Android Control Layer
- ADB screenshot capture
- Tap and long-press input
- Keyboard text input
- Navigation and recent-app key events
What We Learned
- Desktop-style agent tools can map onto mobile primitives
- ADB provides a practical control bridge for Android
- Visual grounding still matters on small screens
- Termux makes mobile automation developer-friendly
๐ฑ Telegram Bot Integration
Full Telegram bot interface that mirrors the terminal experience, enabling mobile-first AI coding from any device with Telegram installed.
Features
- Complete command system (/agent, /chat, /cwd)
- Inline keyboard buttons for consent
- File upload/download support
- Real-time typing indicators
Technical
- node-telegram-bot-api integration
- Markdown/HTML message formatting
- Rate limiting and owner-only access
- Seamless state sharing with TUI
๐ Remote Access & Tunneling
Built-in support for accessing the agent remotely via secure tunnels, enabling development from any location without complex network configuration.
- ๐ก๏ธ Gateway Tokens: Secure token-based authentication for remote WebUI connections
- โ๏ธ Cloudflare Tunnel: Auto-provisions cloudflared quick tunnels for public HTTPS URLs
- ๐ Zero-Config Remote: Single env var (
REMOTE=true TUNNEL=true) enables full remote setup - ๐ Auth Middleware: Socket.IO middleware validates tokens before allowing connection
๐จ Theming System
22+ built-in themes with a modular theme engine that controls colors across the entire TUI and WebUI simultaneously.
Theme Categories
- Game-inspired (Cyberpunk, Zelda, Mario)
- IDE-style (Monokai, Dracula, Solarized)
- Nature-themed (Ocean, Forest, Sunset)
- Custom user-defined themes
Implementation
- Theme JSON with accent/bg/border colors
- Runtime theme switching via /theme
- Logo color auto-adaptation
- WebUI CSS variable synchronization
๐ฐ Community Funding Model
One of the most important product learnings was that our funding model should directly unlock the most powerful hosted runtime in the system: the Server Agent. The community-funded model is not about locking AI models behind a paywall - it exists to fund infrastructure and, in return, unlock premium server-side prompts, agent strategies, and tool definitions that significantly expand what the agent can do.
What Funding Unlocks
- Server Agent access: authenticated runtime behavior delivered from the backend
- Version-matched prompts and strategies: premium hosted agent logic tied to the current client version
- Expanded server tool registry: structured editing, holographic analysis, terminal helpers, and deep GitHub operations
- Managed essential tool policy: backend can deliver essential tool configuration together with server runtime definitions
- Longer-running hosted access model: contribution-backed infrastructure keeps the premium runtime sustainable
๐ Community Keys
- Shared unlock path for the whole community
- Typically grants short-duration access to the Server Agent runtime
- Unlocked when community contribution goals are reached
- Makes premium functionality temporarily available to everyone
- Best example of the โcommunity contribution unlocks access for allโ model
๐๏ธ Contribution Login
- Personal login path tied to a contributed account
- Much longer-duration access than the community key path
- Device-locked security
- Unlocks the premium Server Agent tool and strategy layer
- Directly funds hosting, API delivery, and continued development
Why This Model Matters
- Model access remains bring-your-own-provider - not gated by us
- Funding is tied to hosted agent infrastructure, not vendor lock-in
- Community support directly unlocks better agent behavior and tooling
- The premium layer is about capability depth, not basic access to AI
Premium Functionality Unlocked
- Holographic analysis tools like symbol definitions and diagnostics
- Structured editing flows like generate_edit_plan and interactive diff application
- Advanced GitHub operations including repo exploration, issue flows, and PR creation
- Hosted runtime prompts and strategies that make the Server Agent more capable than the local essentials alone