Skip to content

vapering/SRE-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Deep SRE Agent & Flash Sale Mall

δΈ­ζ–‡ζ–‡ζ‘£

Deep SRE Agent is a cutting-edge intelligent SRE (Site Reliability Engineering) experimental platform designed to explore the application of LLMs (Large Language Models) in the field of SRE.

This project builds a complete microservices-based e-commerce system (Flash Sale Mall) and equips it with an intelligent operations agent (Deep SRE Agent) based on the deepagents framework.

The Agent can act like a human SRE engineer, proactively inspecting the system, analyzing logs, querying metrics, diagnosing databases, and even performing root cause analysis through natural language.


πŸŽ₯ Demo

demo01.mp4
demo02.mp4

If the video does not play, please check packages/ directly.


πŸ—οΈ Overall Architecture

The project adopts a layered architecture design, from bottom to top: Target Business System, Observability Infrastructure, MCP Adapter Layer, and Intelligent Agent Layer.

graph TD
    subgraph "πŸ€– Intelligent Agent Layer"
        UI[Deep Agents UI :3300] --> Agent[Deep SRE Agent :2024]
        Agent --> |Orchestrate| Wiki[Wiki Agent]
        Agent --> |Orchestrate| Log[Log Agent]
        Agent --> |Orchestrate| Metric[Prometheus Agent]
        Agent --> |Orchestrate| DB[MySQL Agent]
    end

    subgraph "πŸ”Œ MCP Adapter Layer"
        Wiki --> |SSE/HTTP| WikiMCP[DeepWiki MCP]
        Log --> |SSE/HTTP| LokiMCP[Loki MCP :7080]
        Metric --> |HTTP| PromMCP[Prometheus MCP :18090]
        DB --> |HTTP| DBHub[DBHub/MySQL MCP :18081]
    end

    subgraph "πŸ“Š Observability Infrastructure"
        PromMCP --> Prometheus[Prometheus :9090]
        LokiMCP --> Loki[Loki :3100]
        Prometheus --> Grafana[Grafana :3000]
        Promtail --> Loki
    end

    subgraph "πŸ›’ Target System (Flash Sale Mall)"
        Frontend[Frontend :5173] --> Backend[Backend :3001]
        Backend --> MySQL[MySQL :3306]
        Backend --> Redis[Redis :6379]
        Backend --> Kafka[Kafka :9092]
        Backend -.-> |Logs| Promtail
        Backend -.-> |Metrics| Prometheus
        Backend -.-> |Traces| OAP
    end
Loading

Core Components

  1. Flash Sale Mall (Target System)

    • A high-concurrency flash sale mall based on Spring Boot 3 + React 18.
    • Integrated full-link monitoring: Micrometer (Metrics), Logback (Logs), tempo (Traces).
    • See: README_flashMall.md
  2. Observability Stack

    • Prometheus: Metric storage and querying.
    • Loki: Log aggregation and retrieval.
    • tempo: Distributed tracing.
    • Grafana: Unified monitoring dashboard.
  3. MCP Layer (Model Context Protocol Layer)

    • Acts as a standard bridge between LLM and infrastructure.
    • Prometheus MCP: Allows Agent to execute PromQL.
    • Loki MCP: Allows Agent to use LogQL to query logs.
    • DBHub: Allows Agent to execute SQL to query data.
    • tempo MCP: Allows Agent to query topology and traces.
  4. Sub-Agent Layer

    • Prometheus Agent: Focuses on metric query and analysis, generating PromQL and interpreting monitoring data.
    • Log Agent: Focuses on log retrieval, using LogQL to filter error stacks and exceptions.
    • MySQL Agent: Focuses on database diagnosis, executing SQL to query business data or slow queries.
    • Wiki Agent: Focuses on knowledge base retrieval, providing system architecture documents and SRE runbook support.
  5. Deep SRE Agent (Main Intelligent Agent)

    • A Multi-Agent system orchestrator based on LangGraph.
    • Responsible for receiving user instructions, decomposing tasks, scheduling sub-agents, and summarizing reasoning results.

πŸš€ Quick Start

1. Prerequisites

  • Docker & Docker Compose: Core dependency, used to start all services.
  • API Key: Requires OPENAI or other compatible LLM API Key.

2. Configure Agent

Copy the environment variable template and fill in your API Key:

cp deep_sre_agent/.env.example deep_sre_agent/.env.dev
# Edit .env.dev and fill in keys, etc.

3. One-Click Start

Use Docker Compose to bring up the entire environment (including Mall, Monitoring, MCP Services, Agent, and UI):

docker compose up -d --build

Note: The first startup requires downloading multiple images and building the Agent environment, which may take tens of minutes.

4. Access the System

Service Name URL / Port Description
Deep Agents UI http://localhost:3300 Agent Entry, chat with SRE Agent here
Flash Sale Mall http://localhost:5173 Mall Frontend, test flash sales here
LangGraph API http://localhost:2024 Agent Backend API (for UI)
Backend API http://localhost:3001 Mall Backend API
Grafana http://localhost:3000 Monitoring Dashboard (Account: admin / admin123)
Prometheus http://localhost:9090 Native Metric Query Interface

πŸ’» Development Guide

SRE Agent Development

Agent code is located in the deep_sre_agent/ directory.

  • Architecture: Uses LangGraph to orchestrate multi-agent collaboration.
  • Debugging:
    • Recommended to use Jupyter Notebook (research_agent.ipynb) for interactive debugging.
    • Or run langgraph dev locally to start the API server.
  • Extension: Create a new Agent directory under deep_sre_agent/ and write mcp_client.py to connect to new MCP services.

Mall Business Development

Business code is located in backend-spring/ (Backend) and src/ (Frontend).

  • Backend: Spring Boot 3.3, Java 21.
  • Frontend: React 18, Vite, TailwindCSS.
  • Local Run: Refer to the development guide in README_flashMall.md.

πŸ”Œ Service Port Mapping

Container Service Port Usage
deep-agents-ui 3300 Agent Chat Interface (Next.js)
deep-sre-agent 2024 Agent Core Logic (LangGraph API)
flashsale-frontend 5173 Mall Frontend (Nginx/Vite)
flashsale-backend 3001 Mall Backend (Spring Boot)
flashsale-grafana 3000 Monitoring Visualization
flashsale-prometheus 9090 Metric Storage
flashsale-loki 3100 Log Storage
flashsale-mysql 3306 Business Database
flashsale-redis 6379 Cache & Rate Limiting
flashsale-kafka 9092 Message Queue
prometheus-mcp 18090 Prometheus MCP Adapter
dbhub (MySQL MCP) 18081 SQL Execution Adapter
loki-mcp 7080 Loki MCP Adapter

🀝 Contribution & License

Issues and PRs are welcome!


Deep SRE Agent - Make operations smarter, make systems more reliable.

About

Deep SRE Agent is a cutting-edge intelligent SRE (Site Reliability Engineering) experimental platform designed to explore the application of LLMs (Large Language Models) in the field of SRE.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors