Logging and Monitoring Guide
Logging and Monitoring Guide
This guide introduces how to implement comprehensive logging and performance monitoring in Silly applications, building a complete observability solution.
Introduction
Observability is a core element of modern application operations, consisting of three pillars:
- Logging: Recording discrete events and error messages
- Metrics: Recording aggregatable numerical data reflecting system state
- Tracing: Tracking the complete lifecycle of requests through the system
The Silly framework provides built-in support for all three aspects:
silly.logger: Hierarchical logging system with log rotation supportsilly.metrics.prometheus: Prometheus metrics collection and exportsilly.tracespawn/traceset: Distributed trace ID generation and propagation
Logging System
Basic Usage
silly.logger provides four log levels: DEBUG, INFO, WARN, and ERROR:
local logger = require "silly.logger"
-- Set log level (only output INFO and above)
logger.setlevel(logger.INFO)
-- Basic log output
logger.debug("Debug information") -- Won't be output
logger.info("Server started") -- Will be output
logger.warn("Connection timeout, retrying") -- Will be output
logger.error("Database connection failed") -- Will be outputLog Format
The framework automatically adds the following information to each log entry:
2025-10-21 09:37:27 0001e3d700010000 I cluster/node1.lua:30 [node1] Received HTTP GET /testLog format explanation:
2025-10-21 09:37:27- Timestamp0001e3d700010000- Trace ID (automatically printed, no need to explicitly add in business code)I- Log level (D=DEBUG, I=INFO, W=WARN, E=ERROR)cluster/node1.lua:30- File name and line number[node1] Received HTTP GET /test- Log message
Automatic Trace ID Printing
The framework automatically prints the current coroutine's Trace ID before each log entry. Business code does not need to explicitly include the Trace ID in log messages. This allows all logs from the same request to be correlated via the Trace ID.
-- ❌ Wrong: Don't explicitly print trace ID
local trace_id = trace.propagate()
logger.info("[" .. trace_id .. "] Processing request")
-- ✅ Correct: Framework automatically prints trace ID
logger.info("Processing request")Choosing Log Levels
Choose appropriate log levels based on different scenarios:
| Level | Use Case | Examples |
|---|---|---|
| DEBUG | Development debugging, troubleshooting | Variable values, function calls, detailed request info |
| INFO | Normal business flow | Service start/stop, user login, order creation |
| WARN | Potential issues, degraded operations | Retry limit exceeded, cache miss, missing config |
| ERROR | Errors and exceptions | Database connection failure, request processing failure |
local logger = require "silly.logger"
-- Production environment: use INFO level
logger.setlevel(logger.INFO)
-- Debug mode: use DEBUG level
logger.setlevel(logger.DEBUG)
-- Check current level
if logger.getlevel() <= logger.DEBUG then
-- Only perform expensive serialization in DEBUG mode
local json = require "json"
logger.debug("Request details:", json.encode(request))
endFormatted Logging
Use formatted log functions (*f series) to improve log readability:
local logger = require "silly.logger"
-- Use string.format style
logger.infof("User [%s] completed %d operations in %d seconds",
username, duration, count)
logger.errorf("Order #%d processing failed: %s (error code: %d)",
order_id, error_msg, error_code)
-- Format parameters
logger.debugf("%.2f%% of requests completed within %dms",
percentage, latency_ms)Structured Logging
For easier log analysis, use structured log format:
local logger = require "silly.logger"
local json = require "silly.encoding.json"
-- Define log helper function
local function log_request(method, path, status, duration)
local log_entry = {
timestamp = os.time(),
level = "INFO",
event = "http_request",
method = method,
path = path,
status = status,
duration_ms = duration,
}
logger.info(json.encode(log_entry))
end
-- Usage
log_request("GET", "/api/users", 200, 15.3)
-- Output: {"timestamp":1703001234,"level":"INFO","event":"http_request",...}Log Rotation
Silly supports log rotation via signals to prevent unlimited log file growth:
-- Specify log file at startup
-- ./silly main.lua --logpath=/var/log/myapp.logShell script to perform log rotation:
#!/bin/bash
# rotate-logs.sh
LOG_FILE="/var/log/myapp.log"
APP_PID=$(cat /var/run/myapp.pid)
# 1. Rename current log file
mv "$LOG_FILE" "$LOG_FILE.$(date +%Y%m%d-%H%M%S)"
# 2. Send SIGUSR1 signal to make Silly reopen the log file
kill -USR1 "$APP_PID"
# 3. Compress old logs (optional)
gzip "$LOG_FILE".*
# 4. Clean up logs older than 7 days (optional)
find /var/log -name "myapp.log.*" -mtime +7 -deleteConfigure crontab for periodic execution:
# Execute log rotation at 2 AM daily
0 2 * * * /path/to/rotate-logs.shDynamic Log Level Adjustment
In production environments, dynamically adjust log levels via signals to avoid service restart:
local logger = require "silly.logger"
local signal = require "silly.signal"
-- Initialize to INFO level
logger.setlevel(logger.INFO)
-- Toggle DEBUG mode via SIGUSR2 signal
signal("SIGUSR2", function()
if logger.getlevel() == logger.DEBUG then
logger.setlevel(logger.INFO)
logger.info("Log level switched to INFO")
else
logger.setlevel(logger.DEBUG)
logger.info("Log level switched to DEBUG")
end
end)Switch log level:
# Switch to DEBUG mode
kill -USR2 <pid>
# Execute again to switch back to INFO mode
kill -USR2 <pid>Performance Monitoring
Prometheus Integration
Silly has built-in complete Prometheus metrics system, supporting Counter, Gauge, and Histogram metric types.
Creating /metrics Endpoint
The most basic monitoring integration is exposing a /metrics endpoint for Prometheus to scrape:
local http = require "silly.net.http"
local prometheus = require "silly.metrics.prometheus"
-- Start HTTP server
local server = http.listen {
addr = "0.0.0.0:8080",
handler = function(stream)
if stream.path == "/metrics" then
-- Collect all metrics and return in Prometheus format
local metrics = prometheus.gather()
stream:respond(200, {
["content-type"] = "text/plain; version=0.0.4; charset=utf-8",
})
stream:closewrite(metrics)
else
-- Business logic
stream:respond(200, {["content-type"] = "text/plain"})
stream:closewrite("Hello World")
end
end
}Built-in Metrics
prometheus.gather() automatically collects the following built-in metrics:
Silly Runtime Metrics:
silly_worker_backlog: Number of pending messages in worker queuesilly_timer_pending: Number of pending timerssilly_tasks_runnable: Number of runnable taskssilly_tcp_connections: Number of active TCP connectionssilly_network_sent_bytes_total: Total bytes sent over networksilly_network_received_bytes_total: Total bytes received from network
Process Resource Metrics:
process_cpu_seconds_user: User-mode CPU time (seconds)process_cpu_seconds_system: Kernel-mode CPU time (seconds)process_resident_memory_bytes: Resident memory size (bytes)process_heap_bytes: Heap memory size (bytes)
JeMalloc Metrics (if compiled with MALLOC=jemalloc):
- Detailed memory allocation statistics
Custom Metrics
Create custom metrics based on business needs:
Counter: Cumulative Counter
Counter can only increase, suitable for counting total requests, error counts, and other cumulative values:
local prometheus = require "silly.metrics.prometheus"
-- Create Counter
local http_requests_total = prometheus.counter(
"http_requests_total",
"Total HTTP requests",
{"method", "path", "status"}
)
-- Record requests
http_requests_total:labels("GET", "/api/users", "200"):inc()
http_requests_total:labels("POST", "/api/users", "201"):inc()
http_requests_total:labels("GET", "/api/users", "500"):inc()Gauge: Gauge
Gauge can increase or decrease, suitable for counting current connections, queue depth, and other instantaneous values:
local prometheus = require "silly.metrics.prometheus"
-- Create Gauge
local active_connections = prometheus.gauge(
"active_connections",
"Current active connections"
)
local queue_depth = prometheus.gauge(
"queue_depth",
"Queue depth",
{"queue_name"}
)
-- Usage
active_connections:inc() -- Increase by 1
active_connections:dec() -- Decrease by 1
active_connections:set(42) -- Set to 42
active_connections:add(10) -- Add 10
active_connections:sub(5) -- Subtract 5
queue_depth:labels("jobs"):set(128)Histogram: Histogram
Histogram counts data distribution, suitable for counting latency, response size, and other scenarios requiring percentile analysis:
local prometheus = require "silly.metrics.prometheus"
-- Create Histogram (default buckets)
local request_duration = prometheus.histogram(
"http_request_duration_seconds",
"HTTP request duration (seconds)"
)
-- Custom bucket boundaries
local response_size = prometheus.histogram(
"http_response_size_bytes",
"HTTP response size (bytes)",
{"method"},
{100, 500, 1000, 5000, 10000, 50000, 100000}
)
-- Record observations
local start = os.clock()
-- ... process request ...
local duration = os.clock() - start
request_duration:observe(duration)
response_size:labels("GET"):observe(1234)Complete Monitoring Example
An HTTP service example with complete monitoring:
local silly = require "silly"
local http = require "silly.net.http"
local logger = require "silly.logger"
local prometheus = require "silly.metrics.prometheus"
-- Define metrics
local http_requests_total = prometheus.counter(
"myapp_http_requests_total",
"Total HTTP requests",
{"method", "path", "status"}
)
local http_requests_in_flight = prometheus.gauge(
"myapp_http_requests_in_flight",
"Number of HTTP requests in flight"
)
local http_request_duration = prometheus.histogram(
"myapp_http_request_duration_seconds",
"HTTP request duration (seconds)",
{"method", "path"},
{0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0}
)
-- HTTP handler function
local function handle_request(stream)
local start = os.clock()
http_requests_in_flight:inc()
-- Handle different paths
local status_code = 200
local response_body = ""
if stream.path == "/metrics" then
-- Prometheus metrics endpoint
local metrics = prometheus.gather()
stream:respond(200, {
["content-type"] = "text/plain; version=0.0.4",
})
stream:closewrite(metrics)
elseif stream.path == "/api/users" then
-- Business API
logger.info("Handling user API request:", stream.method)
response_body = '{"users": []}'
stream:respond(200, {["content-type"] = "application/json"})
stream:closewrite(response_body)
else
-- 404
status_code = 404
response_body = "Not Found"
stream:respond(404, {["content-type"] = "text/plain"})
stream:closewrite(response_body)
end
-- Record metrics
local duration = os.clock() - start
http_requests_in_flight:dec()
http_request_duration:labels(stream.method, stream.path):observe(duration)
http_requests_total:labels(stream.method, stream.path, tostring(status_code)):inc()
-- Record log
logger.infof("%s %s %d %.3fs",
stream.method, stream.path, status_code, duration)
end
-- Start service
local server = http.listen {
addr = "0.0.0.0:8080",
handler = function(stream)
local ok, err = silly.pcall(handle_request, stream)
if not ok then
logger.error("Request handling failed:", err)
stream:respond(500, {["content-type"] = "text/plain"})
stream:closewrite("Internal Server Error")
end
end
}
logger.info("Server started on 0.0.0.0:8080")
logger.info("Prometheus metrics: http://localhost:8080/metrics")Grafana Visualization
Configure Prometheus to scrape metrics from Silly application:
# prometheus.yml
scrape_configs:
- job_name: 'silly-app'
scrape_interval: 15s
static_configs:
- targets: ['localhost:8080']
labels:
app: 'my-silly-app'
env: 'production'Create dashboard in Grafana with common queries:
# QPS (queries per second)
rate(myapp_http_requests_total[1m])
# QPS grouped by status code
sum by (status) (rate(myapp_http_requests_total[1m]))
# P95 latency
histogram_quantile(0.95, rate(myapp_http_request_duration_seconds_bucket[5m]))
# Error rate
rate(myapp_http_requests_total{status=~"5.."}[1m])
/
rate(myapp_http_requests_total[1m])
# Current active connections
myapp_http_requests_in_flight
# Memory usage
process_resident_memory_bytes
# CPU usage rate
rate(process_cpu_seconds_total[1m]) * 100Request Tracing
Trace ID Generation
Silly provides a distributed trace ID system where each coroutine has an independent trace ID:
local silly = require "silly"
local task = require "silly.task"
local logger = require "silly.logger"
task.fork(function()
-- Create new trace ID (if current coroutine doesn't have one)
local old_trace_id = trace.spawn()
logger.infof("Start processing request")
logger.infof("Request processing completed")
trace.attach(old_trace_id)
end)Cross-Service Tracing
In microservice architecture, trace IDs need to be propagated to downstream services:
local silly = require "silly"
local http = require "silly.net.http"
local logger = require "silly.logger"
local httpc = http.newclient()
-- Service A: Initiate HTTP request
local function call_service_b()
-- Generate trace ID for propagation
local trace_id = trace.propagate()
logger.info("Calling service B")
-- Pass trace ID via HTTP Header
local response = httpc:request {
method = "POST",
url = "http://service-b:8080/api/process",
headers = {
["X-Trace-Id"] = tostring(trace_id),
},
body = '{"data": "value"}',
}
return response
end
-- Service B: Receive request and use incoming trace ID
local server = http.listen {
addr = "0.0.0.0:8080",
handler = function(stream)
-- Extract and set trace ID
local trace_id = tonumber(stream.headers["x-trace-id"])
if trace_id then
trace.attach(trace_id)
else
trace_id = trace.spawn()
end
logger.info("Service B received request")
-- Process business logic
stream:respond(200, {["content-type"] = "application/json"})
stream:closewrite('{"status": "ok"}')
end
}Automatic RPC Tracing
When making RPC calls using silly.net.cluster, trace IDs are automatically propagated:
local cluster = require "silly.net.cluster"
local logger = require "silly.logger"
-- Create cluster service
cluster.serve {
marshal = ...,
unmarshal = ...,
call = function(peer, cmd, body)
-- trace ID is automatically set by cluster, logger will use it automatically
logger.info("RPC call:", cmd)
-- Handle RPC request
return handle_rpc(body, cmd)
end,
close = function(peer, errno)
logger.info("RPC connection closed, errno:", errno)
end,
}
-- Initiate RPC call (trace ID automatically propagated)
local peer = cluster.connect("127.0.0.1:8080")
local result = cluster.call(peer, "get_user", {user_id = 123})Log Correlation
Integrate trace ID into logs to achieve complete request tracking:
local silly = require "silly"
local logger = require "silly.logger"
local json = require "silly.encoding.json"
-- Structured log helper function
local function log_with_trace(level, event, data)
local log_entry = {
timestamp = os.time(),
level = level,
event = event,
}
-- Merge data
for k, v in pairs(data or {}) do
log_entry[k] = v
end
local log_str = json.encode(log_entry)
if level == "ERROR" then
logger.error(log_str)
elseif level == "WARN" then
logger.warn(log_str)
elseif level == "DEBUG" then
logger.debug(log_str)
else
logger.info(log_str)
end
end
-- Usage
log_with_trace("INFO", "user_login", {
user_id = 12345,
ip = "192.168.1.100",
})
log_with_trace("ERROR", "database_error", {
query = "SELECT * FROM users",
error = "connection timeout",
})In log collection systems (like ELK), you can query the complete log chain of a request via trace_id.
Alert Configuration
Prometheus Alert Rules
Configure alert rules in Prometheus:
# alerts.yml
groups:
- name: silly_app_alerts
interval: 30s
rules:
# High error rate
- alert: HighErrorRate
expr: |
rate(myapp_http_requests_total{status=~"5.."}[5m])
/
rate(myapp_http_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate: {{ $value | humanizePercentage }}"
description: "Application {{ $labels.app }} error rate exceeds 5%"
# High P95 latency
- alert: HighLatency
expr: |
histogram_quantile(0.95,
rate(myapp_http_request_duration_seconds_bucket[5m])
) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "High P95 latency: {{ $value }}s"
description: "Application {{ $labels.app }} P95 latency exceeds 1 second"
# High memory usage
- alert: HighMemoryUsage
expr: process_resident_memory_bytes > 1073741824 # 1GB
for: 10m
labels:
severity: critical
annotations:
summary: "High memory usage: {{ $value | humanize1024 }}"
description: "Application {{ $labels.app }} memory usage exceeds 1GB"
# Worker queue backlog
- alert: WorkerBacklog
expr: silly_worker_backlog > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "Worker queue backlog: {{ $value }} messages"
description: "Application {{ $labels.app }} has severe worker queue backlog"
# Service down
- alert: ServiceDown
expr: up{job="silly-app"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service unavailable"
description: "Application {{ $labels.app }} is unreachable"Alert Channels
Configure Alertmanager to send alerts:
# alertmanager.yml
route:
group_by: ['alertname', 'app']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'default'
routes:
# Critical alerts notify immediately
- match:
severity: critical
receiver: 'pager'
continue: true
# Warning-level alerts send to email
- match:
severity: warning
receiver: 'email'
receivers:
# Default receiver
- name: 'default'
webhook_configs:
- url: 'http://webhook-service:8080/alerts'
# Email notification
- name: 'email'
email_configs:
- to: 'ops@example.com'
from: 'alertmanager@example.com'
smarthost: 'smtp.example.com:587'
auth_username: 'alertmanager@example.com'
auth_password: 'password'
# Emergency pager
- name: 'pager'
webhook_configs:
- url: 'http://pagerduty-integration:8080/alert'In-Application Alerts
You can also implement simple alert logic within the application:
local silly = require "silly"
local time = require "silly.time"
local logger = require "silly.logger"
local prometheus = require "silly.metrics.prometheus"
-- Define alert thresholds
local ALERT_CONFIG = {
error_rate_threshold = 0.05, -- 5% error rate
latency_p95_threshold = 1.0, -- 1 second
memory_threshold = 1073741824, -- 1GB
}
-- Alert state
local alert_state = {
error_rate_fired = false,
latency_fired = false,
memory_fired = false,
}
-- Send alert
local function send_alert(alert_name, message)
logger.errorf("[ALERT] %s: %s", alert_name, message)
-- You can integrate alert channels here, such as HTTP callbacks, email, etc.
-- http.post(
-- "http://alert-service/webhook",
-- {},
-- body = json.encode({
-- alert = alert_name,
-- message = message,
-- timestamp = os.time(),
-- })
-- }
end
-- Periodically check metrics
local function check_alerts()
-- This is a simplified example, should actually calculate from Prometheus metrics
local error_rate = 0.06 -- Example value
local latency_p95 = 1.2 -- Example value
local memory_usage = 1200000000 -- Example value
-- Check error rate
if error_rate > ALERT_CONFIG.error_rate_threshold then
if not alert_state.error_rate_fired then
send_alert("HighErrorRate",
string.format("Error rate %.2f%% exceeds threshold %.2f%%",
error_rate * 100,
ALERT_CONFIG.error_rate_threshold * 100))
alert_state.error_rate_fired = true
end
else
alert_state.error_rate_fired = false
end
-- Check latency
if latency_p95 > ALERT_CONFIG.latency_p95_threshold then
if not alert_state.latency_fired then
send_alert("HighLatency",
string.format("P95 latency %.2fs exceeds threshold %.2fs",
latency_p95,
ALERT_CONFIG.latency_p95_threshold))
alert_state.latency_fired = true
end
else
alert_state.latency_fired = false
end
-- Check memory
if memory_usage > ALERT_CONFIG.memory_threshold then
if not alert_state.memory_fired then
send_alert("HighMemoryUsage",
string.format("Memory usage %d MB exceeds threshold %d MB",
memory_usage / 1024 / 1024,
ALERT_CONFIG.memory_threshold / 1024 / 1024))
alert_state.memory_fired = true
end
else
alert_state.memory_fired = false
end
end
-- Check every 60 seconds
task.fork(function()
while true do
time.sleep(60000)
check_alerts()
end
end)Complete Example: Production-Grade HTTP Service
A production-grade HTTP service example with complete logging, monitoring, and tracing:
local silly = require "silly"
local http = require "silly.net.http"
local logger = require "silly.logger"
local signal = require "silly.signal"
local time = require "silly.time"
local prometheus = require "silly.metrics.prometheus"
local json = require "silly.encoding.json"
-- ========== Logging Configuration ==========
logger.setlevel(logger.INFO)
-- Dynamically adjust log level
signal("SIGUSR2", function()
if logger.getlevel() == logger.DEBUG then
logger.setlevel(logger.INFO)
logger.info("Log level switched to INFO")
else
logger.setlevel(logger.DEBUG)
logger.info("Log level switched to DEBUG")
end
end)
-- ========== Monitoring Metrics ==========
-- Request metrics
local http_requests_total = prometheus.counter(
"api_http_requests_total",
"Total HTTP requests",
{"method", "path", "status"}
)
local http_requests_in_flight = prometheus.gauge(
"api_http_requests_in_flight",
"Number of HTTP requests in flight"
)
local http_request_duration = prometheus.histogram(
"api_http_request_duration_seconds",
"HTTP request duration (seconds)",
{"method", "path"},
{0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0}
)
local http_request_size = prometheus.histogram(
"api_http_request_size_bytes",
"HTTP request size (bytes)",
nil,
{100, 1000, 10000, 100000, 1000000}
)
local http_response_size = prometheus.histogram(
"api_http_response_size_bytes",
"HTTP response size (bytes)",
nil,
{100, 1000, 10000, 100000, 1000000}
)
-- Business metrics
local user_operations = prometheus.counter(
"api_user_operations_total",
"Total user operations",
{"operation", "status"}
)
-- ========== Structured Logging ==========
local function log_request(trace_id, method, path, status, duration, req_size, resp_size)
local log_entry = {
timestamp = os.time(),
trace_id = trace_id,
level = "INFO",
event = "http_request",
method = method,
path = path,
status = status,
duration_ms = duration * 1000,
request_size_bytes = req_size,
response_size_bytes = resp_size,
}
logger.info(json.encode(log_entry))
end
-- ========== Business Handlers ==========
local function handle_user_get(stream)
logger.debug("Getting user list")
-- Simulate database query
time.sleep(10)
local response = json.encode({
users = {
{id = 1, name = "Alice"},
{id = 2, name = "Bob"},
}
})
user_operations:labels("get_users", "success"):inc()
return 200, response
end
local function handle_user_post(stream)
logger.debug("Creating user")
-- Simulate database insert
time.sleep(20)
local response = json.encode({
id = 3,
name = "Charlie",
status = "created",
})
user_operations:labels("create_user", "success"):inc()
return 201, response
end
-- ========== HTTP Handler ==========
local function handle_request(stream)
local start = os.clock()
http_requests_in_flight:inc()
-- Get or create trace ID
local trace_id = tonumber(stream.headers["x-trace-id"])
if trace_id then
silly.traceset(trace_id)
else
silly.tracespawn()
trace_id = silly.tracepropagate() -- Get current trace ID for response header
end
-- Record request size
local req_size = tonumber(stream.headers["content-length"]) or 0
http_request_size:observe(req_size)
-- Route handling
local status_code = 200
local response_body = ""
if stream.path == "/metrics" then
-- Prometheus metrics endpoint
local metrics = prometheus.gather()
stream:respond(200, {
["content-type"] = "text/plain; version=0.0.4",
})
stream:closewrite(metrics)
status_code = 200
response_body = metrics
elseif stream.path == "/api/users" then
-- User API
if stream.method == "GET" then
status_code, response_body = handle_user_get(stream)
elseif stream.method == "POST" then
status_code, response_body = handle_user_post(stream)
else
status_code = 405
response_body = "Method Not Allowed"
end
stream:respond(status_code, {
["content-type"] = "application/json",
["x-trace-id"] = tostring(trace_id),
})
stream:closewrite(response_body)
elseif stream.path == "/health" then
-- Health check
status_code = 200
response_body = json.encode({status = "healthy"})
stream:respond(status_code, {["content-type"] = "application/json"})
stream:closewrite(response_body)
else
-- 404
status_code = 404
response_body = json.encode({error = "Not Found"})
stream:respond(status_code, {["content-type"] = "application/json"})
stream:closewrite(response_body)
end
-- Record metrics
local duration = os.clock() - start
http_requests_in_flight:dec()
http_response_size:observe(#response_body)
http_request_duration:labels(stream.method, stream.path):observe(duration)
http_requests_total:labels(stream.method, stream.path, tostring(status_code)):inc()
-- Record log
log_request(trace_id, stream.method, stream.path, status_code,
duration, req_size, #response_body)
end
-- ========== Start Service ==========
local server = http.listen {
addr = "0.0.0.0:8080",
handler = function(stream)
local ok, err = silly.pcall(handle_request, stream)
if not ok then
silly.tracespawn() -- Create new trace ID
logger.error("Request handling failed:", err)
stream:respond(500, {["content-type"] = "application/json"})
stream:closewrite(json.encode({error = "Internal Server Error"}))
http_requests_total:labels(stream.method, stream.path, "500"):inc()
end
end
}
logger.info("========================================")
logger.info("Application started successfully")
logger.infof("API Service: http://localhost:8080/api/users")
logger.infof("Health Check: http://localhost:8080/health")
logger.infof("Monitoring Metrics: http://localhost:8080/metrics")
logger.info("========================================")
logger.info("Signal Controls:")
logger.info(" kill -USR1 <pid> # Reopen log file")
logger.info(" kill -USR2 <pid> # Toggle log level (INFO <-> DEBUG)")
logger.info("========================================")Monitoring Dashboard Configuration
Grafana Dashboard JSON
Create a Grafana dashboard to visualize monitoring data from Silly applications:
{
"dashboard": {
"title": "Silly Application Monitoring",
"panels": [
{
"title": "QPS",
"targets": [
{
"expr": "sum(rate(api_http_requests_total[1m]))"
}
]
},
{
"title": "Error Rate",
"targets": [
{
"expr": "sum(rate(api_http_requests_total{status=~\"5..\"}[1m])) / sum(rate(api_http_requests_total[1m]))"
}
]
},
{
"title": "P95 Latency",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(api_http_request_duration_seconds_bucket[5m]))"
}
]
},
{
"title": "Active Connections",
"targets": [
{
"expr": "silly_tcp_connections"
}
]
},
{
"title": "Memory Usage",
"targets": [
{
"expr": "process_resident_memory_bytes"
}
]
},
{
"title": "Worker Queue Depth",
"targets": [
{
"expr": "silly_worker_backlog"
}
]
}
]
}
}Best Practices
Logging Best Practices
- Use log levels appropriately: Avoid using DEBUG level in production, which generates excessive logs
- Structured logging: Use JSON format for easier log collection and analysis
- Avoid sensitive information: Don't log passwords, tokens, or other sensitive data
- Control log volume: For high-frequency operations, consider sampling logs
- Regular rotation: Avoid unlimited log file growth
Monitoring Best Practices
- Metric naming: Follow Prometheus naming conventions (snake_case, with unit suffixes)
- Avoid high cardinality: Don't use user IDs as labels
- Choose metric types appropriately:
- Counter: Cumulative values (total requests)
- Gauge: Instantaneous values (current connections)
- Histogram: Distribution (latency)
- Set reasonable bucket boundaries: Choose Histogram buckets based on actual data distribution
- Monitor key business metrics: Monitor not only system metrics but also business metrics
Tracing Best Practices
- Always propagate trace ID: Pass trace ID in cross-service calls
- Log correlation: Integrate trace ID into logs for easier troubleshooting
- Retain sufficient information: Include trace ID in logs, metrics, and error reports
See Also
- silly.logger - Logger API Reference
- silly.metrics.prometheus - Prometheus Metrics API Reference
- silly - Core Module
- silly.signal - Signal Handling