RW health
Cluster-wide signals — apply to every database hosted on this RisingWave cluster.
No CDC sources reported by rw_catalog.rw_sources.
i
p95 barrier latency. A barrier is a checkpoint that flows through every actor in the streaming graph; latency = how long the slowest 5% take to round-trip. Source: Prom histogram meta_barrier_duration_seconds_bucket. Healthy on this cluster is typically <500ms.
Barrier latency p95
N/A
prometheus not wired
i
p99 barrier latency. Sustained >1000ms = backpressure or slow sinks (warn). >3000ms = streaming graph isn't keeping up (fail). Check skewed actors, slow downstream sinks, or compactor lag. Source: Prom meta_barrier_duration_seconds_bucket.
Barrier latency p99
N/A
prometheus not wired
i
Total streaming actors across the cluster. Actors are the unit of parallelism — every operator splits into N actors equal to its parallelism. Hard limit: 400 actors per compute-node CPU core (over that, the scheduler starts dropping barriers). Source: rw_catalog.rw_actors.
Total actors
0
0/core
i
In-flight CREATE/ALTER DDL count. Non-zero during an active migration (expand or contract phase). Stays at 0 between deploys; spikes here mirror the progress visible on the migration detail page. Source: rw_catalog.rw_ddl_progress.
Pending DDL jobs
0
i
Hummock storage write throughput (bytes/sec) — how fast the compute nodes are flushing SST files to object storage. Spikes during large MV backfills. Source: Prom object_store_write_bytes.
Storage write
0 Bytes/s
i
Total Hummock storage size (object store footprint) across every MV / table / sink. Grows monotonically until compaction reclaims space. Source: rw_catalog.rw_table_stats + Prom storage_current_version_object_size.
Storage size
N/A
hummock — prometheus not wired
i
L0 SST files not yet compacted into higher LSM levels. Above 20 = compactor falling behind writes (warn); above 50 = compactor stuck (fail). High L0 makes reads scan more files → query latency creeps up. Source: Prom storage_level_total_file_size.
Compaction L0
N/A
prometheus not wired
i
Worst-case lag across all CDC source tables — time between a row being committed upstream (postgres / mysql) and arriving in RisingWave. p99, max across tables. Warn >5s; fail >30s. Source: Prom source_cdc_event_lag_duration_milliseconds (histogram, per table_name label).
CDC max lag
N/A
metric not exposed by this RW
i
Total streaming jobs (materialized views + sinks + sources + CDC tables) currently registered. ``running`` counts active actors; ``paused`` are sinks/MVs the operator stopped explicitly. Source: rw_catalog.rw_streaming_jobs.
Streaming jobs
0
0 running
| Node | Type | State | Parallelism | CPU | Memory |
| 10.244.41.50:6660 |
COMPACTOR |
RUNNING |
48 |
16 cores |
4.0 GiB |
| risingwave-alinma-compute-0.risingwave-alinma-compute-headless.alinma-rw.svc:5688 |
COMPUTE |
RUNNING |
16 |
16 cores |
16.0 GiB |
| 10.244.41.49:4567 |
FRONTEND |
RUNNING |
— |
16 cores |
4.0 GiB |
| risingwave-alinma-meta-0.risingwave-alinma-meta-headless.alinma-rw.svc:5690 |
META |
RUNNING |
— |
16 cores |
4.0 GiB |
Cluster metrics combine rw_catalog reads (workers, actors, jobs) and Prometheus signals (barriers, storage, compaction — N/A until the Prom adapter lands).