Skip to main content
Back to selected work

Engagement · Non-profit · Bronxville, NY

Christ Church Bronxville. UDM Pro under load.

A year of intermittent UniFi console disconnections, traced to two compounding root causes — a legacy firmware backup file pushing the OS partition to 99% and an OOM killer reaping the unifi-core process under a four-application load. Resolved in two on-site sessions, with every change logged and signed off by the client.

Client
Christ Church Bronxville
Sector
Non-profit
Location
Bronxville, NY
Engagement
Forensic diagnostic + remediation
Performed by
ShiftCTRL
Device under review
UniFi Dream Machine Pro
§ 01 · Executive summary

Two compounding faults under a four-app load.

The UDM Pro was running all four UniFi applications — Network, Protect, Access, and Talk — on a 4 GB device under combined application load. The reported symptom was intermittent UniFi console disconnections under normal operating load.

Root cause analysis surfaced two compounding faults: a legacy firmware backup file at ~897 MB had pushed the OS partition to 99% capacity, and the Linux Out-of-Memory killer was issuing SIGKILL to the unifi-core process when available memory dropped to ~66 MB. Both faults were resolved on-site. With explicit client authorization, secondary remediation steps were also completed: VoIP log files purged, memory-snapshot logs cleared, the unifi-talk service restarted to release open file descriptors, and optional services tuned in coordination with the client to recover ~233 MB of RAM.

The device is now stable. The capacity constraint underneath it is structural, not configurational; the recommendations section below outlines the path from a no-cost log-rotation cron job to a single-device hardware refresh.

§ 02 · System overview

What was running, where.

Parameter
Value
Notes
Device
UniFi Dream Machine Pro
4 GB RAM
Active applications
Network · Protect · Access · Talk
All four
WAN configuration
Verizon (WAN1) + Optimum (WAN2)
Load-balanced
Secondary UDM Pro
1× idle
See Rec § 05.2
§ 03 · Findings

Four findings. Two primary, two structural.

FINDING 01

OS partition at 99% capacityPRIMARY

The root filesystem (/) sat at 99% capacity, driven by a ~897 MB legacy firmware backup file left behind by a prior firmware upgrade. When a Linux filesystem reaches capacity, the kernel cannot write process state, PID files, or runtime sockets, which manifests as sudden console disconnections.

FINDING 02

OOM kills on unifi-corePRIMARY

With ~66 MB of available RAM, the Linux OOM killer was sending SIGKILL (9) to unifi-core — the central management process — producing the observed console drops. Per-app footprints under load: Network ~780 MB, Protect ~124 MB, Access ~50 MB, Talk ~100 MB, plus optional security services at ~233 MB. On a 4 GB device, this leaves no headroom under normal operating conditions.

FINDING 03

FreeSwitch logs accumulating without rotationSECONDARY

The /var/log partition had grown to 89%, principally from FreeSwitch (UniFi Talk) log files. UniFi does not ship a built-in log rotation setting for the Talk application. A second behavior compounds the issue: deleting log files while unifi-talk holds open file descriptors does not reclaim disk blocks until the service restarts.

FINDING 04

Hardware capacity constraintSTRUCTURAL

The UDM Pro’s 4 GB of RAM is operating at its practical limit with all four applications active. Even after software optimizations, the UniFi Network Java application alone consumes ~780 MB RSS, with optional security services adding ~233 MB on top. This is a structural hardware limitation, not a configuration error. The recommendations section addresses it in three steps: no-cost, no-additional-cost, and capital.

§ 04 · Actions taken

What we did, in two sessions.

  • SESSION 01 — INITIAL REMEDIATION

    • Identified and removed a ~897 MB legacy firmware backup file from the OS partition. Usage dropped from 99% to ~45%.
    • Cleared stale system log files in non-rotating directories.
    • Reviewed running processes and memory allocation (ps aux, free -h).
    • Identified the FreeSwitch log accumulation as secondary disk pressure and documented remediation commands for client review prior to execution.
  • SESSION 02 — REMEDIATION Authorized by Nelson, CCB

    • Deleted FreeSwitch VoIP log files. rm -rf /var/log/freeswitch/* — authorized by client.
    • Deleted memory-snapshot logs. rm -rf /var/log/mem_snapshot/* — authorized by client.
    • Restarted the unifi-talk service to release open file handles. Log partition fell from 89% to 22% (711 MB free).
    • Tuned optional services in coordination with the client to relieve memory pressure. ~233 MB RAM recovered. Available RAM rose from ~66 MB to ~368 MB.
§ 05 · Before / after

Measured at the close of session 02.

Metric
Before
After
Console disconnections
Frequent / OOM kills
Resolved
OS partition usage
99% (legacy backup, ~897 MB)
~45% — backup removed
Log partition usage
89% (VoIP logs)
22% — 711 MB free
Available RAM
~66 MB (critical)
~368 MB (stable)
§ 06 · Recommendations

From no-cost to capital. In that order.

REC 01 · LOG ROTATIONNo cost

Configure FreeSwitch log rotation.

UniFi Talk does not include a built-in log-rotation setting. Without intervention, FreeSwitch logs return the partition to a critical state. A cron job on a weekly schedule purges logs older than 7–14 days. Brief maintenance window, SSH access only.

REC 02 · HA FAILOVERNo additional cost

Deploy the second UDM Pro in HA failover mode.

The client has a second UDM Pro currently idle. UniFi does not support application clustering across UDM Pros, so this unit cannot distribute RAM load. It can, however, run as a hot standby:

  • The secondary takes over automatically if the primary fails.
  • Failover requires no user-side action.
  • Both units stay synchronized to the primary configuration.

Important: HA failover is a business-continuity measure, not a performance fix. RAM pressure on the primary is unaffected.

REC 03 · OFFLOAD PROTECTHardware refresh

Move Protect to a dedicated UNVR.

UniFi Protect is architecturally designed to run on dedicated NVR hardware. Moving it to a UNVR frees ~124 MB of RAM on the primary, dedicates storage and processing to camera feeds, and restores headroom for the full application stack. Camera configurations migrate with minimal disruption.

REC 04 · UPGRADE PRIMARYHardware refresh

UDM Pro Max as primary, 8 GB RAM.

The cleanest single-device resolution. The Max runs all four applications at full capacity; the existing UniFi backup restores directly. The retired UDM Pro becomes the HA standby (Rec 02), making the two investments complementary.

Best-outcome path: UDM Pro Max as primary + existing UDM Pro as HA failover — full RAM headroom and full redundancy across the application stack.

Note — restore full capacity once hardware landsMemory-relief tuning was a stop-gap, not the destination. Once Rec 03 or Rec 04 lands and headroom is restored, all optional services should be brought back online at full capacity.
GET IN TOUCH

Looking at a similar issue?

UDM Pro instability, OOM symptoms, console drops under four-app load — a common failure mode. If your console looks like the before column, we'll come on-site, root-cause it, and leave you knowing exactly what was wrong and what we changed.