Skip to main content
Back to selected work
// NETWORK · DIAGNOSTIC + REMEDIATION

Christ Church Bronxville. UDM Pro under load.

A year of intermittent UniFi console disconnections, traced to two compounding root causes — a legacy firmware backup file pushing the OS partition to 99% and an OOM killer reaping the unifi-core process under a four-application load. Resolved in two on-site sessions with a written, client-authorized exit report.

Client
Christ Church Bronxville
Sector
Faith / community
Location
Bronxville, NY
Engagement
Forensic diagnostic + remediation
Performed by
ShiftCTRL
Device under review
UniFi Dream Machine Pro
§ 01 · Executive summary

Two compounding faults under a four-app load.

The UDM Pro was running all four UniFi applications — Network, Protect, Access, and Talk — on a 4 GB device under combined application load. The reported symptom was intermittent UniFi console disconnections under normal operating load.

Root cause analysis surfaced two compounding faults: a legacy firmware backup file at ~897 MB had pushed the OS partition to 99% capacity, and the Linux Out-of-Memory killer was issuing SIGKILL to the unifi-core process when available memory dropped to ~66 MB. Both faults were resolved on-site. With explicit client authorization, secondary remediation steps were also completed: VoIP log files purged, memory-snapshot logs cleared, the unifi-talk service restarted to release open file descriptors, and optional services tuned in coordination with the client to recover ~233 MB of RAM.

The device is now stable. The capacity constraint underneath it is structural, not configurational; the recommendations section below outlines the path from a no-cost log-rotation cron job to a single-device hardware refresh.

§ 02 · System overview

What was running, where.

Parameter
Value
Notes
Device
UniFi Dream Machine Pro
4 GB RAM
Active applications
Network · Protect · Access · Talk
All four
WAN configuration
Verizon (WAN1) + Optimum (WAN2)
Load-balanced
Secondary UDM Pro
1× idle
See Rec § 05.2
§ 03 · Findings

Four findings. Two primary, two structural.

// FINDING 01

OS partition at 99% capacityPRIMARY

The root filesystem (/) sat at 99% capacity, driven by a ~897 MB legacy firmware backup file left behind by a prior firmware upgrade. When a Linux filesystem reaches capacity, the kernel cannot write process state, PID files, or runtime sockets, which manifests as sudden console disconnections.

// FINDING 02

OOM kills on unifi-corePRIMARY

With ~66 MB of available RAM, the Linux OOM killer was sending SIGKILL (9) to unifi-core— the central management process — producing the observed console drops. Per-app footprints under load: Network ~780 MB, Protect ~124 MB, Access ~50 MB, Talk ~100 MB, plus optional security services at ~233 MB. On a 4 GB device, this leaves no headroom under normal operating conditions.

// FINDING 03

FreeSwitch logs accumulating without rotationSECONDARY

The /var/log partition had grown to 89%, principally from FreeSwitch (UniFi Talk) log files. UniFi does not ship a built-in log rotation setting for the Talk application. A second behavior compounds the issue: deleting log files while unifi-talk holds open file descriptors does not reclaim disk blocks until the service restarts.

// FINDING 04

Hardware capacity constraintSTRUCTURAL

The UDM Pro's 4 GB of RAM is operating at its practical limit with all four applications active. Even after software optimizations, the UniFi Network Java application alone consumes ~780 MB RSS, with optional security services adding ~233 MB on top. This is a structural hardware limitation, not a configuration error. The recommendations section addresses it in three steps: no-cost, no-additional-cost, and capital.

§ 04 · Actions taken

What we did, in two sessions.

  • SESSION 01 — INITIAL REMEDIATION

    • Identified and removed a ~897 MB legacy firmware backup file from the OS partition. Usage dropped from 99% to ~45%.
    • Cleared stale system log files in non-rotating directories.
    • Reviewed running processes and memory allocation (ps aux, free -h).
    • Identified the FreeSwitch log accumulation as secondary disk pressure and documented remediation commands for client review prior to execution.
  • SESSION 02 — REMEDIATION // authorized by Nelson, CCB

    • Deleted FreeSwitch VoIP log files. rm -rf /var/log/freeswitch/*— authorized by client.
    • Deleted memory-snapshot logs. rm -rf /var/log/mem_snapshot/*— authorized by client.
    • Restarted the unifi-talk service to release open file handles. Log partition fell from 89% to 22% (711 MB free).
    • Tuned optional services in coordination with the client to relieve memory pressure. ~233 MB RAM recovered. Available RAM rose from ~66 MB to ~368 MB.
§ 05 · Before / after

Measured at the close of session 02.

Metric
Before
After
Console disconnections
Frequent / OOM kills
Resolved
OS partition usage
99% (legacy backup, ~897 MB)
~45% — backup removed
Log partition usage
89% (VoIP logs)
22% — 711 MB free
Available RAM
~66 MB (critical)
~368 MB (stable)
§ 06 · Recommendations

From no-cost to capital. In that order.

REC 01 · LOG ROTATIONNo cost

Configure FreeSwitch log rotation.

UniFi Talk does not include a built-in log-rotation setting. Without intervention, FreeSwitch logs return the partition to a critical state. A cronjob on a weekly schedule purges logs older than 7–14 days. Brief maintenance window, SSH access only.

REC 02 · HA FAILOVERNo additional cost

Deploy the second UDM Pro in HA failover mode.

The client has a second UDM Pro currently idle. UniFi does not support application clustering across UDM Pros, so this unit cannot distribute RAM load. It can, however, run as a hot standby:

  • The secondary takes over automatically if the primary fails.
  • Failover is seamless under user load.
  • Both units stay synchronized to the primary configuration.

Important: HA failover is a business-continuity measure, not a performance fix. RAM pressure on the primary is unaffected.

REC 03 · OFFLOAD PROTECTHardware refresh

Move Protect to a dedicated UNVR.

UniFi Protect is architecturally designed to run on dedicated NVR hardware. Moving it to a UNVR frees ~124 MB of RAM on the primary, dedicates storage and processing to camera feeds, and restores headroom for the full application stack. Camera configurations migrate with minimal disruption.

REC 04 · UPGRADE PRIMARYHardware refresh

UDM Pro Max as primary, 8 GB RAM.

The cleanest single-device resolution. The Max runs all four applications at full capacity; the existing UniFi backup restores directly. The retired UDM Pro becomes the HA standby (Rec 02), making the two investments complementary.

Best-outcome path:UDM Pro Max as primary + existing UDM Pro as HA failover — full RAM headroom and full redundancy across the application stack.

// NOTE — RESTORE FULL CAPACITY ONCE HARDWARE LANDSMemory-relief tuning was a stop-gap, not the destination. Once Rec 03 or Rec 04 lands and headroom is restored, all optional services should be brought back online at full capacity.
// DISCLAIMER · INDEPENDENT-AGENCY TERMS

This case study is presented for informational purposes only and illustrates the professional services performed by ShiftCTRL in collaboration with the named client. By accessing or referencing this case study, you acknowledge and agree to the following terms.

  1. Independent agency status.ShiftCTRL operates as an independent engineering firm providing professional services including consulting, software development, system design, implementation, and optimization.
  2. Ownership of intellectual property.All intellectual property — including patents, trademarks, copyrights, trade secrets, and other proprietary rights associated with the systems described — is the sole property of the respective client. ShiftCTRL does not claim ownership over any intellectual property developed during its engagements.
  3. Client responsibility.The client retains full ownership, responsibility, and control over their technology, systems, and data. ShiftCTRL's role is strictly limited to providing professional services as outlined in the contractual agreement.
  4. No endorsement or affiliation.This case study does not imply formal endorsement, partnership, or affiliation between ShiftCTRL and the client unless explicitly stated in writing by both parties.
  5. Confidentiality and non-disclosure.ShiftCTRL upholds the privacy and confidentiality agreements in place with its clients. All information presented adheres to those obligations; sensitive details are omitted or anonymized where required.
  6. Limitation of liability.ShiftCTRL is not liable for any direct, indirect, incidental, consequential, or special damages arising from the use or interpretation of this case study. Outcomes are specific to the client's circumstances and are not warranties of similar results for other engagements.
  7. Scope of responsibility.ShiftCTRL's responsibilities for the operation, maintenance, performance, security, or compliance of the systems described are defined by the contractual agreement established with the client and do not extend beyond its terms.
// GET IN TOUCH

Looking at a similar issue?

UDM Pro instability, OOM symptoms, console drops, four-app load — these are not unusual. If your console looks like the before column, we'll come on-site, root-cause it, and leave you with a written exit report.