Container Fabric is New Relic’s in-house container orchestration and runtime platform. It runs on CoreOS Container Linux in Docker containers that are hosted on Dell hardware. However, CoreOS does not have a traditional package installer, is not a supported Dell platform, and only truly supports running applications inside Linux containers. Nevertheless, the Container Fabric team needed to be able to monitor and manage the underlying Dell hardware.

The solution? Well, as it turns out, containers are actually a very useful way to solve this issue.

Dell’s Linux software packages are designed to run on very specific distributions, which must have certain patch-levels installed. These tools also tend to get updated only long after new OS version or patches are released, so if your systems are at the most current release, you may experience unexpected bugs. Instead of using a configuration management tool to install the software on supported systems, or just living without these tools on unsupported systems, we decided to use containers to ensure that we can run this critical software on any system that can run a Linux-based container.

In this post, I’ll share how we run these Dell tools in containers so we can properly manage the underlying hardware hosts on which our container platform runs.

The software components we need

To start, we identified the Dell software tools that we would like to use. We decided on the following:

  • OpenManage Server Administrator: OpenManage is the Dell suite of tools that allows users and systems to do in-depth inventory reporting and configuration on Dell hardware.
  • Dell iDRAC Service Module (iSM): The ISM is a Dell service that runs on Linux and communicates with the embedded Integrated Dell Remote Access Controller (iDRAC). Among other things, this allows Linux to update important information in the iDRAC, including the system’s true hostname.
  • Dell System Update (DSU): DSU is a tool that allows firmware updates to be applied to Dell-supported hardware from within Linux. Each container includes the most recent releases of these Dell tools. In addition to configuring and monitoring our Dell hardware, we also want to be able to monitor the hardware for faults. With this in mind, we included the command line utility check_openmanage in each container.

If you want to try these out on your own systems, you can find a repository for these containers on GitHub. Directions for launching each of these containers is included in the README files.

Using the OpenManage container

We use the OpenManage container as a simple command-line tool to perform such tasks as inspecting the chassis information of a Dell host, clearing a host’s Embedded System Management (ESM) log, and checking the overall health of a system.

For example, once the OMSA container is up, we can run the following commands:

  • To check chassis details:
    # docker exec omsa omreport chassis info
    
    Chassis Information
    
    Index                                    : 0
    Chassis Name                             : Main System Chassis
    Host Name                                : dell-host.example.net
    iDRAC8 Version                           : 2.30.30.30 (Build 50)
    Lifecycle Controller Version             : 2.30.30.30
    Chassis Model                            : PowerEdge R430
    Chassis Lock                             : Present
    Chassis Service Tag                      : 225XHC2
    Express Service Code                     : 0000924034
    Chassis Asset Tag                        : Unknown
    Flash chassis identify LED state         : Off
    Flash chassis identify LED timeout value : 300
  • To check the overall health of the system:
    # docker exec omsa check_openmanage
    
    OK - System: 'PowerEdge R430', SN: '225XHC2', 128 GB ram (8 dimms), 1 logical drives, 2 physical drives

Using the Dell iDrac Service Module (ISM)

When we run this container, the iDrac is updated with useful OS-level information, including the correct hostname for the Dell machine.

(Note: The iDrac module requires that you install the usb-storage kernel module, since this is how these tools interact with the iDRAC.)

For example:

# sudo modprobe usb_storage
# docker run --privileged -d -P -v /var/log:/var/log --restart=always --uts=host --net=host --name=ism dell-host.example.net/container-fabric/dell-ism24:latest

ISM server information

(Note: The Linux distribution information returned by ISM matches the container OS—CentOS 7—and not the host OS—CoreOS.)

Using the Dell Software Update (DSU) tool

You can run DSU both interactively and non-interactively. Running the container using the defaults executes a non-interactive reporting of the hardware inventory on the system using the dsu –inventory command. Many DSU commands take a long time to complete, as they have to crawl most of the system hardware.

(Note: The DSU also requires you to install the usb-storage kernel module.)

For example:

# sudo modprobe usb_storage
# docker run --rm -ti --privileged -P --name=dell-dsu dell-host.example.net/container-fabric/dell-ism24:latest

Verifying catalog installation ...
Installing catalog from repository ...
Fetching dsucatalog ...
Reading the catalog ...
Installing inventory collector ...
Fetching invcol_WF06C_LN64_16.12.200.896_A00 ...
Verifying inventory collector installation ...
Getting System Inventory ...

1. BIOS  ( Version : 2.0.1 )

2. CPLD  ( Version : 1.0.3 )

3. PERC H730 Mini Controller 0 Firmware  ( Version : 25.4.0.0015 )

4. Firmware for  - Disk 0 in Backplane 1 of PERC H730 Mini Controller 0    ( Version : DK04 )

5.  iDRAC  ( Version : 2.30.30.30 )

6. 13G SEP Firmware, BayID: 1  ( Version : 2.23 )
 
The true utility of DSU lies in its ability to detect and apply required firmware updates to the system. You can easily do this by running DSU without any options. The example below updates only the PERC firmware, but the DSU can also update all firmware if desired.

# sudo modprobe usb_storage
# docker run --rm -ti --privileged -P --name=dsu dell-host.example.net/container-fabric/dell-dsu:latest dsu

Verifying catalog installation ...
Installing catalog from repository ...
Fetching dsucatalog ...
Reading the catalog ...
Installing inventory collector ...
Fetching invcol_WF06C_LN64_16.12.200.896_A00 ...
Verifying inventory collector installation ...
Getting System Inventory ...
Determining Applicable Updates ...

|-----------Dell System Updates-----------|
[ ] represents 'not selected'
[*] represents 'selected'
[-] represents 'Component already at repository version (can be selected only if -e option is used)'
Choose:  q - Quit without update, c to Commit,  - To Select/Deselect, a - Select All, n - Select None

[-]1 13G SEP Firmware, BayID: 1
 Current Version : 2.23 same as : 2.23

[ ]2  iDRAC
 Current Version : 2.30.30.30 Upgrade to : 2.41.40.40

[ ]3 Firmware for  - Disk 0 in Backplane 1 of PERC H730 Mini Controller 0
 Current Version : DK04 Upgrade to : DK05

[ ]4 BIOS
 Current Version : 2.0.1 Upgrade to : 2.3.4

[*]5 PERC H730 Mini Controller 0 Firmware
 Current Version : 25.4.0.0015 Upgrade to : 25.5.0.0018

Enter your choice : c
Fetching SAS-RAID_Firmware_2H45F_LN_25.5.0.0018_A08 ...
Installing SAS-RAID_Firmware_2H45F_LN_25.5.0.0018_A08 ...
Collecting inventory...
..
Running validation...
...
The system should be restarted for the update to take effect.

Done! Please run 'dsu --inventory' to check the inventory

Please reboot the system for update(s) to take effect

# shutdown -r now

Containers make Linux software more portable

Containers are a great way to package software that has very specific and deep dependency trees that make it hard to install and manage natively. In Container Fabric, we’re able to package, deploy, and—most important—use these hardware-monitoring tools as needed in a flexible way, even though they can’t be run natively on CoreOS Container Linux and aren’t supported by Dell in that environment.

Hopefully, this approach will help others with their hardware monitoring needs, and remind you that containers can make almost all Linux software much more portable than even the application’s original authors might have intended.

Sean is a Lead Site Reliability Engineer at New Relic. He is a long-time system administrator and operations engineer who has lived in places ranging from Alaska to Pakistan. View posts by .

Interested in writing for New Relic Blog? Send us a pitch!