RAID disk monitoring with postfix and mailgun
The redundancy of RAID buys you time between disk failure and server failure. But a default RAID setup will happily function with a failed disk, until the next disk fails and your data is lost.
Email notifications need to be configured manually so you can intervene (replace a harddrive) after a disk failure.
At GPXZ we use RAID heavily to work with large datasets. This is how we handle monitoring of RAID arrays, using Mailgun to send email alerts.
These instructions use Ubuntu 20.04: different linux distributions may have config files in different locations.
Overview.
mdadm and smartd can’t send email directly: they instead pass emails to mail relay software running on your server. This software can send emails directly, but ensuring reliable email delivery is non-trivial, and it’s extra important to have reliable delivery for critical alerts. The postfix mail relay software can accept emails from mdadm/smartd and pass them off to an email API service such as mailgun.
So our setup will look like
mdadm/smartd → postfix → mailgun
Postfix email relay
Before getting started, log into the mailgun web UI. Under the SMTP tab of your domain’s settings you’ll see your mailgun SMTP domain (like smtp.mailgun.org
) and your SMTP login (like [email protected]
).
First install postfix
sudo apt install postfix
There are some options to select during installation:
- Choose
Satellite System
as the mailer type. - Use your server’s
$HOSTNAME
as the mail name. - Use your mailgun’s SMTP server as the relay host (e.g.,
smtp.mailgun.org
).
Create the file /etc/postfix/sasl_passwd
to store your mailgun credentials:
sudo nano /etc/postfix/sasl_passwd
with the following contents:
{mailgun smtp domain} {mailgun smtp login}:{mailgun smtp password}
If you’ve never used SMTP before you may have to reset your SMTP password. This won’t impact your mailgun API key or web login password.
The config file might look like
smtp.mailgun.org [email protected]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxx-xxxxxxxx
Lock down the permissions of the credentials file then load it into postfix.
sudo chmod 600 /etc/postfix/sasl_passwd
sudo postmap /etc/postfix/sasl_passwd
Configure domain mapping
sudo nano /etc/postfix/generic
with your domain and your mailgun SMTP server.
@example.com no-reply@[smtp.mailgun.org]:587
then load that into postfix
sudo postmap /etc/postfix/generic
Finally, configure postfix by adding these lines to /etc/postfix/main.cf
(or editing the corresponding lines for any settings that already exist). Replace smtp.mailgun.org
with your mailgun SMTP domain.
sudo nano /etc/postfix/main.cf
relayhost = [smtp.mailgun.org]:587
mydestination = localhost.localdomain, localhost
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_sasl_tls_security_options = noanonymous
smtp_sasl_mechanism_filter = AUTH LOGIN
smtp_tls_note_starttls_offer = yes
smtp_generic_maps = hash:/etc/postfix/generic
Reload this new config into postfix with
sudo systemctl restart postfix
You should be all set up to send mail from your server! To test it’s working, you can send a test email to [email protected]
with the mail
command:
echo "Test message from postfix" | mail -s "Test message" [email protected]
If you don’t get an email within a few seconds, something’s broken! Check your spam email folder, the Mailgun UI, and logs in /var/loc/mail*
mdadm
Now that we can send email from our server, the next step is to tell our disk monitoring software to use it. We’ll start with mdadm, which monitors the health of your RAID array.
Edit the mdadm config
sudo nano /etc/mdadm/mdadm.conf
and add/modify the MAILADDR
setting to the recipient address alerts should be sent to:
MAILADDR [email protected]
then do a quick test with
mdadm --monitor --test --oneshot /dev/md0
(where /dev/md0
is a RAID array). You should get an email in your [email protected]
inbox.
That’s all you need to do to have mdadm send email notification of any errors found during a check. However, it’s not uncommon for mdadm to be configured incorrectly and not be performing checks! So it’s worth checking regular checks are set up. Unfortunately this depends on your OS, but on Ubuntu 22.04 you can check for an entry under
sudo systemctl list-timers mdcheck_start
smartd
While mdadm will tell you when a disk has failed, there might be advanced warning of this in the disk’s SMART statistics. smartd is a service that can monitor these statistics and alert you if any fall out of compliance.
You may need to install smartmontools first if the smartd
command isn’t found:
sudo apt install smartmontools
Modify the configuration file:
sudo nano /etc/smartd.conf
The file contains lots of commented example configurations, plus potentially an uncommented line beginning with DEVICESCAN
. smartd can only handle a single DEVICESCAN
directive, so comment any existing lines out then add
DEVICESCAN -o on -H -l error -l selftest -t -M test -m [email protected]
This setting will do the following
-o on
: Enable monitoring.-H
: Check SMART attributes for pre-failure conditions.-l error -l selftest
: Check for errors as well as failed test results.-t
: Check changes in SMART attributes.-m [email protected]
: Send email alerts to this address.-M test
: Send a test email when smartd is started.
To test this setup restart the smartd service:
sudo systemctl restart smartd
You should get one email for each disk. You can leave config setting as-is, or remove ` -M test from
/etc/smartd.conf` to get email alerts only for errors (not for service restarts).