Common Linux Troubleshooting Techniques for Diagnostics
Linux is a rock-solid operating system, no doubt about it. For this reason, most of the cloud infrastructure that runs online uses Linux in the mix to help keep things running smoothly. As solid as the operating system may be, there are times when problems crop up.
Luckily, Linux has excellent diagnostic tools built right into it. On the network front, you can troubleshoot both remote and local connectivity problems right from the terminal by using text commands.
Alternatively, you can use a plethora of graphical applications to track down issues with your network and get them resolved. The same is true for restoring a Linux server back to use by using tools to repair the boot functionality. Below is a list of common tools that can help you resolve many common Linux issues relating to networking, hard disks, and other hardware issues.
In this article, we want to look at some common troubleshooting techniques and diagnostic applications to keep things sailing smoothly as they relate to the Linux+ certification and many other Linux-based certifications too.
We want to look at some basic network commands that you must understand to write the Linux+ exam and for troubleshooting on a Linux system in general.
Any basic network troubleshooting starts with the ping command, no matter which operating system or platform you use. To use it in Linux, simply drop into a terminal and run the command with a target IP address. Below is an example of how to use the command if our target IP address is 192.168.1.1.
Ping in Linux runs indefinitely, so you have to CTRL+C to stop the command. If you would like ping to act more like it does in the Windows operating system, then you can run it like this:
ping -c 4 192.168.1.1
Running it with a -c switch enables the routing compartment identifier, while the 4 tells ping to run four times before stopping.
If you find your network is unreachable and that your computer/server is also unreachable to the rest of the network, then you might be dealing with a local network issue.
ifconfig and ip
In modern Linux distros, ifconfig is deprecated, and ip is used instead. We’ll run through both to show you how to achieve the same results below.
If you are sure your network cable is connected and that you have a physical connection to your network switch or router, then you can check out the system configuration of your local device. Luckily, there is an easy way to do this from the command line in Linux. All we need to do is drop into the terminal and type in the following:
You will get an output that highlights all your active network connections. If you are using a wired connection, then you should see an eth0 or eth1, depending on your setup and how many network cards you have. If you are using a wireless card, you can look for wlan0 as your wireless adapter.
In our example, we are looking at eth0, and our IP address is 192.168.1.111. We removed the mac address from this output, but you can find yours using this command:
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.1.111 netmask 255.255.255.0 broadcast 192.168.1.255
inet6 fe80::3eb7:dd41:2b6d:1e28 prefixlen 64 scopeid 0x20<link>
ether ############ txqueuelen 1000 (Ethernet)
RX packets 6163515 bytes 6369685703 (5.9 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2803165 bytes 438330018 (418.0 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
The ip command is available on newer Linux systems and offers a lot of information about your network settings. Below are some examples of how to use it.
Try it out by typing:
The result looks like this:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:1a:2b:3c:4d:5e brd ff:ff:ff:ff:ff:ff inet 192.168.1.111/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0 valid_lft 86398sec preferred_lft 86398sec inet6 fe80::1234:5678:9abc:def0/64 scope link valid_lft forever preferred_lft forever
3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether 01:2b:3c:4d:5e:6f brd ff:ff:ff:ff:ff:ff inet 192.168.1.110/24 brd 192.168.1.255 scope global dynamic wlan0 valid_lft 86398sec preferred_lft 86398sec inet6 fe80::1b2c:3d4e:5f6g:7h8i/64 scope link valid_lft forever preferred_lft forever
This will display all of the details for all your adapters. If you want specific details on a network interface, then you can try the following:
ip addr show eth0
The output looks like this:
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:1a:2b:3c:4d:5e brd ff:ff:ff:ff:ff:ff inet 192.168.1.111/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0 valid_lft 86398sec preferred_lft 86398sec inet6 fe80::1234:5678:9abc:def0/64 scope link valid_lft forever preferred_lft forever
If you have a connection to your router but you can’t seem to connect to the internet or another target computer on another segment of your network, then you can use the traceroute command. A common technique when troubleshooting internet issues is to try and contact a Google DNS server. This is because Google has incredible uptime, so the chances of this target being offline are slim. From the terminal, type the following:
You will get an output showing all of the different hops your data takes to get onto the internet. If your connection is blocked or failing on its way to the target, you will see which device is causing the issue along the way.
Boot Issue Troubleshooting
Not all Linux boot issues are catastrophic, which is a good thing! Sometimes, you may want to find out what an error message is pointing to or why a service is failing on boot - anything that makes the system run less than optimal.
In most cases, you can find some great information about your system by looking at the boot.log file on your computer or server. Some files require root access, so you may need to run the sudo command before these examples. (And know what your root password is.) If you do not have access to a graphical desktop, then you can easily read the contents of your file by simply typing the following from a command line.
This will output the contents of the log file by using the concatenate command (cat). If you wanted to see only the newest or the oldest contents of the file quickly, then you could type either of the following:
Head shows the first ten lines of the file:
Tail for the last ten lines of the file
The same goes for the /var/log/messages file, as it can also contain helpful hints about why your system is experiencing boot issues. If your system is running a later version of Linux and you can get to a command line, then you can run the command journalctl.
Running this command gives you an output from the SystemD logs, and it can help you pinpoint the exact issue plaguing your system.
One of the advantages of Linux for administrators is that it stores so much information about the current state of the machine it is running on. This is valuable when you run into issues preventing the system from booting up.
Even better, most of the log files are stored as plain text, so even if you are unable to boot the system up into a shell environment, you can still do some investigative digging on the system.
File System Troubleshooting
Sometimes, you might experience a hard disk failure while using your Linux machine. This is terrifying enough on most systems, but in Linux, the perception is it can be especially tough to deal with. This is especially true if you are unfamiliar with the terminal prompt and you have no access to a graphical interface.
However, there are a lot of different tools that we can use to chase down the culprit and allow us to attempt file repairs and configuration changes to non-working storage volumes.
The most commonly used file repair tool for hard drive issues is fsck (File System ChecK). With this tool, you can run file integrity checks, health checks, and use many other useful features. If you have a hard drive you wish to scan with fsck, then you will need to identify its mount point. To do this, we will use the df command:
This command will output the current mount points of your system in a human-readable format. Here is an example of the output:
Filesystem Size Used Avail Use% Mounted on
udev 3.8G 0 3.8G 0% /dev
tmpfs 784M 29M 755M 4% /run
/dev/sda1 218G 56G 151G 27% /
From this output, I can see that my primary drive (/dev/sda1) is mounted as the root drive (/). If I wanted more information about my hard drive, I would use the parted utility to view the partitions on it with the following:
parted /dev/sda1 'print'
Which outputs the following for example:
Model: Unknown (unknown)
Disk /dev/sda1: 238GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Number Start End Size File system Flags
1 0.00B 238GB 238GB ext4
From this output, you can tell which number the hard drive is listed as, how much free space it has, and the format the drive has been set up with. All this information can help identify problematic drives when you have issues with your file system.
Permission issues are another regular issue you may encounter when working on Linux systems. Commands like chmod, chown, and chgrp help modify access control lists and ownership to rectify "access denied" errors. The sudo command also grants temporary escalated privileges for executing restricted system tasks.
This sets individual file/directory permissions for the owner (u), group (g), or public (o).
chmod u+x file.txt
Adds execute permission for the file owner, allowing you to open the file.
This command modifies the associated group ownership on a file or directory.
chgrp admins file.txt
Assigns file.txt to the admins group, allowing all users in that group to open the file.
This temporarily elevates privileges to root or another user to run commands.
sudo apt update
This command runs apt update as a superuser to refresh package data, which is not accessible without root access.
Mastering these four commands allows fine-grain administration of Linux permissions, tackling issues like blocked access or permission errors that could stop services or other system processes. Well-tuned access controls keep a Linux system running smoothly and securely.
Diagnosing Hardware Failures
Unstable memory, failing drives, and faulty CPUs cause stability headaches on any system, but Linux offers numerous built-in commands for detailed hardware health checks to pinpoint issues quickly.
Extracts complete system component inventory, specifications, and the status of the devices.
This displays firmware versions, storage capacity, bus speed, and more.
This command lists all connected block devices like physical disks and partitions.
This checks all the current storage devices in a system, including formatted partitions.
This command reports processor name, architecture, socket count, plus NUMA (Non-Uniform Memory Access) configuration.
This is useful for determining CPU bottlenecks or sizing compatibility.
This command shows memory usage statistics critical for diagnosing shortages.
This gives you a human-readable output that displays total, used, free, cached, and swap memory.
When troubleshooting, you can also use live monitoring tools like htop to check system logs frequently for hardware errors.
Network problems often boil down to DNS resolution failures. Commands like dig, host, nslookup, and good-old ping allow you to query DNS servers to isolate the culprit. Unstable nameservers may require attention, as may incorrect hostfile entries or bad DHCP-assigned DNS settings.
This is a versatile DNS lookup tool that returns detailed DNS records like IP addresses, mail exchanges, TTLs, and more.
This command is a simple DNS lookup utility to find the IP address associated with a domain.
With nslookup you can query domain name servers to diagnose DNS issues and map hostnames/IPs to correct connectivity issues.
Tracking Down Performance Problems
Sometimes, Linux systems run slowly even when you have enough RAM and hard drive space. In these cases, don't forget traditional performance troubleshooting! Monitoring utilization with top, htop, and glances locates stressed components. iostat, vmstat, netstat, and Linux perf checkpoints measure resource saturation. Identify and kill processes, upgrade overloaded components, and optimize resource-hungry daemons to restore speed.
Here are some commands to help find issues that are affecting performance on your system:
top: Displays a dynamic real-time view of active processes, sorting by CPU, memory, and other usage metrics. It shows you a constantly updated task manager with the most resource-hungry apps.
htop: The same as top but is more interactive and allows scrolling/sorting processes and killing them directly. Easy to identify processes that are hogging resources.
glances: Presents system health overview using graphs and gauges for CPU, memory, disk, network, and processes. It can help you spot potential bottlenecks at a glance.
iostat: Reports per-disk input/output statistics like read/write speeds, transfers, and utilization. You can use it to help you determine slow disks dragging down overall performance.
vmstat: Analyzes virtual memory usage, paging, swapping, and cache misses. It can signal memory issues forcing disk paging and slowing down your system.
netstat: Displays network connections, traffic, and port listening status. It is useful for connectivity and bandwidth troubleshooting.
perf: Advanced analyzer profiles CPU and memory by specific processes/functions. This can help you pinpoint optimization hot spots in code or applications if you have issues with a specific program on a system.
Using these tools provides you with a ton of information about optimizing your Linux system. You can identify processes that are hogging resources, find out if your system is under-specced for its workload, fine-tune how resources are being used, and optimize settings to get the most out of your hardware.
Using Troubleshooting Techniques to Solve Real Problems
Let's look at a fictional scenario where you must use some troubleshooting commands to deal with an all-too-real problem. The steps below have been simplified to show the basic steps you could follow when dealing with a bad system volume.
Imagine walking into work on a Tuesday morning, and suddenly, frantic users flood the help desk. The corporate file share that holds all departmental files and home folders suddenly vanished without warning!
You log into the storage server and, sure enough, the /srv/shares volume that hosts these critical directories is no longer mounted. Before you panic, try to remount the volume with:
mount /dev/disk/by-uuid/aaaaaa-bbbb-cccc-dddd-eeeeeeee /srv/shares
Unfortunately, you are greeted by input/output errors from the underlying storage, something like this:
As part of your troubleshooting, you quickly move to unmount the volume fully with:
Next, you probe the filesystem superblock with:
It’s not good news. Flags reveal filesystem corruption. Next, you try to recover the data and run:
You spend 15 tense minutes while it attempts repairs before exiting with an unfriendly verdict - the data cannot be recovered.
Luckily, you have been backing up this volume’s data, and since there hasn’t been much activity on that share since the nightly backup, there is no major loss in data. You restore the files from last night's snapshots to a fresh volume:
Next, you redirect users and services and have the department shares operating again in under an hour. Crisis averted!
Since your troubleshooting steps confirmed a hard drive issue, you notify the hardware team and ask them to replace the damaged storage volume.
As we have seen, plenty of great troubleshooting tools are available in Linux, but this is merely scratching the surface regarding what can be done with this operating system. The fact that Linux is free to use is just the cherry on top.
If you are preparing to write your Linux+ exam or any other Linux certification, then there are plenty of lessons that you can learn from just playing with the command line terminal while you go through the practical exercises that you need to understand for the exam. This will help you understand the Linux OS better and reduce the time it takes you to prepare for your Linux+ exam.
Not a CBT Nuggets subscriber? Sign up and start learning today.
delivered to your inbox.