| technology | networking - Nick Matveev
7 Steps to Resolving Common Virtualization Issues
VMware is an expansive platform with a tremendous amount of capability, which is typically great. Except when it comes to troubleshooting problems. When things go wrong, there are many different things you could try. But really it's about finding a good starting point.
Let's look at a simple step-by-step troubleshooting method that I use to address slow performing applications in vCenter environments.
Here's a little back story. I work for a government office that does Computer Assisted Mass Appraisal (CAMA for short). Our entire infrastructure is VMware based. The software runs on the Windows Server operating system. The most critical piece of the CAMA system is the SQL database, which constantly gets queried and rewritten throughout the day. For that reason, when the new CAMA system was installed and subsequently started performing very poorly.
All eyes were instantly fixated on the VM that runs the SQL database in question. This CAMA system is a very expensive piece of software. A team of specialists from the vendor came to assess the situation. Eventually, we found out that it wasn't the VM in question but rather problems with the SQL database structure that were causing the performance issues.
The steps outlined below are the exact steps we took to troubleshoot the poorly performing SQL database. The optimization did improve the performance on the poorly written SQL queries. I went on to use these steps countless times to troubleshoot other problems within our environment.
1. Reboot the Machine
In my 10 years as a helpdesk tech I have seen more problems solved by rebooting the machine than everything else put together.
When you perform a reboot, the RAM gets completely wiped. This is the easiest — and sometimes only — way to get out of infinite loops, eliminate redundant data, clean up memory leaks, and fix different Kill states.
2. VMotion the Virtual Machine
If you are managing your VMware environment with a Vcenter, there is a good chance that you have clustered your resources. It is a good idea for many reasons. One of them is that VMotioning your VM is a great troubleshooting step for a lot of problems. In this case, you are allowing a different set of hardware the opportunity to perform the same function. This can eliminate the possibility of a problem with your hardware.
Much like a reboot this is another way to clear out different Kill states, both in the Operating System and in the VMware environment. There is also a chance that the host you are currently running on is bogged down by other VMs — and moving it will clear up more resources for the VM you are troubleshooting.
3. Get Rid of Unnecessary Snapshots
VMware snapshot is a fantastic technology that you should be using to safeguard yourself from unexpected Kills. I have worked many different cases with VMware support staff. In many situations, they suggest taking a snapshot of the VM before applying changes. This does, however, come at a cost when it comes to performance.
Each time the storage is accessed, it must first try to find the information within the snapshot before it tries the main storage. When you add more than one snapshot you further degrade performance by forcing your storage to cycle through all of them. The rule of thumb is the more snapshots you have, the slower your VM will run. This is particularly troublesome for VMs that do not have pre-allocated storage, which in many modern Storage Area Networks is the default mode of operation.
4. Update the VM Version and VMware Tools
VMware tools use a set of drivers that helps the ESXI host communicate with the operating system of the VM. If you are running an older version, there is a chance that you are not optimizing the performance of your VM. The VM version upgrade accomplishes a similar task. It helps the VM make use of the new functionality included with the improved ESXI/VCenter versions. The improvements are often indirect and mostly come from better utilization of technology already present on your server.
5. Change the Network Adapter Driver
The default network adapter for Vsphere 6.7 is E1000e. If your VM was created on an older version of ESXI or VCenter, there is a chance that it is using E1000 or VMXNET. Switching to E1000E has improved performance on several occasions. There is also a host of other nagging issues such as losing connection after a reboot or a VMotion that this will fix.
6. Check the Performance Monitor Inside the VCenter Web App
If the VM you are running is a Windows Server, the natural first place to look during a performance issue is the Performance Monitor that comes with Windows. The VCenter web application has its own performance monitoring tool on each VM and host. There are many problems that you can identify using this tool — that the Windows Performance Monitor will simply not be able to see.
For example, "memory ballooning that causes memory swapping" is a common problem that the Operating System has no way of detecting.
7. Add More Resources if Possible
Unfortunately, most people will completely skip the aforementioned steps 1-6 and try to add more resources as their first step toward solving performance issues. Truth be told, there will be a lot of cases where this is the solution. The problem with that approach is that you often end up overprovisioning your VMs without proper utilization.
Before you add any resources, you should look at the current utilization and make sure that adding more resources will not put a strain on the rest of your environment. As a rule of thumb, usually you add RAM first. If that does not work, you can try adding more CPU cores. If your environment has both SSD and SATA drives, it is a good idea to try migrating all your datastores there to see if that can help resolve the issue.
The Basics: Check Software and Databases for Logical Issues
This is a step that ultimately shows us what the real culprit was in the situation I described earlier. We were using a SQL database that had a large amount of redundant data and tables that did not always have cohesive fields. This created a tremendous amount of overhead for the large number of queries that the software was running. We migrated from a different environment and our data was converted from one software solution to another. The migration was not done properly.
The Basics: Check Event Viewer for Any Warnings or Errors
Another good step for troubleshooting is to check the windows event viewer for problems. The "Application" and the "System" logs are the first place to look. Unfortunately, there are a lot of benign Kills and warnings found in those locations. You will need to do a bit of research to see what they mean — and if they can be affecting your software.
Specific software and windows server functions also have their own subsection within the Event Viewer.
Every troubleshooting session usually boils down to finding pieces of a puzzle. The more issues you can rule out, the more likely you are to find the root of the problem. There are many great resources for locating information on just about any VMware problem imaginable.
The rule of thumb is, "If you are running into a glitch, there are many others who have already run into this and fixed the problem". Search for solutions using community sites where VMware users congregate. Become a member yourself and contribute when you see a user who is experiencing a problem you know how to fix. Best of luck on your VMware journey.