Why VPNs Suck and How to Fix Them
It is the middle of the day and you are working on a difficult issue that requires a bit of attention. A user contacts you because they cannot get on the VPN. Maybe it's a password issue or they are at a location that blocks VPNs. Perhaps, you are lucky and Tier 1 fielded it — but then got unlucky after they escalated to you.
VPN-related issues are never at the top of our to-do lists because VPNs should work, at least until they don't. Here's a look at why VPNs are frustrating at times — and how you can help them work better.
Lack of Protocol Diversity in VPN Implementation
A good Point-to-Site (P2S) or user VPN implementation gives end users a couple of connectivity methods, even if the options are transparent and automatically negotiated. Lacking these options can cause problems. IPsec sometimes has issues with hotel internet services, particularly when they use lower-end or home routers. If you do not have an alternate option, your end users simply will not be able to connect. It does not make IT look good when an end user has to check out of a hotel and find one with Internet that is more compatible with your VPN.
As an example, Palo Alto Global Protect and Cisco AnyConnect support both IPsec and SSL VPN. OpenVPN is another that is great about having UDP and TCP options and transparent selection of these by default. The legacy Cisco VPN Client primarily used IPsec, but allowed for a TCP option.
IPsec is still one of the preferred protocols, particularly for S2S but also for P2S. It operates lower on the OSI Layer so it has less overhead. It is a standardized protocol, so it has interoperability between vendors. With that said, it often has some difficulties with NAT and DS-Lite (IPv6 Dual Stack Lite). Those issues can range from performance to flat out not working.
On the other hand, SSL VPNs tend to work better with NAT since stateful inspection devices like firewalls understand the traffic and have for some time. They perceive the traffic as being the same as any other TLS traffic and typically do not interfere.
The Fix: Have a Few Options
Ensure your solution provides a few options of VPN protocols when applicable. In the days of the legacy Cisco VPN client, this involved opening up a TCP port and manually configuring a connection in the client. Today, many VPN solutions automatically negotiate a couple of protocols and it is just a matter of ensuring the appropriate ports are opened.
Many organizations have wide open ACLs to their VPN termination point that should cover opening any necessary ports. If you do restrict access to that IP, please ensure that the right ports and protocols are allowed to connect. For IPsec, ensure at least UDP/500 (for IKE) and Internet Protocol 50 (ESP) and Internet Protocol 51 (AH) are open. Internet protocols are often confused with TCP/UDP ports.
Ensure you read your vendor's documentation on the ports it uses — and protocol allowed. Also, try to manually test each allowed protocol to ensure it works. If your VPN solution automatically fails over to different protocols you may need to Wireshark the connection to validate.
Packet Sizing and Encapsulation over VPNs
One set of issues a network administrator will run into sometimes on new VPN deploys is related to Maximum Transmission Unit (MTU) and Maximum Segment Size (MSS). By default the MTU and MSS are usually set appropriately and able to accommodate VPN tunnels. MTU is the maximum frame size and is typically 1500 bytes. The MSS excludes the IP and TCP headers, which are each 20 bytes for IPv4. To match up to the MTU, an MSS would traditionally be set to 1460 byte to meet the 1500 byte MTU
Because we are encapsulating IP packets into an IPsec packet, they could exceed that 1500 byte limit due to the extra IPsec header. To account for the overhead and various scenarios, Cisco ASA defaults to an MSS of 1380. Other vendors allow you to set this per interface such as a VPN interface, or are smart enough to do this dynamically to IPsec traffic or other encapsulated traffic.
When this value is set too low, such as 1280 for IPv4 traffic, it leads to more packets being sent. For example, if you have a 1300-byte packet, it now becomes two packets whereas an MSS of 1380 would have allowed it to stay one. If you set it too high, certain packets that reach the MTU will get dropped because they are too large. Researching the MTU and MSS best practices for the vendor and platform you use is always recommended when implementing a VPN solution or taking over administration of one.
The Fix: Use Best Practices
Start with the best practices for your VPN or Firewall product. Cisco ASA defaults to a TCP MSS of 1380 for IPv4, which is usually the best option assuming 1500 byte MTU while Palo Alto and other newer generation solutions may do this automatically or may just require the tunnel interface MTU to be 1460.
What we want to avoid is incorrectly setting this to disable or an extremely low number such as 1280. However, you want to ensure it is not set too high, such as 1460, when it needs to have more overhead and the device does not automatically "clamp down" the packets.
VPNs vs Business Design Decisions
Some of the biggest VPN issues are related to business decisions. An example of those decisions are where to put VPN termination points. Many times, these are tied to budgetary constraints. If you have a singular data center, the decision typically makes itself as to the primary location.
Do you have a Disaster Recovery (DR) site? Is the VPN there or was there no budget for it? Do you have users in another country but making them VPN into a termination point halfway across the globe in your data center?
The Fix: Point of Presence
Ideally, we want to provide users with a VPN termination point that is somewhat geographically close to them. Sometimes this is called a Point of Presence (POP). Retries and latency can plague VPN protocols and the underlying encapsulated traffic. This does require some sort of backhauling, usually via private circuit. It can be backhauled via IPsec S2S, but measures should be taken such as ensuring the POP is well peered to the other end of the S2S tunnel. The ability to failover carriers due to peering or other latency issues would be recommended.
If you have a cold DR site, try to make the VPN solution active so that users can use it as a backup. This allows maintenance on the primary VPN solution with users still able to connect when necessary. It also provides a great test bed for applying updates first.
VPN Training and Self-Servicing
Password issues should not exist, right? There is nothing worse than a constant influx of support requests that end up being password issues. Password expirations compound that. End users often get vague messages that seem like a network issue, but end up to be a bad password, expired password or locked out account.
They cannot tell the difference due to the limited feedback the VPN client is giving them. Many times the user has been provisioned a laptop with the VPN client already on it. Other times, they are simply sent some brief instructions on how to install and use but that's it.
The Fix: Documentation
Documentation should be provided for the user to help them troubleshoot their issue. Power users and road warriors are usually fairly receptive to documentation because their schedules typically do not give them much time to set aside to work with IT support. Such documentation should include how to install/reinstall and reconfigure the VPN software.
Good documentation also includes some common errors and how to correct them. When working offline with users via email or chat, reference your documentation to help make them aware of it. When working directly with them, summarize and point to snippets or quotes of it instead of just sending them a link and telling them to read it. This goes a long way to making them aware of the documentation and that it is helpful. They are more likely to read it first the next time.
Implement a self-service portal. Self-service portals are an ideal way to help mitigate password issues. They can take a proactive approach to expiring passwords by warning the user ahead of time. Ensure they have enough notice to do something about it though. Ideally, this would be more than seven days in advance to account for users on vacation when the notices go out.
Often, VPNs work fine, but if your environment has recurring issues, consider some of the above cases and solutions. Sometimes, it is just a perception and end user training issue. Other times there are minor configuration issues that need to be tweaked. The better you can get your VPN services streamlined, the more time you will have to focus more mission-critical tasks