The objective of stress testing is to,
determine whether a system can recover from an excessive demand on it's resources,
determine whether a system can continue operating with decreasing resources.
Excessive demand can occur when too many users access the Citrix server and this can lead to user sessions freezing. The server may also stop responding and after a while may recover when some resources become available again or the server may need rebooting.
In a stress test perspective, it could be argued that the point just at which the server stopped responding once put under increased load, is the stress point of the server. However, the server may start responding again when resources become available again, so is this the stress limit? Or is it when the server doesn't recover and needs a reboot? What about if it does need a reboot and works fine once rebooted?
Stress testing isn't just about hitting the maximum and watching the system break completely, it's about ensuring that the limit reached can be recovered from.
"Well, we put 80 users on the server and it crashed. We had to rebuild the server because it wouldn't reboot. So 80 users is the stress level for the server."
Such sentiment doesn't confer stress testing instead it advocates testing to destruction. When I've done stress testing, I've taken server loads up to limits where the users would have their sessions freeze as the server itself becomes unresponsive. Eventually the server has recovered, as some user sessions have died, freeing up resources.
This is the stress limit of the server and pushing the load higher is just trying to test to destruction, such as the server itself having to be rebooted to recover.
I use the following analogy to highlight stress testing, where a particular SUV (Sports Utility Vehicle) underwent a serious of tests, one test was used to determine it's maximum speed. The SUV manufacturer was aware that there could be a point where the engine just wouldn't be able to cope and could explode.
The stress limit wouldn't be when the engine exploded, as there would have been no way to recover from this, as with the engine blown would need to be rebuilt or replaced. The stress limit would therefore be when the engine was nearing it's top speed just before it blew up. As at this speed the engine is still working but under incredible strain.
During testing they found that the tyres exploded at 145mph and the vehicle became undriveable, as it had no tyres. It was also incredibly dangerous during the tyre blow out and an inexperienced driver could easily have lost control and had an accident.
Because the SUV manufacturer was now aware of the limit of their SUV which was when it's tyres exploded and not the engine exploding, they decided the stress limit for this vehicle was just before 145 mph, possibly 144mph. But to be on the safe side, speed inhibitors were put in place at a lower speed of 135mph, so the vehicle could never exceed 135mph and therefore never reach the tyres maximum explosion speed.
The SUV manufacturer put a safety level into their SUV design which was attained from stress testing. The same needs to be done to Citrix environments, whereby a limit set on user loads which can only be determined by stress testing.
When I'm asked why we need to do stress testing, I advise the main reason is to ensure you know your boundaries. That is what you can work to without it impacting requirements for a useable system.
With thin client technology you put all your eggs in one basket, where the basket is the server and the eggs the users. So you need to be pretty sure the basket doesn't break and cause all your eggs to smash.
The most important factor of stress testing is to simulate the extreme environment conditions that could cause the system to become unresponsive with a view to ensuring that the system can recover gracefully from these extreme conditions.
If a system cannot recover from the maximum hit on it's resources, then measures need to be taken to ensure that that maximum set can not be reached.
It's important to understand that it's not about breaking a system but recoverability when it comes to Citrix stress testing.