The first and the second articles of this series compare some performance details between JSF and Wicket. This third article will wrap it up with an updated comparison between JSF (MyFaces and Mojarra), Wicket, Tapestry, Spring MVC and Grails 2. It references some of the tests from the first two articles.

Choosing a web framework is a challenging task. There are several aspects to take into consideration, like usability, scalability, availability of documentation and much more. There is no "perfect" web framework out there, but instead, there are different web frameworks that try to achieve the best balance between what each one considers more or less important. For example:

  • Some web frameworks like JSF, Wicket or Tapestry provide built-in Ajax support, aiming to minimize or avoid writing extra JavaScript code. On the other hand, frameworks like Spring MVC or Grails 2 promote writing Ajax responses using Java or Groovy on the server side, and then they use those responses on the client side with JavaScript frameworks like jQuery.
  • There are different ways to deal with the state problem. With JSF or Wicket it is possible to associate state with components—you can create stateful and stateless components. On the other side, Tapestry is a component based framework too, but all its components are stateless. Action based web frameworks like Spring MVC and Grails 2 do not have such concepts, so all pages can be considered stateless too.

Therefore, these considerations require a balance from the performance perspective:

  • Built-in Ajax support imposes an overhead from the performance perspective but it makes it easier and faster to write rich client applications.
  • Automatic state management supports restoring and saving the page state when necessary, but it makes it easier to write reusable and dynamic components because the developer does not need to worry about storing the state into session manually.

Web frameworks like JSF, Wicket or Tapestry are more comparable, because all of them provides the same feature set (built-in Ajax support, validation, reusable components, and so on) in their own fashion. But the comparison becomes more difficult with action based web frameworks like Spring MVC and Grails 2, because in this case, the fundamental concepts behind their design are just different. Still, a reference about how each framework performs is important, in the same way that the price or expiration date for some groceries helps people to buy food. There are different types of apples, but an apple will never be a peach, even if both are delicious.

A look at MyFaces improvements

To begin, it is important to review what has happened in the last years (2011-2013) from the performance perspective with MyFaces Core. To see it clearly the following experiment was done.

CONFIGURATION

 

Processor: AMD Phenom x4

Speed:     2300 MHz

Server: Apache Tomcat 7.0.37

JVM version : oracle jdk1.6.0_30

JVM Options : -Xms128m -Xmx128m -server

WARMUP

 

loop.count: 200

thread.count: 40

include.logout : 1

include.delete: 1

booking.count: 1

include.ajax: 1

thread.delay: 20           

thread.deviation:10

rampup time: 7s

Http Request Client : Httpclient4

EXPERIMENT

 

loop.count: 100

thread.count: 40

include.logout : 1

include.delete: 1

booking.count: 1

include.ajax: 1

thread.delay: 0

thread.deviation:0

rampup time: 7s

Http Request Client : Java

 

The experiment creates 40 threads and each one of them runs 100 loops of the operations described in the test in the first article (get login, post login, ajax post search, get view hotel, post book hotel, ajax post cc number, ajax post cc name, post booking details, post confirm booking, get cancel booking, logout). There is no delay between loops.


On the left: 90% line of Response Time for Booking Application. On the right: Throughput (Iterations per Second) for Booking Application

It is clear there has been a significant improvement in MyFaces over time. Please note there are other web frameworks comparisons out there that used previous versions, but MyFaces has improved a lot since those comparisons were done.

It is worthwhile to mention the impact of the following performance flags over MyFaces:

  • org.apache.myfaces.CACHE_EL_EXPRESSIONS  --> always
  • org.apache.myfaces.CHECK_ID_PRODUCTION_MODE  --> false
  • org.apache.myfaces.SUPPORT_JSP_AND_FACES_EL --> false
  • org.apache.myfaces.VIEW_UNIQUE_IDS_CACHE_ENABLED --> true
  • Enable html compression in faces-config.xml using oam-compress-spaces.

To see the difference, the same test was run with all flags enabled and without them.


On the left: 90% line of Response Time for Booking Application. On the right: Throughput (Iterations per Second) for Booking Application

Speed Benchmark

  In the following experiments, we are going to review the speed of each web framework under the same conditions. It is important to remember that it doesn't matter which framework has the better response time. Instead, it's more interesting to check the ability of a web framework to deal with effects like concurrency and get an idea of the overhead involved in using a web framework against the fastest possible solution in Java.

Code speed with Load

This is the same experiment as in the first article, only this time with all the frameworks:

 

CONFIGURATION

 

Processor: AMD Phenom x4

Speed:     2300 MHz

Server: Apache Tomcat 7.0.37

JVM version : oracle jdk1.6.0_30

JVM Options : -Xms128m -Xmx128m -server

WARMUP

 

loop.count: 200

thread.count: 40

include.logout : 1

include.delete: 1

booking.count: 1

include.ajax: 1

thread.delay: 20           

thread.deviation:10

rampup time: 7s

Http Request Client : Httpclient4

EXPERIMENT

 

loop.count: 100

thread.count: 40

include.logout : 1

include.delete: 1

booking.count: 1

include.ajax: 1

thread.delay: 0

thread.deviation:0

rampup time: 7s

Http Request Client : Java

 

These are the results:

 


90% line of Response Time for Booking Application

The graph shows that the framework with the lowest 90% line response time is the application written using plain Servlet-JSP. The reason is the Servlet-JSP solution does not have any ThreadLocal instances, or synchronized blocks or access to volatile variables (memory barriers). In other words, the Servlet-JSP has the lowest 90% line due to the absence of the concurrency effect.

A measure of throughput will be useful to understand the consequences of the concurrency effect. If one iteration corresponds to running all these operations (get login, post login, ajax post search, get view hotel, post book hotel, ajax post cc number, ajax post cc name, post booking details, post confirm booking, get cancel booking, logout) just once, we get the following graph (the higher value the better):


Throughput (Iterations per Second) for Booking Application

Here are the details of the graph:

Web Framework

Average Response Time [ms]

Median [ms]

90% line [ms]

Throughput [iterations per second]

Average Response  Time per iteration using Little's Law [ms]

Servlet – JSP

2

2

4

211.15

4.74

Spring MVC – JSP

8

4

20

196.67

5.08

MyFaces 2.1.11

14

6

36

158.05

6.33

Wicket 1.4.22 session storage

15

5

29

142.37

7.02

Spring MVC - Thymeleaf

21

9

57

119.8

8.35

Mojarra 2.1.20

28

11

79

98.36

10.17

Tapestry

32

16

85

96.05

10.41

Wicket 6.6.0 session storage

29

10

76

88.04

11.36

Grails 2.2.1 GORM

62

41

140

50.82

19.68

 

The difference in throughput between Spring MVC-JSP (196.67) and plain Servlet-JSP (211.15) is small, which means the concurrency effect does not affect the ability of the system to process a load significantly, but it affects the response times.

One way to see it is to imagine that Servlet-JSP has fewer points where the control can change from one thread to another thread, and in that way, once the task starts the thread does not loose control until it finishes. If the change of control is done very quickly, the final effect is more or less the same throughput but a bigger dispersion of the response times. Note that the difference between the 90% line of Servlet-JSP and Spring MVC-JSP {4,20} is not proportional to the difference between the median values {2,4}.

Even if Wicket 1.4 has a better 90% line response time, MyFaces is able to handle more requests per second. Again, the reason is the concurrency effect. In this case, the way the application is done using Wicket helps to avoid ThreadLocal calls. But in terms of scalability a higher throughput with a good response time is preferred.

Spring MVC is a lot faster than Grails. In the Grails case, the developer uses Groovy as language, Groovy Server Pages (GSP), and Grails Object Relational Mapping (GORM), but conceptually it is still an action based web framework like Spring MVC, so from the developer's point of view, the only fundamental difference is that Grails syntax is more compact.

There is a big difference between using JSP and Thymeleaf for Spring MVC as a view layer. Spring MVC-JSP will always be the fastest combination, because a compiled JSP is just a servlet that writes to the response. Any other template engine will give a lower performance, but it will be easier for developers to write and maintain pages.

In the group of component based web frameworks, MyFaces is doing the best job, and it is only 20% slower than Spring MVC-JSP, but keep in mind that JSF has a full template engine like Facelets. Also, Spring MVC-JSP is quite different from JSF. The additional concepts provided by JSF like the lifecycle, stateful component tree using partial state saving, reuse through facelets templates and composite components or built-in Ajax support really pays off, because the overhead imposed on the application is minimal compared with the benefits it provides.

Tapestry, Wicket and Mojarra have more or less the same throughput. The interesting part is Tapestry provides a static component tree, which means once created it cannot be modified at runtime, following the principle "static structure, dynamic behavior." This means it is the responsibility of the developers to code the "dynamic behavior" and store the associated state into the beans. Theoretically, it should perform better but it does not, because in practice there are other things that affect performance even more.

Unfortunately, based on the previous information, it is not possible to conclude anything about the effect of a static component tree in performance or the relative weight of a stateful component tree against a stateless one. An in deep test must be done to uncover the truth.

JSF Stateless Views and View Pooling

Over the last years, MyFaces Core has improved its partial state saving algorithm and Facelets engine, making the component tree lighter and improving its memory management. Therefore, it is ideal to take MyFaces Core as a base and implement two prototypes:

  • MyFaces 2.1.11 + Stateless: Implement JSF 2.2 stateless mode partially, adding "transient" attribute to f:view tag and if the attribute is set, just prevent the state from being calculated and return null.
  • MyFaces 2.1.11 + View Pool: Instead, create a view at each request. A "view pool" is a space that store views already discarded, allowing them to be used in upcoming requests. The idea is reuse the existing state saving logic and implement an algorithm that can reuse a view once discarded. Note it assumes some important changes that conflict with the JSF spec and the improvement requires some additional
    parameters to be set. See the results below:


90% line of Response Time for Booking Application

Here is another way of looking at it.


Throughput (Iterations per Second) for Booking Application

Keeping all views stateless doesn't make a big difference. The reason is MyFaces can calculate the state and restore it very quickly, thanks to its partial state saving algorithm. The view pool idea looks better than making the views stateless, because it reduces the number of objects that need to be created at each request. But note that the Java garbage collector still is fast enough, so the difference between the base implementation and the prototype is just about 8%.

Code Speed with increasing load

Now let's see what happens if we run the tests, each time increasing the threads used while keeping enough memory resources.

CONFIGURATION

 

Processor: AMD Phenom x4

Speed:     2300 MHz

Server: Apache Tomcat 7.0.37

JVM version : oracle jdk1.6.0_37 for linux

JVM Options : -Xms512m -Xmx512m -server

WARMUP

 

loop.count: 200

thread.count: 100

include.logout : 1

include.delete: 1

booking.count: 1

include.ajax: 1

thread.delay: 20           

thread.deviation:10

rampup time: 20s

Http Request Client : Httpclient4

EXPERIMENT

 

loop.count: 100

thread.count: 10, 20, 30, 40, 80, 120, 160, 200

include.logout : 1

include.delete: 1

booking.count: 1

include.ajax: 1

thread.delay: 0

thread.deviation:0

rampup time: 1, 2, 3, 4, 6, 8, 12, 16, 20s

Http Request Client : Httpclient4

 

Each time the tests are run, the following steps happen:

  • Start tomcat server
  • Run warmup
  • Run experiment with 10 threads and 1 second rampup time, save the result and reset.
  • Run experiment with 20 threads and 2 second rampup time, save the result and reset.
  • .....
  • Stop tomcat server

Here are the results for the 90% line:


90% line of Response Time vs Number of Threads for Booking Application 100 loop and 512m


Throughput (Iterations per Second) vs Thread Number for Booking Application 100 loop and 512m

The graphs show that the conclusions in parts A, B and C are still valid even if the number of threads is increased.

The most interesting fact is the decrease in the throughput for Spring MVC. To the user, it seems like the server is suddenly stopping for a short time and then resuming the pending operations a few times. This suggests the garbage collector takes control, releases memory, and then  continues. As time passes, the server get slower and slower. Note the 90% line response time is not affected, but the throughput has a significant impact.

Unfortunately, this is evidence of an old known problem related to how Spring works. This is usually reported as a "java.lang.OutOfMemoryError: PermGen space", in many different situations, and the common fix is "increase the memory" or "restart the server every X amount of time". The problem is Spring continuously create proxy classes, filling up the memory space for the permanent generation.

In Java 6, theoretically it is possible to avoid the OutOfMemoryError, by adding these JVM flags:

-XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled

With these flags, the garbage collector now can take a look the permanent generation space and remove the unused proxy classes from memory, but according to the tests that were done, it does not make any difference from the performance perspective (throughput for 200 thread is 122.79 against 115.6 without the flags).

It is easy to imagine why this happens. The memory is filled up with hundreds of thousands of small proxy classes and objects, and when no more memory is available, the server has to do a titanic job to calculate if there are used instances of a proxy class. At the start the garbage collector frees small chunks of memory but soon memory fragmentation becomes evident, so the garbage collector needs to do a lot more. The GC algorithm was never designed to deal with this scenario. In the end, the CPU spends more time cleaning up the memory than doing the real job.

This is indeed a very serious problem. If you think that using Spring MVC-JSP will give you a better performance, think twice, because MyFaces with a CDI implementation like OpenWebbeans will do a better job. But please note that what's wrong in this case is not using a specified proxy implementation. Instead, the problem resides in the implementation that created multiple proxy classes per each request.

Memory Benchmark

It is important to keep memory allocation to a minimum because allocating memory takes time and each call to the garbage collector (GC) is expensive and in the end slows down the web server.

The following experiment was done with a profiler attached:

CONFIGURATION

 

Processor: AMD Phenom x4

Speed:     2300 MHz

Server: Apache Tomcat 7.0.37

JVM version : oracle jdk1.6.0_30

JVM Options : -Xms128m -Xmx128 -server

WARMUP

 

loop.count: 20

thread.count: 5

include.logout : 1

include.delete: 1

booking.count: 1

include.ajax: 1

thread.delay: 0 

thread.deviation: 0

rampup time: 1s

EXPERIMENT

 

loop.count: 20

thread.count: 5

include.logout : 1

include.delete: 1

booking.count: 1

include.ajax: 1

thread.delay: 0

thread.deviation:0

rampup time: 1s

 

Here are the results:


Memory allocated in kilobytes for a fixed load in Booking Application

There is a correlation between memory usage and code speed, but note in some cases like Wicket 6.6.0 that's not necessarily true. The reason is in some cases the garbage collector can reclaim unused memory quickly if it is used locally. Remember theoretically it is possible to get performance improvements by increasing the memory usage, and that's something completely reasonable.

Here are the results for the number of objects allocated:


Objects allocated in thousands for a fixed load in Booking Application

In MyFaces, the difference between stateless and normal mode is minimal, because the partial state saving algorithm is very good at identifying the stateful part, and the size of that part is small. The view pool creates about 20% fewer objects, which means the size of the views is relatively small compared with the memory used by the framework in other tasks (managed beans, iterators, string manipulation, rendering ...).

Servlet-JSP uses the least possible amount of memory, but the surprising part is MyFaces uses 32% less memory than Spring MVC-JSP, and in the best case the view pool uses 46% less memory.  That means in some cases under high load and low memory, MyFaces will probably perform better than Spring MVC-JSP due to the garbage collector effect.

Session Size

Regarding the session size, it is important to keep in mind that:

  • If the session size is big, and is stored in memory only, it is possible to exhaust the available memory, and it will limit the number of concurrent users for your system.
  • If session size is big, and a persistence solution is used to store session information in some centralized place or share between servers (like in a cluster for example), more CPU is used because it will take longer to serialize/deserialize and transmit the necessary information across servers.

In the end, what's important is to make a rational use of this resource. For example, if the session size is about 20 KB and the memory available for sessions is 128 MB, there will be space for 6400 concurrent clients, but if the session size is about 1 MB, there will be space only for 128 concurrent clients. However if the CPU resources available on the web server can only handle 100 concurrent clients, there will not be any difference.

YourKit Profiler allows you to see the retained size of an object, or in other words, it represents the amount of memory that will be freed by the garbage collector when this object is collected. This feature is ideal for viewing the session size—just look for org.apache.catalina.session.StandardSession retained size. Here are the results:

 


Retained size in memory per session

Curiously, web frameworks like Tapestry or Grails have bigger sessions than MyFaces or Mojarra. But what's even more important is the sizes are in the same order of magnitude. In the Grails case, Spring Web Flow is used so that requires storing the information related to the flow. In the Tapestry case, flash scope variables are used at some points, because the component tree is stateless. In JSF, the component tree is used to store the same information.

The fact that MyFaces has similar values to Tapestry and Spring MVC means its partial state saving algorithm really pays off. There is no need to worry about the session with JSF anymore.

Conclusion

From the performance perspective, it is important to maintain a good balance between different factors like response time, throughput, memory use and so on. MyFaces is an example of how a component based web framework can be implemented properly, providing useful features and maintaining a good performance. The reason why this is possible is that the JSF specification has been carefully designed to balance each one of these aspects.

Thinking that stateless means faster and stateful is slower does not make sense. The performance comparison shows than a stateless web framework can be slower than a stateful one. In the end, it all depends on the implementation. For example in the JSF case, the code that calculates the state can be implemented in a way that imposes only a small overhead.

An action based web framework does not necessarily perform faster than a component based web framework, because the template engine—depending on how it is done—can impose a significant overhead or not. In the JSF case, an important synergy between Facelets and JSF Components has been achieved, leading to a better performance.

In any case, web frameworks are tools to make web applications. All developers wants to achieve both high scalability and low maintenance costs. For example, writing an application using JDBC only will be faster than a object-relational mapping tool like Hibernate or OpenJPA, but with JDBC it will be harder and more expensive for developers to maintain the same application. The same dilemma applies for web frameworks.

Finally, a web application is composed of many different layers, so the final performance will be determined by the interactions between these different layers. It is good to remember that the overall performance will be affected by the slowest element in the chain—in that sense choosing a fast web framework may not be the most important concern, compared with tuning up your business and database layer.