JDBCBench, a Java Database Case Study Category:
We all know that writing high-quality database applications isn’t easy. There are many pitfalls along the way and stumbling into one can mean impaired application performance and scalability. But it’s not only how you write applications that matters.
To get the best performance and scalability, you need to use a database management system that has features such as transaction management and parallelism. A good system should also give you access to prepared statements and reusable statements.
This article describes how we took two database management systems: Mimer SQL and MySQL and, together with the open source benchmark JDBCBench and the TPC-B benchmark, explored how both programming methods and database management systems can radically affect your application’s performance.
This article is rather long so we've included the following links to help you navigate:
What We Started-out With
We ran all of the tests on a 500 MHz Pentium with 128 MB RAM running Windows 98. All figures are in transactions per second.
Note! MySQL/MyISAM does not support transactions. By transaction, in this context, we mean the group of statements that form a TPC-B transaction.
A Sneak Preview of Results
The graphic below shows how Mimer SQL performs when compared with MySQL with transactions (InnoDB) when we used transactions to encapsulate our database requests, and when we embedded the requests in a stored procedure.
As you can see, the use of transactions and stored procedures enables Mimer SQL to scale up to about 600 transactions per second on this particular machine.
But MySQL doesn’t support precompiled SQL statements or stored procedures. Neither is its database kernel designed for transactions, all of which makes it harder for MySQL-based applications to scale.
About the Benchmarks
The TPC-B benchmark implements a bank application. The application keeps track of customers, their bank accounts and their deposits and withdrawals. Besides the account balance, the application must also keep track of the teller balance and the branch office balance. If a customer visits a local bank office and withdraws 1000 Euros from teller no. 1 in that office, the application must deduct 1000 Euros from the customer's account balance, teller no. 1's balance and the branch office balance.
The database consists of four tables, ACCOUNT, TELLER, BRANCH, and HISTORY. The history table logs all deposits and withdrawals.
Each withdrawal and deposit forms one transaction which involves updating the account balance, teller balance, branch balance, writing a log entry to the HISTORY table and querying to determine the resulting customer account balance. In the benchmark, this transaction is performed over and over again.
JDBCBench, available here, was written to simulate a number of clients. Each client uses its own thread, but they all share the same connection to the database. This solution does not perform well as it doesn’t make use of parallelism in a database server. Each operation issued by the client over the single connection is taken care of by a single request thread.
Before we even started making programming improvements, we rewrote JDBCBench to use one connection per thread. This is a more normal situation, where several clients are connected to the server at the same time and make queries simultaneously. If you are interested in the one-connection case, take a look at the 1-user-result in the tables and charts that are presented further down.
There is one thing worth mentioning here. Mimer SQL is designed and built to provide a high level of parallelism. The server work is performed by a number of request threads and a number of background threads. Which one is doing what depends on the operations are running. The number of threads is configurable, but for most systems the default setting is good enough. In a one-user scenario, we have a database client somewhere issuing database statements. These statements are at the lowest level received at the server by one of the request threads.
As you will see in the tables that follow, the server throughput in most cases increases significantly when the workload is spread over several connections, there might be some situations where it is worthwhile for an application that wants to execute several operations in parallel to open up another connection to the server.
Let's Get Started!
After modifying JDBCBench to use multiple connections we ran it. You can find the source code here.
We can see in the graph above that MySQL without transactions (MySQL/MyISAM) outperforms both MySQL with transactions (MySQL/InnoDB) and Mimer SQL. Particularly when few users are connected. When the load increases, the difference becomes smaller and at about 30 users it appears to be an even match between the MySQL versions.
Mimer SQL has problems in this test since this version of JDBCBench never closes its allocated objects. Mimer SQL is designed for tight coupling between client and server, enabling better scalability, but Mimer SQL needs to know when a client is done with a particular object.
Close Resources – and that’s an order ;-)
A common mistake Java programmers make is to rely entirely on garbage collection to close statements and result sets. Even though garbage collection will eventually work, performance and scalability are improved if your application explicitly closes objects when it’s done with them.
The main problem is that each client object consumes resources on the server. A large scrollable result set could use megabytes of memory on the server. It is extremely beneficial for the server to know as soon as possible that it can release such resources. If client objects aren’t closed, they will be closed automatically at the next garbage collect, but that might not happen for a long time. It all depends on the memory allocation strategy of the Java Virtual Machine (JVM). A JVM might not collect the garbage until it runs out of memory, and that could take forever.
So, the first improvement we decided to make was to close all client objects when done with them. That resulted in the following:
OK, you have to look really hard to see any improvement. Basically, this graph is identical to the previous one except that Mimer SQL and MySQL/InnoDB perform about 5% better. MySQL without transactions appears not to change at all.
Cut Back on Statement Compilations
In JDBCBench, each SQL statement sent to the server is compiled. Compilations are costly, particularly if the SQL statements are complex and there are many possible optimization choices. Sometimes, compiling and optimizing the query is more time-consuming than running the query itself. It is therefore important, and vital to server throughput, to minimize the number of statement compilations.
JDBCBench makes extensive use of java.sql.Statement objects. The Statement object class is the easiest and simplest way of executing SQL statements, but also the most expensive. For each statement, the application builds an SQL statement string which is executed once and then forgotten. The string concatenation process is in itself expensive.
The equivalent approach for a Java program is to have code to generate a Java source dynamically, compile it, execute and then forget it. For example, if we want to implement a function for calculating the formula f(x,y)=x*y+1, we could generate the class below, compile, and execute it if we wanted to calculate f(2,3). No one would do that. Instead we implement the class below, which uses input parameters. In the second example, we only need to compile the class once. We can execute it many times since the parameters are supplied at runtime.
Now that we realize the benefits of avoiding compilations, we want to do the same thing in SQL. Suppose we want to run a simple query in the personnel table below. (This table and these examples are not part of the benchmark.) We have an id-number and we want to know the name of the person on staff that it belongs to. A novice database programmer might implement it like the following. Each time the queryName method is called, we do everything over again. We allocate a new SQLStatement object each time (line 3), we build an SQL-string each time (4-5), and we compile and execute the statement each time (6). We want a better way! See below. The program above compiles the statement once (lines 5-6) through the constructor, but the statement can be compiled using any other method. The method MyClass.queryName is called with the parameters once for each execution. The parameters are supplied at line 10. As we learned earlier, we shouldn't rely on garbage collection so we close our database objects explicitly.
Reuse those Statements!
Nowadays, Web applications establish a lot of short-lived connections to the database, often through a connection pool. The nature of a Web application is to connect to the database, query it, generate a Web page and then disconnect. The Web application cannot keep prepared statements open between requests.
This poses a big problem as the Web application has to allocate new PreparedStatement objects at every request. The application must construct the SQL string and compile the statement at every request and we haven't achieved much. The application's performance suffers because of too many compilations.
To improve performance, Mimer SQL will, whenever possible, reuse compiled statements and query plans. This happens when the Mimer SQL compiler preprocessor determines that a particular statement has been compiled before. Statements can be reused even when they are issued on different connections, which is good for Web applications which often issue the same query over and over again from different connections.
To maximize the benefits this Mimer SQL feature, your application should issue identical SQL statements. Remember, each time a statement that has never been compiled before is created, performance suffers. Every statement with hard-coded parameters is regarded as a different statement.
Imagine a Java-program that makes use of Statement objects to insert employees in the STAFF-table listed above. The sample Java-program uses Statement objects and concatenates new SQL strings over and over again. In the end, the given program will compile the following statements. (We are using the names of Swedish prime ministers here.) The inserts result ina number of statement compilations which are all considered different. We can'treuse the compilations.
On the other hand, if we examine the improved program, we can see that this example uses PreparedStatement objects and parameter markers. The statement is prepared once before execution, and the parameters are supplied at runtime. Some systems may only compile this statement once, when the application is executed for the first time. And when other users run the same application later, the compilation will be reused.
The improvement at this stage is to replace the java.sql.Statement objects with java.sql.PreparedStatement objects. PreparedStatement objects are provided with an SQL statement at its creation which is never changed. The SQL statements use parameter markers in order to be able to reuse compilations. The statement is compiled once, at object instantiation, and executed many times.
If the application has already been executed recently, the database server doesn’t even have to compile the statement since it can be reused.
Now take a look at performance:
Here we can see that Mimer SQL has doubled its performance for a single user and more than quadrupled its performance for multiple users. By avoiding compilations, we have managed to increase both throughput and response times.
Note also that MySQL does not benefit from precompiled statements. As a matter of fact, no matter how the application submits its SQL queries, the statements will always be recompiled. In many cases, this means that the same work is done over and over again and the compiled result is forgotten every time.
One area where statement compilations also matter is the load sequence, so let’s take a look at that. When we replaced the generated SQL statements with prepared statements, load performance improved significantly. The chart below shows rows inserted per second:
The blue bars indicate the insert performance we obtained by using PreparedStatement objects. The red bars indicate Statement object performance.
One thing we see is that Mimer SQL increases its performance using the PreparedStatement objects but MySQL performs worse using these more advance objects. This is due to the fact that Mimer SQL tries hard to reuse statements. And as these inserts use the same statement over and over again, we benefit from this Mimer SQL feature.
MySQL on the other hand, keeps recompiling statements. As a matter of fact, the only feature MySQL PreparedStatement objects offer is to replace parameter markers with parameter data in the SQL string. This is what causes MySQL to perform worse using PreparedStatement objects.
Transactions – the Magic Word
Mimer SQL supports transactions and transactions give you an edge. First and foremost, transactions enable you to group operations that logically belong to each other and execute them atomically. Transactions also let you change your mind and revoke changes, you can rollback the transaction. And, whenever you have successfully committed a transaction, you can be (practically) certain that that piece of information will never be lost.
To make sure transactions aren’t lost once committed, each transaction is saved to disk before control is returned to the application. In this particular example, we have four updates and a select that logically belong to each other in each step.
JDBCBench runs in auto-commit mode, which means that all statements are committed and saved to disk, before the next statement is executed. This can be compared with the manual-commit approach, where the four updates and the select form one transaction whose database changes are secured on disk in one single disk write. Since disk operations are by far the most time consuming task in a database system, we can, in theory, experience a four-fold improvement in performance by committing each transaction manually.
If we consider the load phase, we can see an even greater improvement in performance. Since the original benchmark ran in auto-commit mode, every insert statement was immediately committed. A better approach is to turn auto-commit off, and perform a manual-commit at about every 100th insert. This way we save 100 rows to disk at a time, rather than saving each row to disk separately.
So we changed the benchmark to use transactions. We switched auto-commit off and issued a manual-commit after each TPC-B step. After these modifications, the program performs even better than before.
An added bonus is that the application will never leave the database in an inconsistent condition no matter what happens to the environment (power failure, disk crash, etc).
The main difference compared with the previous graph is that Mimer SQL in transaction mode increased performance by another 50 transactions per second. This is because Mimer SQL now knows which statements logically belong to each other and that they can be written to disk simultaneously.
It is also interesting to note that Mimer SQL with null databanks, i.e. no transactions, is actually performing less well now. This is because manual-commit mode involves an extra server communication per transaction. The client has to explicitly tell the database server to commit. This extra network call generates some overhead, and since the disk-write strategy was already optimized in the previous test there were no improvements to be made.
Because MySQL/MyISAM doesn't support manual-commit mode we used the figures from the previous test for comparison purposes.
We also changed to load phase to use manual commits at every 100th insert. The performance improvements can be seen below. This table shows rows loaded per second.
Since we now commit manually at every 100th insert we don’t flush the inserted data to disk at every statement. This is very beneficial for Mimer SQL and, as we can see, it improves load performance about 20 times. The MySQL/InnoDB server also benefits from this but only about four times. A possible reason that MySQL/InnoDB only improves its load performance four times may be that MySQL recompiles statements once for every inserted row.
Inserting data into null databanks is slightly slower since manual-commit mode involves another server communication, which we talked about earlier.
MySQL/MyISAM does not support transactions and that bar is provided for comparison only.
The Finishing Touch - Procedures
We can refine JDBCBench even further. All the previous programs made separate server calls for each statement. This means each statement talks to the server using a communication call of its own. Each communication includes some overhead both at the client and the server as well as transfer time for the message to get through. If the network is slow or if there is a lot of network traffic communication times can be very, very long.
One solution to this problem is to embed several statements in a procedure. This application is very well suited for this since each transaction involves updating several tables from the same application data.
Finally, we changed the benchmark to use procedures. We removed the separate update, select and insert statements and created a procedure which is called once for each TPC-B step. Now we are talking about serious performance:
Using this version of the benchmark we are able to triple server throughput compared to the previous test. Stored procedures minimize server communications and server overhead.
Note! The figures for MySQL are from previous test runs and are only for comparison purposes.
If you want to call yourself an expert Java database programmer, you must take the following into consideration when designing and implementing a database application:
To see what this means in practice, just take another look at what we achieved: