Friday, February 18, 2011

EJB transactions: going deeper

EJB technology is something received with mixed feelings. Some people love it (like me), other people can't seem to find the benefit of it. Yet others are frustrated by the fact that they can't seem to get the hang of it.

The trap with EJBs, especially since version 3.0 of the specification, is that you are given a false sense of security. Read any article on when EJB 3.0 was just released. Pick up any book. You'll get the same message: EJB 3.X is sooooooo easy! Really, you don't have to do anything basically, just slap some annotations on there and you're good to go!

In other words: you are meant to believe that the technology will do the work for you.

Of course if you have more than half a brain you'll just instantly dismiss such claims and you rightfully should. The technology most certainly does not do the work for you. But, if you know how to use it, it CAN help you to make your job a whole lot easier. But you are still the captain of the ship and you cannot and should not let go of that wheel until you're safe in the harbor.

In this article I would like to go a little bit beyond the average article you can find on the net and in stead incorporate a few clues you'll only find in the obscure forums after you have gotten yet another vague error message from your application server. I want to talk about EJB transaction management, the core of EJB technology. But not only about how it works, but also how you really apply it.

Note that in this article I'll deal only with container managed transactions. For completeness I'll discuss the material from start to finish, but this is not a tutorial on how to write EJBs or how to use JPA. I expect you to know at least the foundations of both.


What is a managed transaction?

Managed transactions. First off lets get something out of the way: an EJB (or to be more precise: container) managed transaction is not a database transaction. Part of an EJB transaction might be a database transaction (or multiple database transactions!), but it goes far beyond the datasource: an EJB transaction models, or is supposed to model, an actual transaction that can take place in the real world. After all when you build an enterprise system, you are trying to solve a real world problem.

Take a money transaction. That is not only changing some numbers around in one or more databases. There is also administration, notification, confirmation and validation going on.

What if the money transaction fails? Then we enter a failure path, which will among other things include restoring the system to its original unbroken state and more notification (a call to systems management and the client for example). The person overlooking the transfer may have been already on the line to confirm the transfer to a client; the call will be broken off to be able to enter the failure path.

Many steps that can either succeed or fail. Sometimes failure is acceptable, other times the transaction step needs to be delayed, but most of the time when something breaks you'll want to undo the damage already done.

Of course we deal in software here and our software isn't going to call anybody. But it does have the capacity to notify through the all-powerful JMS. Sending and processing messages can be part of a managed transaction. Most if not all services delivered by the JEE platform can, including sending emails.


Commit and rollback

Back to database terms. When all steps in a transaction succeed (transfer, confirm, notify, etc.), you'll want to commit it. After committing the transaction, it is permanent and you can't easily undo it anymore. Generally upon commit the transaction is over, and will be cleaned up.

When things go sour, you will want to rollback the transaction. This means that any mutation that was part of the active transaction will be undone. The system must go back to its virgin state as if nothing went wrong. Of course something did go wrong, but your failure handling routines should be able to cope with that.

Since the transactions are managed by the container, in general you also want the container to manage when a transaction is committed and rolled back. The rule is quite easy.

- when your code succeeds, commit
- when your code throws an exception (the bubbles out of the EJB method), roll back

But of course we don't want the server to have full control. You can control when a rollback is automatic or not; it only happens when the exception is either;

- a runtime exception
- an EjbException
- any other exception marked with the annotation @ApplicationException

With the annotation you can fully control if an automatic rollback should happen or not on your own exceptions. You could declare a runtime exception for example, but not have it rollback the transaction.

@ApplicationException(rollback=false)
public class MyException extends RuntimeException {

  ...
}


The first EJB


Lets begin with a little practical knowledge now to let the theory sink in. It is all fine and well that the container can manage transactions for us, but how and when does it happen?

@Local
public interface MyFirstEjb {

  public void helloWorld();
}

@Stateless
public class MyFirstEjbBean implements MyFirstEjb {

  public void helloWorld(){
    
    System.out.println("Hello world!");
  }
}


Here we have a stateless EJB with a single business method. Note that this is my specific naming convention for EJBs, it may not match your own. Do what you feel is best.

So where is the transaction? Right here:

public void helloWorld(){ // transaction starts here
    
    System.out.println("Hello world!");
  } // transaction ends here

Its as simple as that. It is after all a container managed transaction; you don't have to do anything to create one. It is "just there". Later on we'll see how you can impose influence here.

Adding a persistence unit

Of course most of the time you'll be doing transaction stuff that incorporates datastore mutations. Part of the EJB3 specification is JPA, which has a mode in which the JPA transactions can be managed by the container also. You'll only have to declare your persistence unit like this in the META-INF/persistence.xml:

<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/persistence
    http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" version="1.0">

 <persistence-unit name="your-pu" transaction-type="JTA">
    java:/YourDS
        <!--JPA provider configuration here -->
        </persistence-unit>
</persistence>


Transaction type "JTA" means "Java Transaction API", which is an API to standardize the way transactions are managed. The nice thing about JTA is that transactions can cross over different technologies; they all use the container managed JTA based transaction.

Now that you have your JTA persistence unit setup, you can gain access to it through a simple annotation:

public class MyFirstEjbBean implements MyFirstEjb {

  @PersistenceContext(name="your-pu")
  private EntityManager em;

  ...
}

Voila, a container managed entity manager. This means that when you enter a business method in this EJB that will have a managed transaction, you are guaranteed that the entity manager will also have an active transaction and that this transaction will be either committed or rolled back based on the success rate of your business method.

public class MyFirstEjbBean implements MyFirstEjb {

  @PersistenceContext(name="your-pu")
  private EntityManager em;

  public void saveCustomer(String name, String address, String city){

    Customer customer = new Customer(name, address, city);
    em.persist(customer);

    // nah, I don't care what you think. All our customers are named bill
    customer.setName("bill");
  }
}

A not too realistic example that requires little imagination. Customer is a JPA entity and it has some basic customer properties.

Why I choose this example is to see JPA transaction management at work. At first the customer is persisted. At that point the customer is managed by JPA, thus according to the rules of JPA any changes we make to the entity must be automatically persisted when you either flush or commit the active transaction.

Thus we can change the name without any need for a call to any EntityManager related persist call. The container will commit the transaction for us, and thus the name change is automatically committed for you. In fact the container is smart enough even to know when to flush changes to the database halfway through the EJB call, should this be necessary to make further mutations work. Neat huh?

Similarly, when you fetch entities through JPA these will also become managed entities for the duration of our transaction.

public Employer findBoss(String name){
    String q = "select b from Employer b where b.name=:name";
    Query qo = Query.createQuery(q).setParameter("name", name);

    try{
      return (Employer) qo.getSingleResult();
    } catch(NoResultException nre){
      return null;
    }
  }

  public void saveEmployee(String name, String address, String city, String bossname){

    Employee bill = new Employee(name, address, city);
    em.persist(bill);

    Employer boss = findBoss(bossname);
    bill.setEmployer(boss);
  }


As you should know, relational mappings in JPA will only work properly if you are putting managed entities in there. Because you are doing it in an EJB call, the boss instance will be managed - this also works for the findBoss() method as it is called within the scope of the saveEmployee() call and thus it simply shares the transaction of saveEmployee(). How and why will be explained later, there is a trap here I will make you aware of once we've covered a little more ground.


Transaction attributes

So far you've seen the default behavior of the EJB 3 specification.

- the EJB has container managed transactions
- each business method has an active transaction

The fun thing about transactions is that they can span across multiple EJBs. Or not! Thats basically up to you. For the next part, lets define unrealistic example number three.

@Stateless
public class MyFirstEjbBean implements MyFirstEjb {

  @EJB
  private MySecondEjb other;

  public void firstEjb(){

    other.secondEjb();
  }
}


@Stateless
public class MySecondEjbBean implements MySecondEjb {

  public void secondEjb(){

  }
}

I chose these names so that it stays clear what is being called.

Alright then. As said each business method, unless specified otherwise, has a managed transaction. This happens because by default business methods are assigned the transaction attribute REQUIRED. You could also enforce it yourself, like this:

@TransactionAttribute(TransactionAttributeType.REQUIRED)
  public void firstEjb(){
     ...
  }

You don't have to be paranoid however. The default values are declared in the EJB specification itself; every server that is EJB 3.X compliant must follow the same rules. So you can safely leave out the TransactionAttribute annotation in most cases.

Attribute REQUIRED basically means "you must have a transaction. So if there isn't one already create one". This is important when we follow the code. Lets say we call MyFirstEjb.firstEjb(). A transaction is created for us. In this business method we make a call to MySecondEjb.secondEjb(). Because we do this through the business interface (other) we inject into MyFirstEjb, the secondEjb() call is a container managed invocation. In other words: an EJB invocation, not a local method invocation.

This is an important distinction, because it determines what will happen to our transaction. We are now in secondEjb() and the container has some work to do. Will a new transaction be created for secondEjb()?

The answer is: no. Because REQUIRED says "create a new transaction if none exists already". But one does exist already, the transaction created in firstEjb(). secondEjb() will now adopt the transaction of firstEjb()!

What does that say to us? Lets revisit an earlier example.

public class MyFirstEjbBean implements MyFirstEjb {

  @EJB
  private MySecondEjb other;

  public void saveEmployee(String name, String address, String city, String bossname){

    Employee bill = new Employee(name, address, city);
    em.persist(bill);

    Employer boss = other.findBoss(bossname);
    bill.setEmployer(boss);
  }

}

@Stateless
public class MySecondEjbBean implements MySecondEjb {

  public Employer findBoss(String name){
    String q = "select b from Employer b where b.name=:name";
    Query qo = Query.createQuery(q).setParameter("name", name);

    try{
      return (Employer) qo.getSingleResult();
    } catch(NoResultException nre){
      return null;
    }
  }
}


The employer/employee example, adapted to be split among two different EJBs. The fact of the matter is that the code will still work; the boss may be fetched through JPA in MySecondEjb and returned to MyFirstEjb, both EJB methods are sharing the same transaction and thus the entity returned to firstEjb() is still managed at that point.

At this point you might shrug and think "is that so special?". Its easy isn't it, to just neglect all the magic that is being done for you. All you are doing now is invoke some bean methods and pass around some objects. But in the background the container is keeping an eye on things to make sure your transactional state is in order; it will even do that across JVMs in case of a remote EJB call.

And not a single line of actual database transaction related code!

Transaction attributes: going deeper

REQUIRED is not the only choice you have. Lets have them all.


NOT_SUPPORTED
This means that the EJB call will have no transaction at all.

In the case of secondEjb() in the example above, this means the transaction created in firstEjb() is suspended, and reactivated as soon as secondEjb() finishes its work.

NOT_SUPPORTED may seem like baggage at first, but it serves a few purposes.

- documentation. The annotation instantly tells you that the method does nothing transactional. Or should do nothing transactional, an important message to other people that may need to touch the code.
- resources. There is always a cost in managing a transaction, so if the container doesn't have to, give it a break.
- decreased whoops factor. Lets face it, you are going to make mistakes. By being precise with the transaction attributes you'll catch transaction mistakes far sooner in your development cycle as they'll be more fragile.

NEVER
NEVER is more drastic than NOT_SUPPORTED. If a transaction is active when this EJB is called, the container will throw an exception.

If you are dealing with a complex transaction management setup, NEVER can be a useful tool to catch programming mistakes early on. There is a gotcha however; NEVER would imply that during the runtime of the EJB method there will actually never be any kind of transaction. This is not entirely true however; when you make a call to another EJB, that EJB may safely create its own isolated transaction. Be aware of that, as if you make lots of EJB calls then NEVER may actually become a performance hog because of many mini-transactions being created, possibly without your knowledge.

SUPPORTS
This will make the container lazy. "If a transaction exists then fine, I'll adopt it. If none exist then I'm not going to make the effort to create one."

SUPPORTS is not particularly useful for any specific solution; if there is a real reason to use it you probably can (and likely should) redesign your code so it isn't needed anymore. In fact you should take care using it as it can lead to really nasty to pinpoint transaction issues. I see it as a "don't care" type of deal - you have a method that does not need an active transaction. You could mark it as NOT_SUPPORTED, but then the container will put a running transaction to sleep. When you put SUPPORTS, the transaction will remain active leading to less bookkeeping overhead for the container. If you have some sort of utility EJB method that gets called a lot from an EJB context, it can be a small optimization to give it the SUPPORTS transaction attribute in stead of NOT_SUPPORTED. And that would be my recommendation for its use: stick to NOT_SUPPORTED, but when you do performance optimizations identify methods that could benefit from SUPPORTS and only then apply it.

REQUIRES_NEW
The most interesting of the bunch. REQUIRES_NEW will always create a new transaction, even if one already exists. In other words you'll be working with parallel transactions, or to be more precise a nested transaction. However it isn't as complicated as all that; the outer transaction is put to sleep until the inner EJB call finishes, at which point the inner transaction is wrapped up. So there will still be only one active transaction at a time. Note that the inner transaction does not share the managed entities of the outer transaction, they are completely isolated.

This is then also a source of many programming mistakes. Because you create a new transaction, any entities managed inside it will become detached again when the EJB call finishes, even if you return an instance to the outer EJB method and its transaction! You'd have to do a find() on the entity in the outer EJB call to make it managed again.

MANDATORY
The opposite of NEVER; when the EJB is called there must be an active transaction already. Within container based transaction management you will be basically saying "call this business method only from another container managed resource such as an EJB or MDB". You may not realize it yet, but MANDATORY is a wickedly powerful tool that can help you to make your transactional code so much more robust.

For example, when I have a DAO class I like to mark storage DAO methods that accept (managed) entities as a parameter as MANDATORY. This way I don't have to add any code that makes the parameter entities managed before I slap them in entities I want to persist, I just dumbly assume that they already are because they come from a transacted environment. If they are not: well that is your own fault, but 99/100 times entities will actually already be managed at that point in time.


Lets set an example shall we? To properly demonstrate this, we'll have to make our code fail on purpose.

public class MyFirstEjbBean implements MyFirstEjb {

  @EJB
  private MySecondEjb ejb2;

  @PersistenceContext(name="your-pu")
  private EntityManager em;


  public void createEmployee(String name, String address, String city, String bossname){

    Employee steve = new Employee(name, address, city);
    em.persist(steve);

    Employer bill = ejb2.passAlongBoss(bossname);
    steve.setEmployer(bill);
  }
}

EJB number one is responsible for creating our employer.


@Stateless
public class MySecondEjbBean implements MySecondEjb {
  
  @EJB
  private MyThirdEjb ejb3;

  @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
  public Employer passAlongBoss(String name){
    return ejb3.findBoss(name);
  }
}

Our second EJB is a middle-man; our first EJB demands to know the boss for our employee; our second EJB supplies it by asking it of our third EJB.

@Stateless
public class MyThirdEjbBean implements MyThirdEjb {

  @PersistenceContext(name="your-pu")
  private EntityManager em;

  @TransactionAttribute(TransactionAttributeType.MANDATORY)
  public Employer findBoss(String name){
    String q = "select b from Employer b where b.name=:name";
    Query qo = Query.createQuery(q).setParameter("name", name);

    try{
      return (Employer) qo.getSingleResult();
    } catch(NoResultException nre){
      return null;
    }
  }
}

And our third EJB delivers. Now lets put the transaction attributes in line for a moment:

createCustomerREQUIREDtransaction T1 created
passAlongBossREQUIRES_NEWtransaction T2 created
findBossMANDATORYtransaction T2 adopted

This code will fail. Can you spot where?

It will fail as soon as createCustomer() finishes and transaction T1 is committed. Because even though we are so clever, we have messed with the transactions here. Lets see what happens at the entity level.

createCustomerEmployee createdmanaged in T1
passAlongBossEmployer received from findBoss()managed in T2
findBossEmployer fetchedmanaged in adopted T2

Our Employer entity is passed all the way from findBoss() to passAlongBoss() to createCustomer(). The trouble is that it is managed in T2, not in T1. So as soon as passAlongBoss() ends, T2 is wrapped up and the Employer entity becomes detached. The end result: you are setting a detached entity reference into the managed Employee entity, resulting in the JPA provider not being able to persist that change.


The local method trap

With the knowledge of transaction attributes fresh in your mind, let me throw a common mistake at you.

public class MyFirstEjbBean implements MyFirstEjb {

  public void businessMethod1(){
    
    businessMethod2();
  }

  @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
  public void businessMethod2(){
    // do some stuff
  }
}

Two business methods in the same EJB. Assuming we call, businessMethod1(), how many transactions do you think will be created in total?

Answer: 1.

What? businessMethod2() is supposed to get its own personal transaction because of REQUIRES_NEW right? You'd be right... if businessMethod2() would be invoked through an EJB interface. But it is not. From the perspective of the container, businessMethod2() is simply a local method call inside businessMethod1() and is not instrumented accordingly. Remember: EJBs are at the core still plain old Java classes.

The easy solution would be to use two EJBs in this case, as has been demonstrated earlier. I hope that makes sense to you, because it is only too easy to blame the technology for these kind of personal oversights. The fact that it works this way is logical, not a design flaw.


Multiple persistence units

You know how it is with manuals, articles and tutorials. Everything is fine when the basics are covered. But then you go out into the real world and you want to actually apply the material. Then you take the next step, pushing the technology beyond the scope of the manual, the articles and the tutorials because the more difficult problems are always forgotten by the authors as if you will never face them. But of course you do, programming is not easy.

The interesting topic when it comes to transactions is when you go beyond one persistence unit into multiple persistence units, that may target different databases. JTA can certainly handle that, and thus so can EJB technology. It doesn't just work out of the box though.

First of all, you need a very specific type of datasource: an XA datasource. To keep it short and simple: an XA datasource is specifically designed to take part in a "two phase" transaction, or a transaction that targets multiple resources. Most established database implementations support XA datasources through their drivers.

With the XA datasources in place, JTA is all setup to deal with you throwing multiple persistence units at it. What you shouldn't do however is to try and force two persistence units onto the same EJB. In stead, separate.

@Stateless
public class MyFirstEjbBean implements MyFirstEjb {

  @PersistenceContext(name="first-pu")
  private EntityManager em;

  public void doSomething(){

  }
}

@Stateless
public class MySecondEjbBean implements MySecondEjb {
  
  @PersistenceContext(name="second-pu")
  private EntityManager em;

  public void doSomething(){

  }
}


So far so good, two EJBs with two different persistence units. What will work with XA datasources and fail without it is this:

@Stateless
public class MyThirdEjbBean implements MyThirdEjb {
  
  @EJB
  private MyFirstEjb first;

  @EJB
  private MySecondEjb second;


  public void reallyDoSomething(){
    first.doSomething();
    second.doSomething();
  }
}

Notice how in one business method call the transactions are mixed and mashed. But with XA datasources and JTA, this can work.

Just note that as soon as you get into the realm of XA datasources and distributed transactions, when something blows up you'll probably get the most vague exceptions you'll ever encounter with exceptions yelling TWO PHASE COMMIT FAILURE ABORT errors at you. Shrug it off and don't be intimidated however, the truth is usually as simple as a query borking somewhere and the root cause will be hidden somewhere in the logged stacktraces. But it may seem like the container is on the brink of destruction when you first have to deal with this.


Dealing with exceptions

Lets say that an exception occurs in an EJB method. How would you deal with that?

@Stateless
public class MySecondEjbBean implements MySecondEjb {

  public void suicideMission(){
    throw new IllegalStateException("Blowing up!");
  }
}

suicideMission() throws a RuntimeException when called which is not handled, so its transaction will be rolled back by the container. This means that when called from another EJB, it will share the transaction of that EJB (if any) and that transaction will be marked for rollback.

@Stateless
public class MyFirstEjbBean implements MyFirstEjb {

  @EJB
  private MySecondEjb ejb2;

  public void doingSomething(){
    ejb2.suicideMission(); // bang
  }

}

Now lets say you choose to deal with that and you want to do some error handling by storing the error in a database.


public void doingSomething(){
    try{
      ejb2.suicideMission(); // bang
    } catch(Throwable t){
      ErrorLog log = new ErrorLog(t.getMessage());
      em.persist(log); // bang number 2
    }
  }

Sorry amigo, this isn't going to work. You'll be notified by the container that the transaction is marked for rollback. You cannot do any more mutations on it - it makes no sense anyway because all your new changes will be rolled back!

To incorporate error handling in your service layer, you'll have to make clever use of transaction boundaries using the attributes. For example you could give suicideMission() its own private transaction with REQUIRES_NEW to blow up so the transaction of doingSomething() remains intact (as long as you deal with the exception), or you could let the error handling be managed by another EJB method that has its own private transaction. When you do the latter, remember the local method call trap discussed earlier.

@Stateless
public class MySecondEjbBean implements MySecondEjb {

  @PersistenceContext(name="your_pu")
  private EntityManager em;


  public void suicideMission(){
    throw new IllegalStateException("Blowing up!");
  }

  @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
  public void handleError(Throwable t){
     ErrorLog log = new ErrorLog(t.getMessage());
     em.persist(log); // okay, done in a private transaction
  }
}


public void doingSomething(){

    try{
      ejb2.suicideMission(); // bang
    } catch(Throwable t){
      ejb2.handleError(t); // no bang
    }
  }


Of course you can always let the exception bubble all the way back to the root caller, which will most likely be a piece of code that does not have a container managed transaction like a servlet or a JSF backing bean. This is usually the best strategy to follow. But the overall message is: Be careful with exception handling in the service layer, there are many surprises if you don't keep in mind what the state of a transaction is.


EJB tech: making assumptions is okay!

So far I've been dealing with databases but that is not what makes EJB technology so interesting. As said before basically any JEE service is part of the transaction, including something as trivial as sending an email (as an example).

An email is sent out through an external mail server. Because sending the email is part of the transaction, it should be able to be rolled back right? So how does the container deal with that?

The answer is really simple: the email is sent out when the transaction is committed. The same for sending a JMS message by the way.

It is an incredibly simplistic yet effective way of making it possible to rollback such actions. And what it gives us is very dumb code that just assumes that there is no such thing as an error. Imagine a piece of code that does not use EJB technology.

public void someMethodNotAnEjb(String newuser, List<UserPermission> permissions) throws SomeOtherException {

  User user = null;
  try{
    user = saveUser();
  } catch(SomeException se){
    // deal with exception and quit
    throw new SomeOtherException("Registering user failed!", se);
  }

  // okay at this point the user was created, lets send out an email to the 
  // user to notify him of this fact
  sendRegistrationEmail(user);

  // oh yeah, need to bind permissions to user
  for(UserPermission pm : permissions){
    addPermissionToUser(user, pm); // bang, duplicate permission or something
  }
}

private void addPermissionToUser(User user, UserPermission pm) {
  throw new IllegalStateException("Cause an error to demonstrate what happens.");
}

This piece of code is not in any EJB. Because of the strange ordering of statements, the email will be sent out before the permissions are bound to the user. It just so happens that for whatever reason the addPermissionToUser() fails, maybe because of a duplicate permission or whatever. This means that user registration state is left in limbo and worse: the user got an email as if everything went just fine.

Now lets do it the EJB way.

public void someMethodIsAnEjb(String newuser, List<UserPermission> permissions){

  User user = saveUser();

  // okay at this point the user was created, lets send out an email to the 
  // user to notify him of this fact
  sendRegistrationEmail(user);

  // oh yeah, need to bind permissions to user
  for(UserPermission pm : permissions){
    addPermissionToUser(user, pm); // bang
  }
}

Nearly the same code, but what happens is vastly different. All this executes in the same transaction and a runtime exception is caused, triggering a rollback. This means that the creation of any permissions and the user record is rolled back but the most important thing... the email is not sent!

NOTE: assuming you use the JavaMail services of your container and not some sneaky backdoor code to send the email of course.

What a joy right? No nasty error checking needed to be sure that things succeed or not, just dumb code that assumes that things cannot go wrong and the container basically takes care of the error management for us. Now THAT is why I love EJB technology.


Long running tasks

When working with EJB technology you may have run into a problem: the dreaded transaction timeout. Like any other transaction, an EJB transaction will have a certain timeout bound to it; if a transaction takes longer than the timeout, the transaction is aborted.

Sometimes you will have EJB code that will be long running however. When processing data in files of several gigabytes on an external SFTP server you may have a runtime of several hours for example. Heavy database stuff may also be a culprit.

There are of course ways to cope with that, and it requires precise transaction management. Here are a few tricks.

Batching
A good way to deal with large volumes of data is to process it in batches. This also helps to keep the transaction size small. Imagine using JPA to store 1 million entities for example; even if you create the entities on the fly, they will all go into the persistence store which may likely give you memory issues. Not only that but the database transaction would become huge with such volumes of data.

So in stead of doing the entire set at once, split into smaller batches of say 10000. You would create two EJBs to handle this: EJB1 has a "master" method in which NO transaction is active (TransactionAttributeType.NEVER); this method will deal with the large volume of data and split up the transactional part into smaller batches. Each batch is then passed on to EJB2, which has a "support" method that does create a transaction.

Using this setup you will have a new transaction created per batch, which will each be short-lived and small scoped transactions. Voila, no time out and no memory issues.


Use bean-managed transactions
I don't call this a real 'solution' as the two-ejb setup I described earlier can make this work without a problem using container managed transactions. But the fact of the matter is that bean managed transactions offer you control over when a transaction starts and ends, making batching setups possible within only one EJB method. If you don't mind using both management types in the same application layer, by all means go for it. It saves you from having to add yet another class.


Increase timeout
Not really a stable solution as runtime speed is determined by many factors, including machine load. So you can never predict how long a certain operation is going to take, it is always going to vary. But if you can say with certainty that a single transaction is going to take AT LEAST 10 minutes, you could always increase the timeout to support such runtimes. As long as you don't start setting timeouts of hours or days; in such cases you really need to fix the problem at the root.

How to do that is server specific, check out my JBoss 5.1 fun facts article to learn how to do it in JBoss 5.1 and JBoss 6.

A word on manual transaction management

I don't want to end this article and leave the topic untouched. Either you use JPA in a client application or you use bean managed transactions in an EJB environment, there will be times in your career where you will have to manage transactions yourself.

Managing transactions is fairly painless (one call to start it, one call to commit it or roll it back). What you should be wary about is the persistence store, an invisible storage of entities you persist. Every entity that becomes managed is added to the store, which will grow and grow in size until ultimately it can grow so big that your application runs out of memory! Before that time you'll find that persisting new entities becomes slower and slower as the store becomes bigger and bigger.

The biggest job you have while performing manual transaction management is to manage that persistence store; you will want to keep it as small as possible to keep resource usage low and keep things as speedy as possible. JPA gives you multiple tools to do just that.

a) EntityManager.clear() will empty the persistence store and make all entities that were in it detached, even when a transaction is still active
b) EntityManager.delete() will remove the entity from the store... but also the database!
c) EntityManager.close() should be obvious.
d) persist outside of a transaction so the entities do not become managed; this will only work if you do not have entity relations of course.

Out of recent personal experience I can tell you that committing a transaction will not make entities detached and will not remove them from the persistence store; they remain cached. The best strategy that you can follow when working with large volumes of entities (say in a batched insert) is to have a dedicated entity manager for each transaction you create. So create the entity manager before starting the transaction and close the entity manager after committing or rolling back the transaction. This way you mimic closely what the container does during a container managed transaction.

For more information on doing manual JPA stuff outside of an EJB container, I refer to my JSF on Tomcat 7 article which also has a section on getting JPA 2 with Hibernate 4 up and running.

Conclusion

Of course, I have yet to cover everything there is to know about transactions but whole books have been filled with this very subject; I aim to cover what is useful but more than the bare basics that most articles seem to restrict to. For a more complete picture you should read some books. My two favorite books on the subjects touched upon in this article are Enterprise Javabeans and Pro JPA 2. With those two on your desk you'll be up and running with EJB and JPA technology very quickly indeed.

42 comments:

  1. Thanks!! I did not know container takes an @TransactionAttribute into account only when the EJB method called through the interface!

    ReplyDelete
  2. Yeah, that's one that can really stump you when you really need a nested transaction, especially because it may take you a while to figure out that it isn't actually happening. In the case of batching inserts for example, you'll only really notice it when you throw a big pile of data down the application for the first time.

    ReplyDelete
  3. yeah, Really good blogs to understand EJB Transaction concepts

    ReplyDelete
  4. Thank you. Nice to get feedback, although I wouldn't mind hearing about parts that are unclear or information that is missing :)

    ReplyDelete
  5. Thanks . It's very helpful for me.

    ReplyDelete
  6. If you are using a JEE6 compliant server (Glassfish V3+, JBoss 6+, Weblogic 12+, Websphere 8+, Geronimo 3+) you also have access to singleton EJBs and asynchronous EJB invocations; both incredibly powerful tools I will some day add to the blog. Be sure to check them out.

    ReplyDelete
  7. added the section "EJB tech: making assumptions is okay!" section to better illustrate how EJB transactions can help to write dumb code without excessive error handling.

    ReplyDelete
  8. Pretty good write-up. I learnt a lot today from your blog. You have a knack for making things simple and funny (had me chuckling with "But it may seem like the container is on the brink of destruction when you first have to deal with this."). I would definitely buy a book if you wrote one :)

    ReplyDelete
  9. He, no books in my near future (need to dry up behind the ears a little more). But if its an EJB book you're looking for, I highly recommend "Enterprise Javabeans 3.1". Its predecessor Enterprise Javabeans 3.0 is what taught me the tricks of the trade.

    ReplyDelete
  10. Hi! You say: "What you shouldn't do however is to try and force two persistence units onto the same EJB." what is the reason for this?

    ReplyDelete
  11. Hi, very good article! I got here from the Stackoverflow answer http://stackoverflow.com/a/8435463/1030527.

    If you want to improve the article, I have 2 suggestions:
    1-State explicitly that EJB self-references can solve the "local method trap"
    2-In the "Use bean-managed transactions" section, perhaps add a hint to which annotations and classes are used for this. Or maybe a link to the relevant section in the "Oracle Java ee tutorial"

    Thanks again!

    ReplyDelete
    Replies
    1. Ah, some much appreciated input.

      1-I'm no fan of self-references. If you need one, you likely have a design flaw in the whole "separation of concerns" scheme of things. I'm not really out to provide information just because it exists, but perhaps it is no good either to ignore its existence given the way the article lists alternatives... I'll give it some thought.

      2-The article is not about bean-managed transactions, but indeed I'll add the link to the tutorial to make it a little less of a useless Google hit if people were looking for it.

      Thanks!

      Delete
    2. https://openid.stackexchange.com/

      Regarding 1: Indeed, this "pattern" feels like a kludge to me also... Here is my use-case for something along these lines:

      I need a global shared configuration that works across clusters. The configuration holds some "enabled" flags for some timer tasks among other things. Currently, this is done with a singleton, but this doesn't work in a clustered environment because one copy of each singleton exists per JVM.

      Here is the idiom I plan on using:

      // Get current AppConfig entity or create a default one if none exist.
      // Defaults to REQUIRED
      public AppConfig getCurrentConfig() {
      AppConfig result = SelectAppConfigInstanceWithHighestId()
      while (result == null && maxTries-- > 0) {
      try {
      // Create the default configuration entity
      persistDefaultConfig();
      } catch (Exception e) {
      // Ignore; this should mean another thread created it
      }
      result = SelectAppConfigInstanceWithHighestId()
      }
      if (result == null) {
      throw new RuntimeException("Could not get or create AppConfig");
      }
      return result;
      }

      // Use REQUIRES_NEW so caller can catch the duplicate entity exception
      @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
      public void persistDefaultConfig() {
      AppConfig appConfig = AppConfig.getDefault();
      appConfig.setId(DEFAULT_CONFIG_ID);
      getEntityManager().persist(appConfig);
      }

      For this to work, the persistDefaultConfig method must really use REQUIRES_NEW to start a new transaction. Splitting this task across two EJBs seems very artificial as both methods are part of the logical process.

      If you know of a better way to achieve something similar, I'm all ears!

      p.s. for some reason, I get "Your OpenID credentials could not be verified." when using https://openid.stackexchange.com/ to authenticate.

      Delete
    3. That is a design stemming from recovering from a multithreading collision it seems. That is a hairy area of development that is more often than not in need of a review (for example: should you be synchronizing access to getCurrentConfig() to prevent collisions altogether?)

      btw: that code does not use a self-reference; the REQUIRES_NEW is not going to do anything in the current state. But you probably knew that. Also thanks for the link to that stackoverflow page, upon further inspection I notice that this article is actually linked there as an "exceptionally good article". Such praise is a good motivator ;)

      Delete
    4. This design is really in place to allow for a user-supplied AppConfig entity in the DB OR using (and persisting) a default one. I designed this to provide a cluster-wide singleton because I couldn't find other built-in and portable solutions.

      Multiple EJBs may call getCurrentConfig() at the same time and from different JVMs. These calls can therefore also call persistDefaultConfig() at the same time, but only one may succeed because the entity ID is the same. Of course, you are right and the code as posted does not use a self-reference which is necessary to achieve transaction isolation.

      I must say that the comment you referred to regarding this article is spot on! I really liked your writing style and the fact that you go beyond the basics.

      Delete
  12. Hi Gimby,
    great article!
    The best one I found to the topic "Long running tasks" with EJBs.
    Thanks a lot!

    ReplyDelete
    Replies
    1. You're very welcome. Do note that this article targets EJB 3.0 only; The EJB 3.1 spec has more features that can help you like asynchronous EJB invocations. That way a long-running EJB call can become a background task.

      Delete
  13. This really is the best overview of ejb transactions anywhere. thank you!

    ReplyDelete
    Replies
    1. Thanks. It can still be a good deal better though, since everyone is refusing to give me constructive criticism I'll voice my own: the article is full of stupid hello world type examples. It would be better to have a more real life example, but that is not as easy to come up with as it sounds though :/

      Delete
  14. Updated the section on the SUPPORTS transaction attribute to make it more to the point.

    ReplyDelete
  15. Hi Gimby
    How to see transaction traces? For example, in your "The local method trap"
    how can I see that EJB container is not creating a new transaction for businessMethod2.
    I'm using weblogic to run my ejbs.

    TIA

    ReplyDelete
    Replies
    1. You dont - you'll have to pay attention and know what you're doing. In failing that, hopefully you'll see unexpected behavior and make the correct interpretation.

      Perhaps weblogic has some feature to debug transactional states; i suggest the manual or the weblogic subforum on forums.oracle.com (in the application server category). I dont use weblogic myself, i use jboss.

      Delete
  16. Very informative article on EJB transactions. I was googling to find out why a new transaction was not created on calling a local method that has REQUIRES_NEW. I came across this blog from this URL: http://stackoverflow.com/questions/8435318/requires-new-within-requires-new-within-requires-new-on-and-on.

    I disagree with your 'stupid hello world type examples' comment. These simple hello world examples keep us readers focused on the topic of discussion.

    As someone already asked, I would like to know why we shouldn't force two persistence units onto the same EJB.

    ReplyDelete
    Replies
    1. Give me another year of research and experimentation and i'll be able to write an epic article on the depths of distributed transactions that would illustrate many why and how questions. Until then... lets leave it at keeping it simple through separation of concerns being the driving force behind that statement.

      Delete
  17. Thank you very much. Your article help me a lots. The best article I have read about transaction.

    ReplyDelete
  18. Great article!!! Thanks a lot.
    Just one thing - code won't compile because of 'void':

    public void passAlongBoss(String name){
    return ejb3.findBoss(name);
    }

    ReplyDelete
    Replies
    1. Fixed! Thanks for taking the time to share it.

      Delete
  19. Thanks a lot, it helped me too much. Your article shows the "real world" on the Ejb Transactions. Congratulation!

    ReplyDelete
  20. Excellent article. Thanks a lot for sharing your experiences.

    ReplyDelete
  21. Thank you, Thank you very very much
    This is great article
    You help me a lot

    ReplyDelete
  22. Thank you a lot it is brilliant article.

    ReplyDelete
  23. I realized one issue with NotSupported transaction attribute. There is usually parent transaction, which is paused and therefore it can time out, once your NotSupported annotated method finishes.

    ReplyDelete
    Replies
    1. That's a general issue whenever there is an outer transaction, yes.

      That is actually one thing that I don't really understand and was really surprised by actually; my understanding of "suspension" is that it wouldn't time-out anyway. But the time-out in fact still happens.

      Delete
  24. there is a problem with your assesment of bean managed transactions. if you do decide to manually commit each persistenc unit and jdbc connection manually and separately you will not maintain transactional integrity. One part of your transaction will commit while another fails and you have to undo all the work that committed.

    for this reason you are supposed to use the UserTransaction. Every container and transaction management system provides an implementation of the UserTransaction interface. This does not require you to worry about the low level calls to any persistence units of jdbc connections you have.

    You basically lookup the transaction object in JNDI (differs for each app server). Once you have it you can then manipulate the transaction.

    You have two options, one to override the container timeout and one to set the transaction to rollback only. then all you have to do is call the start() method to start the transaction and either commit() or rollback() to end it. The transaction can be restarted again without a lookup if multiple transactions are required.

    I say this because in many environment large transactions are not batchable. They all have to commit/rollback together and if there is that much work it probably shouldn't be done at the EBJ tier.

    So it would look like this in your ebj method:

    try {
    InitialContext iCtx = new InitialContext();
    UserTransaction ut = iCtx.lookup("name/of/transaction);

    ut.setTimeout(5000); // in seconds
    ut.setRollbackOnly(false); // not necessary unless you want it true
    ut.begin

    // do some work with as many beans
    // or persistence units or jdbc connections
    // or any other XA enabled resource

    ut.commit;

    // you could begin another transaction here

    } catch (Exception e) {
    ut.rollback;
    }

    The timeout is only applicable to this transaction and will not interfere with the standard container transaction timeout for other transactions. So you can set your container to a sane value of 120 seconds or so and when you know you have a long running transaction you can override the timeout for just that transaction. we built an estimating routine for our large transactions that estimates the timeout based on the volume of work required so the time is dynamic for each run.

    ReplyDelete
    Replies
    1. You operate under the assumption that losing transactional integrity is ALWAYS a problem. But it is not. You refer to the section about batched inserts which is exactly one case where it is not a problem and rather by design. But thanks for your addition nonetheless, the article is lacking a reference to UserTransaction.

      Delete
  25. Nice, thanks for your explanation.

    ReplyDelete
  26. Very clear, understandable, thanks for the article!
    I started applying the knowledge, but I'm stuck at the transactional sending of e-mails. At least a WebLogic book says that JavaMail is not transactional (Mountjoy, J., & Chugh, A. (2009). WebLogic: the definitive guide. O'Reilly.), and the mails are also not rolled back on my GlassFish server. How do you do it?
    Thanks!

    ReplyDelete
    Replies
    1. ... Uh oh, I might have goofed on that. It may just be that it was something else that made it transactional. Perhaps a JBoss specific feature or JBoss Seam 2. I'd have to investigate that and perhaps fix the article.

      Delete
  27. Hi Gimby,

    I have problem with transactions, i could not found much information about Tansaction Supports and thought i could write the problem.

    I have Stateless bean which looks like below
    @Local
    @Stateless
    @TransactionAttribute(TransactionAttributeType.SUPPORTS)
    public class MyfirstBean extends AbstractDAO {
    @EJB(name = "ejb/MySecondBean")
    MySecondBean secondBean;


    @TransactionAttribute(TransactionAttributeType.REQUIRED)
    public void update() {
    .........
    secondBean.find(1);
    }
    }


    @Local
    @Stateless
    @TransactionAttribute(TransactionAttributeType.REQUIRED)
    public class IdRecordManagerBean extends AbstractDAO {

    @TransactionAttribute(TransactionAttributeType.SUPPORTS)
    public SomeEntity find(final Long id) {
    .......................
    }
    }

    when i make a call to the update method in my firstbean, it gives me this error and no real reason mentioned in the stack trace.
    Caused by: javax.ejb.TransactionRolledbackLocalException: Client's transaction aborted

    And the stack trace clearly says there is a problem where the secondBean.find(,,) is called.

    Do you know why this happens and what needs to be done to fix this issue.

    Hope i would get some solution.

    Thanks

    ReplyDelete
    Replies
    1. Probably you have to dig deeper; what you may be seeing is the cascaded result of an earlier exception. If not well I'm sorry but I can't help you solve problems I don't have access too. It will probably be a long session between you and the debugger of your IDE to narrow it down.

      Delete
  28. Thanks for a precise Article on EJB.
    I have got a question regarding exception handling .

    We are using CMT , no @TransactionAttribute is defined. Every thing is defualt in EJB.

    I want to log System exception in database for business need.
    Something like this.
    catch (Exception ex) {
    employeeService.saveDocParserExceptionInDb(transactionTO);
    }

    saveDocParserExceptionInDb is calling merge(transactionTO).


    Your Explanation: After Transaction is rolled back , we can not use persist(obj) . The below statement will not work in catch block.

    em.persist(log); // bang number 2 (this will no longer work)



    My Observation : I am using merge(obj) to save exception after trsaction faliure in database and it is working fine. Whenever exception occurred , i am able to save exception status in DB .

    could you help me to understand how this is happening.May be i have misunderstood the concept.

    Thanks

    ReplyDelete
  29. Thanks for the awesome article!

    ReplyDelete