Sunday, June 13, 2010

maven2 - a copy pastable template

Introduction

Maven; a wonderful tool to manage your project dependencies and builds with. It is also a source of great confusion and often is not used as the right tool for the job. I'll repeat what several people have already said before: Maven is NOT a build tool like ANT.

What Maven does give you is easy dependency management - in stead of shipping the dependencies with your project, they are downloaded from central repositories. To make this even more manageable, you can put up a tool like Nexus, which acts like a proxy server between your local maven projects and the big bad outside world. In stead of downloading dependencies off the internet, you'll be downloading them off of Nexus (which in turn might download it from the net if it doesn't have it yet).

Maven can also be used to build, test and package your project and even take that final step of applying release management to it, creating a tag in an SVN repository and possibly putting the resulting artifacts in Nexus so it may be used in other projects.

Maven is great, but it has a learning curve. I hope to make that learning curve slightly less steep by providing a clear and logical template for maven project setups that you can copy and adapt for yourself. I have applied this structure to all projects I have done so far; be it a simply library jar to a full blown enterprise application with ear, ejb and war modules. It just works!

In this document I'll give the full JavaEE project template; you can cut out what you don't need yourself.

Repositories

To be able to work with Maven, you need repositories. Repositories hold the dependencies that Maven can then download on demand. Once downloaded a dependency exists in your local repository, which is a directory on your harddisk. This prevents Maven from having to download dependencies over and over again, seriously speeding up your builds.

Out of the box Maven knows about the maven central repository, which is found here: http://repo1.maven.org/maven2/

I'd say about 80% of the dependencies you need can be found here. Due to licensing issues certain dependencies are blocked from entering the maven central and will exist in vendor specific repositories. Two useful ones are:

1. The java.net nexus: https://maven.java.net
3. The JBoss repository: http://repository.jboss.org/nexus/content/groups/public-jboss/

The Jboss repository is of course useful for JBoss dependencies such as JBossAS itself, JBPM, Drools, etc. There are also plenty of the more obscure dependencies to be found there. Note that this is a newer repository; there is also an older one that is no longer updated. Always check that you have the above repository configured.

The java.net nexus replaces the old java.net repository which used to be documented here. All dependencies from the old repository have been migrated to Maven central which is a -very- good thing. You should only configure the Java.net nexus if you need a dependency that is in there.

The java.net Maven documentation has a nice guide how to setup additional repositories.


The pom structure

Maven works with files called 'poms'. These are XML formatted meta data files that tell Maven how to do its magic and where to put files. Generally you will want to keep these files as sparse as possible; Maven defines 'convention over configuration' and it is wise to stick to the maven way of doing things, in stead of enforcing your own rules. I will assume that you actually do follow the convention.

Maven poms can follow a hierarchy; this is more useful than you can imagine. I will right now tell you what hierarchy the template I am describing here will follow:

parent pom
The parent pom is the 'gatekeeper' of the project; it defines which modules are part of the project and it will define which dependencies we will be using, plus which versions of these dependencies. It is important to keep that kind of information in the parent, because otherwise you can end up with version conflicts between the modules.

Module poms
Module poms will result in the actual 'deliverables' of our project. In terms of a JavaEE project that will be an ear, zero or more EJB modules and zero or more WAR modules.


On disc, that might look like this:

project_dir
  • pom.xml {parent}
  • myejb_dir
    • src/main/java
    • pom.xml
  • mywar_dir
    • src/main/java
    • pom.xml
  • myear_dir
    • pom.xml

Quite simple, but very effective.

Groups and artifacts


To refer to dependencies, Maven defines them in 'groups', 'artifacts' and 'versions' (optionally also a classifier, but I wouldn't worry too much about that). For example, if we would want to include JBPM in our application, we would declare a dependency like this:

<dependency>
 <groupId>org.jbpm.jbpm3</groupId>
 <artifactId>jbpm-jpdl</artifactId>
 <version>3.2.9</version>
</dependency>

How to know what the groupId and artifactId of a certain library are? I generally use the trick of looking the dependency up in the maven repository itself using google. For example searching for "maven 2 jbpm3" gives the following url as a result:

Jboss maven repository.

Now under the jbpm3/jbpm-jdpl subdirectory you'll find all the jbpm3 versions, including our 3.2.9 version. The full path thus becomes:

org/jbpm/jbpm3/jbpm-jpdl/3.2.9

To get the groupID, artifactID and version, you simply strip off elements from right to left.

Version=3.2.9
artifactId=jbpm-jpdl

And what is left is the groupId, replacing the slashes with dots.

groupId=org.jbpm.jbpm3

If you know this little trick, it is quite easy! All you need is the name of the API you want to use and Google.


the parent pom


Through the parent pom we will do dependency management, and more specifically we'll manage the versions of those dependencies. But the minimal thing we'll do is define what modules our project is built out of.

First of all, the basic header that all poms should have.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.mycompany.myproject</groupId>
  <artifactId>myproject-parent</artifactId>
  <name>Project name</name>
  <version>1.0.0-SNAPSHOT</version>
  <packaging>pom</packaging>

Here we define our groupId and the artifactId of the parent. For our modules we'll use the same groupId, but different artifactIds of course.

The version follows a maven standard: a SNAPSHOT release is an 'in development' release; remove the -SNAPSHOT for a stable production release. (the maven release management functions can automatically do that).

The 'pom' packaging type we specify here basically refers to the fact that this is a parent pom - it doesn't lead to a specific product, it is used to manage modules.


Next, we define which modules our project has.

<modules>
    <module>myproject-ejb</module>
    <module>myproject-web</module>
    <module>myproject-ear</module>
  </modules>

  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>com.mycompany.project</groupId>
        <artifactId>project-ejb</artifactId>
        <version>1.0.0-SNAPSHOT</version>
        <type>ejb</type>
      </dependency>
      <dependency>
        <groupId>com.mycompany.project</groupId>
        <artifactId>project-web</artifactId>
        <version>1.0.0-SNAPSHOT</version>
        <type>war</type>
      </dependency>
      <!-- more dependencies here -->
    </dependencies>
  </dependencyManagement>

Three modules, as discussed. This not only tells maven which modules are part of the project, but also in which subdirectories the modules will be stored - the subdirectories must match the names as they are written here.

The dependencymanagement section basically says 'if you adopt me in your project, you are taking these specific versions of these dependencies and their transitive dependencies with you as well'. That's why we don't put the EAR module, that module is of no consequence to the outside world.

Finally, we declare which dependencies we want to manage for our project. You will typically put all dependencies here that you want to use in your EJB, WAR and EAR modules. In the place of the more dependencies here above, put the dependencies, versions and default scope of all dependencies your project will use. Example:

<!-- more dependencies here -->
    ...
    <dependency>
      <groupId>org.jbpm.jbpm3</groupId>
      <artifactId>jbpm-jpdl</artifactId>
      <version>3.2.9</version>
    </dependency>
    </dependencyManagement>
</project>

Note that declaring the dependencies here (in the dependencyManagement section) will not actually add them to the project dependencies yet; think of these declarations only as 'in this project, the modules might use these libraries and I want them all to use these specific version, plus I want to have them at this scope by default'.

Dependency scope

You can provide another dependency configuration element: the scope. In my own projects I tend to use three scopes:

compile (default)
This means that the dependency is a transient dependency needed at compile time AND runtime of the application. Generally this will cause the dependency to be packaged with your application, depending on what the artifact type is. Since this is the default scope, when you don't provide scope information the dependency will be in the compile scope.

provided
This scope tells maven 'I need this dependency during compile time, but during runtime it will already be in the working environment'. A good example of this is the JavaEE dependency: when you deploy your web application the server will already have the libraries needed, so you certainly never ever want to deploy dependencies such as the servlet-api with your application. the provided scope can help you to manage this.

test
The test scope is the same as the provided scope; be it that the libraries will only be put on the classpath with running JUnit or TestNG unit tests. Dependencies such as JUnit/TestNG itself and Mock APIs used for testing purposes will need to be put in the test scope.

Managing and maintaining scopes is a piece of responsibility you HAVE to take when you start to use Maven; putting libraries on the wrong scope can have annoying or even disastrous results. Minimally it will lead to classpath pollution. Please, take the time and effort to put the dependencies on the right scope. You only have to provide that information in your parent pom, your modules will adopt it.

<dependency>
      <groupId>org.testng</groupId>
      <artifactId>testng</artifactId>
      <version>5.9</version>
      <scope>test</scope>
    </dependency>


the EJB pom

EJB and WAR poms are very much alike. I'll illustrate the EJB pom and then list what is different for the WAR pom.

First of all, the common header.

<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  <modelVersion>4.0.0</modelVersion>
  <artifactId>myproject-ejb</artifactId>
  <groupId>com.mycompany.myproject</groupId>
  <name>Myproject EJB layer</name>
  <version>1.0.0-SNAPSHOT</version>
  <packaging>ejb</packaging>

This should speak for itself. We are building an EJB jar, so that's the packaging type. For ease, I would keep the groupId the same as the one of your parent (in fact, use the same groupId for every module).

Now we add a new bit to the POM header.

<parent>
    <artifactId>myproject-parent</artifactId>
    <groupId>com.mycompany.myproject</groupId>
    <version>1.0.0-SNAPSHOT</version>
  </parent>

In the parent we say what modules are part of the project, in the modules we explicitly say which parent the module has, plus its version. This strongly ties the parent to the module and will allow you to link to any specific versions of the project's artifacts (as long as the versions end up in some global repository, such as a Nexus).

The minimum you'll want to add to the modules is which dependencies they have. As discussed because you have already defined the dependencies and the default scope in the parent pom, you don't need to specify any versions or scope in your modules.

<dependencies>
   <dependency>
      <groupId>org.jbpm.jbpm3</groupId>
      <artifactId>jbpm-jpdl</artifactId>
   </dependency>
   ...
 </dependencies>

Note that the tag name is dependencies, NOT dependencyManagement. Everything that you declare in a dependencies block are actual dependencies part of the module. That's the difference:

dependencyManagement: these dependencies I don't want to include right now, but I am going to use them somewhere and I want to use these versions.
dependencies: these dependencies I am going to use right here in this module.

Besides dependencies, you may also want to declare build properties in your EJB/WAR poms. Such as the following:

<build>
    <finalName>myproject-ejb</finalName>
    <sourceDirectory>src/main/java</sourceDirectory>
    <testSourceDirectory>src/test/java</testSourceDirectory>
    <defaultGoal>install</defaultGoal>
    <resources>
      <resource>
        <filtering>true</filtering>
        <directory>src/main/resources</directory>
      </resource>
    </resources>
    <testResources>
      <testResource>
        <filtering>true</filtering>
        <directory>src/test/resources</directory>
      </testResource>
      <testResource>
        <directory>src/main/java</directory>
      </testResource>
    </testResources>
  </build>


Now this build definition is actually quite unnecessary, as it configures the default convention that Maven implies. Any maven project will have its source files in the following structure:

PROJECT_DIR
/pom.xml
/myproject-ejb/pom.xml
/myproject-ejb/src/main/java
/myproject-ejb/src/main/resources
/myproject-ejb/src/test/java
/myproject-ejb/src/test/resources
/myproject-web/pom.xml
/myproject-web/src/main/java
/myproject-web/src/main/webapp
/myproject-web/src/main/resources
/myproject-web/src/test/java
/myproject-web/src/test/resources
/myproject-ear/pom.xml


So there is a split between application source files and test source files. Also there is a logical split between source files (java) and any other type of file such as images and configuration files (resources).

Even if you are not using maven, this is a very logical, flat and easy path structure to use. I use it for all my projects now, even ones I don't manage with Maven.

One last thing. By default Maven assumes EJB 2.1 spec. You will want to override that by manually declaring the EJB plugin in your parent pom, like this:

<build>
  <plugins>
    <plugin>
      <artifactId>maven-ejb-plugin</artifactId>
      <inherited>true</inherited>
      <configuration>
        <ejbVersion>3.1</ejbVersion>
      </configuration>
    </plugin>
    
  </plugins>
</build>

We'll cover more plugins a little later.

Now you could have slapped packaging type 'jar' on your EJB module and be done with it. There is a good reason to make your poms 'by the book' however; the fact that you configure your EJB module to be of packaging type EJB can influence tools built on top of Maven. A good example is the m2eclipse plugin of Eclipse which synchronizes your Eclipse project settings with the project poms; if you neglect the packaging type the project won't receive the EJB facet; if you neglect the plugin definition to specify the exact version, the wrong version of the facet will be configured by the plugin.

More information on Maven in Eclipse you can read in my article on the subject.

the WAR module


The WAR module is very much the same as the EJB module, with some notable differences.

<packaging>war</packaging>

Easy enough, we'll be creating a war file so that is the packaging type.

<dependencies>
    <dependency>
      <groupId>com.mycompany.myproject</groupId>
      <artifactId>myproject-ejb</artifactId>
      <type>ejb</type>
      <scope>provided</scope>
    </dependency>
    ...
  </dependencies>

Here the EJB module is defined as a dependency in the WAR module - this should make sense about now; you'll be wanted to use the EJBs in your WAR module, so you'll want it on the WAR classpath during compile time. To not confuse maven however, you have to declare the dependency as provided - after all, the EJB jar will certainly be there when you deploy your application!


the EAR module

The EAR module consists of nothing but a pom that will tell the maven EAR plugin how to generate it.

<packaging>ear</packaging>

Standard header (including the parent declaration), only we are packaging an EAR.

<build>
    <finalName>myapp</finalName>
    <defaultGoal>install</defaultGoal>
    <plugins>
      <plugin>
        <artifactId>maven-ear-plugin</artifactId>
        <configuration>
          <displayName>Myproject EAR module</displayName>
          <description>Myproject EAR module</description>
          <defaultLibBundleDir>lib</defaultLibBundleDir>
          <modules>
            <ejbModule>
              <groupId>com.mycompany.myproject</groupId>
              <artifactId>myproject-ejb</artifactId>
            </ejbModule>
            <webModule>
              <groupId>com.mycompany.myproject</groupId>
              <artifactId>mycompany-web</artifactId>
              <contextRoot>/myproject</contextRoot>
            </webModule>
            ...
        </modules>
        </configuration>
      </plugin>
   </plugins>
  </build>

Basically this piece of configuration defines how the application.xml of the EAR will be generated. The defaultLibBundleDir must not be forgotten; if you leave it out all dependencies will end up in the root of the ear, where they will NOT end up on the classpath of the EAR by default. They must go to the lib subdirectory for that to happen.

As you can see, you define the modules of the application. For EJB modules you will get some generated name (as apposed to the finalName you gave the EJB for some strange reason); to force a specific filename on the EJB you can add the bundleFileName element, like this:

<ejbModule>
     <groupId>com.mycompany.myproject</groupId>
     <artifactId>myproject-ejb</artifactId>
     <bundleFileName>myproject-ejb.jar</bundleFileName>
   </ejbModule>

You can do the same for war modules, although in my experience war modules do adopt their finalName in the EAR.


You must also declare the modules as dependencies, to get the dependency management going:

<dependencies>
    <dependency>
      <groupId>com.mycompany.myproject</groupId>
      <artifactId>myproject-ejb</artifactId>
      <type>ejb</type>
    </dependency>
    <dependency>
      <groupId>com.mycompany.myproject</groupId>
      <artifactId>myproject-web</artifactId>
      <type>war</type>
    </dependency>
    ...
  </dependencies>
</project>

It seems a little excessive to have to do this - I'm sure the Maven developers had a good reason though.

And thats basically it, you now have working templates for creating a standard JEE application!


Transitive dependencies revisited

Dependency management is the main feature of Maven, and we're not quite there yet with our understanding of it. Let me go a little deeper.

You know that dependencies at the compile scope are transitive dependencies. But what does that mean to Maven?

A lot actually! Lets create an example just how important it is to properly manage your dependencies and their scopes. Say we want to use JBPM in our EJB module. This means that when we deploy our EAR, the JBPM jar needs to end up in the lib subdirectory of the EAR for it to work during runtime.

To let Maven do that for us, do the following. In your parent pom, activate dependency management on JBPM:

<!-- in the parent pom -->
<dependencyManagement>
...
<dependency>
 <groupId>org.jbpm.jbpm3</groupId>
 <artifactId>jbpm-jpdl</artifactId>
 <version>3.2.9</version>
</dependency>
</dependencyManagement>

We've already seen that, now we have told maven that whenever we refer to JBPM, we want version 3.2.9 of it and we want it to be on the compile scope by default.

Now in your EJB, add JBPM to the dependency list:

<!-- in the EJB pom -->
  <dependencies>
   <dependency>
      <groupId>org.jbpm.jbpm3</groupId>
      <artifactId>jbpm-jpdl</artifactId>
   </dependency>
   ...
 </dependencies>

Also nothing new. But now we've made JBPM a transitive dependency of the EJB module - Maven now knows: whenever you add the EJB module as a dependency, JBPM has to be deployed with it!

And that is exactly what happens when we put the EJB in our EAR, like this:

<!-- in the EAR pom -->
  <dependencies>
    <dependency>
      <groupId>com.mycompany.myproject</groupId>
      <artifactId>myproject-ejb</artifactId>
      <type>ejb</type>
    </dependency>
    ...
  </dependencies>

And right there it happens! The EJB is a dependency of the EAR module, and right now Maven will automatically deploy the transitive JBPM dependency with your EAR, putting it in a lib subdirectory (remember: because of the defaultLibBundleDir element we added to the configuration). This is very important that it does this, as all jars in the lib subdirectory of an EAR are automatically deployed by the server, you don't have to configure anything more to make it happen. Slick!

Similarly, transitive dependencies of a WAR module will automatically be put in the WEB-INF/lib directory by Maven when you let it build the war file. It doesn't get any better than that I'd say.

Sharing dependencies between the EJB and the WAR

It might happen you need the same library in an EJB module and in a WAR module. Because the library is a transitive dependency of the EJB, it will end up in the EAR as I discussed in the previous paragraph.

Fact of the matter is, all EAR libraries are also visible to the WAR modules, so letting the library be put in the WAR also is overkill. The solution?

Declare the dependency as provided in the WAR, overriding the compile scope as it is determined by the parent. In this case it is true, because the jar will be provided to the WAR through the EAR file. That cleans up a duplicate library mess.

That is a basic rule by the way: scope and version of a dependency CAN be overridden in module poms by adding version and scope tags to the dependency. You just should minimize it only to cases where it is really necessary, like this particular one. In a sane project environment only the parent will contain version declarations.


Controlling compiler settings

The convention over configuration concept that Maven follows has resulted in many configuration options not being present in the template I have put before you thus far. You can have finegrained control over many of the maven plugins that are available. Most of them you can easily work out from the online maven documentation; to give you an idea I'll show you how to control compiler settings. For example, lets see how you can make Maven output Java 5 compatible classes. You can declare this in the parent so all the modules adopt it.

<build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>1.5</source>
          <target>1.5</target>
        </configuration>
      </plugin>
      ...
    </plugins>
  </build>

Check out the maven documentation to know which plugins are available and what configuration properties they have!

So what if you don't create a JEE project?


It's flexible, Bill.

Say you want to create a maven project for a library - one jar, nothing more, nothing less. You could define a parent and a single jar module, or you could skip the parent altogether and just create a single pom with jar packaging. Its up to you - just be sure to think about what you are doing. Maven is a wonderful tool, but it isn't a magic wand that can do everything. ANT however can do a lot more, and did you know you can invoke ANT scripts from within Maven?


Closing thoughts: Fine grained dependency management


The dependency management system of Maven is fantastic. But it can lead to problems also. Once you start to attach dependencies to your project, you'll find that additional dependencies start popping up all over the place. These are transient dependencies of your own transient dependencies - sometimes they are correct runtime dependencies of the libraries you use, other times they are the result of not defining the proper scopes. JBPM is a good example of this; when you include it in your project you'll automatically get such dependencies as Lucene and Jackrabbit as a gift, while they are absolutely not needed during runtime!

You'll want to keep your classpath clean, and that will take a little micromanagement. When you declare a dependency in your parent pom, you can also add exclusions to the dependency. Its easy, like this:

<dependencies>
    <dependency>
      <groupId>org.jbpm.jbpm3</groupId>
      <artifactId>jbpm-jpdl</artifactId>
      <version>3.2.9</version>
      <exclusions>
       <exclusion>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-core<artifactId>
       </exclusion>
      </exclusions>
    </dependency>
    ...
  </dependencies>

Its annoying, but usually you'll only have to figure this out once and then it will become a copy-paste action among your projects. The question is, how do you figure out where all the transitive dependencies are coming from? Maven comes to the rescue. Simply invoke the following command under your parent or a module:

mvn dependency:tree

When invoked from the parent, that will display all dependencies in all modules in a nice tree structure, showing exactly where all the dependencies and their transitive dependencies are coming from. If you invoke this under a module you'll only get the dependencies in that module.


That's it!

I haven't even covered half of what is possible using Maven in this article. That was not my intention either. I hope I have given you a clean and easy to use template for your own projects that you can copy and abuse.

When I have found some dependable webspace I'll put up a download of a 'complete' maven project. In a future article I'll also describe release management using Maven - it would have made this article to confusing.

6 comments:

  1. Great article.
    We use Maven for all projects, but so far I've only tweaked some existing configurations without much understanding. Your overview helped me grasp the bigger picture.

    ReplyDelete
  2. I'm glad. Maven is a pain to learn, but it is a great benefit to know the gritty details of it. If properly used it can save lots of time, and understanding the poms can help you to setup the entire application framework in under 15 minutes.

    ReplyDelete
  3. Good stuff. Please make sure to correctly write in cAmElCaSe, write artifactId instead of artifactid

    ReplyDelete
  4. Fixed.

    You make a blog post about a copy/pastable template and then of course you introduce a copy/paste mistake :) It was a lesson to teach how evil copy/pasting is! Really!

    ReplyDelete
  5. updated java.net repository information after the migration of download.java.net to Maven central.

    ReplyDelete
  6. EJB module example was using jar packaging type; switched that to ejb packaging type (and added the ejb plugin declaration to force a specific spec version) which will prevent conflicts with IDE integration.

    ReplyDelete