Monday, May 17, 2010

Alfresco: Installation and Initial Thoughts

Currently, we are on the final stretch of delivering a product that combines the Drupal CMS with a Java-based publishing system. It has been a long and painful process getting to this point. One of the things that contributed to the pain is the opacity of the Drupal code, compounded by the fact that we don't have too much Drupal/PHP talent in-house. So at one point, I wondered if the process wouldn't have been easier had we chosen to work with a Java based CMS instead.

One well-known and fairly mature open-source Java based CMS is Alfresco. I decided to check it out and use it to build something similar to our Drupal based product. What's the objective of this apparently pointless exercise, you ask? Well, its mainly to learn about Alfresco and see how it compares to Drupal, really just curiosity. And no, its not to be able to switch out the Drupal component in the product with one based on Alfresco, in case you were wondering - that would be too risky, at least at this stage.

According to this somewhat controversial Infoworld article, Alfresco scores better than Drupal. However, the jury seems to be still out on that.

I think the best way to decide is to figure this out for myself. Prior to working with Drupal, I didn't really know what to look for in a CMS. Not that I know everything there is to know about this even now, but here is my set of "required features" for a CMS.

  • Custom Content - the ability to define custom content types specific to the application.
  • Profile - the ability to store user profile information, which may not be natively supported in the CMS user object. The reason I mention this separately is that the user object is usually distinct form a content object.
  • Import Content and Users - there should be some sort of API so I can import content and users form an external (possibly XML) source.
  • Users and Roles - CMS should support multiple users with different roles.
  • Workflow - documents will have to pass through multiple reviews before being published.
  • Relate content - a document in the CMS may be associated to to zero or more documents in the CMS.
  • Taxonomy - a document may be associated with multiple taxonomy vocabularies. The associations could be 1:1 or 1:n.
  • Enter/Maintain content - there should be a UI in order for users to enter new content and maintain existing content.
  • Interface to Publisher - should be able to send publish/unpublish commands to the current publisher interface.

With Drupal, we had the benefit of a consultant who helped us out with the installation, setup and initial learning curve. So I may be a bit biased towards Drupal because to me it is "simpler". However, with Alfresco, I am more familiar with the components used to build it (Spring, Lucene, Hibernate, JCR), so perhaps the bias will cancel itself out.

Since I am not an Alfresco expert, I plan on spending the next several weeks working through the various "requirements" and see how hard/easy it is compared to Drupal (much of this stuff is already implemented in our Drupal instance by our consultant, and some by me). At the end of this, I hope to have enough knowledge to be able to customize an Alfresco instance to a set of semi-realistic base requirements.

This week, all I've been able to do is to set up the Alfresco ECM client, basically a web application running on Tomcat. I've also set up an Eclipse project that will contain my customizations to the base Alfresco package to make it behave more like our Drupal installation. I describe them below:

Alfresco ECM Client setup

Prebuilt packages for Windows, Linux and MacOS exist for doing 1-click installs of Alfresco. I wanted to use the Tomcat that ships with my Mac (which I ended up upgrading later for a different reason) and the MySQL that I downloaded as part of MAMP for Drupal earlier, and because the prebuilt packages embed both these components, I didn't want to use the prebuilt packages.

So I initially downloaded the project from SVN, but was unable to build, so I downloaded the latest stable WARs, and popped the alfresco.war file into Tomcat's webapps folder. It complained about various things:

  • Tomcat running out of PermGen - the fix was to set a higher value for PermGen space based on this this Alfresco wiki page.
  • Missing ImageMagick and swftools - Alfresco complains about not finding these, so I needed to install them (sudo port install) and update the repository.xml file to point to the correct locations for these two tools.

I also had to create the alfresco database in MySQL and grant the alfresco user the appropriate rights as defined here.

At the end of a fairly long startup (the database gets populated with the tables the first time round, and the alf_data directory gets created and initialized), I was rewarded with the following page at http://localhost:8080/alfresco.


My first impression of the Alfresco ECM user interface is that its horribly complex compare to Drupal's, but then it could just be my unfamiliarity with it.

Customization Project setup (Eclipse)

I followed Jeff Potts's Alfresco Developer's Guide (see References below) almost to the letter while setting up my client customizations project. That way I could use the build.xml file in the code download for the book. The way the project is set up is that it zips up the files and unpacks them into an exploded Alfresco web application in Tomcat.

The directory structure for the project is as follows:

 PROJECT_ROOT
  |
  +-- src
  |   |
  |   +-- java
  |   |
  |   +-- web
  |       |
  |       +-- META-INF
  |       |
  |       +-- jsp
  |       |   |
  |       |   +-- extension
  |       |
  |       +-- mycompany
  |
  +-- config
       |
       +-- alfresco
            |
            +--- extension

I also downloaded the Alfresco JAR files and created a lib directory outside the project, so they don't get copied along with the project's ZIP file.

Initial Thoughts

Drupal appears to be more "complete" and intuitive (at least for my web developer intuitions). You can configure it to add your customizations, use its forms interface to generate content, and even use it to power your site's dynamic content pages, all in the same application. From my initial skimming of Munwar Sharif's book (see References below), the ECM can do most of what I want from it, but I just don't know how to do them yet. However, the general recommendation is to usually have a separate custom application for the CMS users, communicating with Alfresco's repository over REST/SOAP. For my web users, I would want the application to be decoupled from the ECM anyway, so the absence of a web front-end in Alfresco is a non-issue for me.

Drupal also has a lot more documentation freely available on the Internet. This is probably just because Alfresco is a younger project and its relatively harder to get into, so there are fewer people writing about it. However, there are at least two excellent books about it (see References below), which I suspect I will get much more familiar with over the coming weeks :-).

References

  • Alfresco Enterprise Content Management Implementation - by Munwar Sharif. I've just started reading this, so don't have much to say at this point.
  • Alfresco Developer Guide - by Jeff Potts. I've gone through this once already. There is an enormous amount of information in here, which I haven't digested completely either. Hopefully, as I work through my use cases, I will understand more.

Useful Links

Here are some links I found which I thought was useful. I list them below, hopefully you find them useful too.

Update - 2010-06-04

I wanted to have a way to override the repository.xml using my custom alfresco-global.properties, so I followed the instructions here and here, but no luck. Ended up adding these properties into the exploded repository.properties file instead. Not clean, but it works. Its probably as much work to maintain a custom version of alfresco.war as it is to maintain a Tomcat version customized for Alfresco.

3 comments (moderated to prevent spam):

Anonymous said...

Hi Sujit,

We just started exploring Drupal and Alfresco. Our team develops apps using Java, so a java based CMS would be better for us. I noticed from your blog posts that you have experience using both Drupal and Alfresco. Would you choose Alfresco over Drupal if you were to begin your project now (knowing the pros and cons of both)?

Thanks!

Sujit Pal said...

I would prefer Alfresco, but I think it depends on your situation.

Drupal is very popular, and hence has many third-party add-on modules available, so you can potentially just "assemble" your Drupal installation to do what you want without any programming. Alfresco is not as popular so you will most likely have to build in some features. But if you are a Java shop, it shouldn't be too hard to do.

So if your application doesn't really need features beyond what is already available via its add-on modules /and/ you are able to interface the rest of your (presumably Java) application to it in some manner, then Drupal would make sense. For the first, it may make sense to hire a Drupal consultant to build you a custom solution based on your requirements which you can then play with to see if it works for you.

[Please ignore my last comment - I have deleted it, and I was endorsing Drupal when I meant Alfresco]

Anonymous said...

Thanks for your input. BTW, I bookmarked your blog -- it has wealth of information!