Saturday, June 12, 2010

Alfresco: Importing Content using NodeService and Friends

This week, I load up the three blogs and their associated posts from their Atom feed XML files, into Alfresco. For those of you following along, I realize that my progress has been slow, almost glacial. But hopefully, today is the last of the "boring" setup stuff, and I can move on to exploring how to customize Alfresco for the application.

With Drupal, I followed the reverse approach - I started writing my custom module against a stock Drupal instance, logging in as admin. That process did move faster initially, but I then had to modify the original code (multiple times) as more modules were brought into the picture to do certain tasks that we hadn't thought of initially. So I wanted to do it "right" (or something approximating it) this time. From my initial experiences, I am beginning to think it will take the same time regardless of the approach chosen.

On the bright side, though, I have gained a lot of familiarity with the Alfresco Foundation API, which hopefully, will help me with the customizations later. I found the Node Reference cookbook, and the pages on Search and Alfresco's Full Text Lucene Query syntax particularly helpful during this.

Loading Blogs

Before I started loading the blogs (all 3 of them :-)), I decided to create a constants interface similar to the Alfresco ContentModel, so we can refer to the type names, property names, etc, by their symbolic names rather than by strings - its just a bit less error-prone that way. So here is my MyContentModel interface. All it does is list down the various types, properties, aspects and associations in my content model (described here).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
// Source: src/java/com/mycompany/alfresco/extension/model/MyContentModel.java
package com.mycompany.alfresco.extension.model;

import org.alfresco.service.namespace.QName;

/**
 * Defines constants for the MyCompany company model.
 */
public interface MyContentModel {

  public static final String NAMESPACE_MYCOMPANY_CONTENT_MODEL = 
    "http://www.mycompany.com/model/content/1.0";
  
  // Types
  public static final String TYPE_BASE_DOCUMENT_STR = "baseDoc";
  public static final QName TYPE_BASE_DOCUMENT = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, TYPE_BASE_DOCUMENT_STR);
  public static final String TYPE_BLOG_STR = "blog";
  public static final QName TYPE_BLOG = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, TYPE_BLOG_STR);
  public static final String TYPE_PUBLISHABLE_DOC_STR = "publishableDoc";
  public static final QName TYPE_PUBLISHABLE_DOC = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, TYPE_PUBLISHABLE_DOC_STR);
  public static final String TYPE_POST_STR = "post";
  public static final QName TYPE_POST = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, TYPE_POST_STR);
  
  // Aspects
  public static final String ASPECT_TAGCLASSIFIABLE_STR = "tagClassifiable";
  public static final QName ASPECT_TAGCLASSIFIABLE = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, ASPECT_TAGCLASSIFIABLE_STR);
  
  // Properties
  public static final String PROP_BLOGNAME_STR = "blogname";
  public static final QName PROP_BLOGNAME = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_BLOGNAME_STR);
  public static final String PROP_BYLINE_STR = "byline";
  public static final QName PROP_BYLINE = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_BYLINE_STR);
  public static final String PROP_USER_STR = "name";
  public static final QName PROP_USER = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_USER_STR);
  public static final String PROP_PUBDATE_STR = "pubDate";
  public static final QName PROP_PUBDATE = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_PUBDATE_STR);
  public static final String PROP_PUBSTATE_STR = "pubState";
  public static final QName PROP_PUBSTATE = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_PUBSTATE_STR);
  public static final String PROP_PUBDTTM_STR = "pubDttm";
  public static final QName PROP_PUBDTTM = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_PUBDTTM_STR);
  public static final String PROP_UNPUBDTTM_STR = "unpubDttm";
  public static final QName PROP_UNPUB_DTTM = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_UNPUBDTTM_STR);
  public static final String PROP_FURL_STR = "furl";
  public static final QName PROP_FURL = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_FURL_STR);
  public static final String PROP_BLOGREF_STR = "blogRef";
  public static final QName PROP_BLOGREF = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_BLOGREF_STR);
  public static final String PROP_TAGS_STR = "tags";
  public static final QName PROP_TAGS = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, PROP_TAGS_STR);
  
  // Associations
  public static final String ASSOC_POSTS_STR = "posts";
  public static final QName ASSOC_POSTS = QName.createQName(
    NAMESPACE_MYCOMPANY_CONTENT_MODEL, ASSOC_POSTS_STR);
}

The next step is to define holder beans (POJOs) to hold the Blog data. Two beans are defined (BaseDoc and Blog) in order to mirror the inheritance in the content model. The beans are just used as data holders for the parsing, but I think it helps because it pulls in the various aspects (found by looking through the contentModel.xml file. The getters and setters are omitted in an attempt to keep the size of this (already rather long) post down a bit, hopefully its easy enough to just use your IDE to fill them out.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Source: src/java/com/mycompany/alfresco/extension/model/BaseDoc.java
package com.mycompany.alfresco.extension.model;

import java.util.Date;

/**
 * POJO to model a base document (abstract).
 */
public class BaseDoc {

  // from cm:object
  private String name;
  // from cm:content
  private String content;

  // from cm:auditable aspect
  private Date created;
  private String creator;
  private Date modified;
  private String modifier;
  // from cm:ownable aspect
  private String owner;
  // from cm:author aspect, although I don't see where its coming from...
  private String author;
  ...  
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// $Source: src/java/com/mycompany/alfresco/extension/model/Blog.java
package com.mycompany.alfresco.extension.model;

import org.apache.commons.lang.builder.ReflectionToStringBuilder;
import org.apache.commons.lang.builder.ToStringStyle;

/**
 * POJO to model a my:blog content.
 */
public class Blog extends BaseDoc {

  private String blogname;
  private String byline;
  // reference to Person
  private String personRef;
  ...  
}

To import, we parse the three XML files and extract the feed summary information and populate the blog bean. We then use the Alfresco foundation API to log in as each of our three bloggers, and write out a profile.html file (and its associated metadata) into their home directories. The code is shown below. Note that I had to go through multiple runs to get this right, so there is also a deleteAllFiles() method to allow me to go back to a good state.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
// Source: src/java/com/mycompany/alfresco/extension/loaders/BlogImporter.java
package com.mycompany.alfresco.extension.loaders;

import java.io.FileInputStream;
import java.io.Serializable;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.transaction.UserTransaction;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

import org.alfresco.model.ContentModel;
import org.alfresco.service.ServiceRegistry;
import org.alfresco.service.cmr.repository.ChildAssociationRef;
import org.alfresco.service.cmr.repository.ContentService;
import org.alfresco.service.cmr.repository.ContentWriter;
import org.alfresco.service.cmr.repository.NodeRef;
import org.alfresco.service.cmr.repository.NodeService;
import org.alfresco.service.cmr.repository.StoreRef;
import org.alfresco.service.cmr.search.ResultSet;
import org.alfresco.service.cmr.search.SearchService;
import org.alfresco.service.cmr.security.AuthenticationService;
import org.alfresco.service.cmr.security.PersonService;
import org.alfresco.service.namespace.QName;
import org.alfresco.service.transaction.TransactionService;
import org.alfresco.util.ApplicationContextHelper;
import org.apache.commons.lang.StringUtils;
import org.apache.commons.lang.WordUtils;
import org.junit.Test;
import org.springframework.context.ApplicationContext;

import com.mycompany.alfresco.extension.model.Blog;
import com.mycompany.alfresco.extension.model.MyContentModel;

/**
 * Parse and import blog level information.
 */
public class BlogImporter {

  private ServiceRegistry serviceRegistry;
  
  public void init() {
    ApplicationContext ctx = ApplicationContextHelper.getApplicationContext();
    this.serviceRegistry = 
      (ServiceRegistry) ctx.getBean(ServiceRegistry.SERVICE_REGISTRY);
  }
  
  public Blog parse(String author) throws Exception {
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLStreamReader parser = factory.createXMLStreamReader(
      new FileInputStream("/Users/sujit/Projects/Alfresco/" + 
      author + "_atom.xml"));
    Blog blog = new Blog();
    Date now = new Date();
    blog.setCreated(now);
    blog.setModified(now);
    blog.setCreator(author);
    blog.setModifier(author);
    blog.setOwner(author);
    blog.setAuthor(StringUtils.join(new String[] {
      WordUtils.capitalize(author), "Blogger"
    }, " "));
    blog.setName("profile.html"); // special file in each home dir
    for (;;) {
      int evt = parser.next();
      if (evt == XMLStreamConstants.END_DOCUMENT) {
        break;
      }
      if (evt == XMLStreamConstants.START_ELEMENT) {
        String tag = parser.getName().getLocalPart();
        if ("title".equals(tag)) {
          blog.setBlogname(parser.getElementText());
        } else if ("name".equals(tag)) {
          blog.setByline(parser.getElementText());
        } else if ("subtitle".equals(tag)) {
          blog.setContent(parser.getElementText());
        } else if ("entry".equals(tag)) {
          break;
        }
      }
    }
    parser.close();
    return blog;
  }
  
  public void doImport(String author, Blog blog) throws Exception {
    // login as author
    AuthenticationService authService = 
      serviceRegistry.getAuthenticationService();
    authService.authenticate(author, author.toCharArray());
    String username = authService.getCurrentUserName();
    String ticket = authService.getCurrentTicket();
    System.out.println("Logged in as: " + username + " with ticket: " + ticket);
    // start a transaction
    TransactionService txService = serviceRegistry.getTransactionService();
    UserTransaction tx = txService.getUserTransaction();
    tx.begin();
    try {
      // find the home folder for the user
      PersonService personService = serviceRegistry.getPersonService();
      NodeService nodeService = serviceRegistry.getNodeService();
      NodeRef personRef = personService.getPerson(username);
      NodeRef homeDirRef = (NodeRef) nodeService.getProperty(
        personRef, ContentModel.PROP_HOMEFOLDER);
      // create the profile node reference
      NodeRef profileRef = nodeService.createNode(homeDirRef, 
        ContentModel.ASSOC_CONTAINS, 
        QName.createQName(MyContentModel.NAMESPACE_MYCOMPANY_CONTENT_MODEL, 
        blog.getName()),
        MyContentModel.TYPE_BLOG).getChildRef();
      // add the properties
      Map<QName,Serializable> props = new HashMap<QName,Serializable>();
      props.put(ContentModel.PROP_NAME, blog.getName());
      props.put(MyContentModel.PROP_BLOGNAME, blog.getBlogname());
      props.put(MyContentModel.PROP_BYLINE, blog.getByline());
      props.put(MyContentModel.PROP_USER, personRef);
      nodeService.setProperties(profileRef, props);
      // write the content out - this is done after setting properties
      // since Alfresco cannot associate the content with a filename 
      // unless it gets the name property!
      ContentService contentService = serviceRegistry.getContentService();
      ContentWriter writer = contentService.getWriter(
        profileRef, ContentModel.PROP_CONTENT, true);
      writer.setMimetype("text/html");
      writer.putContent(blog.getContent());
      // add the aspects
      Map<QName,Serializable> ownableProps = new HashMap<QName,Serializable>();
      ownableProps.put(ContentModel.PROP_OWNER, blog.getOwner());
      nodeService.addAspect(
        profileRef, ContentModel.ASPECT_OWNABLE, ownableProps);
      Map<QName,Serializable> auditableProps = 
        new HashMap<QName,Serializable>();
      auditableProps.put(ContentModel.PROP_CREATOR, blog.getCreator());
      auditableProps.put(ContentModel.PROP_CREATED, blog.getCreated());
      auditableProps.put(ContentModel.PROP_MODIFIER, blog.getModifier());
      auditableProps.put(ContentModel.PROP_MODIFIED, blog.getModified());
      nodeService.addAspect(
        profileRef, ContentModel.ASPECT_AUDITABLE, auditableProps);
      Map<QName,Serializable> authorProps = new HashMap<QName,Serializable>();
      authorProps.put(ContentModel.PROP_AUTHOR, blog.getAuthor());
      nodeService.addAspect(
        profileRef, ContentModel.ASPECT_AUTHOR, authorProps);
      tx.commit();
    } catch (Exception e) {
      tx.rollback();
      throw e;
    }
    // log out
    authService.invalidateTicket(ticket);
    authService.clearCurrentSecurityContext();
  }
  
  public void deleteAllFiles(String author) throws Exception {
    AuthenticationService authService = 
      serviceRegistry.getAuthenticationService();
    authService.authenticate("admin", "admin".toCharArray());
    TransactionService txService = serviceRegistry.getTransactionService();
    UserTransaction tx = txService.getUserTransaction();
    tx.begin();
    SearchService searchService = serviceRegistry.getSearchService();
    NodeService nodeService = serviceRegistry.getNodeService();
    ResultSet results = null;
    try {
      results = searchService.query(
        StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, 
        SearchService.LANGUAGE_LUCENE, 
        "PATH:\"/app:company_home/app:user_homes/sys:" + 
        WordUtils.capitalize(author) + "\"");
      NodeRef homeRef = results.getChildAssocRef(0).getChildRef();
      List<ChildAssociationRef> caRefs = nodeService.getChildAssocs(homeRef);
      for (ChildAssociationRef caRef : caRefs) {
        NodeRef fileRef = caRef.getChildRef();
        System.out.println("deleting noderef: " + fileRef);
        nodeService.addAspect(fileRef, ContentModel.ASPECT_TEMPORARY, null);
        nodeService.deleteNode(fileRef);
      }
    } finally {
      if (results != null) { results.close(); }
    }
    tx.commit();
  }
  
  @Test
  public void testImport() throws Exception {
    BlogImporter importer = new BlogImporter();
    importer.init();
    String[] authors = new String[] {"happy", "grumpy", "bashful"};
    for (String author : authors) {
      Blog blog = importer.parse(author);
      importer.doImport(author, blog);
//      importer.deleteAllFiles(author);
    }
  }
}

Once this ran successfully, the Alfresco web client showed the profile file in each blogger's home directory. Here is the screenshot from blogger "happy".

Loading Posts

Loading the posts is similar to loading blogs, just on a (slightly) larger scale. As before we parse the XML files into a Post bean. Per our content model, the Post bean inherits from PublishableDoc which inherits from BaseDoc (already shown above). So here are the beans. As before, the getters and setters are removed, just use your IDE to fill them out for you.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Source: src/java/com/mycompany/alfresco/extension/model/PublishableDoc.java
package com.mycompany.alfresco.extension.model;

import java.util.Date;

/**
 * POJO to model a Publishable Document (abstract).
 */
public class PublishableDoc extends BaseDoc {

  public String pubState;
  public Date pubDttm;
  public Date unpubDttm;
  public String furl;
  ...  
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Source: src/java/com/mycompany/alfresco/extension/model/Post.java
package com.mycompany.alfresco.extension.model;

import java.util.ArrayList;
import java.util.List;

import org.apache.commons.lang.builder.ReflectionToStringBuilder;
import org.apache.commons.lang.builder.ToStringStyle;

/**
 * POJO to model a Post content object.
 */
public class Post extends PublishableDoc {

  private String blogName;
  // from aspect my:tagClassifiable
  private List<String> categoryNames = new ArrayList<String>();
  // from aspect cm:titled
  private String title;
  private String description;
  ...  
}

And here is the code to parse each file, extract out a List of Post objects, then log into Alfresco as that user, and load up the files into the current directory.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
// Source: src/java/com/mycompany/alfresco/extension/loaders/PostImporter.java
package com.mycompany.alfresco.extension.loaders;

import java.io.FileInputStream;
import java.io.Serializable;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import javax.transaction.UserTransaction;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

import org.alfresco.model.ContentModel;
import org.alfresco.service.ServiceRegistry;
import org.alfresco.service.cmr.repository.ChildAssociationRef;
import org.alfresco.service.cmr.repository.ContentService;
import org.alfresco.service.cmr.repository.ContentWriter;
import org.alfresco.service.cmr.repository.NodeRef;
import org.alfresco.service.cmr.repository.NodeService;
import org.alfresco.service.cmr.repository.StoreRef;
import org.alfresco.service.cmr.search.ResultSet;
import org.alfresco.service.cmr.search.SearchService;
import org.alfresco.service.cmr.security.AuthenticationService;
import org.alfresco.service.cmr.security.PersonService;
import org.alfresco.service.namespace.QName;
import org.alfresco.service.transaction.TransactionService;
import org.alfresco.util.ApplicationContextHelper;
import org.apache.commons.lang.StringUtils;
import org.junit.Test;
import org.springframework.context.ApplicationContext;

import com.mycompany.alfresco.extension.model.MyContentModel;
import com.mycompany.alfresco.extension.model.Post;

public class PostImporter {
  
  private static final SimpleDateFormat ATOM_DATE_FORMATTER = 
    new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSZ");
  
  private ServiceRegistry serviceRegistry;
  private Map<String,NodeRef> categoryNodeMap;
  private Map<String,NodeRef> authorBlogMap;
  
  public void init() throws Exception {
    ApplicationContext ctx = ApplicationContextHelper.getApplicationContext();
    serviceRegistry = (ServiceRegistry) ctx.getBean(
      ServiceRegistry.SERVICE_REGISTRY);
    categoryNodeMap = loadCategoryRefs();
    authorBlogMap = loadAuthorBlogMap();
  }

  public List<Post> parse(String author) throws Exception {
    XMLInputFactory factory = XMLInputFactory.newInstance();
    XMLStreamReader parser = factory.createXMLStreamReader(
      new FileInputStream("/Users/sujit/Projects/Alfresco/" + 
      author + "_atom.xml"));
    List<Post> posts = new ArrayList<Post>();
    Post post = null;
    boolean inPost = false;
    for (;;) {
      int evt = parser.next();
      if (evt == XMLStreamConstants.END_DOCUMENT) {
        break;
      }
      switch (evt) {
        case XMLStreamConstants.START_ELEMENT: {
          String tag = parser.getLocalName();
          if ("entry".equals(tag)) {
            post = new Post();
            // we can set a few things directly from our metadata
            post.setOwner(author);
            post.setCreator(author);
            post.setModifier(author);
            post.setBlogName(author); // use to look up node reference
            post.setPubState("Draft"); // to begin with
            // publish/unpublish in future, set to null
            post.setPubDttm(null);
            post.setUnpubDttm(null);
            inPost = true;
          } else if (inPost && "published".equals(tag)) {
            post.setCreated(ATOM_DATE_FORMATTER.parse(
              removeColonInOffset(parser.getElementText())));
          } else if (inPost && "updated".equals(tag)) {
            post.setModified(ATOM_DATE_FORMATTER.parse(
              removeColonInOffset(parser.getElementText())));
          } else if (inPost && "category".equals(tag)) {
            int nattrs = parser.getAttributeCount();
            for (int i = 0; i < nattrs; i++) {
              String attrName = parser.getAttributeLocalName(i);
              if ("term".equals(attrName)) {
                // use terms later to look up the appropriate noderef
                post.getCategoryNames().add(parser.getAttributeValue(i));
              }
            }
          } else if (inPost && "title".equals(tag)) {
            post.setTitle(parser.getElementText());
          } else if (inPost && "content".equals(tag)) {
            String content = parser.getElementText();
            post.setContent(content);
            post.setDescription(extractTeaser(content));
          } else if (inPost && "summary".equals(tag)) {
            // grumpy's Atom feed contains summaries...oh well
            String content = parser.getElementText();
            post.setContent(content);
            post.setDescription(extractTeaser(content));
          } else if (inPost && "name".equals(tag)) {
            post.setAuthor(parser.getElementText());
          } else if (inPost && "link".equals(tag)) {
            // make sure this is the title tag @rel='alternate'
            // and extract the furl and title from it
            int nattrs = parser.getAttributeCount();
            boolean isRelLink = false;
            String href = null;
            String title = null;
            for (int i = 0; i < nattrs; i++) {
              String attrName = parser.getAttributeLocalName(i);
              if ("href".equals(attrName)) {
                href = parser.getAttributeValue(i);
              } else if ("title".equals(attrName)) {
                title = parser.getAttributeValue(i);
              } else if ("rel".equals(attrName)) {
                isRelLink = "alternate".equals(parser.getAttributeValue(i));
              }
            }
            if (isRelLink) {
              post.setTitle(title);
              post.setName(lastPartOfUrl(href));
              post.setFurl(StringUtils.join(new String[] {
                MyContentModel.TYPE_POST_STR,
                author,
                post.getName()
              }, "/"));
            }
          }
        }
        case XMLStreamConstants.END_ELEMENT: {
          String tag = parser.getLocalName();
          if ("entry".equals(tag)) {
            if (post.getName() != null) {
              posts.add(post);
            }
          }
        }
      }
    }
    parser.close();
    return posts;
  }
  
  public void doImport(String author, List<Post> posts) throws Exception {
    // login as author
    System.out.println("logging in as " + author);
    AuthenticationService authService = 
      serviceRegistry.getAuthenticationService();
    authService.authenticate(author, author.toCharArray());
    String username = authService.getCurrentUserName();
    String ticket = authService.getCurrentTicket();
    // start a transaction
    TransactionService txService = serviceRegistry.getTransactionService();
    UserTransaction tx = txService.getUserTransaction();
    tx.begin();
    // find the home folder for the user
    PersonService personService = serviceRegistry.getPersonService();
    NodeService nodeService = serviceRegistry.getNodeService();
    NodeRef personRef = personService.getPerson(username);
    NodeRef homeDirRef = (NodeRef) nodeService.getProperty(
      personRef, ContentModel.PROP_HOMEFOLDER);
    NodeRef blogRef = authorBlogMap.get(author);
    try {
      for (Post post : posts) {
        // create the post reference
        NodeRef postRef = nodeService.createNode(homeDirRef, 
          ContentModel.ASSOC_CONTAINS, 
          QName.createQName(MyContentModel.NAMESPACE_MYCOMPANY_CONTENT_MODEL, 
          post.getName()),
          MyContentModel.TYPE_POST).getChildRef();
        // setup the properties
        Map<QName,Serializable> props = new HashMap<QName,Serializable>();
        props.put(ContentModel.PROP_NAME, post.getName());
        props.put(MyContentModel.PROP_BLOGREF, blogRef);
        props.put(MyContentModel.PROP_FURL, post.getFurl());
        props.put(MyContentModel.PROP_PUBSTATE, post.getPubState());
        // null for now, but keep them as placeholders
        props.put(MyContentModel.PROP_PUBDTTM, post.getPubDttm());
        props.put(MyContentModel.PROP_UNPUB_DTTM, post.getUnpubDttm());
        nodeService.setProperties(postRef, props);
        // write out the content
        ContentService contentService = serviceRegistry.getContentService();
        ContentWriter writer = contentService.getWriter(
          postRef, ContentModel.PROP_CONTENT, true);
        writer.setMimetype("text/html");
        writer.putContent(post.getContent());
        // add the aspects
        // cm:auditable
        Map<QName,Serializable> auditableProps = 
          new HashMap<QName,Serializable>();
        auditableProps.put(ContentModel.PROP_CREATOR, post.getCreator());
        auditableProps.put(ContentModel.PROP_CREATED, post.getCreated());
        auditableProps.put(ContentModel.PROP_MODIFIER, post.getModifier());
        auditableProps.put(ContentModel.PROP_MODIFIED, post.getModified());
        nodeService.addAspect(
          postRef, ContentModel.ASPECT_AUDITABLE, auditableProps);
        // cm:ownable
        Map<QName,Serializable> ownableProps = new HashMap<QName,Serializable>();
        ownableProps.put(ContentModel.PROP_OWNER, post.getOwner());
        nodeService.addAspect(
          postRef, ContentModel.ASPECT_OWNABLE, ownableProps);
        // cm:author
        Map<QName,Serializable> authorProps = new HashMap<QName,Serializable>();
        authorProps.put(ContentModel.PROP_AUTHOR, post.getAuthor());
        nodeService.addAspect(
          postRef, ContentModel.ASPECT_AUTHOR, authorProps);
        // cm:titled
        Map<QName,Serializable> titledProps = new HashMap<QName,Serializable>();
        titledProps.put(ContentModel.PROP_TITLE, post.getTitle());
        titledProps.put(ContentModel.PROP_DESCRIPTION, post.getDescription());
        nodeService.addAspect(postRef, ContentModel.ASPECT_TITLED, titledProps);
        // my:tagClassifiable
        Map<QName,Serializable> categoryProps = 
          new HashMap<QName,Serializable>();
        for (String categoryName : post.getCategoryNames()) {
          NodeRef categoryNodeRef = categoryNodeMap.get(categoryName);
          if (categoryNodeRef != null) {
            categoryProps.put(MyContentModel.PROP_TAGS, categoryNodeRef);
          }
        }
        nodeService.addAspect(postRef, 
          MyContentModel.ASPECT_TAGCLASSIFIABLE, categoryProps);
//        // add association from blog ref to this post
//        System.out.println("add backlink to blog");
//        nodeService.createAssociation(
//          blogRef, postRef, MyContentModel.ASSOC_POSTS);
      }
      tx.commit();
    } catch (Exception e) {
      tx.rollback();
      throw e;
    }
    // log out
    authService.invalidateTicket(ticket);
    authService.clearCurrentSecurityContext();
  }
  
  public void deleteAllPosts() throws Exception {
    AuthenticationService authService = 
      serviceRegistry.getAuthenticationService();
    authService.authenticate("admin", "admin".toCharArray());
    String ticket = authService.getCurrentTicket();
    TransactionService txService = serviceRegistry.getTransactionService();
    UserTransaction tx = txService.getUserTransaction();
    tx.begin();
    try {
      SearchService searchService = serviceRegistry.getSearchService();
      ResultSet resultSet = null;
      try {
        resultSet = searchService.query(
          StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, 
          SearchService.LANGUAGE_LUCENE, 
          "TYPE:\"" + MyContentModel.TYPE_POST.toString() + "\"");
        NodeService nodeService = serviceRegistry.getNodeService();
        for (ChildAssociationRef caref : resultSet.getChildAssocRefs()) {
          NodeRef postRef = caref.getChildRef();
          Map<QName,Serializable> props = nodeService.getProperties(postRef);
          String filename = (String) props.get(ContentModel.PROP_NAME);
          nodeService.addAspect(postRef, ContentModel.ASPECT_TEMPORARY, null);
          nodeService.deleteNode(postRef);
          System.out.println("Deleting file: " + filename);
        }
      } finally {
        if (resultSet != null) { resultSet.close(); }
      }
      tx.commit();
    } catch (Exception e) {
      tx.rollback();
      throw e;
    }
    authService.invalidateTicket(ticket);
    authService.clearCurrentSecurityContext();
  }
  
  private String extractTeaser(String content) {
    if (content.indexOf('<') < 0 && content.indexOf('>') < 0) {
      // plain text already
      return StringUtils.abbreviate(content, 250);
    } else {
      String plainText = content.replaceAll("<.*?>", "").
        replaceAll("\n+", "...");
      return StringUtils.abbreviate(plainText, 250);
    }
  }

  private String removeColonInOffset(String atomDate) {
    return StringUtils.join(new String[] {
      StringUtils.substring(atomDate, 0, atomDate.lastIndexOf(':')),
      StringUtils.substring(atomDate, atomDate.lastIndexOf(':') + 1)
    }, "");
  }

  private String lastPartOfUrl(String url) {
    return StringUtils.substring(url, url.lastIndexOf('/') + 1);
  }

  private Map<String,NodeRef> loadCategoryRefs() throws Exception {
    Map<String,NodeRef> categoryNodeMap = new HashMap<String,NodeRef>();
    AuthenticationService authService = 
      serviceRegistry.getAuthenticationService();
    authService.authenticate("admin", "admin".toCharArray());
    String ticket = authService.getCurrentTicket();
    TransactionService txService = serviceRegistry.getTransactionService();
    UserTransaction tx = txService.getUserTransaction();
    tx.begin();
    try {
      String queryString = 
        "PATH:\"cm:generalclassifiable/cm:Tags/" + 
        "cm:MyCompany_x0020_Post_x0020_Tags\"";
      SearchService searchService = serviceRegistry.getSearchService();
      NodeService nodeService = serviceRegistry.getNodeService();
      ResultSet resultSet = null;
      try {
        resultSet = searchService.query(
          StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, 
          SearchService.LANGUAGE_LUCENE, queryString);
        NodeRef myCompanyTagsRef = resultSet.getChildAssocRef(0).getChildRef();
        for (ChildAssociationRef caref : 
            nodeService.getChildAssocs(myCompanyTagsRef)) {
          NodeRef nodeRef = caref.getChildRef();
          Map<QName,Serializable> props = nodeService.getProperties(nodeRef);
          String name = (String) props.get(ContentModel.PROP_NAME);
          categoryNodeMap.put(name, nodeRef);
        }
      } finally {
        if (resultSet != null) { resultSet.close(); }
      }
      tx.commit();
    } catch (Exception e) {
      tx.rollback();
      throw e;
    }
    authService.invalidateTicket(ticket);
    authService.clearCurrentSecurityContext();
    return categoryNodeMap;
  }

  private Map<String,NodeRef> loadAuthorBlogMap() throws Exception {
    Map<String,NodeRef> authorBlogMap = new HashMap<String,NodeRef>();
    AuthenticationService authService = 
      serviceRegistry.getAuthenticationService();
    authService.authenticate("admin", "admin".toCharArray());
    String ticket = authService.getCurrentTicket();
    TransactionService txService = serviceRegistry.getTransactionService();
    UserTransaction tx = txService.getUserTransaction();
    tx.begin();
    try {
      String queryString = "+TYPE:\"" + 
        MyContentModel.TYPE_BLOG.toString()  + "\"";
      SearchService searchService = serviceRegistry.getSearchService();
      NodeService nodeService = serviceRegistry.getNodeService();
      ResultSet resultSet = null;
      try {
        resultSet = searchService.query(
          StoreRef.STORE_REF_WORKSPACE_SPACESSTORE, 
          SearchService.LANGUAGE_LUCENE, queryString);
        for (NodeRef nodeRef : resultSet.getNodeRefs()) {
          Map<QName,Serializable> props = nodeService.getProperties(nodeRef);
          String owner = (String) props.get(ContentModel.PROP_OWNER);
          authorBlogMap.put(owner, nodeRef);
        }
      } finally {
        if (resultSet != null) { resultSet.close(); }
      }
      tx.commit();
    } catch (Exception e) {
      tx.rollback();
      throw e;
    }
    authService.invalidateTicket(ticket);
    authService.clearCurrentSecurityContext();
    return authorBlogMap;
  }
  
  @Test
  public void testImport() throws Exception {
    PostImporter importer = new PostImporter();
    importer.init();
    importer.deleteAllPosts();   
    String[] authors = new String[] {"happy", "grumpy", "bashful"};
    for (String author : authors) {
      List<Post> posts = importer.parse(author);
      importer.doImport(author, posts);
    }
  }
}

As before, getting this right took multiple iterations, so there is a deleteAllPosts() method which does just that for a particular user. However, once done, I see that I have a bunch of content created in each home directory. Here is another screenshot after the PostImporter ran.

Two things I could not figure out, which I kind of gave up on, at least for the moment.

  1. Unlike the blog filename (which appears as profile.html), the post names are the actual node references. I am setting the name (based on the title) in exactly the same way as when importing blogs, so not sure what I am doing wrong. Thinking through this a bit, though, its probably not a big deal - since in my implementation, the blogger will never see this screen. As for the admin, the association type from the home directory does have the file name, so he will be able to navigate to the correct document from the node browser. Besides, when saving from a remote client, we will deal with blobs of content rather than files, so it probably makes sense to store it using its UUID anyway.
  2. I could not associate the post back to the blog using the my:posts association. Alfresco would give me errors saying that the mapping I set up was incorrect. I changed the mapping to mimic one from the examples, but I still have this problem. I can work around this and leave it like this, setting up the association in the application, but I would like to solve it - I think its something to do with the way I am calling the nodeService.createAssociation() method.

Under the Hood

There were other gotchas that I tripped up on when loading these two entities. I have a few answers, but also a few unanswered questions.

One time I ended up with a mismatch between the database and lucene, because my initial code did not have transactions and it crashed midway. So I had to manually clean up the database. The (I believe outdated) database schema was quite helpful for this one.

My Alfresco data.dir has many Lucene indexes, probably because I was not closing the ResultSet returned from the SearchService.query() call initially. Hoping to clean it up, I deleted the lucene-indexes directory and set the index.recovery.mode to FULL and restarted, but got errors on Alfresco startup, so I reverted back the indexes.

Another thing I was trying to do was to find the profile.html page with a combination query (eg, +TYPE:"my:blog" +@cm\:owner:"happy"), and I would get 0 results from the node browser as well as my code. Leads me to believe that the @cm:owner is probably not recorded in the Lucene index, even though Alfresco claims that all fields are by default indexed and stored unless specifically requested otherwise.

In an attempt to verify this, I tried looking at some of these indexes using Luke, but could not even find the record. The Alfresco Search Page does say that Alfresco uses multiple Lucene indexes to manage its data, and there is some more details in the Index Version 2 page. Haven't looked at the last one in much detail, but I plan to do this soon - hopefully it contains the answers to my problem.

As always, if you know the answers to these questions, or you can spot something obviously wrong with what I have done, please drop me a comment.

Update - 2010-06-17

The first issue where the post names were not being written into the repository turned out to be a code bug. Basically, I was setting the properties on create (in an attempt to solve another bug), I should have created the node, then setting the properties in the next call. The code for PostImporter has been updated.

2 comments (moderated to prevent spam):

Unknown said...

hello, my name is Irina, and i am trying to customize alfresco using java.
I managed to customize the specific functions onCreateNode, onUpdateNode but now i just got stuck with onDeleteNode. I can't get the noderef of the node that is being deleted, and i really need to find a way soon.

I googled for this, but no results.. I saw your blog and thought you could help me..

Please reply soon.

Thanks,
Irina

Sujit Pal said...

Hi Irina, I am afraid I don't completely understand your question. Based on my limited knowledge of Alfresco, I think you would want to do this in a behavior (check out this post where I attempt to simulate Drupal's PATH module using Alfresco Behaviours. In the PathAliasSetter behavior, I have an overriden onDeleteNode() in there, maybe that helps? Also check out the Alfresco NodeRef Cookbook, there are quite a few recipes in there you may find helpful.