Sunday, August 10, 2014

Exploring OpenCalais and Alchemy NER APIs


We are building a new NLP pipeline at work for Natural Language Understanding of Patient Health Records (PHR). We plan on extending our existing concept mapping technology that has served us well so far. The text corpora we have dealt with so far are journal articles and book chapters at one end of the spectrum to blog posts on the other - in other words, articles written by professionals and generally proof-read for grammar and spelling. PHRs are written by (and for) busy doctors and nurses, so we are grappling with what appears to be new standards for measuring the correctness of concept annotations.

I figured that perhaps a second (and third) opinion may help our understanding of these new standards, so I decided to investigate two other entity recognition systems - the OpenCalais and Alchemy APIs. The idea is to use the results from the results from these APIs as a baseline to learn what is correct or not correct, and to measure the performance of our pipeline against these APIs. Given that we are working in a narrower domain, our objective should be to do better. This post describes a simple test of running a single PHR against both these services.

Note that both these APIs also provide an easy to use web form (here and here) where you can drop your text and get back the annotations, without even signing up for an API key. But its not as convenient to use this approach for multiple documents, so I built some simple Scala clients using the Java interfaces both APIs provide.

The input file looks like this:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
CHIEF COMPLAINT:  Ankle pain.

HISTORY OF PRESENT ILLNESS:  The patient is a pleasant 17-year-old gentleman 
who was playing basketball today in gym.  Two hours prior to presentation, 
he started to fall and someone stepped on his ankle and kind of twisted his 
right ankle and he cannot bear weight on it now.  It hurts to move or bear 
weight.  No other injuries noted.  He does not think he has had injuries to 
his ankle in the past.

PAST MEDICAL HISTORY:  None.

PAST SURGICAL HISTORY:  None.

SOCIAL HISTORY:  He does not drink or smoke.

ALLERGIES:  Unknown.

MEDICATIONS:  Adderall and Accutane.

REVIEW OF SYSTEMS:  As above.  Ten systems reviewed and are negative.

PHYSICAL EXAMINATION:
VITAL SIGNS:  Temperature 97.6, pulse 70, respirations 16, blood pressure 
120/63, and pulse oximetry 100% on room air.

GENERAL:   A pleasant gentleman in no acute distress.

EXTREMITIES:  Focused physical examination, he has full range of motion in 
his right knee.  No pain to palpation over the lateral or medial malleolus.  
No pain over the Achilles tendon.  Pulses are intact.  Capillary refill and 
sensation normal.  He has had pain over the lateral aspect of the right foot 
with some ecchymosis and swelling.  He also has some pain over the dorsum of 
the foot as well.  No laxity is noted.

MEDICAL DECISION MAKING:  This is a pleasant young gentleman with symptoms as 
above, presenting with a foot and ankle injury.  He had an x-ray of his ankle 
that showed a small ossicle versus avulsion fracture of the talonavicular 
joint on the lateral view.  He has had no pain over the metatarsals themselves.
This may be a fracture based upon his exam.  He does want to have me to put him
in a splint.  He was given Motrin here.  He will be discharged home to follow 
up with Dr. X from Orthopedics.

ASSESSMENT:  Acute foot or ankle sprain, possible small fracture.

DISPOSITION:  Crutches and splint were administered here.  I gave him a 
prescription for Motrin and some Darvocet if he needs to length his sleep and 
if he has continued pain to follow up with Dr. X.  Return if any worsening 
problems.

OpenCalais provides a Java interface called J-Calais. The interface is very simple to use. Here is some code that reads the file (above) and sends it to the OpenCalais server for analysis, then reports on the response. OpenCalais exposes a single analyze() method and will return all information in one short. Results are untyped, ie, always lists of OpenCalaisObjects so you will have to know what to look for in each case. The most important information (from my point of view) are the entities and topics. J-Calais also returns relationship information and suggests social tags for the document.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Source: src/main/scala/com/mycompany/scalcium/utils/OpenCalaisMapper.scala
package com.mycompany.scalcium.utils

import java.io.File
import mx.bigdata.jcalais.rest.CalaisRestClient
import scala.io.Source
import mx.bigdata.jcalais.CalaisResponse
import mx.bigdata.jcalais.CalaisObject
import scala.collection.JavaConversions._

class OpenCalaisMapper {

  val MyApiKey = "..."
  val client = new CalaisRestClient(MyApiKey)
  
  def map(file: File): CalaisResponse = {
    val source = Source.fromFile(file)
    val text = source.mkString
    source.close
    client.analyze(text)
  }
  
  def entities(resp: CalaisResponse): List[CalaisObject] =
    resp.getEntities().toList
    
  def topics(resp: CalaisResponse): List[CalaisObject] = 
    resp.getTopics().toList
    
  def socialTags(resp: CalaisResponse): List[CalaisObject] = 
    resp.getSocialTags().toList
    
  def relations(resp: CalaisResponse): List[CalaisObject] = 
    resp.getRelations().toList
}

And here is the corresponding JUnit test that sends our file over to the OpenCalais server and parses the response.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Source: src/test/scala/com/mycompany/scalcium/utils/OpenCalaisMapperTest.scala
package com.mycompany.scalcium.utils

import org.junit.Test
import java.io.File
import scala.collection.JavaConversions._
import org.junit.Assert

class OpenCalaisMapperTest {

  val InputFile = "..."
    
  @Test
  def testAnalyzeFile(): Unit = {
    val mapper = new OpenCalaisMapper()
    val resp = mapper.map(new File(InputFile))
    Console.println("== Entities ==")
    val entities = mapper.entities(resp)
    entities.foreach(entity => Console.println("%s/%s (%s)".format(
        entity.getField("name"), entity.getField("_type"), 
        entity.getField("relevance"))))
    Assert.assertTrue(entities.size > 0)
    Console.println("== Topics ==")
    val topics = mapper.topics(resp)
    topics.foreach(topic => Console.println("%s (%s)".format(
      topic.getField("categoryName"), topic.getField("score"))))
    Assert.assertTrue(topics.size > 0)
  }
}

Entities and Topics (along with their scores) returned by OpenCalais are shown below. For Entities, first the matched text is shown, followed by a slash and the type of entity, followed by a score signifying the confidence of that assignment in parenthesis. For topics, the topic name is followed by the score in parenthesis.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
== Entities ==
Motrin/Product (0.106)
Accutane/Product (0.262)
foot and ankle injury/MedicalCondition (0.111)
pain/MedicalCondition (0.513)
ALLERGIES/MedicalCondition (0.272)
Ankle pain/MedicalCondition (0.32)
HISTORY OF PRESENT ILLNESS/MedicalCondition (0.319)
Adderall/Product (0.262)
REVIEW OF SYSTEMS/Company (0.251)
Darvocet/Product (0.058)
injuries/MedicalCondition (0.339)
x-ray/Technology (0.116)

== Topics ==
Health_Medical_Pharma (1)

The Alchemy API is typed and requires multiple calls to the server to get different facets of information, such as entities and concepts (a concept in Alchemy seems to be similar to a topic in OpenCalais). Unlike OpenCalais there are no social tags - instead Alchemy returns sentiments. Alchemy documentation refers to a Java SDK but project on GitHub provides a nicer and cleaner API, so I used that instead. The code to contact the Alchemy API is shown below:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Source: src/main/scala/com/mycompany/scalcium/utils/AlchemyMapper.scala
package com.mycompany.scalcium.utils

import java.io.File
import scala.io.Source
import scala.collection.JavaConversions._
import org.apache.commons.lang.builder.ToStringStyle
import com.likethecolor.alchemy.api.Client
import com.likethecolor.alchemy.api.call.RankedNamedEntitiesCall
import com.likethecolor.alchemy.api.call.`type`.CallTypeText
import com.likethecolor.alchemy.api.params.NamedEntityParams
import com.likethecolor.alchemy.api.entity.NamedEntityAlchemyEntity
import com.likethecolor.alchemy.api.call.RankedConceptsCall
import com.likethecolor.alchemy.api.entity.ConceptAlchemyEntity
import com.likethecolor.alchemy.api.call.SentimentCall
import com.likethecolor.alchemy.api.entity.SentimentAlchemyEntity

class AlchemyMapper {

  val MyApiKey = "..."
  val client = new Client(MyApiKey)
  
  def entities(file: File): List[NamedEntityAlchemyEntity] = {
    val text = toText(file)
    val params = new NamedEntityParams()
    params.setIsCoreference(true)
    params.setIsDisambiguate(true)
    params.setIsLinkedData(true)
    params.setIsQuotations(true)
    params.setIsSentiment(true)
    params.setIsShowSourceText(true)
    val theCall = new RankedNamedEntitiesCall(new CallTypeText(text), params)
    val resp = client.call(theCall)
    resp.iterator.toList
  }
  
  def topics(file: File): List[ConceptAlchemyEntity] = {
    val text = toText(file)
    val theCall = new RankedConceptsCall(new CallTypeText(text))
    val resp = client.call(theCall)
    resp.iterator.toList
  }
  
  def sentiments(file: File): List[SentimentAlchemyEntity] = {
    val text = toText(file)
    val theCall = new SentimentCall(new CallTypeText(text))
    val resp = client.call(theCall)
    resp.iterator.toList
  }
  
  def toText(file: File): String = {
    val source = Source.fromFile(file)
    val text = source.mkString
    source.close
    text
  }
}

The JUnit test to call this code and extract and print entities and concepts from the above text is shown below:

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Source: src/test/scala/com/mycompany/scalcium/utils/AlchemyMapperTest.scala
package com.mycompany.scalcium.utils

import org.junit.Test
import java.io.File
import org.junit.Assert

class AlchemyMapperTest {

  val InputFile = "..."

  @Test
  def testAnalyzeFile(): Unit = {
    val mapper = new AlchemyMapper()
    Console.println("== Entities ==")
    val entities = mapper.entities(new File(InputFile))
    entities.foreach(entity => Console.println("%s/%s (%5.3f)".format(
      entity.getText(), entity.getType(), entity.getScore())))
    Assert.assertTrue(entities.size > 0)
    Console.println("== Topics ==")
    val topics = mapper.topics(new File(InputFile))
    topics.foreach(topic => Console.println("%s (%5.3f)".format(
      topic.getConcept(), topic.getScore())))
    Assert.assertTrue(topics.size > 0)
  }
}

The results of analyzing the text using the Alchemy API is shown below. As before, the entities are the matched text and entity type separated by a slash and the confidence score in parenthesis. The concepts correspond to text items that match up with concepts in Alchemy's taxonomy, followed by a confidence score.

1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
== Entities ==
Achilles tendon/Anatomy (0.766)
medial malleolus/Anatomy (0.746)
Adderall/Drug (0.734)
respirations/Person (0.733)
basketball/Sport (0.723)
ILLNESS/HealthCondition (0.712)
blood pressure/FieldTerminology (0.683)
ecchymosis/HealthCondition (0.678)

== Topics ==
Ankle (0.976)
Medical history (0.666)
Medicine (0.618)
Foot (0.594)
Physical examination (0.572)
Injuries (0.566)
Pulse (0.546)
Decision theory (0.461)

From the results, it does appear that neither platform is perfect and that each has their own strengths. OpenCalais seems to be better at capturing medication names, while Alchemy API appears to be better at capturing body parts and disease names.

And thats all I have for today. I have been meaning to take a look at these APIs, at the very least to see how they stacked up against what we did, but I never got around to it. In fact, I remembered about this thanks to an email from OpenCalais asking me if I still wanted to be part of the OpenCalais community given that I hadn't done anything with it since I signed up :-).

Be the first to comment. Comments are moderated to prevent spam.