July 13, 2010

CDATA, CDATA Run, Run Data Run

Recently the topic of JAXB's handling of CDATA has come up on a few separate Twitter messages (tweets). In this post I will describe how to handle CDATA using EclipseLink JAXB (MOXy).


I will use the following object model, the "bio" property represents some HTML markup the customer can provide to describe themselves:

package blog.cdata;

import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement(name="c")
public class Customer {

   private String bio;

   public void setBio(String bio) {
      this.bio = bio;
   }

   public String getBio() {
      return bio;
   }

}

JAXB unmarshals the XML as expected, and when the object is marshalled the necessary characters are escaped. The following code:

package blog.cdata;

import java.io.StringReader;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Marshaller;
import javax.xml.bind.Unmarshaller;
 
public class Demo {
 
   public static void main(String[] args) throws Exception {
      JAXBContext jc = JAXBContext.newInstance(Customer.class);
 
      Unmarshaller u = jc.createUnmarshaller();
      String xml = "<c><bio><![CDATA[<html>...</html>]]></bio></c>";
      Customer c = (Customer) u.unmarshal(new StringReader(xml));
      System.out.println("Unmarshal: " + c.getBio());
 
      Marshaller m = jc.createMarshaller();
      m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
      System.out.print("Marshal: ");
      m.marshal(c, System.out);
   }

}



Will produce this output:


Unmarshal: <html>...</html>
Marshal: <customer><bio>&lt;html>...&lt;/html></bio></customer> 

However sometimes you just need things to happen exactly the way you want. EclipseLink JAXB (MOXy) allows you to enable a node to use CDATA. You could use MOXy's binding file to indicate that the bio property should use CDATA. The binding file would look like:

<?xml version="1.0" encoding="UTF-8"?>
<xml-bindings xmlns="http://www.eclipse.org/eclipselink/xsds/persistence/oxm">
   <java-types>
      <java-type name="example.Customer">
         <java-attributes>
            <xml-element java-attribute="bio" cdata="true"/>
         </java-attributes>
      </java-type>
   </java-types>
</xml-bindings>

To use this external file, we modify the orignal example slightly.

package blog.cdata;

import java.io.StringReader;
import java.util.HashMap;
import java.util.Map;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;
import javax.xml.bind.Unmarshaller;
import javax.xml.transform.stream.StreamSource;

public class Demo {

   public static void main(String[] args) throws JAXBException {
      Map oxm = new HashMap(1);
      oxm.put("example", new StreamSource("oxm.xml"));

      Map props = new HashMap(1);
      props.put("eclipselink-oxm-xml", oxm);

      Class[] classes = {Customer.class};
      JAXBContext jc = JAXBContext.newInstance(classes, props);

      Unmarshaller u = jc.createUnmarshaller();
      String xml = "<c><bio><![CDATA[<html>...</html>]]></bio></c>";
      Customer c = (Customer) u.unmarshal(new StringReader(xml));

      System.out.println("Unmarshal: " + c.getBio());

      Marshaller m = jc.createMarshaller();
      m.setProperty(Marshaller.JAXB_FRAGMENT, true);
      System.out.print("Marshal: ");
      m.marshal(c, System.out);

   }

}

Now the following output will be produced:

Unmarshal: <html>...</html>
Marshal: <c><bio><![CDATA[<html>...</html>]]></bio></c>


To use MOXy as your JAXB implementation you need to add a jaxb.properties file in with your model classes with the following entry:


javax.xml.bind.context.factory=org.eclipse.persistence.jaxb.JAXBContextFactory


In EclipseLink 2.2 we will add an annotation to enable this behavior, so the following will be possible:


package blog.cdata;

import javax.xml.bind.annotation.XmlRootElement;
import org.eclipse.persistence.oxm.annotations.XmlCDATA;

@XmlRootElement(name="c")
public class Customer {

   private String bio;

   @XmlCDATA
   public void setBio(String bio) {
      this.bio = bio;
   }

   public String getBio() {
      return bio;
   }

}


You can try the @XmlCDATA annotation now by downloading an EclipseLink 2.2 nightly build from:

19 comments:

  1. Thank you, this was very useful !

    However, it seems that if the 'bio' string contains an inner CDATA section, the object will not be properly marshalled by MOXy (the inner CDATA section is not properly escaped). The generated XML is then not valid.

    Please see "Uses of CDATA sections" at http://en.wikipedia.org/wiki/CDATA

    ReplyDelete
  2. Can you clarify your use case. If I modify the input to:

    <c><bio>Before<![CDATA[<html>...</html>]]>After</bio></c>

    then I see the following as output:

    <c><bio><![CDATA[Before<html>...</html>After]]></bio></c>

    ReplyDelete
  3. If I insert the following line of code in the second version of the Demo class at line 26:

    c.setBio("<![CDATA[]]>");

    I would expect to get the following output:

    <c><bio><![CDATA[<![CDATA[]]]]><![CDATA[>]]></bio></c>

    but I get

    <c><bio><![CDATA[<![CDATA[]]>]]></bio></c>

    which is not well formed.

    ReplyDelete
  4. Thanks for identifying that issue. I have opened the following bug for it (https://bugs.eclipse.org/322358). Do you see this as a common problem, or more a matter of completeness?

    ReplyDelete
  5. I suppose it mostly depends on the domain you work in. I would then say it is more a matter of completeness.

    ReplyDelete
  6. Hi Blaise

    It seem you could be able to answer the following question.

    I have an attribute containing '&lt;' and I want JAXB to unmarshal these. The problem is that I can not define an attribute as CDATA, so when unmarshalled the unmarshaller (JAXB2.0-javax.xml.bind.Unmarshaller) converts the '&lt;' to the one character '<'. But I do not want any charaters to be converted.

    It works fine when used in tags which can be defined as CDATA.

    I have tried all sorts of stuff incl. trying with DTD to define an attribute as CDATA but that does not work for the unmarshaller. I also tried setting a '@XmlCDATA' on the getters and setters of my Java class containing the attribute but that does not do the trick either.

    I keep getting the '&lt;' converted to '<'.

    Can you help on this?

    ReplyDelete
  7. Hi Erik,

    Off hand I'm not sure how to prevent "<" from being converted to "<". Stack Overflow may be a good place to get an answer on this one.

    -Blaise

    ReplyDelete
  8. Hi Blaise. great blog, I use it a lot. Learned about JAXB&REST here too and other neat stuff.

    I'm at a javax.xml.bind here...

    I have created your Customer class (the @XmlCDATA example using 2.2.0), added it to my model classes, marshaled it and returned it to the browser and everything works, when I hit 'view source' I can see the String is wraped in a CDATA block.

    however, I could not get it to work with the other model classes in the same package, and wanted to ask the following:
    1.Is it ok to use both @XmlCDATA and @XmlElement on the same element?
    2.Do I need to define the containing elements as @XmlCDATA elements too? e.g. if Customer was an element in a Customers @XmlRootElement and Customers had 'plain' JAXB annotations would it still work?
    3.Is there anywhere at all (I've seen the unofficial tutorial in java.net, the MOXy tutorial in eclipse's website, and the API) where I can get a substantial example for more than a 'hello world' or small out of context examples or tutorial that explains the way EclipseLink MOXy and/or JAXB work?

    just my 2 cents:
    I seriously appreciate the effort you guys put in here, but I keep seeing unanswered cries for help everywhere(java .net for instance), a decent amount of confusion on how to use annotation frameworks such as Jersey and MOXy, and a certain unfulfilled need for support. I think ultimately this does not serve the purpose of the open source community, and contributes to already widespread misconceptions about how Java is overly complicated and getting too big to carry itself.
    I thank you again for the blog and other contributions and am looking forward to your answers.

    ReplyDelete
  9. Hi Avi,

    "I'm at a javax.xml.bind here...", nice :).

    1. Yes, it is okay to use both @XmlCDATA and @XmlElement on the same field/property.

    2. The containing elements do not need to be marked as @XmlCDATA.

    3. It has hard to find the right balance in the documentation. I have tried to aim the blog posts at a level beyond "Hello World", while at the same time focusing on only a few concepts. This material is flowing back into the EclipseLink MOXy (JAXB) User's Guide. The content has been mostly user driven, so feedback is very appreciated.

    Agreed there are too many unanswered questions on the forums. I am mainly active on the Oracle (TopLink), Eclipse (EclipseLink), and Stack Overflow forums. I now have my RSS reader pointed at the java.net forum.

    -Blaise

    ReplyDelete
  10. Thank you Blaise, it is really great to hear that.
    Those forums are bound to get some more passers-by now! great for the community and for aspiring pros like myself.
    I eventually got it to work, by commenting the addition that my POJO had(the only thing different from what your example contained).
    I had the @XmlAccessorType(XmlAccessType.FIELD)
    annotation on it, which caused it not to work.
    Can you please explain why this was the case, and maybe why I would use this property?

    ReplyDelete
  11. Hi Blaise,
    I posted a comment about Cdata not getting generated on our build server. It was mainly due to some configuration issue due to which jaxb.properties was not getting pushed to our build correctly. I just found this and it is resolved. Pls. ignore my yesterday's comment. Thanks.

    -Dheeru

    ReplyDelete
  12. Hi Blaise,

    I read an xml that contains html text that was supposed to be wrapped in CDATA[[ ]] but it is not (provider's bug). I would still like my jaxb to parse it as CDATA. Do you know a way to disable the html entity parsing for a pojo field?

    Thanks,
    Alexey

    ReplyDelete
  13. Hi Alexey,

    You could try using @XmlAnyElement with a DomHandler to preserve the XML context as a String. Below is a link to an example:
    - @XmlAnyElement and non-DOM Properties

    -Blaise

    ReplyDelete
  14. That blog post was very helpful. Thanks.

    ReplyDelete
  15. Anyone knows why standard jaxb doesnt support this?

    ReplyDelete
    Replies
    1. You should submit a feature request to the Users mailing list for the JAXB reference implementation:

      - http://java.net/projects/jaxb/lists/

      -Blaise

      Delete
  16. Thanks for the help! I keep getting the following exception:

    javax.xml.bind.JAXBException: property "eclipselink-oxm-xml" is not supported

    My properties file is setup here:
    InputStream meta = Thread.currentThread().getContextClassLoader().getResourceAsStream("oxm.xml");
    Map metadata = new HashMap();
    metadata.put("eclipselink-oxm-xml", new StreamSource(meta));

    Any thoughts. I have oxm.xml file in the same directory.

    ReplyDelete
    Replies
    1. Hi Jonathan,

      It appears as though you do not included a jaxb.properties file to specify MOXy as the JAXB provider (see: Specifying EclipseLink MOXy as your JAXB Provider). Below is a link to full example that may help.
      - https://github.com/bdoughan/blog20110908

      -Blaise

      Delete