Thursday, January 14, 2010

Optimizing Atom feed parsing with Apache Abdera

I chose Apache Abdera as my Atom processor of choice for a number of small projects. Skipping the processing of unwanted XML elements inside an Atom feed is the most basic optimization for these applications.

For one of these applications, a statistics aggregator of sorts, there was no need to look into the summary and raw contents of each entry. Enter the Apache Abdera built-in filter support, through which one can instruct the parser to only accept or ignore certain entry elements.

The samples in the Abdera wiki didn’t quite match the public Javadocs, so I ended up writing my own version of what the wiki described as a black list filter:

Abdera abdera = new Abdera();
Parser abderaParser = abdera
ParserOptions defaultParserOptions = abderaParser.getDefaultParserOptions();

FavoriteParseFilter fpf = new FavoriteParseFilter();


where FavoriteParseFilter is defined like this:

public class FavoriteParseFilter implements org.apache.abdera.filter.ParseFilter{

private static final QName CONTENT_QNAME =
new QName("", "content");

private static final QName SUMMARY_QNAME =
new QName("", "summary");

* (non-Javadoc)
* @see org.apache.abdera.filter.ParseFilter#acceptable(javax.xml.namespace.QName)

public boolean acceptable(QName n) {
result = !(n.equals(CONTENT_QNAME) ||
return result;

Results may vary, but I observed a gain of at least 25% in overall throughput using a simple application fetching a remote feed with entries about 2Kb in size.

No comments:

Post a Comment