Thursday, January 14, 2010

Optimizing Atom feed parsing with Apache Abdera

I chose Apache Abdera as my Atom processor of choice for a number of small projects. Skipping the processing of unwanted XML elements inside an Atom feed is the most basic optimization for these applications.

For one of these applications, a statistics aggregator of sorts, there was no need to look into the summary and raw contents of each entry. Enter the Apache Abdera built-in filter support, through which one can instruct the parser to only accept or ignore certain entry elements.

The samples in the Abdera wiki didn’t quite match the public Javadocs, so I ended up writing my own version of what the wiki described as a black list filter:

Abdera abdera = new Abdera();
Parser abderaParser = abdera
.getParser();
ParserOptions defaultParserOptions = abderaParser.getDefaultParserOptions();

FavoriteParseFilter fpf = new FavoriteParseFilter();

defaultParserOptions.setParseFilter(fpf);

abderaParser.setDefaultParserOptions(defaultParserOptions);
where FavoriteParseFilter is defined like this:

public class FavoriteParseFilter implements org.apache.abdera.filter.ParseFilter{

private static final QName CONTENT_QNAME =
new QName("http://www.w3.org/2005/Atom", "content");

private static final QName SUMMARY_QNAME =
new QName("http://www.w3.org/2005/Atom", "summary");

/*
* (non-Javadoc)
*
* @see org.apache.abdera.filter.ParseFilter#acceptable(javax.xml.namespace.QName)
*/

public boolean acceptable(QName n) {
boolean
result = !(n.equals(CONTENT_QNAME) ||
n.equals(
SUMMARY_QNAME));
return result;
}

Results may vary, but I observed a gain of at least 25% in overall throughput using a simple application fetching a remote feed with entries about 2Kb in size.

No comments:

Post a Comment