I chose Apache Abdera as my Atom processor of choice for a number of small projects. Skipping the processing of unwanted XML elements inside an Atom feed is the most basic optimization for these applications.
For one of these applications, a statistics aggregator of sorts, there was no need to look into the summary and raw contents of each entry. Enter the Apache Abdera built-in filter support, through which one can instruct the parser to only accept or ignore certain entry elements.
The samples in the Abdera wiki didn’t quite match the public Javadocs, so I ended up writing my own version of what the wiki described as a black list filter:
Abdera abdera = new Abdera();where FavoriteParseFilter is defined like this:
Parser abderaParser = abdera.getParser();
ParserOptions defaultParserOptions = abderaParser.getDefaultParserOptions();
FavoriteParseFilter fpf = new FavoriteParseFilter();
defaultParserOptions.setParseFilter(fpf);
abderaParser.setDefaultParserOptions(defaultParserOptions);
Results may vary, but I observed a gain of at least 25% in overall throughput using a simple application fetching a remote feed with entries about 2Kb in size.public class FavoriteParseFilter implements org.apache.abdera.filter.ParseFilter{
private static final QName CONTENT_QNAME =
new QName("http://www.w3.org/2005/Atom", "content");private static final QName SUMMARY_QNAME =
new QName("http://www.w3.org/2005/Atom", "summary");/*
* (non-Javadoc)
*
* @see org.apache.abdera.filter.ParseFilter#acceptable(javax.xml.namespace.QName)
*/
public boolean acceptable(QName n) {
boolean result = !(n.equals(CONTENT_QNAME) ||
n.equals(SUMMARY_QNAME));
return result;
}


0 comments:
Post a Comment