google
yahoo
bing

Upcoming Classes

RSS Feeds

Categories

Archive

Site search

Parsing XML in Cocoa

This morning, I read Kevin Hoffman’s post on parsing XML, and it is all wrong. Well, except for the part that he copied from my book. That part is right.

In the book, I parse the XML using NSXMLDocument and set the identifiers on the columns of the table views to be the XPath’s of the data I want displayed there. This results in two things: very, very few lines of code and excellent performance.

But Kevin argues “that’s tightly coupled HACKing.” He has a point. (Although the line “Please, for the love of all that is decent about Cocoa [don't follow Aaron's example]” seemed a bit melodramatic to me.)

The solution he implies is the following: Make a whole syntax tree using NSXMLDocument, and then wander the tree picking out the parts you want and make another tree like it out of domain-specific objects.

This is the wrong solution. If you are willing to write all the code necessary to make domain-specific classes to hold the data from an XML parse, use the low-level parser NSXMLParser. There are three benefits over Kevin’s approach:

  • Performance: it is faster and uses less memory
  • Portability: NSXMLDocument isn’t on the iPhone, NSXMLParser is
  • Swappability: Once you have collection SAX-like callbacks, you can move to expat or libxml2 without too much problem

Why would you want to use expat or libxml2? Because NSXMLParser (and NSXMLDocument, for that matter) doesn’t handle the sexier aspects of XML, like xinclude.

What does it look like to use NSXMLParser? To use the low-lever parser in the Amazone example from the book, you would create a class like this:

#import 
@class Book;

@interface AmazonParser : NSObject
{
    NSMutableArray *items;
    Book *bookInProgress;
    NSString *keyInProgress;
    NSMutableString *textInProgress;
}
- (BOOL)parseData:(NSData *)d;
- (NSArray *)items;
@end

The class has a set of methods that get called as the parser goes through the file:

#import "AmazonParser.h"
#import "Book.h"

static NSSet *interestingKeys;

@implementation AmazonParser

+ (void)initialize
{
    if (!interestingKeys) {
        interestingKeys = [[NSSet alloc] initWithObjects:@"Title",
                                          @"DetailPageURL", nil];
    }
}

- (void)dealloc
{
    [items release];
    [super dealloc];
}

- (BOOL)parseData:(NSData *)d
{
    // Release the old itemArray
    [items release];

    // Create a new, empty itemArray
    items = [[NSMutableArray alloc] init];

    // Create a parser
    NSXMLParser *parser = [[NSXMLParser alloc] initWithData:d];
    [parser setDelegate:self];

    // Do the parse
    [parser parse];

    [parser release];

    NSLog(@"items = %@", items);
    return YES;
}

- (NSArray *)items
{
    return items;
}

#pragma mark Delegate calls

- (void)parser:(NSXMLParser *)parser
didStartElement:(NSString *)elementName
  namespaceURI:(NSString *)namespaceURI
 qualifiedName:(NSString *)qName
    attributes:(NSDictionary *)attributeDict
{
    NSLog(@"starting Element: %@", elementName);

    // Is it the start of a new item?
    if ([elementName isEqual:@"Item"]) {

        // Create a dictionary for the title/url for the item
        bookInProgress = [[Book alloc] init];
        return;
    }

    // Is it the title/url for the current item?
    if ([interestingKeys containsObject:elementName]) {
        keyInProgress = [elementName copy];
        // This is a string we will append to as the text arrives
        textInProgress = [[NSMutableString alloc] init];
    }
}

- (void)parser:(NSXMLParser *)parser
 didEndElement:(NSString *)elementName
  namespaceURI:(NSString *)namespaceURI
 qualifiedName:(NSString *)qName
{
    NSLog(@"ending Element: %@", elementName);

    // Is the current item complete?
    if ([elementName isEqual:@"Item"]) {
        [items addObject:bookInProgress];

        // Clear the current item
        [bookInProgress release];
        bookInProgress = nil;
        return;
    }

    // Is the current key complete?
    if ([elementName isEqual:keyInProgress]) {
        if ([elementName isEqual:@"DetailPageURL"]) {
            [bookInProgress setDetailPage:textInProgress];
        } else {
            [bookInProgress setTitle:textInProgress];

        }
        // Clear the text and key
        [textInProgress release];
        textInProgress = nil;
        [keyInProgress release];
        keyInProgress = nil;
    }
}

// This method can get called multiple times for the
// text in a single element
- (void)parser:(NSXMLParser *)parser
foundCharacters:(NSString *)string
{
    [textInProgress appendString:string];
}
@end

Once again, this is faster and uses less memory than building up an entire NSXMLDocument tree just to tear it down again. This class will work on the iPhone. And it does have domain-specific data-bearing classes which will take care of Kevin’s fear of generic XML trees.

But, if you don’t have Kevin’s issues, I don’t see any problem using NSXMLNode and XPaths directly. It will save you a lot of code and the performance is great. Furthermore, NSXMLNode knows how to generate XML, which can save you a bunch of time. Finally, NSXMLNode conforms to the NSCopying protocol, so you can make easily make copies of subtrees.

Let me slip in one last rant: All this stuff is really weak, and that is Apple’s fault. The WebServices framework is quite unusable. NSXMLParser is pathetic when compared to libxml2. These are crucial technologies, and Apple needs to devote serious resources to making them decent, if not brilliant, if they want the Mac or the iPhone to be a great platform for innovation.

Comments

Comment from Kevin Hoffman
Time: July 9, 2008, 5:53 pm

Aaron, first – you need to see some of the humor in my blog posts. I tend to exaggerate so the melodrama was entirely sarcastic, and I realize that sarcasm doesn’t translate on the internet.

My point wasn’t the use of the NSXMLDocument – I know there are faster XML parsers. My point was that by putting an XPath statement directly in the Interface Builder bindings, you have prevented that GUI from working with anything other than an XML base. You can’t bind your data to an alternate source and you don’t have the ability to have a model object encapsulate to allow you to perform additional work on the web service results.

You’re right in that the average Joe probably doesn’t give a crap about that kind of re-use or flexibility… but when I’m building enterprise apps and I’m doing so with short deadlines, an agile development paradigm, and a history of having been burned by tight coupling before, I just felt that using XPath inside the column identifier was a litte too “hello world” for me and not practical enough for real-world apps.

Certainly didn’t mean to discount your work. Your Cocoa book is the bible that sits next to my desk no matter where I go.

Comment from Chris Ryland
Time: July 9, 2008, 6:05 pm

I share your instinct about making XML support better, but with libxml2 around, what’s the real incentive? I suppose they could wrap it with Objective-C goodness, but even that seems a bit redudant.

The other thing is that XML is important like ASCII is important: it’s a useful interchange format, and now ubiquitous, but it no longer seems poised to take over the world.

I.e., it’s found its niches, and seems to be stuck there.

At least in my myopic view of the world…

Comment from Aaron Hillegass
Time: July 9, 2008, 8:36 pm

Kevin,

I like your blog — I think it is a nice way to reach out to the .net people of the world. I just think you were trying to speak with authority on a topic that has a history that you are not aware of.

There are certainly good points on both sides of this discussion, but it is the same discussion that we have been having for 20 years. When we started it was NSDictionarys vs. Custom Objects, then it was EOGenericRecords vs Custom Objects, then it was NSXMLNodes vs CustomObjects, then it was NSManagedObjects vs Custom Objects. Same question: What are the trade-offs of using generic data-bearing classes? (And, as a side-effect, binding our views to such generic data-bearing objects.)

While the arguments for custom objects are obvious, let us not forget plists. The plist is a perfect example of the elegance that can come from generic data-bearing classes. You can often recognize the experience of a Cocoa programmer by how he wields his plists.

You acted as if using the generic NSXMLNode (and putting the XPath in the identifier of the table view) was stupid. This was insulting, but more importantly, incorrect. There are many contexts where this is the most flexible, reliable, and reusable solution.

My point was that by putting an XPath statement directly in the Interface Builder bindings, you have prevented that GUI from working with anything other than an XML base.

I understood your point and acknowledged it: “He has a point.” But even if we assume that the classic three-tiered design is infinitely better than the design I presented, your post didn’t do a decent job explaining how one would do the three-tiered design in Cocoa. This posting corrected that deficiency.

Comment from Jonathan Wight
Time: July 9, 2008, 10:16 pm

You wrote: “Portability: NSXMLDocument isn’t on the iPhone, NSXMLParser is”

http://code.google.com/p/touchcode/wiki/TouchXML

A workalike NSXMLDocument style API that fills covers just enough of NSXMLDocument to be useful. Enjoy!

Comment from Jyda
Time: July 14, 2008, 3:56 pm

Does this code leak a NSXMLParser? I see it alloc’d :

NSXMLParser *parser = [[NSXMLParser alloc] initWithData:d];

But I do not see it released?

Comment from Administrator
Time: July 14, 2008, 6:25 pm

Jyda, you are correct. I forgot to release the parser. That is fixed now. (Arg. I hate it when I make mistakes when I am correcting other people’s mistakes!)

Thanks.

Comment from Charles Brian Quinn
Time: July 17, 2008, 10:56 am

*the sexier aspects of XML*

Now there’s a first!

Comment from Hal Mueller
Time: August 2, 2008, 1:30 pm

This blog post was well timed. I was just beginning to tinker with XML for the first time, and I knew that I wanted to be iPhone compatible. I had seen Brent Simmons’s writeup on the libxml parser, and his example code:
http://inessential.com/?comments=1&postid=3489

But Aaron’s example seemed much simpler, so I decided to try that first.

It took me 58 minutes (I timed it) from the moment I started to modify my project until I had working stub methods to process each element of the admittedly rather simple XML document I needed to parse. In another 2 hours the stub code was fleshed out and all of my instance variables were being correctly populated.

One thing I particularly like about this approach is that it forced me to think about the RSS feed and about the structure of the object I was building. This left me with cleaner code, and less of it.

For the searchers, I’ll mention that the Google Mac GData toolkit, available at
http://code.google.com/p/gdata-objectivec-client/
also includes XML document handling. Our Seattle Mac developer group had a nice talk on that toolkit last week from Google’s own Greg Robbins.

However, for the 2 projects on my immediate horizon which will use XML inputs, I will stick with this NSXMLParser/delegate approach.

Comment from Farshid
Time: August 25, 2008, 2:11 am

So I am really confused about NSXmlParser functionalities.
I am trying to modify xml elements and their attributes using NSXmlParser which according to apple’s documentation should be feasible.
Is there anyway to do so using iPhone’s current SDk and libraries?

gdata-objectivec also exposes functions to parse and xml , not to modify it!.

Comment from James Stanier
Time: September 14, 2008, 10:20 am

I’ve used this example countless times now when using NSXMLParser – it’s a massive help and a fantastic little framework to get started. Thanks very much for writing it up.

Comment from Arthur Clemens
Time: October 3, 2008, 4:22 pm

With some xml documents you need to set [parser setShouldResolveExternalEntities:YES] (for example after [parser setDelegate:self]).

Comment from Rahulkumar
Time: November 18, 2008, 4:06 am

NSXML parser cannot parse through xml files using special characters

Comment from Martin
Time: December 12, 2008, 9:42 am

Does someone know how I could deal with xml chunks where nested tags (on different levels) are sharing the same name?

Comment from Ted
Time: January 8, 2009, 10:35 pm

Thanks for the tutorial.

Why do you copy the elementName instead of retaining it?

keyInProgress = [elementName copy];

I can never get straight when you would want to copy and when you would want to retain.

Comment from Kaveh
Time: May 26, 2009, 1:41 am

Martin,

You could use another delegate object for handling other levels of the XML if there is naming conflicts. Just switch the delegate of the NSXMLParser object by messaging [parser setDelegate:levelTwoDelegate]. Just remember to switch back to the previous delegate when you know you are coming out of that inner level of XML tree. Or you could handle state in some other way like using flags.

Comment from ludo
Time: October 21, 2009, 4:30 am

The tutorial is really good.

I wanna ask if someone know how to deal with Epub file. Not create them but how to read them in Phone. Any clue or code available?

Thanks

Comment from Paresh Thakor
Time: November 12, 2009, 2:12 am

Hello friends..,

I am in search for framework / api for parsing and using .epub files on iPhone. Please let me know if anyone has an answer…

I’m building an iPhone application, which loads all epub features and book on iPhone screen.

Comment from anon_anon
Time: November 25, 2009, 4:52 pm

you might want to look at vtd-xml for best possible xpath query perfomrance

vtd-xml

Comment from Tom Bradley
Time: February 24, 2010, 5:07 am

Check out http://www.TBXML.co.uk for a super-fast, lightweight, easy to use XML parser!

Comment from Vancouver Web Design
Time: March 15, 2010, 1:46 am

Thanks a lot for a full iphone-XML tutorial. I missed my iphone class where they were dealing with this issue so I had to learn from your post basically :)

Thanks a lot

Comment from Ben Reeves
Time: May 24, 2010, 2:42 pm

If your parsing XML from a HTTP Stream you might be interested in my wrapper for expat. There is no delay in waiting for the XML file to sully download + it’s a drop in replacement for NSXMLParser http://benreeves.co.uk/iphone-expat-xml-parser-wrapper/

Comment from Robert Simpson
Time: July 18, 2010, 6:43 am

Good article – thanks for sharing it with us.

I have a question which probably illustrates my lack of understanding of memory management but I would really like to try and resolve. It concerns the following bit of code:

// Create a parser
NSXMLParser *parser = [[NSXMLParser alloc] initWithData:d];
[parser setDelegate:self];

// Do the parse
[parser parse];

[parser release];

So we allocate the parser (its retain count is increased to 1), do the parsing and then release the parser (retain count down to zero). Now what is bothering me is that the parsing may be working away and then we call release on the instance before it is even finished. The only way I see this not being a problem is that we only call release when we have finished parsing – is this the case?

The reason I am bothered about this is that I have a custom class called ServerRequest which I can create much in the same way:

ServerRequest *serverRequest = [[ServerRequest alloc] init];
serverRequest.delegate = self;
[serverRequest sendHTTPrequestToServer];
[serverRequest release];

I’m a bit concerned about releasing the serverRequest instance when it hasn’t actually finished sending and receiving all the data. What is the best thing to do in this situation? Sorry if I’m asking silly questions.

Write a comment