Parsing XML BLAST output with Expat ----------------------------------- blast_parse.c is a modification of the demo program outline.c to show how Expat might be used to parse XML output produced by BLAST. The input is an XML file produced by BLAST (on stdin). Output is on stdout and is (for every HSP produced by BLAST): query-identifier database-description expect-value database-identifier One line per HSP. Each of the above lines is separated by a blank line to make reading easier. This is only an example XML parser, feel free to modify as needed to output other data. There are three static functions in the file blast_parse.c that you might wish to modify in order to see different data: 1.) static void start(void *data, const char *el, const char **attr) This is the handler for start tags. If a tag is recognized in this function, then a global variable is set. 2.) static void text_handler(void *data, const XML_Char *s, int len) This the handler for text (PCDATA in between tags). If a global variable was set in the start handler that should trigger action in this function. Generally the value is saved somewhere and the global variable is set to zero again. 3.) static void end(void *data, const char *el) This is the handler for end tags. I used this function to trigger a print statement of the desired information to stdout. I did this every time we finished parsing an "Hsp" element. This implementation makes use of global variables and is not very pretty. There are functions XML_SetUserData and XML_GetUserData (in the Expat library) that can be used to avoid the use of global variables. I didn't take the time to do this yet. There is a file reference.html in the doc directory when you dowload the expat sources that describes Expat in detail. Please do not blame Clark Cooper or James Clark for any bugs or ugliness in the file blast_parse.c. [NOTE: the following directions have been tested with version 1.95.6 of expat under LINUX] To compile this application download Expat from http://sourceforge.net/projects/expat/ uncompress, dearchive and "configure" expat per the included README's. Then copy blast_parse.c to the "examples" directory. There is one Makefile in the top level expat directory that makes the libs as well as the examples. Edit this Makefile to include blast_parse. The easiest way to do this is to copy the lines for one target (say element) and change the word "element" to "blast_parse". At that point you should invoke "make" from the top level directory. The expat library will be built, as well as blast_parse in the "examples" directory. To run blast_parse with an XML input file called "input.xml" invoke: ./blast_parse < input.xml The sample input "input.xml" is included with this package. More BLAST XML output (input to blast_parse) may be obtained by running searches on the NCBI Web pages or with stand-alone BLAST. Please send questions/comments about this file to madden@ncbi.nlm.nih.gov. Please send questions about running BLAST to blast-help@ncbi.nlm.nih.gov. Last modified by Tom Madden (National Center for Biotechnology Information) 7/31/2003