Introduction to the Standalone WWW Blast server


Index

Introduction

This standalone WWW BLAST server suite of programs was designed similar to regular NCBI BLAST server and such command-line NCBI BLAST programs like "blastall", "blastpgp", "rpsblast" and "megablast". It incorporates most features, which exist in NCBI BLAST programs and should be relatively easy to use. This server does not support any request queuing and load balancing. As soon as user hit "Search" button, BLAST will start immediately if entered information is valid. So, this server is not intended to handle large load, which may exist in public service. Such queueing and loadbalancing however may be implemented using such products like Load Sharing Facility - "LSF" from Platform Computing Corporation. Interface to "LSF" was implemented in NCBI, however this was not included in this suite. Standalone server assumes, that user has its own BLAST or RPS-BLAST database, that should be searched and want to have simple WWW interface to this search. It is STRONLY recommended that user has experience to install and run standalone NCBI BLAST programs.

After files uncompressed server is ready to be used immediately. Any customizations to the program are welcomed and may be done by experienced programmers using source code, which also provided. Recompilation of the server executables require, that programmer has compiled NCBI toolkit libraries. This toolkit may be downloaded from NCBI FTP web site: ftp://ncbi.nlm.nih.gov

What's new in this revision?

Installation of the Standalone WWW server

After downloading file wwwblast.Your_platform.tar.gz to your computer place it into document directory of HTTPD server and uncompress it by

    gzip -d wwwblast.Your_platform.tar.gz
    tar -xvpf wwwblast.Your_platform.tar

Please note that parameter "p" in tar options is significant - it will preserve file access options stored in the distribution. Temporary directory for BLAST overview images (TmpGifs) should have 777 permission and logfiles (wwwblast.log and psiblast.log) should have 666.

After you uncompressed distribution file "blast" directory will be created. You can access sample BLAST HTML input forms using URLs:

This distribution comes with 2 BLAST databases: "test_aa_db" - sample protein database and "test_na_db" - sample nucleotide database. These databases configured to be searchable immediately with corresponding BLAST program.

Description of files in the distribution

blast.cgi

- BLAST search start-up C-shell file

blast_cs.cgi

- BLAST search start-up C-shell file for version with Entrez lookups

.nlmstmanrc

- Configuration file for the graphical overview image (do not edit!)

blast.html

- sample BLAST search input HTML form

blast_cs.html

- sample BLAST search input HTML form for version with Entrez lookups

megablast.html

- sample MEGABLAST search input HTML form

megablast_cs.html

- sample MEGABLAST search input HTML form for version with Entrez lookups

rpsblast.html

- sample RPS BLAST search input HTML form

rpsblast_cs.html

- sample RPS BLAST search input HTML form with Entrez lookups

blast.rc

- Default configuration file for the WWW BLAST server

psiblast.rc

- Default configuration file for the PSI/PHI WWW BLAST server

psiblast.cgi

- PSI/PHI BLAST search start-up C-shell file

psiblast_cs.cgi

- PSI/PHI BLAST search start-up C-shell file for version with Entrez lookups

psiblast.html

- sample PSI/PHI BLAST search input HTML form

psiblast_cs.html

- sample PSI/PHI BLAST search HTML form for version with Entrez lookups

psiblast.REAL

- Main PSI/PHI BLAST server executable

psiblast_cs.REAL

- Main PSI/PHI BLAST server executable for version with Entrez lookups

xmlblast.html

- sample form that produces only XML Blast Output

xmlblast.cgi

- sample start-up C-shell to produce XML content type

blast_form.map

- Auxiliary map file for the front BLAST image

nph-viewgif.cgi

- CGI program used to view and delete overview images

readme.html

- this documentation

wwwblast.log

- default logfile

psiblast.log

- default PSI/PHI Blast logfile

ncbi_blast.rc

- sample file for full NCBI set of databases

Configuration of BLAST databases

To set up databases for the standalone WWW BLAST server it is necessary to follow these steps:

  1. Copy file with concatenated FASTA entries, that will be used as a database into directory "./db"
  2. Run "formatdb" program to format this database there.
  3. Add name of the database into server configuration file
  4. Add name of the database into (PSI/PHI) WWW BLAST search form

PSI/PHI Blast notes

There is one significant feature of the PSI/PHI Blast server, that FASTA files as a source for BLAST databases should have GI number in the SeqId field. This is practically always TRUE for FASTA files from NCBI FTP site. Local databases may not be used with this version of PSI/PHI Blast unless they have ">gi|12345..." prefix in the definition line.

Databases for the PSI/PHI Blast should always be created with formatdb using "-o T" flag. Test database "test_aa_db" was created using this flag and database "test_na_db" was created without this flag.

If this distribution was installed not in "/blast" directory under HTTPD documents root directory, than path do the distribution should be set by environment WWW_ROOT_PATH in the file psiblast.cgi or psiblast_cs.cgi

Client/server version for Entrez lookup and taxonomy reports

Regular Blast, PSI/PHI Blast and MegaBlast have client/server versions for Entrez gi/accession lookups and printing Taxonomy reports. Configuration of client/server interface from the user to NCBI should be done as with any other client/server program to NCBI. If program "blastcl3" works without problems this server also should work OK. If user has firewall - default configuration will definitely fail to work properly and this case will require special configuration. General explanations of client/server configuration to NCBI is not subject of this discussion. In any case if user has problem with such configuration he should write to info@ncbi.nlm.nih.gov for farther assistance.

XML output

Possibility to produce XML output was added to this server. XML definition of BLAST output tied to the simple ASN.1 specification designed for this case. These definitions may be found in the directory ./Src/XML. Any recommendations on improvements to this (possible) standard may be sent to blast-help@ncbi.nlm.nih.gov or directly to me at shavirin@ncbi.nlm.nih.gov XML may be printed by setting "Alignment view" in blast.html or (blast_cs.html) page to "Blast XML". Resulted page will looks empty - but if you open page source (in Netscape - View -> Page source) - you will see complete XML document. As an example of XML usage sample page "xmlblast.html" was created. This form calls starter cshell script "xmlblast.cgi", which set Content-type of the output to ncbi/xmlblast. Now if browser can handle this type of content using plug-in or some helper application - it is possible to design custom BLAST output viewer.

RPS Blast

RPS Blast or "Reversed Position Specific BLAST" was implemented as very fast alternative to the program IMPALA. It has the same general objective - to search collection of conserved domains or motifs or profiles or HMMs - there are many names for this collection. In the contrary RPS Blast has completaly different implementation, that increased speed of profiles search from 10 to 100 times depending on search conditions in comarison with IMPALA. RPS Blast has translated variant, that allows to search DNA sequences against these conserved domains. Currently RPS Blast - is one of tools choosen to annotate human genome in NCBI and base for CDD Blast search page.

Databases for RPS blast are hardware dependent - for speed reasons. So they are different for big/little endian platforms.

To build RPS database it is necessary to follow procedure explained in the file "README.rps", that comes with this distribution. There is small RPS database available for testing. This database is a part of real NCBI database used in CDD search page. Full NCBI database available in platform-independent form from FTP site.

Out of Frame BLASTX

OOF alignment functions were origianlly written by Zheng Zhang in 1997, who worked that time in Penn State University and currently working for Paracel Inc. These functions were never used since then due to unability to handle different scales for query and subject of high scoring pairs (DNA scale vs. protein scale)

His original package included low-level OOF gapped alignment functions, those aligned whole (not-in-frame) DNA to protein.

This package now incorporated into regular BLAST API.

Due to the fact, that all currently existing alignment viewers can "live" only in "the same scale world" - protein/protein or DNA/DNA and cannot show alignemnt with different scales for sequences to be aligned - I had to implement custom viewer especially for OOF alignments. Also it was neccesary to create all trace of transforms from low level Zhang's traceback structures to standrand NCBI alignment structures with added new features to store frameshift information.

This viewer can show now only pairwise Traditional Blast output. XML output is not yet implemented.

Description of tags for main BLAST input page

This standalone server has analogos tag convention to regular NCBI BLAST server. Sample BLAST search forms may be changed to accommodate particular needs of the user in the custom search. Here is the list of these tags and their meaning. If some tag is missing from the search input page this will take default value. Exceptions are tags PROGRAM, DATALIB and SEQUENCE (or SEQFILE), that should always be set.

Server configuration file and logfile

Default configuration file is "blast.rc" and logfile "wwwblast.log". Setting tag WWW_BLAST_TYPE to specific value may change these names. This is useful if few different search input pages use the same CGI search engine, but significantly different by content and priorities. Here is sample configuration file comes with this distribution:

# Number of CPUs to use for a single request
#
NumCpuToUse     4
#
# Here is list of combination program/database, 
# that allowed by BLAST service. Format:    ...
#
blastn test_na_db
blastp test_aa_db
blastx test_aa_db
tblastn test_na_db
tblastx test_na_db

This file will set how many CPUs of computer will be used in the BLAST search and what databases may be used with what programs. Logfile currently store only limited information but also may be updated by programmers to store more values in it. Please note, that usually HTTPD servers run by accounts, that do not have write access to disk, so to write logfile - its permission should be set to 666.

How to debug WWW Blast programs

There is a way to debug these programs.
  1. Add line "setenv DEBUG_COMMAND_LINE TRUE" into *.cgi file (uncomment it)
  2. Run search, that results in the problem - this should create file "/tmp/__web.in" in the "/tmp" directory.
  3. Set in command line all necessary environment (BLASTDB at least) and run from command-line: "blast.REAL < /tmp/__web.in"
This should do your problematic search without WWW. If this resulted in coredump - you may look into corefile with:

dbx blast.REAL core

and then use command "where" to print stack.


Sergei Shavirin

Last modified: Fri Sep 1 13:51:15 EST 2000