  ------------------------------------------------------------------------

               Introduction to the Standalone WWW Blast server

  ------------------------------------------------------------------------

Index

   * Introduction
   * What's new in this revision?
   * Installation of the Standalone WWW server
   * Description of files in the distribution
   * Configuration of BLAST databases
   * PSI/PHI Blast notes
   * Client/server versions for Entrez lookup and taxonomy reports
   * Blast 2 sequences
   * XML output
   * Out of Frame BLASTX
   * RPS Blast
   * Description of tags for main BLAST input page
   * Server configuration file and logfile
   * How to debug WWW Blast programs

Introduction

This standalone WWW BLAST server suite of programs was designed similar to
regular NCBI BLAST server and such command-line NCBI BLAST programs like
"blastall", "blastpgp", "rpsblast" and "megablast". It incorporates most
features, which exist in NCBI BLAST programs and should be relatively easy
to use. This server does not support any request queuing and load balancing.
As soon as user hit "Search" button, BLAST will start immediately if entered
information is valid. So, this server is not intended to handle large load,
which may exist in public service. Such queueing and loadbalancing however
may be implemented using such products like Load Sharing Facility - "LSF"
from Platform Computing Corporation. Interface to "LSF" was implemented in
NCBI, however this was not included in this suite. Standalone server
assumes, that user has its own BLAST or RPS-BLAST database, that should be
searched and want to have simple WWW interface to this search. It is STRONLY
recommended that user has experience to install and run standalone NCBI
BLAST programs.

After files uncompressed server is ready to be used immediately. Any
customizations to the program are welcomed and may be done by experienced
programmers using source code, which also provided. Recompilation of the
server executables require, that programmer has compiled NCBI toolkit
libraries. This toolkit may be downloaded from NCBI FTP web site:
ftp://ncbi.nlm.nih.gov

What's new in this revision?

 May, 2 2001           * No major changes. Programs were recompilated and
                         made synchronous to the latest NCBI tookit
                         release.
 November, 3 2000      * Blast 2 sequences was added
 October, 19 2000      * RPS Blast was added
                       * Out of Frame BLASTX (OOF) now available for
                         testing and suggestions.
 September, 28 2000    * Added possibility to limit search to results of
                         Entrez query (Regular client/server BLAST)
 September, 11 2000    * Added MEGABLAST search.
                       * Added possibility to have multiple FASTA query
                         input - batch searches with multiple graphical
                         overviews. (Regular BLAST)
 August, 22 2000       * Added new advanced statistics to PSI Blast and
                         ability to produce Smith-Waterman alignments
                       * Added support for XML output.
 May, 17 2000          * PSI and PHI Blast were added to this distribution
                       * Added support for client/server interface for
                         gi/accesion lookups using Entrez
                       * Added possibility to print Taxonomy reports
                       * Added option to print alternative alignment with
                         specific color schema
 March, 20 2000        * Initial revision

Installation of the Standalone WWW server

After downloading file wwwblast.Your_platform.tar.gz to your computer place
it into document directory of HTTPD server and uncompress it by

    gzip -d wwwblast.Your_platform.tar.gz
    tar -xvpf wwwblast.Your_platform.tar

Please note that parameter "p" in tar options is significant - it will
preserve file access options stored in the distribution. Temporary directory
for BLAST overview images (TmpGifs) should have 777 permission and logfiles
(wwwblast.log and psiblast.log) should have 666.

After you uncompressed distribution file "blast" directory will be created.
You can access sample BLAST HTML input forms using URLs:

   * http://your_hostname/blast/blast.html
   * http://your_hostname/blast/blast_cs.html
   * http://your_hostname/blast/psiblast.html
   * http://your_hostname/blast/psiblast_cs.html

This distribution comes with 2 BLAST databases: "test_aa_db" - sample
protein database and "test_na_db" - sample nucleotide database. These
databases configured to be searchable immediately with corresponding BLAST
program.

Description of files in the distribution

   * Root directory (./blast):

Files with suffix "*_cs.*" are analogos to files without such suffix with
added capability to make client/server Entrez lookups for sequence gis and
accessions.

blast.cgi, blast_cs.cgi          - BLAST search start-up C-shell files

.nlmstmanrc                      - Configuration file for the graphical
                                 overview image (do not edit!)
blast.html, blast_cs.html        - sample BLAST search input HTML forms
megablast.html,
megablast_cs.html                - sample MEGABLAST search input HTML forms
rpsblast.html, rpsblast_cs.html  - sample RPS BLAST search input HTML forms

blast.rc                         - Default configuration file for the WWW
                                 BLAST server

psiblast.rc                      - Default configuration file for the
                                 PSI/PHI WWW BLAST server

psiblast.cgi, psiblast_cs.cgi    - PSI/PHI BLAST search start-up C-shell
                                 files

psiblast.html, psiblast_cs.html  - sample PSI/PHI BLAST search input HTML
                                 forms
psiblast.REAL, psiblast_cs.REAL  - Main PSI/PHI BLAST server executables

wblast2.html, wblast2_cs.html    - sample BLAST 2 sequences search input
                                 HTML forms
wblast2.REAL, wblast2_cs.REAL    - Main BLAST 2 sequences server executables

bl2bag.cgi                       - CGI used to create 2 sequences alignment
                                 image on the fly

xmlblast.html                    - sample form that produces only XML Blast
                                 Output

xmlblast.cgi                     - sample start-up C-shell to produce XML
                                 content type

blast_form.map                   - Auxiliary map file for the front BLAST
                                 image

nph-viewgif.cgi                  - CGI program used to view and delete
                                 overview images
readme.html                      - this documentation
wwwblast.log                     - default logfile
psiblast.log                     - default PSI/PHI Blast logfile

ncbi_blast.rc                    - sample file for full NCBI set of
                                 databases

   * ./data directory - matrixes used in BLAST search
   * ./db directory. - Files of test BLAST databases: test_aa_db and
     test_na_db. This directory has also binary of formatdb program.
   * ./docs: - HTML pages used in sample BLAST search input page
   * ./images - images used in the sample BLAST search input page
   * ./Src - source directory for WWW BLAST server and formatdb program.
   * ./Src/XML - source directory for creation of XML output related files.
        o blstxml.asn - ASN.1 definition for Blast XML
        o blstxml.dtd - corresponding DTD
   * ./TmpGifs - storage for temporary BLAST overview gif files

Configuration of BLAST databases

To set up databases for the standalone WWW BLAST server it is necessary to
follow these steps:

  1. Copy file with concatenated FASTA entries, that will be used as a
     database into directory "./db"
  2. Run "formatdb" program to format this database there.
  3. Add name of the database into server configuration file
  4. Add name of the database into (PSI/PHI) WWW BLAST search form

PSI/PHI Blast notes

There is one significant feature of the PSI/PHI Blast server, that FASTA
files as a source for BLAST databases should have GI number in the SeqId
field. This is practically always TRUE for FASTA files from NCBI FTP site.
Local databases may not be used with this version of PSI/PHI Blast unless
they have ">gi|12345..." prefix in the definition line.

Databases for the PSI/PHI Blast should always be created with formatdb using
"-o T" flag. Test database "test_aa_db" was created using this flag and
database "test_na_db" was created without this flag.

If this distribution was installed not in "/blast" directory under HTTPD
documents root directory, than path do the distribution should be set by
environment WWW_ROOT_PATH in the file psiblast.cgi or psiblast_cs.cgi

Client/server version for Entrez lookup and taxonomy reports

Regular Blast, PSI/PHI Blast and MegaBlast have client/server versions for
Entrez gi/accession lookups and printing Taxonomy reports. Configuration of
client/server interface from the user to NCBI should be done as with any
other client/server program to NCBI. If program "blastcl3" works without
problems this server also should work OK. If user has firewall - default
configuration will definitely fail to work properly and this case will
require special configuration. General explanations of client/server
configuration to NCBI is not subject of this discussion. In any case if user
has problem with such configuration he should write to info@ncbi.nlm.nih.gov
for farther assistance.

XML output

Possibility to produce XML output was added to this server. XML definition
of BLAST output tied to the simple ASN.1 specification designed for this
case. These definitions may be found in the directory ./Src/XML. Any
recommendations on improvements to this (possible) standard may be sent to
blast-help@ncbi.nlm.nih.gov or directly to me at shavirin@ncbi.nlm.nih.gov
XML may be printed by setting "Alignment view" in blast.html or
(blast_cs.html) page to "Blast XML". Resulted page will looks empty - but if
you open page source (in Netscape - View -> Page source) - you will see
complete XML document. As an example of XML usage sample page
"xmlblast.html" was created. This form calls starter cshell script
"xmlblast.cgi", which set Content-type of the output to ncbi/xmlblast. Now
if browser can handle this type of content using plug-in or some helper
application - it is possible to design custom BLAST output viewer.

Blast 2 sequences

Blast 2 sequences program was initially written by Tatiana Tatusova and Tom
Madden and was presented in the article: Tatiana A. Tatusova, Thomas L.
Madden (1999), "Blast 2 sequences - a new tool for comparing protein and
nucleotide sequences", FEMS Microbiol Lett. 174:247-250. For standalone WWW
version program was rewritten to accomodate general setup and remove
absolete code and interfaces.

RPS Blast

RPS Blast or "Reversed Position Specific BLAST" was implemented as very fast
alternative to the program IMPALA. It has the same general objective - to
search collection of conserved domains or motifs or profiles or HMMs - there
are many names for this collection. In the contrary RPS Blast has completaly
different implementation, that increased speed of profiles search from 10 to
100 times depending on search conditions in comarison with IMPALA. RPS Blast
has translated variant, that allows to search DNA sequences against these
conserved domains. Currently RPS Blast - is one of tools choosen to annotate
human genome in NCBI and base for CDD Blast search page.

Databases for RPS blast are hardware dependent - for speed reasons. So they
are different for big/little endian platforms.

To build RPS database it is necessary to follow procedure explained in the
file "README.rps", that comes with this distribution. There is small RPS
database available for testing. This database is a part of real NCBI
database used in CDD search page. Full NCBI database available in
platform-independent form from FTP site.

Out of Frame BLASTX

OOF alignment functions were origianlly written by Zheng Zhang in 1997, who
worked that time in Penn State University and currently working for Paracel
Inc. These functions were never used since then due to unability to handle
different scales for query and subject of high scoring pairs (DNA scale vs.
protein scale)

His original package included low-level OOF gapped alignment functions,
those aligned whole (not-in-frame) DNA to protein.

This package now incorporated into regular BLAST API.

Due to the fact, that all currently existing alignment viewers can "live"
only in "the same scale world" - protein/protein or DNA/DNA and cannot show
alignemnt with different scales for sequences to be aligned - I had to
implement custom viewer especially for OOF alignments. Also it was neccesary
to create all trace of transforms from low level Zhang's traceback
structures to standrand NCBI alignment structures with added new features to
store frameshift information.

This viewer can show now only pairwise Traditional Blast output. XML output
is not yet implemented.

Description of tags for main BLAST input page

This standalone server has analogos tag convention to regular NCBI BLAST
server. Sample BLAST search forms may be changed to accommodate particular
needs of the user in the custom search. Here is the list of these tags and
their meaning. If some tag is missing from the search input page this will
take default value. Exceptions are tags PROGRAM, DATALIB and SEQUENCE (or
SEQFILE), that should always be set.

   * PROGRAM - name of the BLAST program. Supported values include programs:
     blastn, blastp, blastx, tblastx and tblastn
   * DATALIB - name of the database(s) to search. This implementation
     includes possibility to use multiple databases. To use multiple
     databases few "DATALIB" tags should be used on the page for example
     using checkboxes (look for example at Microbial Genomes Blast Databases
     BLAST at NCBI). Note, that all of these databases should be properly
     written in the server configuration file.
   * SEQUENCE and SEQFILE - these tags used to pass sequence. First SEQUENCE
     tag is used for the input sequence and if it is missing SEQFILE tag
     used instead.
   * UNGAPPED_ALIGNMENT - default BLAST search is gapped search this tag if
     set will turn gapped alignment off
   * MAT_PARAM used to set 3 parameters at the same time. Value for this tag
     should be in format " " where mat_name - string name of the matrix
     (BLOSUM62, etc), d1 - integer for cost to open gap and d2 - cost to
     extend gap (-G and -E parameters in blastall respectably)
   * GAP_OPEN - set value for cost to open gap - 0 or missing tag invoked
     default behavior
   * GAP_EXTEND - set value for cost to extend gap - 0 or missing tag
     invoked default behavior
   * X_DROPOFF - Dropoff (X) for blast extensions in bits (default if zero)
     (-y parameter in "blastpgp" program)
   * GENETIC_CODE - Query Genetic code to use (for blastx only)
   * THRESHOLD_1 - Threshold for extending hits in first pass in multipass
     model search (-f in blastall)
   * THRESHOLD_2 - Threshold for extending hits in second pass in multipass
     model search
   * MATRIX - Matrix (default is BLOSUM62) (-M in blastall)
   * EXPECT - Expectation value (-e in blastall)
   * NUM_OF_BITS - Number of bits to trigger gapping (-N in blastpgp)
   * NCBI_GI - If formated database use SeqIds in the NCBI format this
     option will turn printing of gis together with accessions.
   * FILTER - Multiple instances of values of this tag are concatenated and
     passed to the engine as "filter_string" ("L" for low complexity and "m"
     if filter should be set for lookup table only) - any letter will turn
     default filtering on - DUST for nucleotides and SEG for proteins (-F in
     blastall)
   * DESCRIPTIONS - Number of one-line descriptions in the output (-v in
     blastall)
   * ALIGNMENTS - Number of alignments to show (-b in blastall)
   * COLOR_SCHEMA - Color schema to use in printing of alternative
     alignment. This option valid only for blastp and blastn programs. If
     set - it will override option set by "ALIGNMENT_VIEW"
   * TAX_BLAST - Print taxonomy reports. This option is valid only for
     client/server version of regular Blast
   * XML_OUTPUT - Print XML Blast output. All other alignment view options
     will be disabled
   * ENTREZ_QUERY - Limit search to results of Entrez query. Only for
     client/server version
   * RPSBLAST - This tag with turn "blastp" or "blastx" search into RPS
     Blast search for the rps blast database.
   * OOF_ALIGN -This flag if set to non-zero digit will turn on OOF
     alignment for "blastx" and will set frame shift penalty to this value.
   * OTHER_ADVANCED - this tag allows to input string analogous to the
     command line parameters of blastall. Setting parameter in
     OTHER_ADVANCED tag will override all other settings of this parameter.
     Supported options include:
        o -G gap open cost
        o -E gap extend cost
        o -q penalty for nucleotide mismatch
        o -r reward for nucleotide match
        o -e expect value
        o -W wordsize
        o -v Number of descriptions to print
        o -b Number of alignments to show
        o -K Number of best hits from a region to keep
        o -Y effective search space
   * ALIGNMENT_VIEW - will set type of alignment to show. Available options
     include:
        o 0 - Pairwise
        o 1 - master-slave with identities
        o 2 - master-slave without identities
        o 3 - flat master-slave with identities
        o 4 - flat master-slave without identities
   * OVERVIEW - used to turn on or off printing of alignment overview image
   * WWW_BLAST_TYPE - this special tag to distinguish different BLAST search
     types. See description of configuration file.

Server configuration file and logfile

Default configuration file is "blast.rc" and logfile "wwwblast.log". Setting
tag WWW_BLAST_TYPE to specific value may change these names. This is useful
if few different search input pages use the same CGI search engine, but
significantly different by content and priorities. Here is sample
configuration file comes with this distribution:

# Number of CPUs to use for a single request
#
NumCpuToUse     4
#
# Here is list of combination program/database,
# that allowed by BLAST service. Format:    ...
#
blastn test_na_db
blastp test_aa_db
blastx test_aa_db
tblastn test_na_db
tblastx test_na_db

This file will set how many CPUs of computer will be used in the BLAST
search and what databases may be used with what programs. Logfile currently
store only limited information but also may be updated by programmers to
store more values in it. Please note, that usually HTTPD servers run by
accounts, that do not have write access to disk, so to write logfile - its
permission should be set to 666.

How to debug WWW Blast programs

There is a way to debug these programs.

  1. Add line "setenv DEBUG_COMMAND_LINE TRUE" into *.cgi file (uncomment
     it)
  2. Run search, that results in the problem - this should create file
     "/tmp/__web.in" in the "/tmp" directory.
  3. Set in command line all necessary environment (BLASTDB at least) and
     run from command-line: "blast.REAL < /tmp/__web.in"

This should do your problematic search without WWW. If this resulted in
coredump - you may look into corefile with:

dbx blast.REAL core

and then use command "where" to print stack.

  ------------------------------------------------------------------------

Sergei Shavirin

Last modified: Fri Sep 1 13:51:15 EST 2000
