  ------------------------------------------------------------------------

               Introduction to the Standalone WWW Blast server

  ------------------------------------------------------------------------

Index

   * Introduction
   * Installation of the Standalone WWW server
   * Description of files in the distribution
   * Configuration of BLAST databases
   * Description of tags for main BLAST input page
   * Server configuration file and logfile

Introduction

This standalone BLAST server was designed similar to regular NCBI BLAST
server and command-line NCBI BLAST program "blastall". It incorporates most
features, which exist in NCBI BLAST program and should be relatively easy to
use. The main points here, that this server DOES NOT support any request
queuing and load balancing. As soon as user hit "Search" button, BLAST will
start immediately if entered information is valid. So, this server is not
intended to handle large load, which may exist in public service. Standalone
server assumes, that user has its own BLAST database, that should be
searched and want to have simple WWW interface to this search. It is STRONLY
recommended that user has experience to install and run standalone NCBI
BLAST programs.

After files uncompressed server is ready to be used immediately. Any
customizations to the program are welcomed and may be done by experienced
programmers using source code, which also provided. Recompilation of the
server executables require, that programmer has compiled NCBI toolkit
libraries available. This toolkit may be downloaded from NCBI FTP web site:
ftp://ncbi.nlm.nih.gov

Installation of the Standalone WWW server

After downloading file wwwblast.Your_platform.tar.gz to your computer place
it into document directory of HTTPD server and uncompress it by

    gzip -d wwwblast.Your_platform.tar.gz
    tar -xvpf wwwblast.Your_platform.tar

Please note that parameter "p" in tar options is significant - it will
preserve file access options stored in the distribution. Temporary directory
for BLAST overview images (TmpGifs) should have 777 permission and logfile
(wwwblast.log) should have 666.

After you uncompressed distribution file "blast" directory will be created.
You can access sample BLAST HTML input form using URL
http://your_hostname/blast/blast.html. This distribution comes with 2 BLAST
databases: "test_aa_db" - sample protein database and "test_na_db" - sample
nucleotide database. These databases configured to be searchable immediately
with corresponding BLAST program.

Description of files in the distribution

   * Root directory (./blast):

blast.cgi      - BLAST search start-up C-shell file
blast.html     - sample BLAST search input HTML form
blast.rc       - Default configuration file for the WWW BLAST server
blast.REAL     - Main BLAST server executable
blast_form.map - Auxiliary map file for the front BLAST image
nph-viewgif.cgi- CGI program used to view and delete overview images
readme.html    - this documentation
wwwblast.log   - default logfile

   * ./data directory - matrixes used in BLAST search
   * ./db directory. - Files of test BLAST databases: test_aa_db and
     test_na_db. This directory has also binary of formatdb program.
   * ./docs: - HTML pages used in sample BLAST search input page
   * ./images - images used in the sample BLAST search input page
   * ./Src - source directory for WWW BLAST server and formatdb program.
   * ./TmpGifs - storage for temporary BLAST overview gif files

Configuration of BLAST databases

To set up databases for the standalone WWW BLAST server it is necessary to
follow these steps:

  1. Copy file with concatenated FASTA entries, that will be used as a
     database into directory "./db"
  2. Run "formatdb" program to format this database there.
  3. Add name of the database into server configuration file
  4. Add name of the database into WWW BLAST search form

Description of tags for main BLAST input page

This standalone server has the same tag convention as regular NCBI BLAST
server. Sample BLAST search form may be changed to accommodate particular
needs of the user in the custom search. Here is the list of these tags and
their meaning. If some tag is missing from the search input page this will
take default value. Exceptions are tags PROGRAM, DATALIB and SEQUENCE (or
SEQFILE), that should always be set.

   * PROGRAM - name of the BLAST program. Supported values include programs:
     blastn, blastp, blastx, tblastx and tblastn
   * DATALIB - name of the database(s) to search. This implementation
     includes possibility to use multiple databases. To use multiple
     databases few "DATALIB" tags should be used on the page for example
     using checkboxes (look for example at Microbial Genomes Blast Databases
     BLAST at NCBI). Note, that all of these databases should be properly
     written in the server configuration file.
   * SEQUENCE and SEQFILE - these tags used to pass sequence. First SEQUENCE
     tag is used for the input sequence and if it is missing SEQFILE tag
     used instead.
   * UNGAPPED_ALIGNMENT - default BLAST search is gapped search this tag if
     set will turn gapped alignment off
   * MAT_PARAM used to set 3 parameters at the same time. Value for this tag
     should be in format " " where mat_name - string name of the matrix
     (BLOSUM62, etc), d1 - integer for cost to open gap and d2 - cost to
     extend gap (-G and -E parameters in blastall respectably)
   * GAP_OPEN - set value for cost to open gap - 0 or missing tag invoked
     default behavior
   * GAP_EXTEND - set value for cost to extend gap - 0 or missing tag
     invoked default behavior
   * X_DROPOFF - Dropoff (X) for blast extensions in bits (default if zero)
     (-y parameter in "blastpgp" program)
   * GENETIC_CODE - Query Genetic code to use (for blastx only)
   * THRESHOLD_1 - Threshold for extending hits in first pass in multipass
     model search (-f in blastall)
   * THRESHOLD_2 - Threshold for extending hits in second pass in multipass
     model search
   * MATRIX - Matrix (default is BLOSUM62) (-M in blastall)
   * EXPECT - Expectation value (-e in blastall)
   * NUM_OF_BITS - Number of bits to trigger gapping (-N in blastpgp)
   * NCBI_GI - If formated database use SeqIds in the NCBI format this
     option will turn printing of gis together with accessions.
   * FILTER - Multiple instances of values of this tag are concatenated and
     passed to the engine as "filter_string" ("L" for low complexity and "m"
     if filter should be set for lookup table only) - any letter will turn
     default filtering on - DUST for nucleotides and SEG for proteins (-F in
     blastall)
   * DESCRIPTIONS - Number of one-line descriptions in the output (-v in
     blastall)
   * ALIGNMENTS - Number of alignments to show (-b in blastall)
   * OTHER_ADVANCED - this tag allows to input string analogous to the
     command line parameters of blastall. Setting parameter in
     OTHER_ADVANCED tag will override all other settings of this parameter.
     Supported options include:
        o -G gap open cost
        o -E gap extend cost
        o -q penalty for nucleotide mismatch
        o -r reward for nucleotide match
        o -e expect value
        o -W wordsize
        o -v Number of descriptions to print
        o -b Number of alignments to show
        o -K Number of best hits from a region to keep
        o -Y effective search space
   * ALIGNMENT_VIEW - will set type of alignment to show. Available options
     include:
        o 0 - Pairwise
        o 1 - master-slave with identities
        o 2 - master-slave without identities
        o 3 - flat master-slave with identities
        o 4 - flat master-slave without identities
   * OVERVIEW - used to turn on or off printing of alignment overview image
   * WWW_BLAST_TYPE - this special tag to distinguish different BLAST search
     types. See description of configuration file.

Server configuration file and logfile

Default configuration file is "blast.rc" and logfile "wwwblast.log". Setting
tag WWW_BLAST_TYPE to specific value may change these names. This is useful
if few different search input pages use the same CGI search engine, but
significantly different by content and priorities. Here is sample
configuration file comes with this distribution:

# Number of CPUs to use for a single request
#
NumCpuToUse     4
#
# Here is list of combination program/database,
# that allowed by BLAST service. Format:    ...
#
blastn test_na_db
blastp test_aa_db
blastx test_aa_db
tblastn test_na_db
tblastx test_na_db

This file will set how many CPUs of computer will be used in the BLAST
search and what databases may be used with what programs. Logfile currently
store only limited information but also may be updated by programmers to
store more values in it. Please note, that usually HTTPD servers run by
accounts, that do not have write access to disk, so to write logfile - its
permission should be set to 666.

  ------------------------------------------------------------------------

Sergei Shavirin

Last modified: Fri Mar 17 13:51:15 EST 2000
