PROGRAM DESCRIPTION:

     CIDentify is a homology-based database search algorithm designed to aid in
the identification of unknown peptides by mass spectrometry (MS). It is made to
be used in conjuction with Lutefisk, the de novo MS/MS interpretation program
written by Rich Johnson, Immunex Corporation, Seattle, WA. (Current Lutefisk 
source and compiled application are available from 
http://www.immunex.com/researcher/lutefisk.html).  
     
     CIDentify uses the list of possible peptides produced by Lutefisk to 
search a database for homologous sequences taking into account sequence 
ambiguities due to the nature of the MS data. (See J. A. Taylor and R. S. 
Johnson (1997), "Sequence database searches via de novo peptide sequencing by 
tandem mass spectrometry", Rapid Communications in Mass Spectrometry 
11:1067-1075.)

     CIDentify is a derivative of the William R. Pearson's FASTA homology-based
database search adapted by Alex Taylor of the University of Washington (now at 
Immunex Corp., Seattle, WA). The current version of CIDentify is based on 
version 20u6 (Aug. 1996) of the FASTA program package. (For information on 
FASTA see W. R. Pearson and D. J. Lipman (1988),"Improved Tools for Biological 
Sequence Analysis", PNAS 85:2444-2448, and W. R.  Pearson (1990), "Rapid and 
Sensitive Sequence Comparison with FASTP and FASTA" Methods in Enzymology 
183:63-98.)

CIDentify takes a list of possible peptides from a Lutefisk output file as 
input.

For example:

 Sequence                    Rank  X-corr  IntScr
AEFVNNTK                       1   0.981   1.000
AEFVEVTK                       2   0.969   0.988
AEFVDLTK                       3   0.920   0.937
AEFVN[144]AK                   4   0.908   0.926
AEFMPVTK                       5   0.908   0.925
AEFVEEAK                       6   0.901   0.918
AEFVKTVE                       7   0.865   0.882
AEFVK[201]K                    8   0.865   0.882
AEFVTKVE                       9   0.865   0.882
AEFVT[228]K                   10   0.865   0.882

The Lutefisk rankings and scores are ignored by CIDentify. (Hence it is easy 
to create your own CIDentify input file without using Lutefisk by copying 
the header line,  " Sequence                    Rank  X-corr  IntScr", into a 
new text file and adding your own list of sequences to search - one per line.) 
Numbers in brackets within the peptide sequence are the nominal mass of a 
dipeptide of unknown order and/or identity due to incomplete fragmentation data
in the CID spectra. Each possible peptide sequence is treated as equally 
probable as they are used as query sequences to search a sequence database.  
The score for each database sequence is the sum of its best score vs. each 
of the query sequences.


     The CIDentify Result Compiler program can be used to combine the CIDentify
output for several peptides derived from the same protein. To use the result 
compiler, you must first create a file containing the full path file names of 
the individual CIDentify output files. The Result Compiler program is then run
and this file selected. The top scoring match in each individual file is given 
a rank score of 200, the second is given a rank score of 199, etc. These rank 
scores are combined for database sequences found in multiple results list and 
they are then used to sort the final combined list. 


To get a brief summary of the important command-line options, invoke CIDentify
with the '-h' option. (On the Macintosh, command-line options are entered on 
the 'Argurments:' line of the initial dialog box. To use command-line options 
under Win32 the executable must be invoked from a DOS prompt such as by using the
Command Prompt program.)

The most important command-line options are:

      [-q]  Quiet mode
      [-j Lutefisk query file name]
      [-p Database choice]
      [-C modified cysteine nominal mass]    (e.g.: -C "160")
      [-N N-terminal bonus residues]         (e.g.: -N "KR")
      [-l FASTLIBS file name]
      [-b Number of results to show when using quiet mode]
      [-s scoring matrix file name]

________________________________________________________________________________
VERSION HISTORY:

1.1     (Aug. 2002)  Fixed problem with ambiguous amino acid pairs as letters.

1.0.9 - (Apr. 2002)  Introduction of CIDentify_mpi, a parallelized version of
                     CIDentify (See the README_mpi file). Fixed PI bug.

1.0.8 - (Sep, 2001)  Replaced std. dev. with E-score in the output and did some
                     minor code cleanup to fix compile problems on some platforms.

1.0.7 - (Jan, 2001)  Fixed bugs introduced in v1.0.6. K & Q can now be scored
                     differently.

1.0.6 - (Oct, 2000)  Added support for changes in Lutefisk1900 - reading ambiguous
                     amino acid pairs as letter or floating point masses.

1.0.5 - (Oct, 1998)  Added a '-C' command-line option to set the nominal mass
                     of a modified cysteine. Added a '-N' command-line option
                     to set which residues in the database sequence N-terminal 
                     to the alignment produce a bonus. The default is "RK" as 
                     would be expected for tryptic peptides. Made minor 
                     modifications to the CIDentifyX alignment output to make it 
                     consistent with the CIDentify output. Made adjustments to
                     the result compiler to make it more robust. Successfully 
                     compiled CIDentify & CIDentifyX under LINUX and as console 
                     apps for Windoze NT.

1.0.4 - (Oct, 1997)  Added some command-line options and made a few other 
                     adjustments so that it could be run in quiet mode on UNIX 
                     without having to feed it responses. Fixed more bugs in 
                     CIDentifyX dealing with reverse frames, unknown sequence 
                     characters (B, Z, X), and sequences that are too short 
                     stopping the search (found when encountering an entry in 
                     dbEST whose sequence was only "AC"). Adjusted the scoring 
                     of the 1 query residue = 2 library residues case to be the
                     matrix identity score of the query residue instead of the 
                     sum of the library matrix identities to reduce score 
                     inflation. Tweaked the output format slightly:  longer 
                     line length, full descriptors by default, and inclusion of
                     the query sequences used.

1.0.3 - (Sept, 1997) Fixed some bugs in CIDentifyX and successfully compiled 
                     the source code for CIDentify and CIDentifyX under both 
                     OSF and Solaris using gcc v2.7

1.0   - (June, 1997) Initial release. Based on FASTA version 20u6 (Aug. 1996).
________________________________________________________________________________
CIDentifyMac AND CIDentifyWin32 ARCHIVE CONTENTS:

* Compiled CIDentify, CIDentifyX, and CIDentify Result Compiler applications 
  for PPC or for Win32
* Example Lutefisk output file (CIDentify input) - "BSA-200MKDFVAFVDK"
* Example CIDentify output file - "BSA-200MKDFVAFVDK.out"
* This README file, "README", and the FASTA "COPYRIGHT" 
* "environment" and "fastgbs" files for creating customized database menus
* A folder of FASTA documentation
* A folder of scoring matrices modified for CIDentify


CIDentifySrc SOURCE CODE ARCHIVE CONTENTS:
(The CIDentifySrc.tar.Z archive is a tar archive that has been UNIX compressed.)

* Example Lutefisk output file (CIDentify input) - "BSA-200MKDFVAFVDK"
* Example CIDentify  output file - "BSA-200MKDFVAFVDK.out"
* This README file, "README", and the FASTA "COPYRIGHT" 
* "environment" and "fastgbs" files for creating customized database menus
* A folder of FASTA documentation
* A folder of scoring matrices modified for CIDentify

* C Source code files for CIDentify:  
        - Makefile - Makefile for compiling CIDentify, CIDentifyX and CIDentifyRC
                     on UNIX or LINUX
        - Macintosh/CIDentify.CWP6  - Metrowerks project file for compiling on the 
                                      Macintosh
        - Win32/CIDentify.CWP6.mcp  - Metrowerks project file for compiling on Win32
        - fffasta.c
        - nxgetaa.c
        - f_band.c
        - scalesws.c
        - zzlgmata.c
        - jat.c
        - LutefiskGlobals.c
        - pam.c
        - getenv.c - Needed for Macintosh & Win32 versions
        - getopt.c - Needed for Macintosh & Win32 versions
        - time.c
        - ndispn.c
        - l_band.c
        - llmax.c
        - g_band.c
        - Macintosh/FileDlog.c - Macintosh specific dialog routines
        - Macintosh/fasta.rsrc - Macintosh program resources
        - Macintosh/checkevent.c - Macintosh specific routines
        - Included header files:
               - altlib.h
               - ffasta.h
               - f_band.h
               - getenv.h
               - getopt.h
               - g_band.h
               - jat.h
               - llmax.h
               - Lutefisk.h
               - LutefiskGlobals.h
               - l_band.h
               - mytime.h
               - ndispn.h
               - nxgetaa.h
               - pam.h
               - scalesws.h
               - uascii.gbl
               - upam.gbl
               - zzlgmata.h

* C source code files and changes specific for CIDentifyX (DNA)
        - Macintosh/CIDentifyX.CWP6  - Metrowerks project file for compiling on the 
                                       Macintosh 
        - Win32/CIDentifyX.CWP6.mcp  - Metrowerks project file for compiling on Win32 
        - lx_align3.c
        - lx_band2.c
        - faatran.c
        - zxlgmata.c
        - Include header files:
               - aamap.gbl
        - Remove files: f_band.c, g_band.c, l_band.c, llmax.c, zzlgmata.c
        - (the line: #define TFASTX 
           in ffasta.h must also be uncommented when building the Mac or Win32 versions)

* C Source code files for CIDentify Result Compiler (CIDentifyRC):
        - Macintosh/CIDentifyRC.CWP6  - Metrowerks project file for compiling on the
                                        Macintosh
        - Win32/CIDentifyRC.CWP6.mcp  - Metrowerks project file for compiling on Win32
        - CIDentifyRC.c
        - getopt.c - Needed for Macintosh & Win32 versions
        - checkevent.c, and FileDlog.c - Macintosh specific routines the same as for CIDentify
        - To use the result compiler, make an index file with the path names of
          the CIDentify output files to be compiled. [ Remember, UNIX uses '/'
          as a directory seperator while the Mac uses ':' and Win32 uses '\' ]


* Compiling on the Macintosh
Current Metrowerks Projects are included in the "Macintosh" folder. If you have an older 
compiler you will need to create a new "Std C Console PPC" project and add the source files 
as specified above. You may have to change the files to creator 'CWIE' and type 'TEXT' to get
Codewarrior to like them. Note that in OSX the apps can be compiled using the UNIX directions
if desired.

* Compiling under Win32
Current Metrowerks Projects are included in the "Win32" folder. If you have an older 
compiler you will need to create a new "C Console App" project and add the source files 
as specified above.

* Compiling on UNIX or LINUX
Simply use the "make all" command after untarring the archive.
________________________________________________________________________________

Questions?  Problems?

contact Alex Taylor at  jataylor@hairyfatguy.com   -or-   ataylor@immunex.com


