############################################################# Last update to this document: MARCH 10, 1997 README_2 for the beta version of Network Entrez. This document describes the software that can be obtained by anonymous FTP from ncbi.nlm.nih.gov in this directory: /entrez/ please note that Network Entrez users should read these documents ftp://ncbi.nlm.nih.gov/entrez/README_1 ftp://ncbi.nlm.nih.gov/entrez/README_2 ############################################################# National Center for Biotechnology Information (NCBI) National Library of Medicine National Institutes of Health 8600 Rockville Pike Bethesda, MD 20894, USA tel: (301) 496-2475 fax: (301) 480-9241 e-mail: info@ncbi.nlm.nih.gov ver 4.000 Oct. 05, 1995 ver 4.010 Oct. 11, 1995 ver 4.012 Nov. 08, 1995 ver 4.013 Nov. 30, 1995 (Unix)/Dec 02, 1995 (mswin)/Dec 03, 1995 (Mac) ver 4.017 Feb. 06, 1996 (Unix/mswin)/Feb. 05, 1996 (Mac) ver 4.022 Mar. 15, 1996 (Unix/mswin)/March 19, 1996 (Mac) ver 4.024 May. 01, 1996 ver 5.002 Jul. 26, 1996 (SunOS Aug 1, 1996) ver 5.100 Mar. 06, 1997 ***** README ***** This README presents some of the highlights of the beta version of Network Entrez which incorporates a new genomes division and also presents MMDB (Molecular Modeling DataBase)and the its viewer, Cn3D. Instructions on installing network Entrez are available from the README document (in the same directory you picked up this text) or on the WWW at this location (URL): http://www3.ncbi.nlm.nih.gov/Entrez/nentrez.overview.html The /entrez/ directory, and the ones below it, include the latest version of the network Entrez program which will run on various platforms: ftp://ncbi.nlm.nih.gov/entrez/ ` entrez.hqx :Mac - binhexed alphaOSF1.tar.Z :alpha - compressed linux.tar.Z :Linux - compressed mswin/ win32/entrezz.exe :NT/Win95 (32 bit) self extracting winsock1.1/entrezz.exe :Win 3.1 (16 bit) self extracting sgi.tar.Z :SGI - compressed (IRIX 5.3) solaris.tar.Z :Sun - Solaris - compressed sun.tar.Z :SunOS - compressed You need only copy (in binary mode) the executable version for the platform you need and the two README files (README_1 and README_2) This file is README_2. If you desire binaries of Network Entrez for a platform not presented above, you can ask us, and we will see if we can compile one for you, on an unsupported basis. Please send these requests to: toolbox@ncbi.nlm.nih.gov You should already be familiar with Entrez (Network or CD-ROM) if you are reading this document, but if you are not, you may want to read the first README document presented in this directory, as well as the Entrez user manual present in BinHexed format (Macintosh) in this file: /entrez/docs/entrzdoc.hqx This new version of Network Entrez (present in /entrez/) should still be considered "beta" and users should be aware that although we have taken every precaution to ensure that this program will work without any problems, it may not always perform as expected. We are also still modifying the code to add new features, and you should visit this directory to make sure you have the most up to date version. The build dates on the writing of this document were: March 5 entrez.hqx March 6 alphaOSF1.tar.Z March 6 linux.tar.Z mswin/ March 6 win32/entrezz.exe March 6 winsock1.1/entrezz.exe March 6 sgi.tar.Z March 6 solaris.tar.Z March 6 sun.tar.Z The current version is: 5.100 We are also changing the ergonomics and layout of some of the features on this new programs, and suggestions and feedback are very welcome. These should be sent to this e-mail address: toolbox@ncbi.nlm.nih.gov The text which follows explains some of the new features of this version of Network Entrez, as well as some of the ways one must manoeuver to visit some of these new features. Again, we are assuming that you are already familiar with Entrez, and the general way of going from one information space to another by linking or neighboring. ================================================================= **** Network Entrez: From Genome to Structure **** The NCBI has made a major new release of Entrez available in October 1995. The new release adds graphical access to a new "genomes" division of GenBank as well as graphical views of standard Entrez sequence records. The new release also provides a database of 3- dimensional structures derived from the PDB crystallographic database. ** Graphical Views of Sequences. A tabbed-folder sequence viewer has been added to Entrez. This allows the quick selection of alternate report formats for a sequence entry, including GenBank, EMBL, and a graphical representation. The viewer is resizable, and permits easy visualization of complex annotations such as segmented sequences or alternative splicing in coding regions. ************************************ ** 3D Structure in Network Entrez ** ************************************ Since September 1995, Network Entrez has included an explicity 3D structure database, based on crystallographic and NMR structure determinations. With the release of Entrez 5.0 in July 1996, NCBI has added a new 3D-structure viewer called Cn3D ("See in 3D"). Cn3D allows one to visualize and rotate structure data entries from Entrez. Structure data can provide a wealth of information on the biological function and mechanism of action of macromolecules. By adding the structure database to Entrez we hope to make this information easily accessible to biologists. The structure data comprise a new database from NCBI called MMDB (Molecular Modeling DataBase), derived from the Brookhaven Protein DataBank 3-dimensional structures (currently over 3,000 biomolecules). MMDB is a database of ASN.1-formatted records, not PDB formatted records. MMDB is capable of archiving conventional structure data as well as future descriptions of biomolecules, such as those generated by electron microscopy (surface models). ** Searching Structures in MMDB MMDB is also referred to as the structure division of Entrez. The structure database may be queried directly, using specific fields such as author names, or text terms occurring anywhere in the structure description. One may in this way check for structure data on a specific protein or nucleic acid. A more powerful approach, however, is to identify the molecule of interest in the sequence or MEDLINE databases, identify its sequence neighbors (homologues), and then, by linking to the structure database, ask whether structure data is available for any of the members of this family. It is smaller than the protein or nucleotide databases, but very many sequenced proteins have homologues in this set, and one may often learn more about a protein by examining the 3D structure of its homologues. Soon Entrez will include 3D structural neighbors, which can help link protein neighbors in the "twilight zone" of sequence homology. ** Viewing Structures Structure data from Network Entrez may be viewed in 3D, with real-time rotation, using the built-in Cn3D software, or using public domain graphics programs RasMol or Kinemage. Cn3D is a new structure viewer that is based on the Entrez data model. It is a client-server application - meaning you can immediately fetch structures you wish to see over the Internet in a single session. You can also save the structures you have found from Cn3D. Cn3D reads in ASN.1 formatted structures which originate from MMDB. The new ASN.1 file format for 3D structure data is more compact than the PDB file format, resulting in faster transmission over the internet. Cn3D is freely available for many platforms, including Mac, Windows and UNIX. For a complete list of platforms and the on-line manual, please see the WWW-based Cn3D homepage at: http://www.ncbi.nlm.nih.gov/Structure/cn3d.html Information for obtaining and using RasMol and Kinemage is at: http://www.ncbi.nlm.nih.gov/Structure/struchelp.html ** MMDB Content and Updates As new PDB data becomes available from Brookhaven, MMDB is updated. The PDB data are also changed in both form and content. PDB data are checked and validated for consistency in the purported chemistry, the sequence, and the 3D coordinates. Entrez users may occasionally notice, for example, that the sequence of a PDB-derived entry differs slightly from the PDB file, since all non-standard or chemically modified residues, as judged by their 3D structure, are explicitly identified as such in MMDB. These changes to PDB data are intended to support computational applications such as homology modeling and structure comparison. The MMDB database also differs from PDB in that it provides pre-computed "views" of structures, containing increasing levels of detail. MMDB also has explicit secondary structure information in addition to any provided by PDB, and this information is used to create vector models for the purposes of structural comparison and alignment. As you can see, MMDB is a "value- added" structure database. ******************************************** ** Complete Genomes in Entrez and GenBank ** ******************************************** Network Entrez now offers a new "genomes" division which presents genome level views of a large number of complete chromosomes, from organelle, through virus and phage, to completely sequenced chromosomes from yeast or bacteria, to integrated genetic and physical maps and contiged sequence islands from eukaryotes such as Human, mouse and Drosophila. Following the Entrez tradition, the chromosome views are tightly linked to DNA and protein sequence records, MEDLINE citations, and the new three dimensional structure division described below. ** Small Genomes A number of chromosomes from viruses or organelles have been completely sequenced and available from GenBank for some time. However, there are often multiple versions of these sequences, parts of the sequences, or of population variants. NCBI has selected a reference sequence in these cases, then searched the database and aligned the other versions of sequence from the same chromosome with the reference sequence. In the genomes division of Entrez, selecting such an entry will bring up a graphical map showing the coordinate system of the whole chromosome. Selecting all or part of this map with the mouse designates the region to be displayed in the other viewers. This is done by "rubber-banding" or "click & dragging" of the are of interest. Once you have selected the area of interest in the map view, you must then choose one of the other views (e.g. Graphic) by clicking on the appropriate "TAB". Choosing the Graphic view shows the detailed feature table of the selected region of the reference sequence and the positions of other GenBank records that align to it. Vertical black lines below the aligned sequences indicate insertions relative to the reference sequence, while black lines within the sequence indicate gaps. In the Alignment view, sequences aligned to the reference are retrieved over the network and a new display is constructed which shows the coding region features on the reference sequence AND those on the aligned sequences, permitting comparison of annotation between entries within the alignment. In addition, red lines now show mismatches between the aligned sequences as well as insertions and deletions as before. If you click on these records, you will see the GenBank flatfile view, but if you 'rubber-band' an area of interest, you will see the the alignment of the sequence you selected. ** Chromosomes from Contigs Larger chromosomes have recently been completely sequenced from yeast and Haemophilus. These records exist in GenBank as many smaller overlapping records, as required by the international guidelines for sequence data exchange to ensure compatibility with existing software tools, and to provide convenient units of data for updating or detailed analysis (See NCBI newsletter: Sept. 1995). Entrez provides a view for these chromosomes which presents a virtual sequence representing the whole chromosome, with bands of alternating colors to indicate where it is made from different GenBank records. The Graphic and Alignment views use the same display, but also show the details of overlaps of the pieces, as well as the features and alignments described above. Once you have located a region of interest in the genome view, you can readily retrieve the appropriate constituent record with a double click of the mouse. As larger chromosomes become available (and this is true for all the examples in the next section: "Integrated Maps") and larger amounts of data are requested you may reach the upper limit of the system where too much of a chromosome was selected. You will simply see a warning message, and the maximum default size will have been selected, so you can now "TAB" to the Graphical view. ** Integrated Maps In the higher eukaryotes, relatively small parts of chromosomes have been sequenced. In these cases, the NCBI has collected various genetic and physical maps for a particular organism, mapped them onto a common coordinate system, and aligned any markers they share. The beginning of a sequence map for the chromosome is made using contigs of sequence from the same region and organism, then placing the composite sequence onto the coordinate system provided by the integration of the maps. For Human, these composites are known as the "UniGene" set and are being used as mapping reagents by collaborating groups. As more of the UniGene-derived markers are placed on the maps, more and more sequence records will also be placed. For Human, the Map view shows the integrated map which includes Genethon (as derived from MIT), the MIT physical map, the CHLC framework map, the GDB cytogenetic map, the Stanford radiation hybrid map (at present just for chromosome 4), and the NCBI sequence map. ------------------------------------------------------------------ ** Availability Network Entrez clients are available for Macintosh, MS Windows, MS Windows 95, MS Windows NT, most UNIX machines under X11, and VMS under X11, among others, via anonymous FTP from ncbi.nlm.nih.gov. We are still refining the ergonomics and presentation and we welcome comments and suggestions. This software should be considered in "beta" test. There will be a series of software updates throughout the rest of the year. We welcome comments, suggestions, and offers of curatorial assistance with reference sequences. You can reach NCBI's Entrez development team by sending e-mail to: toolbox@ncbi.nlm.nih.gov. ** Credits The genomes division and the graphical viewers for Entrez have been built by: Jinghui Zhang, Jonathan Kans, Alex Smirnov, Denis Vakatov, Jonathan Epstein, Greg Schuler, Tatiana Tatusov, John Kuzio, Colombe Chappey and Jim Ostell. The structure database for Entrez (MMDB) has been a joint project of Hitomi Ohkawa, Christopher Hogue, Steve Bryant, Jonathan Kans, Jonathan Epstein, Greg Schuler and Jim Ostell. The NCBI would like to acknowledge the following sources for their contributions to the map information in the GenBank Genomes division: B.subtilis (NRSUB) Guy Perriere, Universite Claude Bernard, Lyon, France. Drosophila Physical Map Bill Gelbart and Wayne Rindone, Harvard. Human Genetic Map Ken Buetow, CHLC. Human Physical Map Lincoln Stein, Whitehead Institute, MIT. Human Radiation Hybrid Map Kathleen McKusick and David Cox, Stanford. Mouse Genetic Map Prakash Nadkarni, Yale, and Janan Eppig & Lori Corbani, Jackson Labs. Saccharomyces cerevisiae Mikchael J. Cherry, SGD, Stanford. T4 genome Tom Stidham, Evergreen State College, Olympia, WA. ================================================================== Comments and suggestions are welcome! A FAQ (Frequently Asked Questions) document about using Entrez, MMDB, Kinemage and can be obtained by e-mail request to: info@ncbi.nlm.nih.gov. If you want to be added to the mailing list for our free newsletter which will announce our new developments and projects, please send your complete postal address to the e-mail address above. DOCUMENT REVISION HISTORY: Date | Change ====================================================================== 12-12-95| Add revision history, and updated version dates for | SGI compiled for IRIX 5.3. ----------------------------------------------------------------------- 02-06-96| Updated this document to show new version installed (4.017). ----------------------------------------------------------------------- 03-15-96| Updated this document to show new version installed (4.022) | added Linux version. ----------------------------------------------------------------------- 03-20-96| Updated this document to show new Mac version installed | (4.022). ----------------------------------------------------------------------- 05-01-96| Updated this document to show new version for all | platforms (minor bug fixes) (4.024). ----------------------------------------------------------------------- 07-26-96| release 5.0 of Nentrez now includes the new 3D | viewer "Cn3D". ----------------------------------------------------------------------- 08-02-96| Updated this document to show new SunOS version installed | (5.002). ----------------------------------------------------------------------- 03-06-97| New Nentrez version, version 5.100 -----------------------------------------------------------------------