Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Steven Salzberg's home page
[go: Go Back, main page]

 
photograph not shown

Steven Salzberg's home page 


Director, Center for Bioinformatics and Computational Biology, and Horvitz Professor, Department of Computer Science, 3125 Biomolecular Sciences Bldg #296, University of Maryland, College Park, MD 20742.
Affiliate Professor, Department of Cell Biology & Molecular Genetics.
Faculty member, Bioengineering graduate program
Phone: 301-405-5936. Email: s a l z b e r g (at) u m i a c s . u m d . e d u
Blog: genefinding.blogspot.com

My group's software: Glimmer, MUMmer, AMOS assembler, JIGSAW, TransTermHP, and others
Courses, current and past

Editorials and opinion pieces
My opinion piece on genome annotation from Genome Biology, 1 February 2007 (subscription may be required; email me for a reprint).
My editorial on evolution and the flu from The Philadelphia Inquirer newspaper, Nov 2, 2005.
The letter to Nature from the GISAID consortium on rapid release of avian influenza data, Aug. 24 2006, signed by over 70 scientists from 34 countries.
Our letter to the editor of Nature in favor of rapid release of influenza genome data, with Elodie Ghedin and David Spiro, Nature 440 (30 Mar 2006), 605.
It is time to end the patenting of software.  J. Quackenbush and S.L. Salzberg.  Bioinformatics 22:12 (2006), 1416-7.
Beware of mis-assembled genomes.  S.L. Salzberg and J.A. Yorke.  Bioinformatics 21:24 (2005), 4320-21.
Nature journal club article about viruses as living organisms.  Nature 438 (10 Nov 2005), 133.
Our letter to the editor of Nature in favor of unrestricted access to genome data (with Ewan Birney, Sean Eddy, and Owen White), Nature 422 (2003), 801.

Selected publications (GenomicsBioinformatics, or older machine learning papers)

Genomics research papers (click on the titles to download a reprint)
  1. Genome analysis linking recent European and African influenza (H5N1) virusesSteven L. Salzberg, Carl Kingsford, Giovanni Cattoli, David J. Spiro, Daniel A. Janies, Mona Mehrez Aly, Ian H. Brown, Emmanuel Couacy-Hymann, Gian Mario De Mia, Do Huu Dung, Annalisa Guercio, Tony Joannis, Ali Safar Maken Ali, Azizullah Osmani, Iolanda Padalino, Magdi D. Saad, Vladimir Savić, Naomi A. Sengamalay, Samuel Yingst, Jennifer Zaborsky, Olga Zorman-Rojs, Elodie Ghedin, and Ilaria Capua. Emerging Infectious Diseases 13:5 (May 2007).
  2. Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis. J. Carlton, et al., Science 315 (2007), 207-212.
  3. Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote. J.A. Eisen, et al. PLoS Biology 4:9 (2006): e286.
  4. Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution.  (Reprint) (Abstract) E. Ghedin, N.A. Sengamalay, M. Shumway, J. Zaborsky, T. Feldblyum, V. Subbu, D.J. Spiro, J. Sitz, H. Koo, P. Bolotov, D. Dernovoy, T. Tatusova, Y. Bao, K. St George, J. Taylor, D.J. Lipman, C.M. Fraser, J.K. Taubenberger, and S.L. Salzberg.  Nature (2005), 1162-1166.
  5. Whole-Genome Analysis of Human Influenza A Virus Reveals Multiple Persistent Lineages and Reassortment among Recent H3N2 Viruses. E.C. Holmes, E. Ghedin, N. Miller, J. Taylor, Y. Bao, K. St. George, B.T. Grenfell, S.L. Salzberg, C.M. Fraser, D.J. Lipman, and J.K. Taubenberger.  PLoS Biology 3:9 (2005), e300.  [Local PDF copy]
  6. Comparative Genomics of Trypanosomatid Parasitic Protozoa.  N.M. El-Sayed et al.  Science 309 (2005), 404-409.
  7. The genome of the basidiomycetous yeast and human pathogen Cryptococcus neoformans.  B.J. Loftus et al. Science 309 (Feb 25 2005), 1321-4.
  8. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.  (local PDF copy) S.L. Salzberg, J.C. Dunning Hotopp, A.L. Delcher, M. Pop, D.R. Smith, M.B. Eisen, and W.C. Nelson.  Genome Biology 2005, 6:R23.
  9. Yeast rises again.  S.L. Salzberg, Nature 423 (2003), 233-234.
  10. The genome assembly archive: a new public resource.  S.L. Salzberg, D. Church, M. DiCuccio, E. Yaschenko, and J. Ostell. PLoS Biology 9:2 (2004), 1273-1275.  [Local PDF copy]
  11. Genomic insights into methanotrophy: the complete genome sequence of Methylococcus capsulatus (Bath).  N. Ward, et al., PLoS Biology 10:2 (2004), e303.
  12. Comparative genome sequencing for discovery of novel polymorphisms in Bacillus anthracis T.D. Read, S.L. Salzberg, M. Pop, M. Shumway, L. Umayam, L. Jiang, E. Holtzapple, J. Busch, K.L. Smith, J.M. Schupp, D. Solomon, P. Keim, and C.M. Fraser. Science 296 (2002), 2028-2033.
  13. Genome sequence of the human malaria parasite Plasmodium falciparum.  M.J. Gardner et al., Nature 419 (2002), 498-511.
  14. The genome sequence of the malaria mosquito Anopheles gambiae.  R.A. Holt et al., Science 298 (2002), 129-149.
  15. Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster.  E.M. Zdobnov et al., Science 298 (2002), 149-159.
  16. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii.  J.M. Carlton et al., Nature 419 (2002), 512-519.
  17. A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome.  R.J. Mural et al. (176 authors).  Science 296 (2002), 1661-1671.
  18. Microbial Genes in the Human Genome: Lateral Transfer or Gene Loss? (Abstract)(Full text) (PDF file) S.L. Salzberg, O. White, J. Peterson, and J.A. Eisen, Science 292 (2001), 1903-1906.   See also the Enhanced Perspective in ScienceANNOTATED! See the annotated version of this paper, designed to help students and teachers of science, developed by the SCOPE project and the Editors of Science.
  19. The Sequence of the Human Genome.  (free at the Science website) J. Craig Venter et al. (274 authors), Science 291 (2001), 1304-1351.  Get the figures showing genome-scale duplications in PDF format here: [Page 1] [Page 2]
  20. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.  The Arabidopsis Genome Initiative (143 authors), Nature 408 (2000), 796-815.  (Also contains links to our papers on chromosomes 1, 2, and 3 of Arabidopsis.
  21. Evidence for symmetric chromosomal inversions around the replication origin in bacteria.  Jonathan A. Eisen, John F. Heidelberg, Owen White, and Steven L. Salzberg. Genome Biology 1:6 (2000), 1-9.
  22. Microbial genome sequencing.  Claire M. Fraser, Jonathan A. Eisen, and Steven L. Salzberg.  Nature 406 (2000), 799-803.
  23. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae.  John F. Heidelberg et al., Nature 406 (2000), 477-483.
  24. Genome sequences of Chlamydia trachomatis MoPn and C. pneumoniae AR39.  Timothy D. Read et al.,  Nucleic Acids Research 28:6 (2000), 1397-1406.
  25. Gene Index analysis of the human genome estimates approximately 120,000* genes.  F. Liang, I.E. Holt, G. Pertea, S. Karamycheva, S.L. Salzberg, and J. Quackenbush. Nature Genetics 25:2 (2000), 239-240. *Estimate corrected to 56,000 genes; Nature Genetics 26:4 (2000), 501.
  26. Sequence and Analysis of Chromosome 2 of Arabidopsis thaliana (get abstract).  Xiaoying Lin et al., Nature 402  (1999), 761-768.
  27. Complete genome sequence of Neisseria meningitidis serogroup B strain MC58 (get abstract).  Herve Tettelin et al.  Science287 (2000), 1809-1815.
  28. Optimized Multiplex PCR: Efficiently Closing a Whole-Genome Shotgun Sequencing Project (PDF).  H. Tettelin, D. Radune, S. Kasif, H. Khouri, and S.L. Salzberg. Genomics 62(1999), 500-507.
  29. Genome Sequence of the Radioresistant Bacterium Deinococcus radiodurans R1 (get abstract).   Owen White et al. , Science 286 (1999), 1571-1577.
  30. DNA uptake signal sequences in naturally transformable bacteria.  H.O. Smith, M.L. Gwinn, and S.L. Salzberg.  Research in Microbiology, 150 (1999), 603-616.
  31. Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima (get abstract).  Karen E. Nelson et al. ,  Nature 399 (1999), 323-329.
  32. Chromosome 2 sequence of the human malaria parasite Plasmodium falciparum (get abstract).  Malcolm J. Gardner et al., Science282 (1998), 1126-1132.
  33. Complete Genomic Sequence of Treponema pallidum, the Syphilis Spirochete.  C.M. Fraser et al., Science 281 (1998), 375-388.
  34. Genomic Sequence of a Lyme Disease Spirochaete, Borrelia burgdorferi.   C.M. Fraser et al., Nature 390 (1997), 580-586.
 Bioinformatics papers (and one book)
  1. Identifying bacterial genes and endosymbiont DNA with Glimmer. A.L. Delcher, K.A. Bratke, E.C. Powers, and S.L. Salzberg. Bioinformatics 2007 Mar 15;23(6):673-9. This is the Glimmer 3 paper.
  2. Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake.
    C.L. Kingsford, K. Ayanbule, and S.L. Salzberg.  Genome Biology 2007;8(2):R22.
  3. Hawkeye: an interactive visual analytics tool for genome assemblies.  M. Schatz, A.M. Phillippy, B. Shneiderman, and S.L. Salzberg.  Genome Biology 2007 Mar 9;8(3):R34.
  4. Minimus: a fast, lightweight genome assembler.  D.D. Sommer, A.L. Delcher, S.L. Salzberg, and M. Pop.  BMC Bioinformatics 2007 Feb 26;8:64.
  5. A phylogenetic generalized hidden Markov model for predicting alternatively spliced exons. J.E. Allen and S.L. Salzberg. Algorithms for Molecular Biology 1:14 (2006).
  6. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions.  J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg.  Genome Biology 2006, 7(Suppl):S9.
  7. Efficient implementation of a generalized pair hidden Markov model for comparative gene finding.  W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 21:9 (2005), 1782-88.
  8. Efficient decoding algorithms for generalized hidden Markov model gene finders.  W.H. Majoros, M. Pertea, A.L. Delcher, and S.L. Salzberg.  BMC Bioinformatics 6 (2005), 16.
  9. Comparative genome assemblyM. Pop, A. Phillippy, A.L. Delcher, S.L. Salzberg, Briefings in Bioinformatics 5:3 (2004), 237-248.
  10. Automated correction of genome sequence errors.  P. Gajer, M. Schatz, and S.L. Salzberg.  Nucleic Acids Research 32:2 (2004), 562-569.  This describes the AutoEditor system, with open source code available here.
  11. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders.  W.H. Majoros, M. Pertea, and S.L. Salzberg.  Bioinformatics 20:16 (2004), 2878-79.
  12. An empirical analysis of training protocols for probabilistic gene finders.  W.H. Majoros and S.L. Salzberg.  BMC Bioinformatics 5 (2004), 206.
  13. Versatile and open software for comparing large genomes.  S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.  Genome Biology 5:R12 (2004), http://genomebiology.com/2004/5/2/R12.  The is the MUMmer3 paper, with open source code available here.
  14. DAGChainer: A tool for mining segmental genome duplications and synteny.  B.J. Haas, A.L. Delcher, J.R. Wortman, and S.L. Salzberg.  Bioinformatics 20:18 (2004), 3643-6.
  15. Hierarchical scaffolding with Bambus. M. Pop, D. Kosack, and S.L. Salzberg.  Genome Research 14(2004), 149-159.  This describes our open source system for the scaffolding phase of genome assembly.
  16. Computational gene prediction using multiple sources of evidence.  J.E. Allen, M. Pertea, and S.L. Salzberg.  Genome Research 14(2004), 142-148.  This describes our open source system for producing a gene prediction based on multiple gene finders, alignment programs, and other evidence.
  17. Fast algorithms for large-scale genome alignment and comparison (Abstract) (Full text PDF) A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg. Nucleic Acids Research 30:11 (2002), 2478-2483.  (This is the MUMmer 2 paper.)
  18. Full-length messenger RNA sequences greatly improve genome annotation.  B.J. Haas, N. Volfovsky, C.D. Town, M. Troukhan, N. Alexandrov, K.A. Feldmann, R.B. Flavell, O. White, and S.L. Salzberg.  Genome Biology 3:6 (2002), research0029.1-12.
  19. Book: Computational Methods in Molecular Biology (1998; in paperback since 1999) edited by S.L. Salzberg, D.B. Searls, and S. Kasif. See the table of contents here.
  20. GeneSplicer: a new computational method for splice site prediction M. Pertea, X. Lin, and S.L. Salzberg.  Nucleic Acids Research 29:5 (2001) 1185-1190.
  21. A probabilistic method for identifying start codons in bacterial genomes.  B.E. Suzek, M.D. Ermolaeva, M. Schreiber, and S.L. Salzberg.  Bioinformatics 17:12, 1123-1130.
  22. Prediction of operons in microbial genomes. M.D. Ermolaeva, O. White and S.L. Salzberg.  Nucleic Acids Research 29:5 (2001), 1216-1221.
  23. A clustering method for repeat analysis in DNA sequences.  N. Volfovsky, B.J. Haas, and S.L. Salzberg.  Genome Biology 2:8 (2001), research0027:1-11.  This describes the RepeatFinder software.
  24. Finding genes in Plasmodium falciparum chromosome 3.  M. Pertea, S.L. Salzberg, and M.J. Gardner. Nature 404 (2000), 34.
  25. An optimized protocol for analysis of EST sequences.  F. Liang, I.E. Holt, G. Pertea, S. Karamycheva, S.L. Salzberg, and J. Quackenbush.  Nucleic Acids Research 28:18 (2000), 3657-3665.
  26. Prediction of transcription terminators in bacterial genomes (get abstract).  M.D. Ermolaeva, H. Khalak, O. White, H.O. Smith, and S.L. Salzberg.  J. Molecular Biology 301 (2000), 27-33.
  27. Improved microbial gene identification with GLIMMER  A.L. Delcher, D. Harmon, S. Kasif, O. White, and S.L. Salzberg.  Nucleic Acids Research, 27:23 (1999), 4636-4641.
  28. Interpolated Markov models for eukaryotic gene finding.  S.L. Salzberg, M. Pertea, A.L. Delcher, M.J. Gardner, and H. Tettelin.  Genomics, 59 (1999), 24-31.  This describes the GlimmerM gene finder, available below.
  29. Alignment of Whole Genomes  A.L. Delcher, S. Kasif, R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg.  Nucleic Acids Research, 27:11 (1999), 2369-2376.  Note that Figure 6 is supposed to be in color, and was mistakenly printed as black and white.  Click here for the color figure (PDF).  This describes the MUMmer system, available below.
  30. Microbial gene identification using interpolated Markov models S.L. Salzberg, A.L. Delcher, S. Kasif, and O. White. Nucleic Acids Research, 26:2 (1998), 544-548. This paper describes the original Glimmer system (version 1.0), available here.
  31. Skewed Oligomers and Origins of Replication. S.L. Salzberg, A.J. Salzberg, A.R. Kerlavage, and J.-F. Tomb. Gene 217:1-2 (1998), 57-67.
  32. Finding Genes in Human DNA with a Hidden Markov Model. J. Henderson, S.L. Salzberg, and K. Fasman. This describes the VEIL system for finding genes. Journal of Computational Biology 4:2 (1997), 127-141.
  33. A Decision Tree System for Finding Genes in DNA (preprint only).  S.L. Salzberg, A.L. Delcher, K. Fasman, and J. Henderson. Journal of Computational Biology 5:4 (1998), 667-680.
  34. A Method for Identifying Splice Sites and Translational Start Sites in Eukaryotic mRNA. S.L. Salzberg.  Computer Applications in the Biosciences (CABIOS) 13:4 (1997), 365-376. 
  35. Locating Protein Coding Regions in Human DNA using a Decision Tree Algorithm. S.L. Salzberg.  Journal of Computational Biology, 2:3 (1995), 473-485.
Machine learning papers (1997 and earlier)
  1. Decision Trees for Automated Identification of Cosmic Ray Hits in Hubble Space Telescope Images. S.L. Salzberg, R. Chandar, H. Ford, S. Murthy, and R. White.Publications of the Astronomical Society of the Pacific 107, May 1995, 1-10. (460K, compressed Postscript)
  2. On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach.  S.L. Salzberg. Data Mining and Knowledge Discovery 1:3 (1997), 317-327.
  3. A System for Induction of Oblique Decision Trees by S.K. Murthy, S. Kasif, and S. Salzberg.Journal of Artificial Intelligence Research 2:1 (1994), 1-32. (475K, PostScript).  Four figures in obsolete postscript format may not print - here are scanned versions in jpg: Figure1 Figure2 Figure3 Figure7.
  4. Best-Case Results for Nearest-Neighbor Learning.  S.L. Salzberg, A.L. Delcher, D. Heath, and S. Kasif.IEEE Transactions on Pattern Analysis and Machine Intelligence17:6, June 1995, 599-608. (Earlier version, "Learning with a Helpful Teacher," appeared in the IJCAI-91 proceedings.)

Bioinformatics software from my group, all open source

Computational Gene Finding

  1. Glimmer, a system that uses interpolated Markov models to find genes in microbial DNA. March 2003: New release, version 2.1, automatically optimizes ORF length for training.
  2. TWAIN, a Generalized Pair HMM to predict genes simultaneously in two closely related eukaryotic organisms.
  3. GlimmerHMM (formerly GlimmerM), an interpolated Markov Model system for finding genes in many eukaryotes, including P. falciparum, A. thaliana, rice (O. sativa), mosquito (A. aegypti), B. malayi, C. neoformans, and others.
  4. GeneZilla, a generalized HMM for eukaryotic gene finding that improves upon and replaces TigrScan, a 2003-vintage generalized HMM with a design similar to Genscan.
  5. JIGSAW (formerly called Combiner) a program that predicts gene models using the output from multiple sources of evidence, including other gene finders, Blast searches, and other alignment data.
  6. GeneSplicer, a fast system for detecting splice sites in genomic DNA of various eukaryotes.
  7. PIRATE, a website collecting many links to our gene finders and others.

Genome assembly and large-scale genome alignment

  1. MUMmer, a system for aligning whole genomes, chromosomes, and other very long DNA sequences. Since April 2003: MUMmer 3.0 and later releases are open source.
  2. The AMOS Assembler project is a set of tools, libraries, and freestanding genome assemblers, all open source. AMOS is also an open consortium that includes TIGR, the University of Maryland, The Karolinska Institutet, and the Marine Biological Laboratory.
  3. Hawkeye, a flexible graphical interface to genome assemblies from a variety of assemblers.
  4. AMOScmp is a comparative genome assembler, which uses one genome as a reference on which to assemble another, closely related species.  See the journal paper here.
  5. MINIMUS (new in August 2004) is a small, lightweight assembler for small jobs such as assembling a viral genome, assembling a set of reads that match a single gene, or other tasks that don't require the complex infrastructure of a large-genome assembler.
  6. BAMBUS the first publicly available, standalone genome sequence scaffolding program. It orders and orients contigs into scaffolds based on various types of linking information.
  7. AutoEditor, a tool for correcting sequencing and basecaller errors using sequence assembly and chromatogram data. On average AutoEditor corrects 80% of erroneous base calls, with an accuracy of 99.99%.

Transcription terminators, operons, and motif analysis tools

  1. ELPH, a motif finder that can find ribosome binding sites, exon splicing enhancers, or regulatory sites.
  2. SeeESE, an online tool for identifying exon splicing enhancers (ESEs) in Arabidopsis, Drosophila, and other species.
  3. TransTermHP (new release, November 2005), a program that finds rho-independent transcription terminators in bacterial genomes.
  4. OperonDB (new release, fall 2006), results from our operon-finding software on a large number of prokaryotic genomes.  (Originally described in Ermolaeva et al., Prediction of operons in microbial genomes.)
  5. Skewed oligomers from bacterial and archaeal genomes (described in Gene 217:1-2 (1998), listed above).  Get the source code or Linux executable here. Tables of skewed oligomers for: A. fulgidis, B. burgdorferi, B. subtilis, C. trachomatis, E. coli, H. influenzae, H. pylori, M. genitalium, M. jannaschii, M. pneumoniae, M. thermoautotrophicum, Synechocystis sp. PCC 6803, T. maritima, T. pallidum

Laboratory members, current and recent

Machine Learning systems (pre-1995)

  1. The OC1 decision tree system (source code included)
  2. The PEBLS memory-based reasoning system (source code included)

Students' Ph.D. theses

Courses

Personal

My father, Herman Salzberg, has a home page at the University of South Carolina.
My brother Alan Salzberg is president of Quantitative Analysis, Inc., a statistical consulting company.
My wife Claudia and I have two daughters, Annika and Alyssa.