Combined developments of greater sequence output and longer reads, together with the random nature of PacBio errors have facilitated improved de novo assembly outputs. The long-read PacBio platform was speculated to be increasingly used to produce finished microbial genome assemblies 4, 6, supported by several recent examples 19– 23 and the utility of long-read sequencing for microbial genomes has been reviewed recently 24. This underlines the requirement and utility of hybrid approaches to the scientific community. 11) and within a short time frame the field continued to progress with the release of newer hybrid algorithms 12– 16 and updates to existing ones 17, 18. An evaluation of various hybrid assembly strategies was recently published in mid-2014 ( ref. Therefore, development of hybrid approaches which utilize previous sequencing data and also provide an option to employ long-read data remains as the major scientific focus area. To address this, efficient algorithms were developed 9, 10, which require either >100x PacBio sequence coverage or accurate Illumina reads for error correction. A limitation for earlier versions of PacBio technology for producing accurate genome assemblies was high error rates (> 15%) combined with lower sequence output (100 Mb) 9. Since its introduction, the PacBio sequencing platform has become more widely used due to the utility of its longer read lengths 7 and range of applications 8. The majority of published draft genomes have been sequenced using second generation sequencing technologies (Illumina and 454) and this data is readily available 6. In general, the second generation sequencing platforms are characterized by shorter read lengths while third generation platforms generate significantly longer, but fewer and more error prone reads. A performance comparison of various NGS platforms and recent advances are summarised 2, 3, 5. Currently, Illumina offers the highest throughput and the lowest per base cost 3, while third generation sequencing technology provider Pacific Biosciences (PacBio) has median read lengths in range of 4–5 kb and reads length >20 kb 4. During these ten years several NGS platforms including 454, Illumina, SOLiD, Ion Torrent and Pacific Biosciences (PacBio) have been released and improved 2. It has been a decade since the release of the initial Next Generation Sequencing (NGS) platform by 454 Life Sciences (now Roche) 1. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data. In this paper, we describe high quality sequence datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |