This website is now retired. Please visit our new one!

 

See caenorhabditis.org to see updates, browse, BLAST and download our genomes.

Or head directly to:

ensembl.caenorhabditis.org: custom Ensembl genome browser

blast.caenorhabditis.org: BLAST server

download.caenorhabditis.org: download server

 

FTP server

 

Visit our HTTP server We have uploaded the preliminary assemblies for the first 16 new species. You can BLAST search them here. The Poster and Booklet for the CGP workshop at the C. elegans meeting in LA (June 2015) are also available.

 

Project Strategy

 

Genomics of C. elegans is but one nematode, an “anecdotal” instance of how a genomic system generates a complex organism. But how did this system come to be? Which parts are historical accident and which are the result of selection? What competing forces are at work in shaping the genome - its composition, size, synteny and linkage dynamics, repeat content, mobile element diversity, gene structure, gene birth and death, sequence diversity, … ? To deliver answers to these questions (and many more) we contend that genome sequence information from as many related species within the genus Caenorhabditis will form an essential backdrop to specific research programmes.

 

The time is now ripe for a programme to sequence the diversity of Caenorhabditis. In the last 10 years, there has been a remarkable global effort of discovery of new species, sparked by the Félix lab’s discovery of the likely “true” ecology of Caenorhabditis in rotting fruits and other plant material [2]. The number of species in culture now exceeds 40 (and is growing) and their relationships have been robustly inferred using multi-locus analyses by the Kiontke lab [3]. Several genomes are already available. After the success of the C. elegans genome project, the genome of C. briggsae was sequenced [4], the NHGRI sponsored the sequencing of C. nigoni, C. brenneri and C. tropicalis at WUGSC [5], the Sternberg lab has sequenced C. angaria [6], the Phillips lab has sequenced C. remanei and the Blaxter lab has sequenced C. wallacei (aka sp. 16), sp. 5 and sp.1.

 

A step change in sequencing technologies, and in assembly algorithms, now means that good-enough genomes can be generated quickly, efficiently and cheaply. We therefore have embarked on a project to “complete” the sequencing of all Caenorhabditis species currently available in culture, a Caenorhabditis Genomes Project (CGP). The project will be funded largely from generous application of intramural support from Edinburgh Genomics (http://genomics.ed.ac.uk), and led by the Blaxter laboratory in Edinburgh (http://www.nematodes.org), but we invite all interested researchers to join us in an open collaboration. Additional funding will be sought to improve the genome assemblies, and any support available in the community will significantly improve what can be done. We expect that additional species will be discovered, and would hope to add them to the project as they are defined.

 

The strategy
The current roster of genomes, and their status, is available at http://caenorhabditis.bio.ed.ac.uk. We intend that the GCP will be an open collaboration and will be making data available for free download under the “usual” agreements - basically that anyone carrying out whole genome analyses contacts us before proceeding to publication (and preferably much earlier) so that we can all coordinate efforts. There is so much to be done that collaboration will be essential.

 

Data generation
Our strategy is to ask researchers with live cultures, preferably inbred strains, to make DNA and RNA and to ship these to Edinburgh for sequencing. We are not demanding that inbred lines be generated, as this process often takes many months, and can generate very sick nematodes that are unlikely to be good representatives for their species. Advances in assembly routines mean that we are much better able to deal with heterozygosity issues during assembly. We are currently generating a standard dataset for each species (125 b paired end data from two short insert genomic libraries at 350 and 550 bases [~80 M read pairs, or ~100x coverage], and stranded RNASeq data [~25 M read pairs]) using Illumina HiSeq2500v4 instruments. For selected species we may also produce Illumina mate pair libraries and / or PacBio data (and would encourage colleagues with special interest in a species to “sponsor” the generation of these additional scaffolding data).

 

Primary analyses
Raw data will be posted on the project website as it is generated and passes QC (and also uploaded to SRA). Colleagues are free to download and analyse the raw data. We will be building best-effort assemblies for each genome, possibly by having collaboratively competitive mini-assemblathons for each set of species as they come off the sequencers. Assemblies will be posted along with explicit recipies describing how they were generated and core quality metrics.

 

Annotation
We will perform best-practice gene finding on each species using the stranded RNASeq and comparative data from other species, and decorate the genomes with annotation (sequence similarity, domains, expression values). The genome annotation files (and a description of the protocols used) will be posted for download. A combination of skills and approaches will give the best results and we will coordinate “annotatathons”, perhaps using collaborative platforms such as WebApollo. In particular, we propose to perform bulk reannotation of all species, following the same protocols for each, periodically (for example when we hit 15 or 20, or all species).

 

Genome databasing and publication
Genome sequences, genes and annotations will be made available through a local genome explorer (an BADGER [7] instance). The BADGER “versions” of the genomes will not act as “databases of record” - we are not intending to replicate WormBase - but rather interim homes for the data to spur research and cooperation. When a genome reaches a stable annotation status, we will deposit it in INSDC (ENA/GenBank/DDBJ) and WormBase [8]. We will aim to promote peer-reviewed publication of the genomes and analyses, and will also publish data papers so that the genomes can be sensibly used and cited as early as is possible.

 

Project timing, oversight, staffing
We have started the project. In addition to the three species sequenced by the Blaxter lab in collaboration with Asher Cutter and Marie-Anne Félix already, the Félix lab has provided genomic DNA and RNA from eight new species, and data has been generated for four of these (as of 01 Nov 2014). For many other species DNA and RNA are being generated, and the Rockman, Phillips, Fierst and Wang labs are sequencing additional taxa (and strains). We hope to complete the sequencing in Edinburgh by late Spring 2015, and have assemblies by late Summer 2015. Obviously as data is to be released as we generate it, there will be incremental updates as we approach completion.

 

We will maintain a project blog, announcing upcoming data, and also an annotation/interest roster where individuals and groups can express interests in species or analysis topics. An open google group will be used to foster discussion and data sharing. Management and oversight will be light. We propose that an oversight group (composed of - minimally - Mark Blaxter, Marie-Anne Félix, Karin Kiontke, Erich Schwarz, and a WormBase representative) will coordinate data release announcements and assure quality through open conference calls. We would like to have a “Genomics of the genus Caenorhabditis” workshop at the 2015 International C. elegans Meeting.

 

Caenorhabditis phylogenetic tree from Kiontke et al. (2011)

In total, 16 new Caenorhabditis species were discovered by collecting samples from rotting fruit in a number of locations worlwide. The phylogeny was then determined by obtaining sequence data for two rRNA and nine protein-coding genes.

To view the genomes of these species, navigate to our interactive version of this phylogenetic tree.

Access the full paper here.

The Blaxter lab will coordinate this project.

 

Mark Blaxter   mark.blaxter@ed.ac.uk

Lewis Stevens  lewis.stevens07@gmail.com

Edinburgh Genomics is 'sponsoring' the sequencing of 20 Caenorhabditis genomes.

genomics.ed.ac.uk

Header image source: Bob Goldstein. Navigate to his lab website for more C. elegans videos.