Systematic genome-scale phenotyping
To connect genotypes to phenotypes, we measured the growth of 58,101 Chlamydomonas mutants representing 14,695 genes (83% of all genes encoded in the Chlamydomonas genome, based on the current genome annotation, v5.6) under 121 environmental and chemical stress conditions (both control and experimental conditions are given in Supplementary Tables 1 and 2). We pooled the entire Chlamydomonas mutant collection from plates into a liquid culture and used molecular barcodes to quantify the relative abundance of each mutant after competitive growth (Fig. 1a–f). Growth conditions included heterotrophic, mixotrophic and photoautotrophic growth under different photon flux densities and CO2 concentrations, as well as abiotic stress conditions such as various pH and temperatures. We also subjected the library to known chemical stressors, including DNA-damaging agents, reactive oxygen species, antimicrobial drugs such as paromomycin and spectinomycin and the actin-depolymerizing drug latrunculin B (LatB). To further expand the range of stressors in the dataset, we identified 1,222 small molecules from the Library of AcTive Compounds on Arabidopsis (LATCA)13 that negatively influence Chlamydomonas growth (Extended Data Fig. 1, Supplementary Table 3 and Supplementary Data 1) and performed competitive growth experiments in the presence of 52 of the most potent compounds. We chose to screen the LATCA library for active compounds in Chlamydomonas because we believed that these compounds would be more likely to impact pathways both in Chlamydomonas and in plants, thus providing more general insights into gene functions in the green lineage. Taken together, this effort represents, to the best of our knowledge, the largest mutant-by-phenotype dataset to date for any photosynthetic organism, with 16.8 million data points (Supplementary Table 4).
Mutants show genotype–phenotype specificity
To identify mutants with growth defects or enhancements due to a specific treatment, we compared the abundance of each mutant after growth under the treatment condition to its abundance after growth under a control condition (Fig. 2a). We called this comparison a screen and the ratio of these abundances the mutant phenotype (Fig. 2b,c). Mutant phenotypes were reproducible between independent replicates of a screen (Fig. 2c,d).
Individual mutants exhibited genotype–phenotype specificity. For example, mutants disrupted in the DNA repair gene POLYMERASE ZETA (POLZ, encoded by Cre09.g387400) exhibited growth defects in the presence of the DNA crosslinker cisplatin, and these mutants did not show growth defects in unrelated screens (Fig. 2d). We observed similar genotype–phenotype specificity for other genes and phenotypes, including sensitivity to low CO2, ciliogenesis and LatB sensitivity (Fig. 2d).
In many screens, mutants that exhibited phenotypes were enriched for disruptions in genes with expected function. In 46 out of 223 screens, at least one Gene Ontology (GO)14 term was enriched (FDR < 0.05) in the genes disrupted in mutants whose growth was perturbed in the screen (Fig. 2e, Extended Data Fig. 2 and Supplementary Table 5). These enriched GO terms corresponded to functions known to be required for survival under the respective treatments. For example, screens with DNA-damaging agents resulted in GO term enrichments such as ‘DNA replication’, ‘Nucleotide binding’ or ‘Damaged DNA binding’. These GO term enrichments suggest that the phenotypic screens are correctly identifying genes required for growth under the corresponding treatments.
In total, 10,380 genes (59% of all Chlamydomonas genes) are represented by one or more 5′ untranslated region (UTR), coding DNA sequence (CDS) or intron insertion mutant that showed a phenotype (decreased abundance below our detection limit) in at least one screen (Fig. 2f). Although a lone mutant showing a phenotype is not sufficient evidence to conclusively establish a gene–phenotype relationship, we anticipate that these data will be useful to the research community in at least three ways. First, they can help prioritize the characterization of candidate genes identified by other means, such as transcriptomics or protein–protein interactions. Second, they facilitate the generation of hypotheses about the functions of poorly characterized genes. Third, they enable prioritization of available mutant alleles for further studies, including to establish a gene–phenotype relationship by complementation and/or backcrossing. The genotype–phenotype specificity of individual mutants and the enrichment of expected functions suggest that our data can serve as a guide for understanding the functions of thousands of poorly characterized genes.
High-confidence gene–phenotype relationships
The availability of multiple independent mutant alleles for individual genes allowed us to identify high-confidence gene–phenotype relationships. When multiple independent mutant alleles for the same gene show the same phenotype, the confidence in a gene lesion–phenotype relationship increases, because it is less likely that the phenotype is due to a mutation elsewhere in the genome (on average, there are 1.2 integration events per mutant, and the mutants can also carry other mutations such as point mutations) or that there was an error in mapping of the mutation12. Using a statistical framework that leverages multiple independent mutations in the same gene (Methods), we identified 1,218 high-confidence (FDR < 0.3) gene–phenotype relationships involving 684 genes (Fig. 3a and Supplementary Tables 6 and 7), including hundreds of genes with no functional annotation in the green lineage (Supplementary Table 8). Our gene–phenotype relationships include 302 high-confidence (FDR < 0.3) interactions involving 195 genes and 39 LATCA drugs, providing clues to the drugs’ targets and improving the value of these drugs as tools for perturbing specific pathways (Supplementary Table 7). Based on the highest-confidence (FDR < 0.05) phenotypes, we suggest names for 89 previously unnamed genes (Supplementary Table 9).
As an example of how individual gene–phenotype relationships advance our understanding, we made the unexpected observation that mutants in the gene encoding the chloroplast unfolded protein response (cpUPR) kinase, MUTANT AFFECTED IN CHLOROPLAST-TO-NUCLEUS RETROGRADE SIGNALING (MARS1)15, were sensitive (FDR < 10−9) to the DNA-damaging agent methyl methanesulfonate (MMS) (Fig. 3b). We validated this phenotype in a separate growth assay and showed that the MMS sensitivity of these mutants is rescued by complementation with a wild-type copy of MARS1, but not by a kinase-dead version (Fig. 3c,d). We also determined that treatment with MMS led to induction of VESICLE-INDUCING PROTEIN IN PLASTIDS 2 (VIPP2), a highly selective cpUPR marker, in wild-type cells but not in mutants lacking MARS1 (Fig. 3e). These results illustrate the value of our high-throughput data and suggest the intriguing possibility that the cpUPR is activated via MARS1 upon DNA damage or protein alkylation and has a protective role against these stressors.
From phenotypes to pathways
To facilitate data visualization and predict the functions of poorly characterized genes in our dataset, we used the principle that mutant alleles with similar phenotypes tend to occur in genes that function in the same pathway5. We clustered the 684 genes with high-confidence phenotypes based on the similarity of their phenotypes across different treatments (Fig. 4a and Supplementary Data 2). The correlation of phenotypes was largely unrelated to transcriptional expression correlation16, suggesting that the two approaches provide complementary information (Extended Data Fig. 3 and Supplementary Table 10). We named some of our gene clusters based on the presence of previously characterized genes or based on the conditions that produced the most dramatic phenotypes in a cluster (Fig. 4b–f and Supplementary Table 9). Below, we provide examples of how the data recapitulate known genetic relationships and provide insights into the functions of poorly characterized genes.
Essential DNA repair pathways are conserved in green algae
DNA damage repair pathways are among the best-characterized and most highly conserved across all organisms17,18; thus, they serve as a useful test case of the quality of our data. In our dataset, homologues of known DNA repair proteins are present in a large cluster (Fig. 4e), demonstrating the quality of our phenotypic data, validating our ability to identify that these genes work in a common pathway and extending the conservation of their functions to green algae.
Mutants for various DNA repair genes exhibit expected differences in their sensitivities to different types of DNA damage: (1) DNA double-strand breaks (zeocin and bleomycin), (2) DNA crosslinks (mitomycin C and cisplatin) and (3) DNA alkylation (MMS). For example, mutants exhibiting sensitivity to all DNA damage conditions included mutants lacking upstream DNA damage-sensing kinase ATAXIA TELANGIECTASIA AND RAD3-related protein (ATR, encoded by Cre10.g467200) (ref. 19), as well as mutants lacking the cell cycle checkpoint control protein RADIATION SENSITIVE 9 (RAD9, encoded by Cre16.g682950) or its binding partner HYDROXYUREA-SENSITIVE 1 (HUS1, encoded by Cre12.g524350) (ref. 20). Mutants specifically sensitive to the double-strand break-inducing agents zeocin and bleomycin included the upstream sensor of double-strand breaks, the kinase ATAXIA-TELANGIECTASIA MUTATED (ATM, encoded by Cre13.g564350) (ref. 21) (Supplementary Table 6) and DNA POLYMERASE THETA (POLQ, encoded by Cre16.g664301), which facilitates error-prone double-strand break repair and can maintain genome integrity when other repair pathways are insufficient22,23 (Supplementary Table 6). Mutants specifically sensitive to the DNA crosslinker cisplatin included cells with genetic lesions in the helicases REGULATOR OF TELOMERE ELONGATION HELICASE 1 (RTEL1, encoded by Cre02.g089608) (ref. 24) and FANCONI ANEMIA COMPLEMENTATION GROUP M (FANCM, encoded by Cre03.g208833) and in the crossover junction endonuclease METHANSULFONATE UV SENSITIVE 81 (MUS81, encoded by Cre12.g555050).
Our data suggest several instances where a given factor is required for the repair of a specific class of DNA damage in Chlamydomonas, but not in Arabidopsis, or vice versa, suggesting lineage-specific differences in how DNA damage is repaired. For example, Chlamydomonas fancm mutants are sensitive to the DNA crosslinker cisplatin, whereas Arabidopsis fancm mutants are not25. Conversely, Arabidopsis mus81 mutants are sensitive to the alkylating agent MMS and the DNA crosslinker mitomycin C26, whereas Chlamydomonas mus81 mutants were not.
Taken together, our data suggest that the core eukaryotic DNA repair machinery defined in other systems is generally conserved in green algae. Moreover, the observation of expected phenotypes illustrates the quality of the presented data and the utility of the platform for chemical genomic studies.
Classification of genes based on photosynthesis phenotypes
Our data allowed the classification of 38 genes whose disruption leads to a photoautotrophic growth defect into two clusters. One cluster consisted of genes whose disruption confers sensitivity to light when grown on medium supplemented with acetate, whereas the other contained genes whose disruption does not (Fig. 4b,c and Supplementary Data 2).
The light-sensitive cluster (Fig. 4c) included genes encoding core photosynthesis components and biogenesis factors such as the mRNA trans-splicing factors RNA MATURATION Of PSAA (RAA1)27, RAA328, OCTOTRICOPEPTIDE REPEAT 120 (OPR120) and OPR10429; photosystem II biogenesis factor CONSERVED IN PLANT LINEAGE AND DIATOMS 10 (CPLD10)29,30; the chlorophyll biogenesis factor Mg-CHELATASE SUBUNIT D (CHLD)31; the ATP synthase translation factor TRANSLATION DEFICIENT ATPase 1 (TDA1)32; the Rubisco mRNA stabilization factor MATURATION OF RBCL 1 (MRL1)33; and the Calvin–Benson–Bassham cycle enzymes SEDOHEPTULOSE-BISPHOSPHATASE 1 (SEBP1)34 and PHOSPHORIBULOKINASE 1 (PRK1)35. Several highly conserved but poorly characterized genes are also found in this cluster, including the putative Rubisco methyltransferase encoded by Cre12.g52450036, the putative thioredoxin Cre01.g037800, the predicted protein with a domain of unknown function (DUF1995) Cre06.g281800 (which we named LIGHT SENSITIVE AND/OR ACETATE-REQUIRING 4 (LSAR4)) and Cre13.g572100 (which we named LIGHT GROWTH SENSITIVE 4 (LGS4)), as well as four Chlorophyta-specific genes. The mutant phenotypes of these poorly characterized genes and their presence in this light-sensitive cluster together suggest that their products could mediate the biogenesis, function or regulation of core components of the photosynthetic machinery.
The low CO2-sensitive cluster (Fig. 4b) contains known and new components of the algal CO2-concentrating mechanism (CCM), as detailed below.
New CCM components
The CCM increases the CO2 concentration around the CO2-fixing enzyme Rubisco, thus enhancing the rate of carbon uptake. The mechanism uses carbonic anhydrases in the chloroplast stroma to convert CO2 to HCO3−, which is transported into the lumen of the thylakoid membranes that traverse a Rubisco-containing structure called the pyrenoid37. There, the lower pH drives the conversion of HCO3− back into concentrated CO2 that feeds Rubisco37. Mutants deficient in the CCM are unable to grow photoautotrophically in air, but their photoautotrophic growth is rescued in 3% CO2 (ref. 37). We observed this phenotype for one or more alleles of genes whose disruption was previously shown to disrupt the CCM (Supplementary Table 4), including genes encoding the chloroplast envelope HCO3− transporter LOW CO2 INDUCIBLE GENE A (LCIA)38, and the thylakoid lumen CARBONIC ANHYDRASE 3 (CAH3) (ref. 39), the stromal carbonic anhydrase LOW CO2 INDUCIBLE GENE B (LCIB)40, the master transcriptional regulator CCM1/CIA5 (refs. 41,42) and the pyrenoid structural protein STARCH GRANULES ABNORMAL 1 (SAGA1) (ref. 43) (Supplementary Table 6).
We observed similar high CO2 rescue of photoautotrophic growth defects for mutants in multiple poorly characterized genes in the light-insensitive cluster, suggesting that many of these genes are new components in the CCM. These genes formed a cluster with SAGA1 (ref. 43), the only previously known CCM gene with enough alleles to be present in the cluster. We named one of these genes, Cre06.g259100, SAGA3 (STARCH GRANULES ABNORMAL FAMILY MEMBER 3) because its protein product shows homology to the two pyrenoid structural proteins SAGA1 and SAGA2 (ref. 44) (Extended Data Fig. 4). Consistent with a role in the CCM, SAGA3 localizes to the pyrenoid45. We also observed this phenotype in mutants lacking the pyrenoid starch sheath-localized protein STARCH BRANCHING ENZYME 3 (SBE3) (ref. 46), suggesting that this enzyme plays a key role in the biogenesis of the pyrenoid starch sheath, a structure surrounding the pyrenoid that was recently shown to be important for pyrenoid function under some conditions47. Our cluster also contains the gene encoding FUZZY ONIONS (FZO)-like (FZL), a dynamin-related membrane remodeling protein involved in thylakoid fusion and light stress; mutants in this gene have pyrenoid shape defects48. Our results suggest that thylakoid organization influences pyrenoid function. The cluster additionally includes genes encoding CLV1 (Cre13.g574000), a predicted voltage-gated chloride channel that we hypothesize is important for regulating the ion balance in support of the CCM or, alternatively, may directly mediate HCO3− transport; a protein containing a Rubisco-binding motif44 (Cre12.g528300, which we named LOW CO2 SENSITIVE 1 (LOCO1)); and a predicted Ser-Thr kinase HIGH LEAF TEMPERATURE 1 (HT1) (Cre02.g111550). The kinase is a promising candidate for a regulator in the CCM, as multiple CCM components are known to be phosphorylated49,50,51, but no kinase had previously been shown to have a CCM phenotype.
Also in this cluster are genes encoding the predicted PYRUVATE DEHYDROGENASE 2 (PDH2) (Cre03.g194200) and the predicted DIHYDROLIPOYL DEHYDROGENASE (DLD2) (Cre01.g016514). We hypothesize that these proteins are part of a glycine decarboxylase complex that functions in photorespiration, a pathway that recovers carbon from the products of the Rubisco oxygenation reaction. PDH2 was found in the pyrenoid proteome52, suggesting the intriguing possibility that glycine decarboxylation may be localized to the pyrenoid, where the released CO2 could be recaptured by Rubisco.
New genes with roles in cilia function
Chlamydomonas cells swim using two motile cilia. To identify mutants with abnormal cilia function, we separated cells based on swimming ability by placing the mutant pool in a vertical column and collecting the supernatant and pellet. In this assay, mutants with altered swimming behavior were enriched in GO terms such as ‘dynein complex’, which comprises motor proteins involved in ciliary motility (Fig. 2e). Eighteen genes were represented by enough alleles to provide high confidence (FDR < 0.3) that their disruption produces a defect in swimming (Fig. 4d). These genes were enriched (P = 0.0075, Fisher’s exact test) in genes encoding proteins found in the Chlamydomonas flagella proteome53. Half of these genes or their orthologs have previously been associated with a cilia-related phenotype in Chlamydomonas and/or mice (Supplementary Table 11).
In our analysis, these 18 genes formed four clusters that appeared to subclassify their function (Fig. 4d). The first cluster is enriched in known regulators of ciliary membrane composition and includes the gene encoding NEPHROCYSTIN-4-LIKE PROTEIN (NPHP4)54; the gene encoding its physical interactor TRANSMEMBRANE PROTEIN 67 (TMEM67, also named MECKEL SYNDROME TYPE 3 (MKS3) in mammals), which has been implicated in photoreceptor intraciliary transport55; and the gene encoding CENTRIOLE PROTEOME PROTEIN 290 (CEP290) (ref. 56). We validated the swimming defect of tmem67 and observed that the mutant has shorter cilia (Extended Data Fig. 5). The poorly annotated gene Cre15.g638551 clusters with these genes, suggesting that it may also regulate ciliary membrane composition.
The second cluster contains genes encoding BARDET-BIEDL SYNDROME 1 PROTEIN 1 (BBS1) and BBS9, components of the Bardet–Biedl syndrome-associated complex that regulates targeting of proteins to cilia57. The poorly annotated gene Cre15.g640502 clustered with these genes, suggesting that it may also play a role in targeting proteins to cilia.
The third cluster contains eight genes, four of which relate to the dynein complex, including the ciliary dynein assembly factor DYNEIN ASSEMBLY LEUCINE-RICH REPEAT PROTEIN (DAU1) (ref. 58,59), OUTER DYNEIN ARM (ODA), DYNEIN ARM INTERMEDIATE CHAIN 1 (DIC1) (ref. 60), DYNEIN HEAVY CHAIN 1 (DHC1) (ref. 61) and TUBULIN-TYROSINE LIGASE 9 (TTLL9), which modulates ciliary beating through the addition of polyglutamate chains to alpha-tubulin62. The predicted thioredoxin peroxidase gene Cre04.g218750 and three poorly annotated genes (Cre07.g338850, Cre01.g012900 and Cre16.g675600) clustered with these genes, suggesting possible roles in dynein assembly or regulation.
The fourth cluster contains three poorly characterized genes, FLAGELLA ASSOCIATED PROTEIN2 (FAP2), FLAGELLA ASSOCIATED PROTEIN 81 (FAP81) and TEF24. The protein encoded by FAP81 (Cre06.g296850) was identified in the Chlamydomonas cilia proteome53, and its human homologue DELETED IN LUNG AND ESOPHAGEAL CANCER PROTEIN 1 (DLEC1) localizes to motile cilia63. We validated the swimming defect of the fap81 mutant and established that it has shorter cilia (Extended Data Fig. 5). The localization to motile cilia in humans and our finding that mutating the encoding gene leads to a ciliary motility defect together suggest the intriguing possibility that impaired cilia motility contributes to certain lung and esophageal cancers.
New genes required for actin cytoskeleton integrity
Our analysis revealed a group of genes that render cells sensitive to LatB when any are mutated (Fig. 4f). LatB binds to monomers of actin, one of the most abundant and conserved proteins in eukaryotic cells, and prevents actin polymerization64 (Fig. 5a). LatB was first discovered as a small molecule that protects the sea sponge Latrunculina magnifica from predation by fish65 and is an example of the chemical warfare that organisms use to defend themselves and compete in nature (Fig. 5b).
Chlamydomonas protects itself against LatB-mediated inhibition of its conventional actin INNER DYNEIN ARM5 (IDA5) by upregulating the highly divergent actin homologue NOVEL ACTIN-LIKE PROTEIN 1 (NAP1), which appears to perform most of the same functions as actin but is resistant to inhibition by LatB66. Upon inhibition of IDA5 by LatB, IDA5 is degraded and the divergent actin NAP1 is expressed66. The expression of NAP1 is dependent on three other known genes, LatB-SENSITIVE1-3 (LAT1–LAT3) (Fig. 5c); thus, mutants lacking any of these four genes are highly sensitive to LatB66.
Our phenotype data revealed three new components of this F-actin homeostasis pathway, which we named LAT5 (encoded by Cre17.g721950), LAT6 (encoded by Cre15.g640101) and LAT7 (encoded by Cre11.g482750). LAT5 and LAT6 clustered together with three previously known components of the pathway (NAP1, LAT2 and LAT3), and disruption of all six genes rendered cells sensitive to LatB (Supplementary Table 6). Mutants in all three new components show a relatively mild phenotype when compared to those mutants in LAT1–LAT3 (Fig. 5d), illustrating the sensitivity of our phenotyping platform.
Ubiquitin proteasome-mediated proteolysis of IDA5 has been hypothesized to drive the degradation of IDA5 and promote the formation of F-NAP1 (ref. 67), but the factors involved were unknown. LAT5 and LAT6 encode predicted subunits of a SKP1, CDC53/CULLIN, F-BOX RECEPTOR (SCF) E3 ubiquitin ligase complex, whose homologues promote the degradation of target proteins68. The disruption of LAT5 and LAT6 impaired degradation of IDA5 upon LatB treatment, suggesting that LAT5 and LAT6 mediate IDA5 degradation (Fig. 5e). LAT7 encodes a predicted importin, and its disruption impairs IDA5 degradation after LatB treatment (Fig. 5e), suggesting that nuclear import is required for IDA5 degradation.
It was previously not clear how broadly conserved this F-actin homeostasis pathway is. We found that the land plant model Arabidopsis has homologs of IDA5, NAP1, LAT3, LAT5, LAT6 and LAT7. We observed that Arabidopsis mutants disrupted in LAT3, LAT5 and LAT6 are sensitive to LatB treatment (Fig. 5f,g), which was not expected a priori, suggesting that this pathway for actin cytoskeleton integrity and the gene functions identified here are conserved in land plants.