We’re a group of scientists representing the Human Cell Atlas, an international team effort to create comprehensive reference maps of all human cells—the fundamental units of life—as a basis for understanding human health as well as diagnosing, monitoring, and treating disease. Ask us anything!


Our bodies have 37 trillion cells. And for decades, scientists have been sorting them into buckets of different types, such as neurons, skin cells, liver cells and so on. However, we still don't have a comprehensive understanding of the cell types in our bodies. Without this knowledge, it's impossible to know which cells express the genes involved in a particular disease-and thus, to fully understand these diseases and develop effective and safe treatments for them.

But completing the quest for a complete "periodic table of cells" is suddenly within reach. New, powerful sequencing and imaging techniques allow us to determine which genes are expressed in each of tens of millions of individual cells -and we have accompanying big data algorithms to analyze the data they generate. Suddenly, it is possible to comprehensively map the cells in our bodies.

A large and growing international team of 632 scientists from 47 countries-the Human Cell Atlas consortium-has come together to make this a reality and build an open "Google Maps of the human body," as an ultimate reference for human biology. Because this team will be making its data openly available, researchers worldwide will be able to zoom in on this Google Map to the level of molecules and zoom out to the level of entire tissues and organs. Our team includes physicians, computer scientists, biologists, organ experts, technologists, software engineers, cell biologists and more, and they're collaborating in 238 projects across 22 human tissues.

We’re doing this AMA as part of the National Human Genome Research Institute’s celebration for National DNA Day, and we’d love to answer your questions about our vision, our science, or anything else you’d like to know about the Human Cell Atlas effort. Ask us anything!

Your hosts today are:

Aviv Regev, Ph.D.: Co-chair of the Human Cell Atlas Organizing Committee, Professor of Biology at MIT, Investigator at the Howard Hughes Medical Institute, and Chair of the Faculty at the Broad Institute of MIT and Harvard

Dana Pe'er, Ph.D.: Member of the Human Cell Atlas Organizing Committee, Co-Chair, Analysis Working Group, Human Cell Atlas, Chair, Computational and Systems Biology, Sloan Kettering Institute, Director, Gerry Center for Metastasis and Tumor Ecosystems,

Miriam Merad, M.D., Ph.D.: Member of the Human Cell Atlas Organizing Committee, Professor of Oncological Sciences, Professor of Medicine, Hematology and Medical Oncology, Immunology Institute Mount Sinai School of Medicine

Orit Rozenblatt-Rosen, Ph.D.: Lead Scientist at the Broad Institute, Human Cell Atlas, Institute Scientist, Scientific Director of the Klarman Cell Observatory, Associate Director of the Cell Circuits Program

Jane Lee: Project Manager at the Broad Institute, Human Cell Atlas, Administrative Operations Manager,Klarman Cell Observatory and Core Faculty Member and Chair of the Faculty, Broad Institute

Jennifer Rood, Ph.D.: Senior Development Writer at the Broad Institute

Garry Nolan, Ph.D.: Member of the Human Cell Atlas Organizing Committee, Rachford and Carlotta Harris Professor, Microbiology & Immunology, Stanford University School of Medicine

Kerstin Meyer, Ph.D.: Lead Scientist at the Wellcome Sanger Institute, Human Cell Atlas, Principal Staff Scientist, Wellcome Sanger Institute

More info here: https://www.humancellatlas.org/

Thanks for all of these wonderful questions! Even though this Reddit AMA is wrapping up, the Human Cell Atlas is really just getting started. We’d love to keep you updated on our progress, and of course, would always enjoy hearing from all of you as well. Please check us out at https://www.humancellatlas.org/ or on Twitter @humancellatlas. We’ll talk again soon!

Hi guys, thanks for taking the time to do this.

So, by now we have been shown that the more we try to categorize things, the more we realize that in nature categories are not as clear-cut as we humans like, and most times diferences are gradual and discrete (and even more so in Biology).

During your study, how do you establish where a cell type ends and a new one begins, and how does this relate with other known earlier categorizations that have been maade according to function, morphology or other molecular markers?

Along the same line, when establishing cell types, do you take into account non-molecular characteristics of the cells, like morphology or interactions with other cells, into the categorization process? And if so, how?

Finally, as I understand it a lot of your work will be based on single cell analyses of dissociated cells. How do you manage to reduce or account for the effect that the loss of its physiological environment and interactions has on a cell's gene expression program after dissociation? And to what extent can you expect this data to reflect the real cell type varierty of cells in the body?

Thank you again for doing this, I find your persuit very interesting and I wish you the best of luck! Also, sorry if you go into detail on any of this in your website, I could not read in depthl all the info, but I'll try later.


This is a great set of questions.

To your first question, discrete cell types are only one layer of information we’ll have in the HCA. In some cases, it might be more informative to observe how cells change over time or as they move throughout tissues, rather than classifying them into cell types. In our white paper, we describe several ways to ascertain whether a cell type is distinct.

Empowered by the data collected by the HCA, computational biologists (with backgrounds in computer science, mathematics, statistics and physics), together with biologists (including pathologists, molecular and cell biologists, and domain experts) are developing new definitions, abstractions and and frameworks to represent and organize cell phenotypes, types and states.

Another way to think about where “one cell begins and another ends” in defining cell types is to start organizing cells according to which other cells they interact with in tissues. You can think of cells as the first level of tissue organization. But the neighboring cells cells actually help define the function, too. (For instance, a T cell alone in a tissue, or surrounded only by other T cells, might suggest one biology, but a T cell in close proximity to dendritic cells suggests suggests another. So, in that context we are already finding that in a continuum of cells, say, in B cell development, modest changes in surface expression of certain proteins defines an address of where in the tissue that cell will be found, and by definition, the other cells that are nearby.) So, we need to stop thinking about cells as individual components isolated from their environment… and start to think about cell context, we are multicellular organisms afterall. This, of course, is the fundamental goal of the tissue atlas-- understanding architecture and cell-cell relationships in a 3D structure.

Also, the HCA includes a spatial and a cellular branch of equal importance. There is rapid progress in spatial methods like Codex, IMC and MIBI for proteins and MERFISH, SeqFISH, FISSEQ for RNA and much more are coming. Fortunately, most of these assays can be applied to preserved tissue and so we can first test the methods and then apply them.

And, we fully take into account non-molecular features: in fact, this is why the spatial methods are so important! We want to see in what ways cells can be categorized by their intrinsic (internal) features, which can be the RNA and proteins they express but also their morphology, and then by their extrinsic features (“tell me thy neighbor”) and how these relate to each other. And, we very much hope that we can find in this way the neighborhood and little communities of cells that actually make up the structures in tissues, and how these organize hierarchically into tissue architectures of increasing scale.

Now to your question about dissociation. It is true that single cell methods require dissociation and sometimes this has unwanted effects. Some effects can be on the expression of genes -- this is observed but does not seem to be the most major issue. A bigger issue is that different kinds of cells in the same tissue can be more or less sensitive, and so we may get biases in our recovery. For example, GABAergic neurons are much more sensitive than other neurons and glia in a brain sample, or epithelial cells are more sensitive than immune T cells and so on. One way to address this is single-nucleus RNA-seq, because this can be applied to frozen or lightly fixed samples. Protocols for this are already available, including on our protocols.io repo https://www.protocols.io/groups/hca Other members of HCA have come up with ways to do dissociation in cold temperatures, which also helps — you can also check it out on our protocols repo. Having both the cellular and spatial data helps us find the biases of each method, too, and correct for them.

You can read more in our white paper and also watch our YouTube channel.

Given the recent discovery of a wholly new organ - the interstitium - can we expect to find wholly new major classes of cell types with this initiative? If so, what types in particular?


Awesome question: and yes! we do not just expect it, we are already finding new cells, even in some of the most well studied organ systems such as the immune system; you can check out our YouTube channel for some of the recent highlights!

To give you one example, late last year HCA members from the UK and the US found new varieties of the rarest cells in our blood, called dendritic cells, which are sentinels in the front lines of defense against disease. We didn’t know that these cells existed before! Because they were so rare, they were not seen. One of these new cell types could be important for a very rare kind of cancer, and another could be targeted to make better vaccines. Very recently, HCA members found another new cell type, this time in airways. It happens to express the gene for a well-known genetic disease, which we thought all along was expressed in a completely different cell type. Knowing the relevant cell type will be critical for developing targeted disease therapies. HCA scientists have also discovered new kinds of neurons and so on. And, finding new cells (and what genes they express) can help find new structures in tissues, which is important for identifying new things like the interstitium.

What we do not know is if we will find entirely new major categories of cells, say something as major as epithelial cells that make up many organs. If these are common it’s quite unlikely, but if these are rare, then it could be possible.

One of the cool questions to ponder is also if once we studied enough cells, we could predict ones we have not yet observed, like predicting the presence of an element in the periodic table.

There are advantages to different kinds of RNA-seq approaches, how do decide what to use for each experiment? (SMART-seq giving the best data but lower throughput vs. shallow 3’ DE information from 10X) While cost prohibitive, would it be best to do multiple approaches on each cell type to get the “true” representation of the transcriptome?


This is a great question, which we discuss at length in the HCA White Paper https://www.humancellatlas.org/files/HCA_WhitePaper_18Oct2017.pdf

As you said, there is a tradeoff today between number of cells and amount of information per cell. In fact, even if the methods capture both, cost considerations would maintain this tradeoff. And so, HCA decided to take a strategy we call Sky Dive where we start with many cells profiled shallowly (thousands of molecules per cell, collected uniformly from tissue so that rarer cell types are not well characterized) and then use this information to identify regions to look more deeply (and find rare cell types). The draft Immune Cell Atlas (https://preview.data.humancellatlas.org/) will have data collected in such a scheme to help assess these options.

Then, our Analysis Working Group runs jamborees where dozens of labs meet face to face to analyze this data and help develop solutions to analysis challenges. Specifically, a recent Jamboree worked on developing new statistical and computational experimental design methods to address just that: How many cells do we need to sample? Where do we need to sample more? And at what depth? Great progress was made in three days of intense collaborative brainstorming, and the groups continue to address these challenges by working together remotely. Some of the conclusions from the first jamboree (e.g. how to detect an empty droplet?) are already online in the bioRxiv.

Also, HCA members are mounting two efforts to systematically compare such protocols. One is led by Holger Heyn, and uses a single set of samples, shipped out to many labs. The other, by Joshua Levin, uses five different types of samples (including one from the Heyn team), all tested simultaneously across many protocols. A similar effort is being mounted across spatial techniques in a project called SpaceTx.

During your research, which type of cells have surprised you the most?


HCA scientists already had some pretty big surprises, only some which have been published so far, so more to come! But to give you one example, late last year HCA members from the UK and the US found new varieties of the rarest cells in our blood, called dendritic cells, which are sentinels in the front lines of defense against disease. We didn’t know that these cells existed before! Because they were so rare, they were not seen. One of these new cell types could be important for a very rare kind of cancer, and another could be targeted to make better vaccines.

Very recently, HCA members found another new cell type, this time in airways. It happens to express the gene for a well-known genetic disease, which we thought all along was expressed in a completely different cell type. Knowing the relevant cell type will be critical for developing targeted disease therapies.

You can hear more about these discoveries in talks in our YouTube Channel. The latest and greatest is here: https://www.youtube.com/watch?v=xY6MqOOo4Vo&list=PLkef4SGmngdYA47GG9Z_Q00EtIrSAyJxn

At the end of this, if there is such a thing, what are your best case scenario results? I’m sure that there have to literally be thousands, if not millions, of positive ways to utilize your research and I imagine that’s one thing some of you lay in bed at night and think about.


The Human Genome Project (a map of all our genes) fundamentally changed the way we understand — and make progress in — biomedicine. We hope that the Human Cell Atlas (a map of our cells, tissues and organs) will generate an even bigger revolution in the way we understand and are empowered to conduct biomedical research. Specifically, we are hoping for two things. First, we want to change the fundamental way in which we understand biology by finding the biological programs that control cells, tissues and organs form—and how different cell types relate to their neighbors (forming the next level of organization above the single cell). Second, the HCA would give us both a reference map and a set of navigation tools by which to compare “normal” reference tissue to disease. In that world, all our methods will then make their way to clinical practice -- in how patient’s blood and biopsies are monitored and interpreted, and in providing the first clues to pursue new drug targets. We do know that there is a big gap between a basic research project to build the reference, and the overall long term impact -- a lot of additional work will be required, but that is our goal.

To your question if there is an end for such a project, we do have a specific plan on how to complete each phase. Like with the Human Genome Project before us, our goal is not to study every cell in every person (That won’t work…), but to sample enough to understand what the common features are. Analogous to studies that mapped disease genes after the Human Genome Project, future and parallel projects will study subsets of cells, organs and systems in more individuals and in specific diseases, like different cancers, or Alzheimer’s, or diabetes, or food allergy, or inflammatory bowel disease, and so on.

It is important to remember that there is no “one” map… there is a range of what is “normal” for human. The first map will be, literally, a patchwork of maps from (probably) hundreds of individuals. Each tissue map will collate the results from samples from several individual humans. While we will at the beginning be looking at the common features that define a tissue’s biology and morphology, the DIFFERENCES are important because they define the range of human normal, the differences among us and the dynamic biology that allows tissues to function despite minor differences from individual to individual.

What are the different conditions under which you're sequencing the cells ? What was the most complicated part in the project ?


The aim of the Human Cell Atlas team is to profile cells in their normal healthy state. Of course, this is not really possible, since cells exist in tissues and inside the human body. A difficult part of the project is access to healthy human tissues. This is not always possible, as many tissues are only accessible during surgical procedures, and surgical procedures are not done on healthy tissues. We therefore had to come up with substitutes—for example, obtaining healthy tissues surrounding cancer lesions or tissue from organ donors, knowing that organ donosr receive lot of medications that can affect cell state.

To access the cells, we take tissue samples and measure with spatial technologies that let us map where the cells live in the tissue. Then, we need to separate out—or dissociate—the cells from the tissue before we can carry out single-cell sequencing experiments. This dissociation of living cells can lead to changes in the way the cells behave, but by carrying out control experiments, we can begin to understand what those changes are and account for them.

We can also use fixed tissue (frozen or paraffin). In this case, there are fewer changes associated with the dissociation, but it is not possible to isolate intact cells. Instead, we study the individual nuclei of cells. This also works really well, but it can be tricky to isolate nuclei from rare cell types. Furthermore, we are aiming to use different techniques to look at the cells. The results obtained from single dissociated cells can be compared to different types of spatial gene or protein technologies that profile individual cells within the tissue context, which is helpful because we can’t usually study as many genes within the tissue context.

By combining results of all these different approaches, we will hopefully understand how the different cell types behave in their normal tissue context. In some studies we are beginning to compare this to their behaviour in diseased tissues or during infections.

To your second question about the most complicated part of the project: Carrying out a single-cell sequencing experiment with human tissues requires a large team of people that each have specialist skills and in many ways, the trickiest aspect is to bring all the right experts together in a coordinated way with a shared language.

The first step is taking tissue from a donor. For tissues that are accessible (e.g. skin, upper airways etc) these tissues can be taken from healthy volunteers, usually by clinicians in a hospital setting. However, when trying to sample internal organs (liver, spleen etc) it is not usually possible to take samples from healthy volunteers. For these types of tissues, we have been able to obtain tissues from deceased organ donors, which can still be used for research. The next step involves a team of experimental biologists who carry out the tissue dissociation and the actual sequencing, as well as application of spatial technologies. This requires access to the latest types of sequencing and imaging technologies to carry out the experiment.

After the data is generated, computational biologists help to interpret the information. A complex infrastructure needs to be available to store the data, make it accessible, and then analyze it, collaboratively with domain experts, with tools developed by computational biologists, and integrate with existing data sets.

Once the computational analysis has been carried out, clinicians and biologists with expertise in the relevant organs need to interpret the data and come up with the actual insights from the experiments.

Bringing all these different puzzle pieces together, through collaborative community efforts comprised of people from different disciplines and backgrounds, is a huge challenge, but at the same time it is great fun to interact with experts from many different disciplines. Moreover, the HCA aims to share its data freely, so often the different collaborators not only have different backgrounds, but are also located on different continents!

I'm sure there are future implications and innovations that will result from the completion of this project that we can't predict. That being said, what is your biggest hope and/or wildest possibility that might result from researchers having access to this data in the future?


We hope the project will both transform our basic understanding of biology and have a deep impact on human health in the long term. For basic biology, we hope to build a “periodic table of the cells.” That could allow us to predict the existence of some cell types even before we observe them, or to find the molecular programs that lead to this amazing diversity of cells. We also think there could be new mathematical principles in this data.

In human health, we envision a change in diagnostics, based on very precise information on all the cells and molecules in a blood draw on a biopsy, for example. This would allow us to create something like a new “Complete Blood Count”, or CBC 2.0, where you know all the cells that should be present and can use it as a reference to compare to a new patient. Maybe we can get to a point where machine learning algorithms, trained on this big data, will let us predict all the cells and molecules that are in a patient’s tumor biopsy.

Also, we envision that when geneticists discover a new gene important in a human disease, whether a rare disease like cystic fibrosis or a common disease like asthma or Alzheimer’s, the atlas will tell them where precisely that gene is active. And, we envision that you could use the atlas to find new targets for therapies. Finally, this atlas can serve as a literal map for regenerative medicine (https://report.nih.gov/nihfactsheets/ViewFactSheet.aspx?csid=62).

I have a feeling that biology and medicine is on the verge of an explosion of breakthroughs with the quick advances in single cell sequencing, do you second this feeling?

Of all questions that until now was always impossible to answer, yet now feels like it is within reach, which one(s) are you most excited about?

PS: on a more personal note, I wrote the loom viewer part of the mousebrain.org atlas. While I'm no longer employed by Sten Linnarsson, I still have some bugs I would like to iron out and features I would like to implement. If you have seen it, I would be very happy to get some feedback on what you like and dislike about it (and yes, I know of scatter plot labelling bug :P)


These are indeed good times! Last week Francis Collins did an AMA (https://www.reddit.com/r/science/comments/8dn0jo/im_francis_collins_director_of_the_national/) and shared your opinion. Here he was asked, “aside from CRISPR, what's the next big thing in genetics?” He answered, “The next big thing maybe the ability to do biology on individual single cells. That is starting to happen using technologies that are capable of telling you which genes are on or off in just one cell. Since cells are the unit of life for all organisms, this opens up a window of biological understanding that will have profound consequences.”

Different members of HCA are likely excited by different questions. Some are really eager to find new cells we did not realize existed. Some of us want to find the code of development, i.e., how so many different cells develop from just one. Some are fascinated by the ecosystem or community of cells inside tissue, how they form structure and how they interact to maintain the tissue. It’s cool that we can look at all of these problems at once, and we often find interesting connections between them.

P.S. Loom is so cool, thank you so much for developing it and sharing broadly!

Hello and thanks for doing this AMA. I think this work is very interesting and I'm eager to hear your views.

The scale of this project is clearly unprecedented. In your view, how likely is it to give a truly complete description of the different cell types present in humans? Related questions: How abundant and important are rare cell types? Might factors that (I'm guessing) you have less power to discover at scale actually be important determinants of cell type? For example: localization and time course of expression or regulation of translation. Are all cell types present in all people? To what extent do pathogenic states of cells (e.g. cancer, maybe senescence) constitute cell types in themselves?


These are a lot of excellent questions. We will answer some of them, but also knowing that some will only be addressed when the project is further down the road. Our goal is to achieve profiles of cells up to a defined level of rarity (for types) or speed (for transitions). We discuss it in our white paper and our recent review as well (https://elifesciences.org/articles/27041 and https://www.humancellatlas.org/files/HCA_WhitePaper_18Oct2017.pdf ). We think about it analogously to thinking about human genetic variation: you may not know all the variants segregating in the population, but you want to get to all the variation down to a pre-set threshold, with prespecified confidence. To that predefined level, we aim to be comprehensive, and we are working on statistical models to guide and adapt the process (to sample cells, tissues, organs and individuals). And, we’ve already found rare cell types. (To give you one example, late last year HCA members found new varieties of the rarest cells in our blood, called dendritic cells, which are sentinels in the front lines of defense against disease. We didn’t know that these cells existed before! Because they were so rare, they were not seen. One of these new cell types could be important for a very rare kind of cancer, and another could be targeted to make better vaccines. Very recently, HCA members found another new cell type, this time in airways. It happens to express the gene for a well-known genetic disease, which we thought all along was expressed in a completely different cell type.)

Also, it is often easy to underestimate the power of having very large number of cells (or tissue sections etc): while each may be very noisy, we can infer a pretty robust estimate of the distribution.

The features you mention are very interesting. Because many processes in the body are not coordinated with each other, many temporal patterns can be captured by a large number of cells, as long as the process is sufficiently continuous. Of course, if the transition is very rapid, the cells may be rare (though can be enriched with sorting techniques). We hope that some of the amazing new spatial methods (for RNA and proteins) will give sufficient resolution for localization.

As to whether pathogenic cell types (e.g., cancer cells) are cell types themselves, there are multiple ways to think about this. One can consider pathogenic cells as an adulterated version of normal cells. Mathematically, one defines such a cell by its “distance” or the level of difference from a normal cell from which it was derived. For instance, in cancer, pathologists measure the stage of a cancer by the level of its abnormality. A slightly abnormal cell will arrange itself with other cells in a disorganized pattern in the tissue that a pathologist recognizes as early stage cancer (this is called dysplasia). As the cancer progresses (usually meaning additional genomic mutations driving more aggressive cancer) the disorders accumulate. However, we still think of a cancer cell as having a tissue origin — indeed, as having a single cell of origin. So, when one examines the genomic and epigenetic signatures of such a cell, it is possible to trace it to a cell of origin… and thereby measure the distance from normality that defines the disease state.

What is the expected time to project completion? Thanks in advance!


The project will proceed in several phases. We expect the first draft to be completed within approximately 5 years from our launch of the data collection phase (October 2017) . This draft could have as many as 100M cells, span most major tissues and systems, from healthy donors of both genders, with some geographic and ethnic diversity and some age diversity. We will know how these cells are also organized in the tissue, but will not yet have complete information on entire organs. The deeper characterization of diversity (age, ethnicity, geography) and full atlas of each organ will be in the later phases of the atlas.

Salutations, it is great to know this is available for inquiry.

First question: How might your ""Google Maps of the Human Body" project help in finding a way to effectively eradicate diseases like cancer? (i.e. will the results of the project allow one to effectively understand what make cancer cells different enough to understand how cancer cells to be destroyed at a more efficient level.)

Second Question: How might the project aid in curtailing certain genetic diseases like Sickle Cell Anemia or Huntington's Disease?

Third Question: What are some pointers that you have for someone such as myself who is interested in studying the cell types for the sake of finding out how to cure disease? Such as how to obtain more lab experience and hands-on understanding in the field, who I should try to get in contact with (specifically in the area of Virginia or just the East Coast of the US), what programs or online clubs I should be looking to join, etc. Based on your experiences of course.


To answer your first question:

Yes! In fact, HCA members across the world have formed a Tumor Cell Atlas effort. They benefit from the methods that HCA develops, both lab methods and computational algorithms that can be applied in tumors. This is important in cancer especially because heterogeneity, or differences between cells, is a major hallmark of cancer. The cancer cells vary from normal cells and from each other because they have different mutations and also different states.

The healthy data collected by the HCA will be essential to interpret and understand the tumor data. To understand disease, one needs to know how it differs from the normal state. For this reason, a comprehensive “normal” atlas is needed. Moreover, our current tumor atlas efforts reveal that tumor cells “reuse” programs from healthy cells (such as those for development, healing, migration) in new contexts to achieve their malignant abilities.

Another major opportunity is that the spatial relationships discovered — i.e., “neighborhoods” of cells — will inform us about the next layer of tissue organizations. These cell-cell interactions are a fundamental feature of what defines a tissue’s function.

Moreover, the tumor has many other cell types, including non-cancer cells that both fight and feed the cancer cells. In the last few years, with immunotherapy, it has become clear that sometimes the best therapies for cancer come from targeting these non-cancer cells that live inside the tumor and trying to affect the way they interact with the cancer cells—effectively trying to make the fighter cells fight better or cut off the feeder cells.

In summary, the cell atlas will let us compare the malignant cells to normal cells and see how they changed, and will ALSO let us compare all the non-cancer cells to the same cells in healthy tissue, such as healthy lung compared to lung cancer. This will help us find more genes to target.

What do you guys think about Seurat batch correction algorithm?


Full disclosure: Rahul Satija is a member of HCA and its Analysis Working Group :)

We assume you refer to this work: https://www.biorxiv.org/content/early/2017/07/18/164889 (now also published in NBT) and also point out related work by another HCA member and Analysis Working Group (AWG) co-chair John Marioni: https://www.biorxiv.org/content/early/2017/07/18/165118

These approaches tackle the challenge of combining datasets across batches, technologies etc. Our community is very excited about this direction, because HCA will include data from multiple batches and techniques, and these pioneering approaches provide a framework for doing so and demonstrate how it’s achievable. We are also excited about the ability to combine data of different modalities. Another Reddit participant asked about combining cell morphology and gene expression data. Such frameworks could be very helpful.

More generally, these are early first attempts to solve one of the HCA’s biggest and most challenging computational questions, and we expect to see much more emerging in this area, with a diversity of approaches. At this very moment, John Marioni, Rahul Satija and 85 additional computational biology groups funded by the Chan Zuckerberg Initiative (CZI) are working together at a retreat to combine the strengths of each group’s method. We hope this will lead to much improved methods for alignment, batch correction and normalization.

Definitely looking forward to the completion of this project! Thank you for answering questions today.

1) What do we need, as a community, need to do with this data? I can see this easily being something that has the potential to be ground breaking on many levels, but not getting enough attention or training plus accessibility to be meaningful to scientists distant from transcriptomic methods.

2) What future steps do you anticipate to introduce human cell atlas information and services similar to it in clinical practice alongside precision medicine?

3) Beyond computational analysis, what experimental validation or replication do you see being the most valuable with the data from the finished Human Cell Atlas?


To answer your first question: This is a great question! First, we are committed to have all data open and accessible. There will be multiple portals that will allow a user to explore the data and ask key questions: Which cells are there? What distinguishes “my” cell of interest from others? Which cell types does my sample map to? What other cells does “my” cell prefer to be in close proximity to in tissues? Where is my gene of interest expressed? When my cell of interest induces genes XYZ, what happens to other cells? And so on. It will take time, but portals with some of these functionalities are already emerging.

Second, we are committed to making the computational tools available openly on top of the (also open) Data Coordination Platform. This means you can access and analyze everyone’s data, and some of the early tasks will be automatically done by the platform. In fact, just today researchers from 85 teams are meeting in California to focus on coordinating such efforts! Tutorials, workshop and other educational outreach will follow.

What is the difference between "cell state" and "cell type"?


That’s the big question….

This is not well-defined and the HCA data already shows how these two concepts are intermingled. At some levels, the distinctions are clear: types are discrete and don’t change much over time, while states are more in flux. There are likely to be cell states that can be seen in multiple cell types. For example, there are particular characteristics of different stages of the cell cycle, and the signatures defining these states can be found in very diverse cell types. Similarly, we expect we’ll identify signatures of cell states associated with aging, starvation, hypoxia and other conditions across cell and tissue types. The data may also tell us about new cell states that we are not aware of yet.

Cell types are generally thought of by investigators as entities that have a given set of functions. A cell type, though, can have many cell states. But like anything that is defined in extremes, somewhere in the middle the nice distinctions may break down, along with our intuitions. It could be that a data-driven definition of cell type or state will highlight different features or mathematical characteristics. We discussed this also in a review paper which is published in: https://elifesciences.org/articles/27041. We believe that after more data is collected and analyzed we will be better poised to provide a more accurate answer to that question and it is indeed our intent to tackle it.

In addition, questions about cell lineages (relatedness between different cell types and states) may require experimental validation, such as fate mapping models, to directly address questions about cell differentiation or cell progeny.

Howdy and thanks for the AMA!

Will the Human Atlas project have a focus on active and inactive proteins per cell? Or will the project focus solely on gene expression levels?


The atlas is not restricted to RNA, although single-cell RNA-seq gave it a major head start and an enormous push. There are now great methods for multiplex measurements of proteins, which can also distinguish activation states (with appropriate antibodies). For example, CODEX, MIBI, and IMC all allow measurement of dozens of proteins some in tissue sections and some also in whole mount (3D). The Human Protein Atlas, https://www.proteinatlas.org/, a partner of HCA, will greatly help with such work. We also expect to have measurements of chromatin state, and other key molecular and structural parameters.

As new technologies give us greater and greater resolution (magnification), we will start thinking of cells as collections of protein machines carrying out functions. The cell is a convenient way for an organism to separate functions. Above the level of the cell is the level of cell-cell interactions and tissue function. Below the level of the cell are the molecular machines that drive function. We fully expect that by the time we have finished a first major draft of the HCA, new techniques will allow us to begin thinking about mapping proteins, RNA, chromatin, metabolites, etc. within cells with a similar focus on architecture and spatial organization.

What are some if the most important applications you think can be created with this research?


The Human Cell Atlas will change how we understand, diagnose, monitor and treat disease, both because it would provide a reference for healthy tissue, and because the methods used to create the atlas can and will also be applied in disease. Here are a few examples.

Diagnostics: A blood test today, like a complete blood count, measures only about six cell categories. In the age of the HCA, we will have ways to measure dozens of fine distinctions between cell types and all the genes they express. This means we can diagnose disease earlier and more precisely. Similarly, today’s tumor biopsies are analyzed only for structures and a handful of molecules. In the age of the HCA, we expect that we could identify all the key genes and cells and how they relate to each other. This is rich information, and it will let pathologists and clinicians know much more precisely what the patient has, how they would likely respond to different therapies and provide powerful leads for new drug targets and therapeutic approaches.

For example, the hope is that a gut tissue atlas will inspire clinicians working on patients with inflammatory bowel disease (IBD) to mine gut tissue biopsies prior to therapy and at different time points during therapy to capture all the cell and genes changes that could be relevant to better understanding the disease course, response or resistance to treatment. These atlases will help reclassify human diseases into groups with similar outcomes and responses to drugs. We could imagine a world where the word IBD will be replaced with more useful clinical classifications, leading to a direct impact on clinical care.

Another major area is drug discovery: the atlas -- and the methods we invent -- will let scientists screen potential drugs much faster and more precisely, and also allow us to monitor patients better throughout clinical trials to gain an improved understanding of how the drug acts.

Additional Assets


This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.