What do the ‘?’ and ‘=’ symbols mean in the Catalogue of Life taxonomic dataset? If you’ve ever downloaded or browsed this massive global species checklist, you might have seen some species names prefixed by a question mark or an equals sign. At first glance, these symbols can be puzzling, but they’re actually clues to the underlying taxonomic certainty and naming history of each entry. Let’s dive into what these symbols mean, why they’re used, and how they help researchers navigate the complexities of biological classification.
Short answer: In the Catalogue of Life (CoL) dataset, a ‘?’ before a taxon name indicates that the taxonomic status or placement of that name is uncertain or dubious—often meaning the record is unresolved, unaccepted, or its identity is questionable. An ‘=’ before a name marks it as a synonym, meaning that name is not the currently accepted one but is another scientific name historically used for the same species or taxon.
Let’s unpack these definitions and see how they function in practice, why they matter, and how they fit into the bigger picture of global taxonomy.
Understanding the Catalogue of Life and Its Data
The Catalogue of Life is one of the world’s most ambitious biodiversity projects. It seeks to provide “a comprehensive catalog of all known species of organisms on Earth,” drawing on over 160 taxonomic data sources and integrating the expertise of hundreds of scientists worldwide, as described on re3data.org and guides.library.ucla.edu. As of 2021, it lists over two million accepted species names and nearly as many synonyms, with coverage spanning animals, plants, fungi, and microorganisms, according to marinespecies.org and link.springer.com. This enormous dataset helps standardize species names for everything from conservation planning to international trade regulations.
Because taxonomy is a dynamic field—names change, new species are discovered, and old ones are re-examined—the CoL must constantly update its records. To manage this complexity, the dataset uses not just names and identifiers, but also symbols like ‘?’ and ‘=’ that quickly communicate the status of each taxon.
What Does ‘=’ Mean? Synonyms in the Catalogue of Life
The equals sign ‘=’ is used to indicate that a particular scientific name is a synonym, not the currently accepted name for that taxon. In taxonomy, synonyms arise when a species has been described more than once under different names, or when revisions in classification merge previously separate species. As the World Register of Marine Species notes, “a similar number of synonyms, homonyms, and prior combinations” exist alongside accepted species names in CoL, reflecting the ongoing changes and debates in the field.
For example, biology.stackexchange.com describes an entry like “=Halobacteriota Chuvochina et al., 2024 [phylum],” signifying that “Halobacteriota” is not the accepted name for this group but is recognized as a historical or alternative scientific name now considered a synonym. This helps users trace the naming history and avoid confusion when encountering different names for the same organism in older literature or other databases.
This system of synonym tracking is crucial for data integration and comparison across datasets. According to docs.ropensci.org, resolving synonyms ensures that researchers aren’t double-counting species or missing connections between records that use different names. For instance, in their analysis, they note that some bird species in the Breeding Bird Survey dataset resolve to synonyms in CoL, and the equals sign helps clarify which name is currently accepted.
What Does ‘?’ Mean? Taxonomic Uncertainty
The question mark ‘?’ before a taxon name flags uncertainty or doubt about the taxonomic status, identity, or placement of that name. This might mean that the species’ classification is unresolved, it’s not yet accepted by consensus, or there’s insufficient evidence to confidently assign it to a particular group.
On biology.stackexchange.com, examples include “?Sulfobacillaceae Cavalier-Smith & Chao, 2020 [family]” and “?Xylanibacter muris Hitch et al., 2023 [species].” Here, the question mark signals that these taxa are, in some way, problematic—perhaps based on incomplete data, lack of supporting publications, or ongoing debate among taxonomists.
Why does this matter? Taxonomic uncertainty is a constant challenge in biodiversity science. As experts.illinois.edu explains, the process of assembling the Catalogue of Life involves “collaborating with data providers to resolve detected taxonomic conflicts,” including “missing parents, duplicated and homonymic names,” and cases where names are “not yet accepted.” The question mark is a practical way for CoL to indicate to users that a taxon’s standing is provisional or under review.
How These Symbols Fit Into Data Quality and Use
The use of ‘?’ and ‘=’ is not just a quirk of formatting—it’s a deliberate strategy to make data quality and status immediately visible. In a dataset covering nearly two million species, clarity is critical. A synonym flag (‘=’) prevents misidentification and aids in record linkage, while a question mark (‘?’) warns users to treat the entry cautiously, as its taxonomy may change soon or needs further study.
This approach is echoed in the technical infrastructure of CoL. As described by experts.illinois.edu, the project runs “standardized data integrity checks” to detect issues like “character encoding corruption, non-Latin characters in scientific names, missing parents, duplicated and homonymic names within the GSD and among other GSDs, split taxonomic groups that have been assigned to multiple parent taxa, and other issues.” The symbols help communicate the results of these checks to users.
From a user perspective, these indicators are vital for everything from ecological research to regulatory compliance. For example, when building automated pipelines to reconcile species lists, as discussed on docs.ropensci.org, researchers need to know whether a name is accepted, synonymous, or uncertain to avoid errors in analysis and reporting.
Concrete Examples and Real-World Impact
Consider the following real examples from biology.stackexchange.com:
“=Sittasomus aequatorialis Ridgway, 1891 [species]” tells us that “Sittasomus aequatorialis” is not the accepted species name, but rather a synonym of another, currently accepted name (in this case, “Sittasomus griseicapillus aequatorialis” as confirmed by user mgkrebbs).
“?Acmaeodera gaditana Verdugo, 2016 [species]” indicates that the status of this species is uncertain, perhaps due to lack of sufficient data or unresolved taxonomic placement.
These notations save time for researchers by immediately flagging potential issues, guiding them to the accepted names or alerting them to unresolved taxa that may need closer scrutiny before inclusion in studies or policy documents.
According to guides.library.ucla.edu, the CoL compiles data from more than 60 international sources and is a critical reference for “identifying, classifying, and naming organisms.” The inclusion of these symbols makes it easier to cross-reference accepted names and identify entries that require further investigation or expert input.
Why Taxonomic Clarity Matters
The implications of these notations go far beyond database management. As link.springer.com emphasizes, “scientific names for species play many important roles beyond biology,” including underpinning conservation, trade regulation, and biosecurity. When a name is flagged as a synonym, it helps prevent the duplication of conservation efforts or misapplication of regulations. When a name is flagged as uncertain, it signals to policymakers and researchers alike that more work is needed before the entry can be relied upon in decision-making.
Moreover, with biodiversity databases like CoL serving as backbones for other global initiatives—such as the Global Biodiversity Information Facility (GBIF) and the IUCN Red List—clear notation of synonyms and uncertainties ensures that data flows smoothly between systems, reducing errors and improving the reliability of biodiversity metrics.
Ongoing Challenges and Community Involvement
The Catalogue of Life is a living, evolving project. As marinespecies.org notes, “close to 500 individuals are actively involved in making the COL Checklist possible,” with contributions from organizations like the Smithsonian Institution and the Chinese Academy of Sciences. The process of updating records, resolving synonyms, and clarifying uncertain taxa is ongoing, reflecting new research and discoveries.
Even with these robust systems, challenges remain. Experts.illinois.edu highlights issues such as “character encoding corruption” and “taxonomic conflicts,” which sometimes result in erroneous symbols or misclassified entries. User feedback and expert review are essential components in maintaining the accuracy and utility of the dataset.
In Summary
The ‘=’ symbol in the Catalogue of Life marks a taxon as a synonym—an alternative name for a species or group that is not currently accepted but is important for historical context and data integration. The ‘?’ symbol denotes uncertainty or doubt about the taxonomic status of a name, signaling that its classification is unresolved or under review.
These symbols are not arbitrary; they are essential tools for managing the complexity and dynamism of global taxonomy. They help researchers, conservationists, and policymakers distinguish between accepted, synonymous, and uncertain names, ensuring that biodiversity data remains accurate, transparent, and actionable.
As the Catalogue of Life continues to grow—now listing over two million species and integrating data from hundreds of sources—these clear, standardized notations will remain crucial for the integrity and usability of the world’s most comprehensive taxonomic resource. Whether you’re building a species list for research or navigating regulatory frameworks, understanding what ‘?’ and ‘=’ mean is key to making sense of the living tree of life.