Materials Genome Initiative accelerates materials discovery: A Q&A with James A. Warren

The Materials Genome Initiative launched in 2011 to discover, manufacture, and deploy advanced materials in the United States twice as fast, at a fraction of the cost.

According to mgi.gov, accelerating discovery and deployment of advanced material systems will be crucial to achieving global competitiveness in the 21^st century. Therefore, MGI goals drive policies, resources, and infrastructure to foster and accelerate materials discovery at U.S. institutions.

Since MGI’s launch, the U.S. government has invested multiple hundreds of millions of dollars into the initiative through funding opportunities, unique partnerships, and the creation of new centers and facilities focused on materials discovery. We asked James A. Warren, technical program director of materials genomics at NIST’s Materials Measurements Laboratory, about current progress and future directions of MGI. What follows are his responses (JW) to our questions.

James A. Warren

What successes has MGI produced in the five years since it launched? What do you anticipate for its future directions, goals, and accomplishments?

JW: MGI is fostering acceleration of the discovery, design, development, and deployment of new materials by creating a Materials Innovation Infrastructure. This infrastructure is built from linked computational tools, experimental tools, and digital data. The MGI approach was not, in and of itself, a new idea, and already had been successfully implemented many times—with large returns on investment.

For example, IBM Research was kind enough to share the following story about its efforts in the mid-2000s to design polymeric materials using computer-aided methods. The program, which included efforts in synthesis and application (performance), found that the most fruitful approach was a tight collaboration between laboratory and computational work that focused on speeding up cycles of informative experiments. This strategy led to nine U.S. patent filings and 15 scientific publications on unique materials or material processes that we would not have invented without computational modeling—including one case in which the company accurately simulated and predicted the properties of a high-value polymer before it was synthesized and tested.

Many more examples, including Ford Motor Company’s design of a new aluminum alloy for engine applications—and a return on investment of about 7:1—can be found in the National Research Council 2008 report Integrated Computational Materials Engineering.¹ More recently, there has been strong evidence that the alloys used in the Apple Watch were designed in about two years, placing the product design cycle nearly in sync with the materials design cycle—a remarkable accomplishment.² Indeed, such successes are a crucial inspiration for MGI.

However, in general, the means for such rapid development lie out of reach for most materials developers and users. Hence, the successes of MGI will be marked by creation, and subsequent use, of the newly developed MGI infrastructure. In other words, success is marked by the uptake of technologies that reduce barriers to discovery, design, development, and deployment of new materials into manufactured products. Measuring this effect is something of great interest to the effort, but it is highly nontrivial.

The U.S. government has made considerable investments in this infrastructure, and, not surprisingly, we are starting to see returns as the research curve bends from basic to applied. We should note, however, that investments are still ramping up in this space, and returns will be even larger down the road. Ultimately, in some cases, we will be able to trace design of new materials directly to the existence of this infrastructure. However, the infrastructure will ideally be so seamless that its influence is noticeable only by its absence.

Some of the most visible efforts in the MGI involve amassing large amounts of data, which then can be mined using machine learning and related techniques. It is clear from these efforts that we can discover things when sufficiently high-quality data are aggregated around a topic of interest. There are several approaches to amassing materials data: compute it from models with known reliability; conduct high-throughput experiments; and provide the data infrastructure to enable publication and discovery of the vast amounts of materials data being generated worldwide. Below I briefly address each:

Compute it. Perhaps the most familiar example of success within the framework of the MGI involves construction of databases of compound properties computed using density functional theory (usually at 0 K). There are many active, MGI-funded projects (e.g., Materials Project, AFLOWLIB, and OQMD) that have yielded considerable insights and discoveries pointing the way to new materials for thermoelectrics, batteries, and other applications.³
Conduct high-throughput experiments. Many researchers have pursued high-throughput experiments to provide sufficient data to apply machine learning techniques to discover new materials with improved properties.⁴ At this time, these approaches are built around bespoke codes and tools, coupled with access to beam lines, which are a scarce resource. Getting more widely accessible high-throughput experimental methods is essential to the ultimate success of MGI, because the existing paucity of experimental data is a critical impediment to materials discovery.⁵
Provide the data infrastructure to enable publication and subsequent discovery of materials data. To meet the infrastructural needs demanded by MGI, NIST has implemented a program that revolves around data and model exchange, data and model quality, and new methods and metrologies that can flow from this infrastructure as it matures. Simultaneously, within its Material Measurement Laboratory, NIST has established an Office of Data and Informatics, with a focus on the provision of the highest-quality reference data and development and dissemination of best practices in data science. At this time, NIST is deploying several tools, including a materials data repository (materialsdata.nist.gov) for depositing data, a materials data curation system (https://mdcs1.nist.gov), and, to facilitate discovery, a materials resource registry (http://mgi.nist.gov/materials-resource-registry). Uptake of these tools, and their counterparts being developed under MGI funding, will be a strong measure of the initial success of MGI.

I would be remiss if I did not note that there are many complementary efforts to NIST’s effort to build this data infrastructure. For instance, the DOE has funded the Predictive Integrated Structural Materials Science (PRISMS) center, based at the University of Michigan (Ann Arbor, Mich.), which is focused on establishing a unique scientific platform that will enable accelerated predictive materials science for structural metals. This platform includes the Materials Commons, a repository for materials research data as well as a host of simulation tools.

Other data infrastructure efforts include the Materials Project’s MPContrib effort, a new addition to the infrastructure long under development at Lawrence Berkeley National Laboratory (Berkeley, Calif.); the Materials Data Facility, supported by the Center for Hierarchical Materials Design at Northwestern University (Evanston, Ill.); and the efforts of Citrine Informatics (Redwood City, Calif.), a small company focused on hosting vast amounts of materials data and providing analytical capabilities to enable materials discovery.

Tell us more about the Materials Resource Registry.

JW: The Materials Resource Registry (MRR) is a considerable programmatic focus for NIST. The MGI is supported by NIST through three primary programmatic thrusts, all joined by the thread of digital data. Specifically, NIST is now devoting considerable effort, in concert with its partners in industry, academia, and government, to develop the tools, standards and techniques for establishing model and data exchange infrastructure; establishing best practices and new methods to ensuring data and model quality; and developing the analytical tools to enable data-driven materials science.

When NIST first began its efforts to support MGI, it was natural to assume that NIST would focus on its historical primary mission—development and delivery of the highest-quality data and models. What came as a bit of a surprise, at least for me, was the need to develop a significant amount of infrastructure to enable information exchange. And at the top of the pyramid of this data infrastructure is the NIST Materials Resource Registry. One of the most exciting things about the Materials Resource Registry is that we can do it now and do it well. This year will see worldwide deployment of this approach. An instance of the NIST Materials Resource Registry can be found at http://bit.ly/MaterialsResourceRegistry.

The simplest way to think of the Materials Resource Registry is as a yellow pages or phone book, but with a lot more information, enabling in-depth, worldwide searches of available resources. Conceptually, the idea of a registry is simple and already has been of considerable use in fields outside materials science, particularly astronomy.

Indeed, in 2014, realizing it could capitalize on progress made in other disciplines, NIST hired Robert Hanisch, formerly of the Virtual Astronomical Observatory (VAO), as founding director of the Office of Data and Informatics. VAO, working with its counterparts worldwide, achieved something remarkable: a system that enabled discovery and access of astronomical data from a remarkable number of instruments from many countries.⁶ At the top of the VAO infrastructure was a registry, and NIST has transferred this concept to create the NIST Materials Resource Registry, which is powered by an advanced software stack developed by NIST’s Information Technology Laboratory.

The power of the Materials Resource Registry lies in its simplicity. The goal is to create a listing of high-level entities, such as research organizations, data repositories, services, or software packages (not, typically, a specific data set). The aim of the registry is to enable people to find the resource so they can access it at the resource home (e.g., institutional website or domain repository). The Materials Resource Registry does not host materials data, only the metadata describing the resource and, thereby, directs searchers to the resource.

The registry is designed so that entries can be added in minutes by the resource host (e.g., owner or curator), and, if an institution so desires, a local version of the Materials Resource Registry can be instantiated onsite (the source of the software can be found on GitHub). Resources can be added to the registry by the owner of the registry instance or by external participating institutions and their representatives. Usually there is some vetting to ensure the authenticity of the source before an institution will allow an outsider to add to their registry.

The multiple instances of the Materials Resource Registry, coupled with a protocol, such as the Open Archive Initiative Protocol for Metadata Harvesting, enables metadata from each instance of the Materials Resource Registry to be harvested by the other registry instances, yielding global awareness of available resources. Thus, as we gain worldwide acceptance for the concept of the Materials Resource Registry, discoverability of resources will become ever easier. To gain this acceptance, NIST is leading an effort within the Research Data Alliance, an international organization devoted to eliminating barriers to sharing research data.

Although the MGI goal of halving the materials development time is somewhat arbitrary, it was important to set a target so that the initiative could ultimately benchmark its progress against current practices. Ultimately, the raw material of the materials innovation infrastructure is digital data, and the Materials Resource Registry is the means to discover what is out there in terms of the performance of existing materials and in the predictive models needed to discover the next great material. Without this discovery capability, the data remain essentially in the dark—and out of the research work.

What do you think about the challenges of specification standards and the problem of data provenance in constructing ceramic material property databases?

JW: This is, of course, a long-standing challenge where NIST continues to strive to improve the existing state-of-affairs. In general, research domains need to develop better practices across their communities, which involves working toward consensus, and then develop the means to enforce these standards. Indeed, for decades NIST has collaborated extensively with ACerS on this problem, yielding products like the Ceramics WebBook.

Ultimately, in my opinion, the challenges can be broken down into a few issues:

What is the existing (sub)culture of data curation? Can we find our own data? Now that we have a full born-digital data lifecycle, should we be doing a better job managing data for our own research?
We need tools that implement standards without too much disruption of current practices. Requiring researchers to modify their workflows is fraught with pitfalls and is an approach unlikely to succeed anytime soon.
A top-down approach, surprisingly, can work well, if we let communities grow standards from the minimal to the exact—top to bottom. This is counterintuitive, but an attempt to develop monolithic standards yields results that are unlikely to be widely adopted because of the enormous costs associated with implementing the standard. This is especially true for process descriptions, such as those that dominate materials fabrication and characterization, where standard workflows are found only in rigid industrial settings, not research labs. NIST relies on consensus-based standards and understands that some level of standardization is often preferable to none. Yet, as I have indicated above, this part of the problem is more about communities and people coming to consensus than specific, externally imposed standards.

So, after these caveats, NIST is approaching this problem in a twofold manner: develop tools to enable the attachment of metadata, such as provenance and other relevant information about the data; and work with communities to develop standards for this metadata, with the understanding that the standards should be ones the community is prepared to adopt (and, so, often will be very “lightweight”). These ideas are manifested in the Materials Data Curation System (https://mdcs1.nist.gov, and the source at https://github.com/usnistgov/MDCS) and the metadata schema library that provides templates for those who chose to either adopt or extend the standards (https://github.com/MDCS-community).

How has the approach to research data changed in the digital age?

JW: Today, all data are essentially born digital, but we have yet to fully capitalize on this reality. As discussed, researchers are unable to easily share their data with colleagues, let alone publish data in a manner that has maximum utility. However, this is not the fundamental problem. Indeed, it is often the case that researchers

Do not, or cannot, save all of their data;
Cannot find their own data, even if they saved it; or
Cannot figure out what their own data mean, even after a relatively short amount of time, let alone reuse the data of a fellow researcher or student.

These issues have nothing to do, per se, with widespread discovery of information, but only researchers’ ability to perform the process of data discovery on their own data. Acknowledging this problem is the beginning of the process of designing the tools and infrastructure to materials data discovery. This also is why NIST has been investing a great deal of effort in the development of the Materials Data Curation System. With this tool, and others like it, we believe people can significantly improve their own data management, thereby improving the quality and “shelf life” of materials research.

How do you foresee that availability and access to data could change how research is performed, interpreted, and built upon?

JW: As I elucidated above, once a researcher has access to a great deal of data about a problem, a great deal of science can be done. Of course, the traditional modality of science is not mooted by MGI approaches, only enhanced. That is, theories and models must be validated and extended to encompass available experimental data. This is possible only to the extent that such data are available.

In that sense, MGI promises to extend the reach and speed of science and engineering by more tightly integrating experiment and computation, by making the results of each more readily available. Of course, one of the true measures of progress in the sciences is extension of existing models beyond their current range of applicability, or even, more importantly, invalidation of a model in favor of an improved description. The proliferation of data promises to accelerate this central scientific paradigm.

In addition, improved access and organization of massive amounts of data will reduce the likelihood of duplication of efforts across research domains (although some is always necessary) and also raises the potential for increases in cross-field fertilization. This outcome is not a foregone conclusion, however, because the increases in data also will test our ability to keep up with the proliferation of sources in this data deluge.

Indeed, this is likely a fascinating research topic in itself. Recently we put these ideas to the test through the Materials Science and Engineering Data Challenge, a competition led by the Air Force Research Laboratory, NIST, and NSF to foster new, high-quality, materials research based upon shared research data. The details of this competition can be found at bit.ly/29MRaz4.

Finally, with the coming availability of large amounts of data—and the associated tools to evaluate its quality—we are entering an era where the so-called fourth paradigm of data-driven materials science is firmly within our grasp. Although such approaches necessarily must complement existing approaches, as we move in our understanding from correlation to causation, these methods offer the opportunity to stand on the shoulders of our colleagues, and, thereby, obtain insights that otherwise would be beyond our horizon.^7,8

For more information, contact Warren at james.warren@nist.gov.

Cite this article

A. Gocha, “Materials Genome Initiative accelerates materials discovery: A Q&A with James A. Warren,” Am. Ceram. Soc. Bull. 2016, 95(7): 24–27.

Issue

September 2016

Category

Basic science

Download Issue PDF

Article References

¹Integrated Computational Materials Engineering: A Transformational Discipline for Improved Competitiveness and National Security. National Research Council, 2008. DOI: 10.17226/12199

²F. Lambert. “Elon Musk hires Apple’s alloy expert to lead materials engineering at both Tesla and SpaceX,” http://9to5mac.com/2016/02/24/apple-alloy-kuehmann-musk-tesla-spacex/.

³A. Jain, K.A. Persson, and G. Ceder. “Research update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases,” APL Mater., 4, 053102 (2016). DOI: 10.1063/1.4944683

⁴A.G. Kusne, T. Gao, A. Mehta, L. Ke, M.C. Nguyen, K.-M. Ho, V. Antropov, C.-Z. Wang, M.J. Kramer, C. Long, and I. Takeuchi. “On-the-fly machine learning for high-throughput experiments: Search for rare-earth-free permanent magnets.” Sci. Rep., 4, 6367 (2014). DOI: 10.1038/srep06367

⁵M.L. Green, I. Takeuchi, and J.R. Hattrick-Simpers. “Applications of high-throughput (combinatorial) methodologies to electronic, magnetic, optical, and energy-related materials,” J. Appl. Phys., 113, 231101 (2013). DOI: 10.1063/1.4803530

⁶M. Demleitnera, G. Greeneb, P. Le Sidanerc, and R.L. Planted. “The virtual observatory registry,” Astron. Comput., 7–8, 101–107 (2014). DOI: 10.1016/j.ascom.2014.07.001

⁷C.H. Ward, J.A. Warren, and R.J. Hanisch. “Making materials science and engineering data more valuable research products,” Integr. Mater. Manuf. Innovation, 3, 22 (2014). DOI: 10.1186/s40192-014-0022-8

⁸A. Agrawal and A. Choudhary. “Perspective: Materials informatics and big data: Realization of the ‘fourth paradigm’ of science in materials science,” APL Mater., 4, 053208 (2016). DOI: 10.1063/1.4946894

Ceramic industry news and updates from the American Ceramic Society.

Materials Genome Initiative accelerates materials discovery: A Q&A with James A. Warren

The Materials Genome Initiative launched in 2011 to discover, manufacture, and deploy advanced materials in the United States twice as fast, at a fraction of the cost.

Cite this article

Article References

Related Articles

Engineered ceramics support the past, present, and future of aerospace ambitions

Aerospace ceramics: Global markets to 2029

Innovations in access and technology secure clean water around the world

Become an
ACerS member

Materials Genome Initiative accelerates materials discovery: A Q&A with James A. Warren

The Materials Genome Initiative launched in 2011 to discover, manufacture, and deploy advanced materials in the United States twice as fast, at a fraction of the cost.

Cite this article

Article References

Related Articles

Engineered ceramics support the past, present, and future of aerospace ambitions

Aerospace ceramics: Global markets to 2029

Innovations in access and technology secure clean water around the world

Become anACerS member

Become an
ACerS member