Hinoki Togo Limited Company

  • SCIF Book Club
  • Home
  • News
  • Events
  • Contact Us
  • What
    • Engineering
    • Hinoki Togo Video Redaction
    • Consulting
  • About
  • SCIF Book Club
  • Home
  • News
  • Events
  • Contact Us
    • Engineering
    • Hinoki Togo Video Redaction
    • Consulting
  • About
1103 Bladensburg Rd NE
Washington, DC 20002
USA
202 900 9017

Topics in Data Science: Big Metadata /March 6, 2019 by Admin

230px-Teorema_de_desargues.svg.png

Data Science is Metadata Science


Data Science is one of those topics I can never get enough of…even the definition of Data Science as given by the the Data Science Association resonates with me; “the scientific study of the creation, validation and transformation of data to create meaning.

Data Science is a compelling topic because of the immense potential hidden in data sets, and unlocking that insight can successfully address our most significant societal challenges. The promise of Data Science is to more fully contribute to the greater good buy advancing our knowledge and leading to impactful discoveries. The impact of Big Metadata in the Data Science Framework is the topic I address in the following blog as an introductory topic in a new series of blogs called “Topics in Data Science”. Large data repositories generate massive amounts of metadata, enabling big data analytics to leverage technological and methodological advances in data science for the quantitative study of science. This blog post introduces a definition of Big Metadata in the context of data science and discusses the challenges and possibilities in Big Metadata analytics.


What is Metadata?

Metadata can more universally be thought of as value-added language that serves as an integrated layer in an information system. Metadata is structured data supporting functions associated with an object, an object being any “entity, form, or mode”. Metadata serves as connection to the lifecycle of the digital object being represented or tracks. While Big Data offers undreamed-of possibilities to find new data-driven solutions, Big Metadata can be perceived as data that encompasses information about the relationships among data, leading to the creation of a structure where data relationships can be explained.

Smart Data

Data Scientists work with large unstructured data sets, and these data sets are inherently messy, lacking the structures that make them suitable for analytics. To realize maximum value from a data lake, you must be able to ensure data quality and reliability, and make that data smart.  Metadata is inherently Smart Data because it provides context and meaning for data, and enables an action that draws on the metadata enhance connections that have been made. Smart Data is high quality, trusted data. It is accessible across the enterprise. Smart Data is actionable and can be ingested and understood by humans and/or machines.

Structure is everything

Data Science endeavors rely not only on data, but accurate description of the data - hence metadata.

In the practice of data science, much of the attention is focused on the beautiful visualizations or amazing discoveries made from analyzing large data sets. Little attention is given to the process the data scientist uses to get those results, and specifically the time-consuming process of preparing data. Data preparation accounts anywhere from 80–90% of the work of data scientists. They spend 60% of their time on cleaning and organizing data and 19% of their time on collecting data sets, meaning data scientists spend around a whopping 80% of their time preparing and cleaning their data for analysis. Detecting data anomalies and ameliorating data entry errors generally involves writing code, an intuitive part of the data exploration and confirmation process.

David Lyle, VP of Business Transformative Services at Informatica, wrote that “the difference between success and failure is proportional to the investment an organization makes in its metadata management system”.


Big Metadata Management

Large data sets used in data science have unique challenges pertaining to data management. Big metadata analytics requires careful design of datasets, paying attention to data structure. A misstep in data prep may cause a stalled server, never-ending loops, or very large datasets being exploded out of proportion or cases where data causes issues like disconnection from the network due to inactivity for a long time. There are also instances where, while merging datasets, data explodes out of proportion due to data matching issues. Successfully linking or merging data fields pivots on the extent to which metadata conforms to a standardized structure. Primarily, the largest advantage of this conceptual alignment of a standardized metadata structure on top of a metadata source is the creation of a formalized conceptual definition, allowing for a defined metadata interchange across datasets, fostering stable, reproducible results. Big metadata analytics performed on an ad hoc basis without standardized metadata exchange formats is more likely to suffer from mistakes made in both conceptual and computational workflows. Elevation of the metadata from the underlying storage allows its use in the meta-mining process.


  • Funding
  • AI
  • Machine Learning
  • artificial intelligence
  • DARPA



  Tags: Data Science, Metadata, Tagging
← Startups at the Intersection of Government and Innovation DARPA Competency Aware Machine Learning Proposer's Day (CAML) →
Hinoki Togo Limited Company Blog

Hinoki Togo Limited Company Blog

A technology blog with a focus on innovation and collaboration


Featured Posts

News
U.S. Economic Development Administration Reauthorized by Congress for First Time in 20 Years
U.S. Economic Development Administration Reauthorized by Congress for First Time in 20 Years
about 5 months ago

The U.S. Department of Commerce’s Economic Development Administration(EDA) celebrates its historic reauthorization by Congress, allowing it to continue its legacy of promoting American innovation and competitiveness by providing grants and support to communities across the country. Since 1965, EDA has led some of the nation’s most impactful programs to strengthen public works and infrastructure, job creation and workforce development, disaster recovery, and technology and industry advancement. EDA has not been formally reauthorized since 2004.

NSF Workshops to Identify Educational Requirements of the Future Ocean Technical Workforce
NSF Workshops to Identify Educational Requirements of the Future Ocean Technical Workforce
about a year ago

The National Science Foundation (NSF) anticipates increased investments in future ocean observing systems, ocean renewable energy systems, and other blue economy industries. Development of new instrumentation, maintenance of and improvements to equipment related to ocean-based systems, and data management efforts to ensure the quality and accuracy of the data and the analysis of large data sets generated by these systems all require a workforce with specialized training in ocean sciences, engineering, manufacturing, and data science. However, there are few academic programs that prepare students with the skills required for the expanding needs of the ocean technology workforce. Therefore, the NSF Directorate for Engineering (ENG), Division of Engineering Education and Centers (EEC) and Division of Civil, Mechanical and Manufacturing Innovation (CMMI); the Directorate for Technology, Innovation and Partnerships (TIP), Division of Innovation and Technology Ecosystems (ITE); the Directorate for Geosciences, Division of Ocean Sciences (OCE); and the Directorate for Education (EDU), Division of Undergraduate Education (DUE) are encouraging proposals for workshops that will engage industry and academia in discussions that identify skills needed and curriculum changes that are required to prepare students to participate in the current and future ocean technical workforce.

Licensing Technology with NASA
Licensing Technology with NASA
about 2 years ago

NASA develops technology to solve the tough challenges of exploring space, advancing the understanding of our home planet, and improving air transportation. Often, those same inventions have other untapped applications. Through patent licensing. those technologies can be transformed into commercial products and solutions that can give your business that competitive edge

Administrator Guzman Applauds Passage of Small Business Innovation Research (SBIR) Program Reauthorization
Administrator Guzman Applauds Passage of Small Business Innovation Research (SBIR) Program Reauthorization
about 2 years ago

Administrator Isabella Casillas Guzman, head of the U.S. Small Business Administration and voice for America’s 33 million small businesses in President Biden’s Cabinet, released the following statement today after the House voted to reauthorize funding for the Small Business Innovation Research (SBIR) program. Reauthorizing the programs enables SBA and our partner federal agencies to advance domestic commercialization of innovative technologies developed through the SBIR/STTR programs. The reauthorization extends the program and critical pilot initiatives, strengthens research security due diligence efforts, expands open topics solicitations, and increases company performance standards.

Enterprise Parts Management System (EPMS)
Enterprise Parts Management System (EPMS)
about 2 years ago

The Department of Defense (DoD) is seeking a prototype that establishes the capability to manage electronic parts across the enterprise. The purpose of this prototype is to grant visibility into the supply chain, enable better supply chain risk management, allow aggregation of demand, improve purchasing power, enable collaborative solutions to obsolescence and other parts related issues, reduce the risk of counterfeit parts, and enable more DoD wide design modernization. EPMS will enable better microelectronics-focused parts selection and management throughout the entire life cycle of an acquisition program. By aggregating part data at the Service and DoD level, the EPMS will provide unprecedented insight into supply chains enterprise wide. Such insight will lead to improvements in traditional and cyber supply chain risk management, hardware assurance, collaborative solutions to obsolescence and other parts related issues, counterfeit prevention, and DoD wide design modernization. In addition, the demand consolidation made possible by EPMS will increase DoD’s purchasing power through economies of scale.

DARPA BRIDGES: Bringing Classified Innovation to Defense and Government Systems
about 2 years ago

The Defense Advanced Research Projects Agency (DARPA) is looking for small business concerns and/or nontraditional defense contractors, that do not have facility clearances but are interested in performing classified work for the Department of Defense (DoD). The BRIDGES initiative seeks to assist small companies with disruptive ideas gain facility clearances so that they can bring innovation to the DoD, and help solve classified technical challenges.

A virtual Q&A session will be held on October 24, 2022, from 1:00 pm to 3:00 pm eastern time. Registration for the Q&A session is required by October 19, 2022


Learn more