As the field of big data moves out of the “early adopter” phase and into more mainstream use, many colleges and universities are struggling to balance the challenges it creates with the opportunities it affords. Finding the answer to a question on the Internet has famously been compared to getting a sip of water from a firehouse. That same metaphor can be applied to big data, where the pursuit of a relatively simple question can lead to a deluge of data-supporting answers from all sides.
While there are several methods to improve the likelihood of an effective research effort using big data, there is no element as critical to your success as a research team that includes staff with the necessary skills and knowledge for the effort. When assembling your team, the following roles and skill sets are requisite.
Senior Sponsor: The Senior Sponsor is usually a member of the senior academic or administrative staff. This individual is often responsible for the operations of the area that is being researched. This may include academic affairs, student services, enrollment management, faculty administration, institutional research, information technology, etc. The Senior Sponsor is responsible for ensuring that the research process stays true to its intention, represents the strategic needs of the institution, and is reported correctly and without bias to all of the necessary channels.
Data Scientist: In many ways the Data Scientist is the nexus for the entire research process. Data Scientists must have several skills and abilities to accomplish their responsibilities within the team. Since this role provides the primary oversight and direction of the study, the Data Scientist must have a clear understanding of all phases of the research including its design, acquisition/preparation of data, analysis, interpretation of output, and reporting. Staff in these roles need to pay particular attention to the seams that naturally occur in the “hand-off” between the phases of research. These areas are often prone to error and compromise in the integrity of data and analysis may be compromised.
Data Scientists have the ultimate responsibility for the fidelity of the process and its output. In areas where the institutional talent is not deep enough to support the needs, the college or university may elect to consider outside experience and expertise.
Subject Matter Expert/Content Representatives: The Subject Matter Expert is the team member who works most closely in the area represented in the research. Of all of team members, this individual is “in the trenches” daily and has the greatest knowledge and awareness of the research topic and its context.
Subject Matter Experts are engaged in validating the research plans and questions as they pertain to the area of analysis. They are excellent resources for identifying important internal and external data sources and can often be called upon to judge the quality and completeness of records as part of the data audit. Subject Matter Experts represent the needs and perspective of the consumer of the research. As such, they are in a critical position to inform and shape the format of the research output.
Data Hygienist: The Data Hygienist is responsible for evaluating and determining the suitability of the data for use in analysis. This will include an audit of the accuracy and completeness of the records. The Data Hygienist will also prepare the data based on the agreed upon standards for ingestion. The person(s) in this role also organizes the data in the specified format for analysis.
Data Journalist: The Data Journalist is responsible for the design and composition of the reporting elements of the research. This includes determining the ideal format for presenting clear and actionable output for the identified audience. The Data Journalist often needs to convert technical statistical information into knowledge that can be easily consumed by those without quantitative expertise. The staff member(s) in this position should be comfortable determining the correct and accurate method for graphically representing statistical data. Finally, the Data Journalist ensures that the reporting accurately represents the outcome of the research in a fair and unbiased manner.
IT Team: The IT challenges associated with big data are legion and subject for another blog, if not a book! The development of the IT sub team that supports Big Data is a challenge unto itself. Institutions seeking to work with big data must be committed to recruiting and retaining IT staff that are capable of extracting, storing, accessing, aggregating, formatting, and manipulating structured and semi-structured data from disparate sources. The institution will need data engineers, programmers, database managers, systems architects and related staff with proficiency in coding, modeling, machine learning, database management, information systems management, and related areas. No research effort involving big data will be successful without a skilled and talented IT subteam in place.
While the difficulties in using big data for research are significant, the advantages are considerable. The detail and sheer volume of data now available to your institution has never been matched. Used appropriately, colleges and universities can make decisions informed by analysis that is previously unmatched. But this possibility is unlikely without a skilled and dedicated research team.
Dr. Rob Sapp, Senior Associate