Photo: Cecilia Earls |
Departments establish curriculums; labs invest in new technologies; we admit students, hire faculty, monitor meal plans, and define security protocols. How can we optimize those decisions over the coming years? How do we know if we are meeting our goals? Can we use our data to make better decisions?
I think the answer is yes, but only if we couple the use of state-of-the-art analytical methods with a focused approach to how and when we engage our data to make decisions. Our data strategy must reflect not only our institutional goals, but also the novel ways in which we can now collect and analyze data to attain those goals. Part of my role as a data scientist at Cornell University is to help guide this strategy by establishing a common understanding of, and vocabulary around, the data-driven decision-making process.
A Team Effort
Simply hiring a data scientist does not create a data-driven organization. Identifying and realizing relevant and measurable goals through a well thought-out data strategy does, and this requires collaboration. It is essential that data scientists' partner with four types of stakeholders:
- Visionaries. These are the leaders with a vision of our organization's future. They can identify the areas in which, when influenced by informed decision-making, would result in the greatest impact toward achieving our institutional goals. Bottom line: they know what our "big questions" really should be.
- Subject experts. Members of our community who deeply understand the area chosen for analysis. We rely on them to identify which variables are important. Subject experts can help guide the analysis because they understand the types of change that are truly possible. If we offer a "solution" that cannot be implemented, it is the wrong solution.
- Data experts and archivists. These individuals know where the relevant data are stored and how they can be accessed. This group also includes experts on data quality and how the data have been collected.
- Technology experts. Setting up a secure data ecosystem requires substantial computer expertise and resources. Many data scientists do not have this expertise and need support from those who do.
The Analysis: Machine Learning and Statistical Inference
At their core, supervised and unsupervised machine learning and statistical analysis are simply sets of algorithms used to extract useful information from data. While you can expect your data scientist to choose which algorithm to use, everyone on the team should have a basic understanding of what these algorithms do.
Both supervised machine learning algorithms and traditional statistical inference depend on historical data for either:
- prediction—accurately estimating future outcomes; or
- estimation—determining which variables are related to the outcome, and how and to what degree they are related to the outcome.
Conclusion
Big data science is taking purchase in higher education, and our diverse institutions provide an exceptionally fertile ground for impactful data-driven decision-making. We are not corporations; we are small, vibrant communities that make decisions every day regarding critical issues such as safety, facilities management, risk management, housing, recruitment, admissions, research support, academic freedom, instruction, campus life, alumni relations, athletics, career services, support services, and healthcare. Each of these components creates independent data stores that, when analyzed collectively, can offer valuable insights for the institution as a whole.
To realize this potential, however, requires that the entire community of decision makers, data and subject experts, technological experts, and analysts work collaboratively and communicate effectively.
Read more...
Source: EDUCAUSE Review