By Dr Yakub Sebastian
According to the Forbes magazine, by year 2025 most of the world’s major companies will collectively generate approximately 180 zettabytes of data. To put this into perspective, one zettabyte is enough to store 36 million years’ worth of high-definition quality video. As such, data becomes the new oil, where companies increasingly monetize data as their main source of revenues.
Enter a new breed of professionals: the data scientists. A data scientist employs a range of statistical and computational skills to analyse and interprets complex data in order to assist businesses in their decision-making.
In 2012, a Harvard Business Review article called it the ‘sexiest job of the 21st century’. Meanwhile, one survey released in Singapore last year revealed that an average junior data scientist’s salary could fetch up to RM 130,000 per annum.
Despite its increasing popularity, data scientist is only one among a growing number of data science-related jobs that we may collectively call as data professionals. They include data modellers, data analysts, and data engineers. According to the Malaysian Digital Economy Corporation (MDEC), Malaysia is one of a few countries in the world that prioritise the importance of data science excellence as part of its national strategy. By 2020, the country aims to churn out 20,000 data professionals. This article offers several thoughts in relation to this national goal.
Cultivate data-driven thinking in schools
Given the anticipated future demands for data professionals, we should start thinking about cultivating data-driven mindset early among our children. Data science curricula should begin making its way into our primary and secondary school classrooms. There is already an evidence how this could be done.
Aspiring Minds (www.aspiringminds.com), an India-based employability assessment company, recently piloted a data science education project among the 5th through 9th graders in India and the United States, where students were given half-day, hands-on tutorials on how to perform a full-cycle data science task.
The project adopted a data science pedagogical design that aims at maximizing student engagement while minimizing the pre-requisite knowledge. The students were fully engaged as they were given highly relatable problem statement such as predicting if a particular kid is ‘friend-worthy’. To do this, they learned to construct a friendship dataset from scratch and build a predictive model from the collected data.
All that is required is the basic knowledge of counting, addition, percentages, comparisons and basic computer skills. The kids’ responses were overwhelmingly positive. A similar approach could be adopted in our schools.
Becoming data professionals are not just for computer geeks
People from diverse education backgrounds should be encouraged to explore careers as data professionals, not only by those with computer or statistics-related degrees. In fact, some of the most impactful data professionals in the history had no computer experience. It is the insatiable curiosity, relentless drive to solve problems, and communication prowess that often make a great data scientist.
Florence Nightingale is widely regarded as the founder of modern nursing. But many of us probably do not realize that Nightingale was also a prodigious statistician and a true pioneer in data visualization techniques. At the height of the Crimean War of the 19th century, Nightingale embarked on analysing soldier mortality data from various British military camps. The finding was more than revealing. Her analysis showed that more British soldiers had died in these camps from wound infections than the number of those killed in the battlefield. Employing a pie chart-like visualization known as the coxcomb diagram, Nightingale’s data showed a strong correlation between soldiers’ mortality rate and the camps’ hygiene level. Subsequent improvements made to the camps’ sanitary system reduced the death rate from 42% to merely 2%, prompting a nationwide sanitary reform by the British Government.
Greater industry-academia-government synergies
More intensive industry-academia synergies are needed to train truly market-ready data professionals. There are already some good examples of such synergies. The European Union-funded EDISON project (www.edison-project.eu) recently released the EDISON Data Science Framework. The framework provides a comprehensive set of model data science curricula that can be adopted by universities worldwide. Importantly, the EDISON project serves as an excellent venue for academia-industry dialogues towards creating more industry-aligned data science curricula.
Closer to home, the newly established ASEAN Data Analytics Exchange (ADAX) (www.adax.asia) in Kuala Lumpur aims at becoming a regional collaborative hub between businesses, academia, governments and start-ups who wish to rapidly adopt data science solutions as the integral part of their operations.
Moreover, its brand-new Data Science Finishing School for Graduates initiative provides our fresh university graduates with the opportunity to take part in a 6-month paid data science internship program at various industry partners.
In short, the need for creating more data professionals in Malaysia is real. But if we are serious about growing highly capable data professionals for the future, it is important to put our effort in training our young to begin developing data-centric thinking, encouraging multidisciplinary interests in the profession, and creating stronger synergies between the industries, academia and government.
Dr Yakub Sebastian is a lecturer with the Faculty of Engineering, Computing and Science at Swinburne University of Technology Sarawak Campus. He is contactable via ysebastian@swinburne.edu.my